restructure results

author: Christian Krinitsin <mail@krinitsin.com> 2025-07-03 19:39:53 +0200
committer: Christian Krinitsin <mail@krinitsin.com> 2025-07-03 19:39:53 +0200
commit: dee4dcba78baf712cab403d47d9db319ab7f95d6 (patch)
tree: 418478faf06786701a56268672f73d6b0b4eb239 /results/classifier/016/none
parent: 4d9e26c0333abd39bdbd039dcdb30ed429c475ba (diff)
download: emulator-bug-study-dee4dcba78baf712cab403d47d9db319ab7f95d6.tar.gz
emulator-bug-study-dee4dcba78baf712cab403d47d9db319ab7f95d6.zip
11 files changed, 0 insertions, 7675 deletions
diff --git a/results/classifier/016/none/23300761 b/results/classifier/016/none/23300761
deleted file mode 100644
index 2a3e6f16..00000000
--- a/results/classifier/016/none/23300761
+++ /dev/null
@@ -1,340 +0,0 @@
-i386: 0.475
-x86: 0.171
-debug: 0.052
-files: 0.038
-performance: 0.030
-register: 0.029
-virtual: 0.027
-PID: 0.025
-TCG: 0.019
-semantic: 0.018
-operating system: 0.017
-socket: 0.013
-boot: 0.013
-hypervisor: 0.012
-device: 0.012
-user-level: 0.011
-risc-v: 0.010
-alpha: 0.007
-ppc: 0.006
-VMM: 0.005
-vnc: 0.004
-network: 0.004
-architecture: 0.003
-permissions: 0.003
-assembly: 0.003
-peripherals: 0.003
-kernel: 0.002
-arm: 0.002
-graphic: 0.002
-mistranslation: 0.001
-KVM: 0.000
-
-[Qemu-devel] [BUG] 216 Alerts reported by LGTM for QEMU (some might be release critical)
-
-Hi,
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
-Some of them are already know (wrong format strings), others look like
-real errors:
-- several multiplication results which don't work as they should in
-contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
-32 bit!),Â  target/i386/translate.c and other files
-- potential buffer overflows in gdbstub.c and other files
-I am afraid that the overflows in the block code are release critical,
-maybe that in target/i386/translate.c and other errors, too.
-About half of the alerts are issues which can be fixed later.
-
-Regards
-
-Stefan
-
-On 13/07/19 19:46, Stefan Weil wrote:
->
->
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->
-Some of them are already know (wrong format strings), others look like
->
-real errors:
->
->
-- several multiplication results which don't work as they should in
->
-contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
-32 bit!),Â  target/i386/translate.c and other files
-m->nb_clusters here is limited by s->l2_slice_size (see for example
-handle_alloc) so I wouldn't be surprised if this is a false positive.  I
-couldn't find this particular multiplication in Coverity, but it has
-about 250 issues marked as intentional or false positive so there's
-probably a lot of overlap with what LGTM found.
-
-Paolo
-
-Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
->
-On 13/07/19 19:46, Stefan Weil wrote:
->
-> LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->
->
-> Some of them are already known (wrong format strings), others look like
->
-> real errors:
->
->
->
-> - several multiplication results which don't work as they should in
->
-> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
-> 32 bit!),Â  target/i386/translate.c and other files
->
-m->nb_clusters here is limited by s->l2_slice_size (see for example
->
-handle_alloc) so I wouldn't be surprised if this is a false positive.  I
->
-couldn't find this particular multiplication in Coverity, but it has
->
-about 250 issues marked as intentional or false positive so there's
->
-probably a lot of overlap with what LGTM found.
->
->
-Paolo
->
-From other projects I know that there is a certain overlap between the
-results from Coverity Scan an LGTM, but it is good to have both
-analyzers, and the results from LGTM are typically quite reliable.
-
-Even if we know that there is no multiplication overflow, the code could
-be modified. Either the assigned value should use the same data type as
-the factors (possible when there is never an overflow, avoids a size
-extension), or the multiplication could use the larger data type by
-adding a type cast to one of the factors (then an overflow cannot
-happen, static code analysers and human reviewers have an easier job,
-but the multiplication costs more time).
-
-Stefan
-
-Am 14.07.2019 um 15:28 hat Stefan Weil geschrieben:
->
-Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
->
-> On 13/07/19 19:46, Stefan Weil wrote:
->
->> LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
->>
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->>
->
->> Some of them are already known (wrong format strings), others look like
->
->> real errors:
->
->>
->
->> - several multiplication results which don't work as they should in
->
->> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
->> 32 bit!),Â  target/i386/translate.c and other files
-Request sizes are limited to 32 bit in the generic block layer before
-they are even passed to the individual block drivers, so most if not all
-of these are going to be false positives.
-
->
-> m->nb_clusters here is limited by s->l2_slice_size (see for example
->
-> handle_alloc) so I wouldn't be surprised if this is a false positive.  I
->
-> couldn't find this particular multiplication in Coverity, but it has
->
-> about 250 issues marked as intentional or false positive so there's
->
-> probably a lot of overlap with what LGTM found.
->
->
->
-> Paolo
->
->
-From other projects I know that there is a certain overlap between the
->
-results from Coverity Scan an LGTM, but it is good to have both
->
-analyzers, and the results from LGTM are typically quite reliable.
->
->
-Even if we know that there is no multiplication overflow, the code could
->
-be modified. Either the assigned value should use the same data type as
->
-the factors (possible when there is never an overflow, avoids a size
->
-extension), or the multiplication could use the larger data type by
->
-adding a type cast to one of the factors (then an overflow cannot
->
-happen, static code analysers and human reviewers have an easier job,
->
-but the multiplication costs more time).
-But if you look at the code we're talking about, you see that it's
-complaining about things where being more explicit would make things
-less readable.
-
-For example, if complains about the multiplication in this line:
-
-    s->file_size += n * s->header.cluster_size;
-
-We know that n * s->header.cluster_size fits in 32 bits, but
-s->file_size is 64 bits (and has to be 64 bits). Do you really think we
-should introduce another uint32_t variable to store the intermediate
-result? And if we cast n to uint64_t, not only might the multiplication
-cost more time, but also human readers would wonder why the result could
-become larger than 32 bits. So a cast would be misleading.
-
-
-It also complains about this line:
-
-    ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size,
-                        PREALLOC_MODE_OFF, &local_err);
-
-Here, we don't even assign the result to a 64 bit variable, but just
-pass it to a function which takes a 64 bit parameter. Again, I don't
-think introducing additional variables for the intermediate result or
-adding casts would be an improvement of the situation.
-
-
-So I don't think this is a good enough tool to base our code on what it
-does and doesn't understand. It would have too much of a negative impact
-on our code. We'd rather need a way to mark false positives as such and
-move on without changing the code in such cases.
-
-Kevin
-
-On Sat, 13 Jul 2019 at 18:46, Stefan Weil <address@hidden> wrote:
->
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
-I had a look at some of these before, but mostly I came
-to the conclusion that it wasn't worth trying to put the
-effort into keeping up with the site because they didn't
-seem to provide any useful way to mark things as false
-positives. Coverity has its flaws but at least you can do
-that kind of thing in its UI (it runs at about a 33% fp
-rate, I think.) "Analyzer thinks this multiply can overflow
-but in fact it's not possible" is quite a common false
-positive cause...
-
-Anyway, if you want to fish out specific issues, analyse
-whether they're false positive or real, and report them
-to the mailing list as followups to the patches which
-introduced the issue, that's probably the best way for
-us to make use of this analyzer. (That is essentially
-what I do for coverity.)
-
-thanks
--- PMM
-
-Am 14.07.2019 um 19:30 schrieb Peter Maydell:
-[...]
->
-"Analyzer thinks this multiply can overflow
->
-but in fact it's not possible" is quite a common false
->
-positive cause...
-The analysers don't complain because a multiply can overflow.
-
-They complain because the code indicates that a larger result is
-expected, for example uint64_t = uint32_t * uint32_t. They would not
-complain for the same multiplication if it were assigned to a uint32_t.
-
-So there is a simple solution to write the code in a way which avoids
-false positives...
-
-Stefan
-
-Stefan Weil <address@hidden> writes:
-
->
-Am 14.07.2019 um 19:30 schrieb Peter Maydell:
->
-[...]
->
-> "Analyzer thinks this multiply can overflow
->
-> but in fact it's not possible" is quite a common false
->
-> positive cause...
->
->
->
-The analysers don't complain because a multiply can overflow.
->
->
-They complain because the code indicates that a larger result is
->
-expected, for example uint64_t = uint32_t * uint32_t. They would not
->
-complain for the same multiplication if it were assigned to a uint32_t.
-I agree this is an anti-pattern.
-
->
-So there is a simple solution to write the code in a way which avoids
->
-false positives...
-You wrote elsewhere in this thread:
-
-    Either the assigned value should use the same data type as the
-    factors (possible when there is never an overflow, avoids a size
-    extension), or the multiplication could use the larger data type by
-    adding a type cast to one of the factors (then an overflow cannot
-    happen, static code analysers and human reviewers have an easier
-    job, but the multiplication costs more time).
-
-Makes sense to me.
-
-On 7/14/19 5:30 PM, Peter Maydell wrote:
->
-I had a look at some of these before, but mostly I came
->
-to the conclusion that it wasn't worth trying to put the
->
-effort into keeping up with the site because they didn't
->
-seem to provide any useful way to mark things as false
->
-positives. Coverity has its flaws but at least you can do
->
-that kind of thing in its UI (it runs at about a 33% fp
->
-rate, I think.)
-Yes, LGTM wants you to modify the source code with
-
-  /* lgtm [cpp/some-warning-code] */
-
-and on the same line as the reported problem.  Which is mildly annoying in that
-you're definitely committing to LGTM in the long term.  Also for any
-non-trivial bit of code, it will almost certainly run over 80 columns.
-
-
-r~
-
diff --git a/results/classifier/016/none/42613410 b/results/classifier/016/none/42613410
deleted file mode 100644
index 387e80bd..00000000
--- a/results/classifier/016/none/42613410
+++ /dev/null
@@ -1,176 +0,0 @@
-network: 0.116
-x86: 0.043
-TCG: 0.038
-operating system: 0.031
-files: 0.031
-register: 0.030
-socket: 0.029
-virtual: 0.026
-i386: 0.021
-ppc: 0.020
-PID: 0.020
-VMM: 0.020
-hypervisor: 0.018
-arm: 0.018
-device: 0.017
-risc-v: 0.016
-alpha: 0.016
-boot: 0.013
-vnc: 0.013
-semantic: 0.012
-debug: 0.010
-KVM: 0.006
-kernel: 0.005
-user-level: 0.005
-performance: 0.004
-peripherals: 0.003
-architecture: 0.003
-permissions: 0.002
-graphic: 0.002
-assembly: 0.001
-mistranslation: 0.001
-
-[Qemu-devel] [PATCH, Bug 1612908] scripts: Add TCP endpoints for qom-* scripts
-
-From: Carl Allendorph <address@hidden>
-
-I've created a patch for bug #1612908. The current docs for the scripts
-in the "scripts/qmp/" directory suggest that both unix sockets and
-tcp endpoints can be used. The TCP endpoints don't work for most of the
-scripts, with notable exception of 'qmp-shell'. This patch attempts to
-refactor the process of distinguishing between unix path endpoints and
-tcp endpoints to work for all of these scripts.
-
-Carl Allendorph (1):
-  scripts: Add ability for qom-* python scripts to target tcp endpoints
-
- scripts/qmp/qmp-shell | 22 ++--------------------
- scripts/qmp/qmp.py    | 23 ++++++++++++++++++++---
- 2 files changed, 22 insertions(+), 23 deletions(-)
-
---
-2.7.4
-
-From: Carl Allendorph <address@hidden>
-
-The current code for QEMUMonitorProtocol accepts both a unix socket
-endpoint as a string and a tcp endpoint as a tuple. Most of the scripts
-that use this class don't massage the command line argument to generate
-a tuple. This patch refactors qmp-shell slightly to reuse the existing
-parsing of the "host:port" string for all the qom-* scripts.
-
-Signed-off-by: Carl Allendorph <address@hidden>
----
- scripts/qmp/qmp-shell | 22 ++--------------------
- scripts/qmp/qmp.py    | 23 ++++++++++++++++++++---
- 2 files changed, 22 insertions(+), 23 deletions(-)
-
-diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
-index 0373b24..8a2a437 100755
---- a/scripts/qmp/qmp-shell
-+++ b/scripts/qmp/qmp-shell
-@@ -83,9 +83,6 @@ class QMPCompleter(list):
- class QMPShellError(Exception):
-     pass
- 
--class QMPShellBadPort(QMPShellError):
--    pass
--
- class FuzzyJSON(ast.NodeTransformer):
-     '''This extension of ast.NodeTransformer filters literal "true/false/null"
-     values in an AST and replaces them by proper "True/False/None" values that
-@@ -103,28 +100,13 @@ class FuzzyJSON(ast.NodeTransformer):
- #       _execute_cmd()). Let's design a better one.
- class QMPShell(qmp.QEMUMonitorProtocol):
-     def __init__(self, address, pretty=False):
--        qmp.QEMUMonitorProtocol.__init__(self, self.__get_address(address))
-+        qmp.QEMUMonitorProtocol.__init__(self, address)
-         self._greeting = None
-         self._completer = None
-         self._pretty = pretty
-         self._transmode = False
-         self._actions = list()
- 
--    def __get_address(self, arg):
--        """
--        Figure out if the argument is in the port:host form, if it's not it's
--        probably a file path.
--        """
--        addr = arg.split(':')
--        if len(addr) == 2:
--            try:
--                port = int(addr[1])
--            except ValueError:
--                raise QMPShellBadPort
--            return ( addr[0], port )
--        # socket path
--        return arg
--
-     def _fill_completion(self):
-         for cmd in self.cmd('query-commands')['return']:
-             self._completer.append(cmd['name'])
-@@ -400,7 +382,7 @@ def main():
- 
-         if qemu is None:
-             fail_cmdline()
--    except QMPShellBadPort:
-+    except qmp.QMPShellBadPort:
-         die('bad port number in command-line')
- 
-     try:
-diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py
-index 62d3651..261ece8 100644
---- a/scripts/qmp/qmp.py
-+++ b/scripts/qmp/qmp.py
-@@ -25,21 +25,23 @@ class QMPCapabilitiesError(QMPError):
- class QMPTimeoutError(QMPError):
-     pass
- 
-+class QMPShellBadPort(QMPError):
-+    pass
-+
- class QEMUMonitorProtocol:
-     def __init__(self, address, server=False, debug=False):
-         """
-         Create a QEMUMonitorProtocol class.
- 
-         @param address: QEMU address, can be either a unix socket path (string)
--                        or a tuple in the form ( address, port ) for a TCP
--                        connection
-+                        or a TCP endpoint (string in the format "host:port")
-         @param server: server mode listens on the socket (bool)
-         @raise socket.error on socket connection errors
-         @note No connection is established, this is done by the connect() or
-               accept() methods
-         """
-         self.__events = []
--        self.__address = address
-+        self.__address = self.__get_address(address)
-         self._debug = debug
-         self.__sock = self.__get_sock()
-         if server:
-@@ -47,6 +49,21 @@ class QEMUMonitorProtocol:
-             self.__sock.bind(self.__address)
-             self.__sock.listen(1)
- 
-+    def __get_address(self, arg):
-+        """
-+        Figure out if the argument is in the port:host form, if it's not it's
-+        probably a file path.
-+        """
-+        addr = arg.split(':')
-+        if len(addr) == 2:
-+            try:
-+                port = int(addr[1])
-+            except ValueError:
-+                raise QMPShellBadPort
-+            return ( addr[0], port )
-+        # socket path
-+        return arg
-+
-     def __get_sock(self):
-         if isinstance(self.__address, tuple):
-             family = socket.AF_INET
--- 
-2.7.4
-
diff --git a/results/classifier/016/none/42974450 b/results/classifier/016/none/42974450
deleted file mode 100644
index 9ab3582a..00000000
--- a/results/classifier/016/none/42974450
+++ /dev/null
@@ -1,456 +0,0 @@
-operating system: 0.713
-kernel: 0.463
-debug: 0.442
-hypervisor: 0.390
-x86: 0.334
-virtual: 0.259
-files: 0.192
-TCG: 0.182
-register: 0.171
-device: 0.116
-KVM: 0.071
-i386: 0.064
-VMM: 0.054
-PID: 0.052
-ppc: 0.049
-boot: 0.047
-assembly: 0.037
-architecture: 0.035
-socket: 0.033
-network: 0.028
-user-level: 0.023
-risc-v: 0.023
-arm: 0.022
-semantic: 0.017
-vnc: 0.014
-alpha: 0.007
-peripherals: 0.007
-performance: 0.005
-permissions: 0.004
-graphic: 0.002
-mistranslation: 0.001
-
-[Bug Report] Possible Missing Endianness Conversion
-
-The virtio packed virtqueue support patch[1] suggests converting
-endianness by lines:
-
-virtio_tswap16s(vdev, &e->off_wrap);
-virtio_tswap16s(vdev, &e->flags);
-
-Though both of these conversion statements aren't present in the
-latest qemu code here[2]
-
-Is this intentional?
-
-[1]:
-https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html
-[2]:
-https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314
-
-CCing Jason.
-
-On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
->
->
-The virtio packed virtqueue support patch[1] suggests converting
->
-endianness by lines:
->
->
-virtio_tswap16s(vdev, &e->off_wrap);
->
-virtio_tswap16s(vdev, &e->flags);
->
->
-Though both of these conversion statements aren't present in the
->
-latest qemu code here[2]
->
->
-Is this intentional?
-Good catch!
-
-It looks like it was removed (maybe by mistake) by commit
-d152cdd6f6 ("virtio: use virtio accessor to access packed event")
-
-Jason can you confirm that?
-
-Thanks,
-Stefano
-
->
->
-[1]:
-https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html
->
-[2]:
-https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314
->
-
-On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
->
->
-CCing Jason.
->
->
-On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
->
->
->
-> The virtio packed virtqueue support patch[1] suggests converting
->
-> endianness by lines:
->
->
->
-> virtio_tswap16s(vdev, &e->off_wrap);
->
-> virtio_tswap16s(vdev, &e->flags);
->
->
->
-> Though both of these conversion statements aren't present in the
->
-> latest qemu code here[2]
->
->
->
-> Is this intentional?
->
->
-Good catch!
->
->
-It looks like it was removed (maybe by mistake) by commit
->
-d152cdd6f6 ("virtio: use virtio accessor to access packed event")
-That commit changes from:
-
--    address_space_read_cached(cache, off_off, &e->off_wrap,
--                              sizeof(e->off_wrap));
--    virtio_tswap16s(vdev, &e->off_wrap);
-
-which does a byte read of 2 bytes and then swaps the bytes
-depending on the host endianness and the value of
-virtio_access_is_big_endian()
-
-to this:
-
-+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
-
-virtio_lduw_phys_cached() is a small function which calls
-either lduw_be_phys_cached() or lduw_le_phys_cached()
-depending on the value of virtio_access_is_big_endian().
-(And lduw_be_phys_cached() and lduw_le_phys_cached() do
-the right thing for the host-endianness to do a "load
-a specifically big or little endian 16-bit value".)
-
-Which is to say that because we use a load/store function that's
-explicit about the size of the data type it is accessing, the
-function itself can handle doing the load as big or little
-endian, rather than the calling code having to do a manual swap after
-it has done a load-as-bag-of-bytes. This is generally preferable
-as it's less error-prone.
-
-(Explicit swap-after-loading still has a place where the
-code is doing a load of a whole structure out of the
-guest and then swapping each struct field after the fact,
-because it means we can do a single load-from-guest-memory
-rather than a whole sequence of calls all the way down
-through the memory subsystem.)
-
-thanks
--- PMM
-
-On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
-On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
-CCing Jason.
-
-On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
->
-> The virtio packed virtqueue support patch[1] suggests converting
-> endianness by lines:
->
-> virtio_tswap16s(vdev, &e->off_wrap);
-> virtio_tswap16s(vdev, &e->flags);
->
-> Though both of these conversion statements aren't present in the
-> latest qemu code here[2]
->
-> Is this intentional?
-
-Good catch!
-
-It looks like it was removed (maybe by mistake) by commit
-d152cdd6f6 ("virtio: use virtio accessor to access packed event")
-That commit changes from:
-
--    address_space_read_cached(cache, off_off, &e->off_wrap,
--                              sizeof(e->off_wrap));
--    virtio_tswap16s(vdev, &e->off_wrap);
-
-which does a byte read of 2 bytes and then swaps the bytes
-depending on the host endianness and the value of
-virtio_access_is_big_endian()
-
-to this:
-
-+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
-
-virtio_lduw_phys_cached() is a small function which calls
-either lduw_be_phys_cached() or lduw_le_phys_cached()
-depending on the value of virtio_access_is_big_endian().
-(And lduw_be_phys_cached() and lduw_le_phys_cached() do
-the right thing for the host-endianness to do a "load
-a specifically big or little endian 16-bit value".)
-
-Which is to say that because we use a load/store function that's
-explicit about the size of the data type it is accessing, the
-function itself can handle doing the load as big or little
-endian, rather than the calling code having to do a manual swap after
-it has done a load-as-bag-of-bytes. This is generally preferable
-as it's less error-prone.
-Thanks for the details!
-
-So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
-
-I mean:
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
-index 893a072c9d..2e5e67bdb9 100644
---- a/hw/virtio/virtio.c
-+++ b/hw/virtio/virtio.c
-@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
-     /* Make sure flags is seen before off_wrap */
-     smp_rmb();
-     e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
--    virtio_tswap16s(vdev, &e->flags);
- }
-
- static void vring_packed_off_wrap_write(VirtIODevice *vdev,
-
-Thanks,
-Stefano
-(Explicit swap-after-loading still has a place where the
-code is doing a load of a whole structure out of the
-guest and then swapping each struct field after the fact,
-because it means we can do a single load-from-guest-memory
-rather than a whole sequence of calls all the way down
-through the memory subsystem.)
-
-thanks
--- PMM
-
-On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote:
->
->
-On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
->
->On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
->
->>
->
->> CCing Jason.
->
->>
->
->> On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
->
->> >
->
->> > The virtio packed virtqueue support patch[1] suggests converting
->
->> > endianness by lines:
->
->> >
->
->> > virtio_tswap16s(vdev, &e->off_wrap);
->
->> > virtio_tswap16s(vdev, &e->flags);
->
->> >
->
->> > Though both of these conversion statements aren't present in the
->
->> > latest qemu code here[2]
->
->> >
->
->> > Is this intentional?
->
->>
->
->> Good catch!
->
->>
->
->> It looks like it was removed (maybe by mistake) by commit
->
->> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
->
->
->
->That commit changes from:
->
->
->
->-    address_space_read_cached(cache, off_off, &e->off_wrap,
->
->-                              sizeof(e->off_wrap));
->
->-    virtio_tswap16s(vdev, &e->off_wrap);
->
->
->
->which does a byte read of 2 bytes and then swaps the bytes
->
->depending on the host endianness and the value of
->
->virtio_access_is_big_endian()
->
->
->
->to this:
->
->
->
->+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
->
->
->
->virtio_lduw_phys_cached() is a small function which calls
->
->either lduw_be_phys_cached() or lduw_le_phys_cached()
->
->depending on the value of virtio_access_is_big_endian().
->
->(And lduw_be_phys_cached() and lduw_le_phys_cached() do
->
->the right thing for the host-endianness to do a "load
->
->a specifically big or little endian 16-bit value".)
->
->
->
->Which is to say that because we use a load/store function that's
->
->explicit about the size of the data type it is accessing, the
->
->function itself can handle doing the load as big or little
->
->endian, rather than the calling code having to do a manual swap after
->
->it has done a load-as-bag-of-bytes. This is generally preferable
->
->as it's less error-prone.
->
->
-Thanks for the details!
->
->
-So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
->
->
-I mean:
->
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
->
-index 893a072c9d..2e5e67bdb9 100644
->
---- a/hw/virtio/virtio.c
->
-+++ b/hw/virtio/virtio.c
->
-@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
->
-/* Make sure flags is seen before off_wrap */
->
-smp_rmb();
->
-e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
->
--    virtio_tswap16s(vdev, &e->flags);
->
-}
-That definitely looks like it's probably not correct...
-
--- PMM
-
-On Fri, Jun 28, 2024 at 03:53:09PM GMT, Peter Maydell wrote:
-On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote:
-On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
->On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
->>
->> CCing Jason.
->>
->> On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
->> >
->> > The virtio packed virtqueue support patch[1] suggests converting
->> > endianness by lines:
->> >
->> > virtio_tswap16s(vdev, &e->off_wrap);
->> > virtio_tswap16s(vdev, &e->flags);
->> >
->> > Though both of these conversion statements aren't present in the
->> > latest qemu code here[2]
->> >
->> > Is this intentional?
->>
->> Good catch!
->>
->> It looks like it was removed (maybe by mistake) by commit
->> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
->
->That commit changes from:
->
->-    address_space_read_cached(cache, off_off, &e->off_wrap,
->-                              sizeof(e->off_wrap));
->-    virtio_tswap16s(vdev, &e->off_wrap);
->
->which does a byte read of 2 bytes and then swaps the bytes
->depending on the host endianness and the value of
->virtio_access_is_big_endian()
->
->to this:
->
->+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
->
->virtio_lduw_phys_cached() is a small function which calls
->either lduw_be_phys_cached() or lduw_le_phys_cached()
->depending on the value of virtio_access_is_big_endian().
->(And lduw_be_phys_cached() and lduw_le_phys_cached() do
->the right thing for the host-endianness to do a "load
->a specifically big or little endian 16-bit value".)
->
->Which is to say that because we use a load/store function that's
->explicit about the size of the data type it is accessing, the
->function itself can handle doing the load as big or little
->endian, rather than the calling code having to do a manual swap after
->it has done a load-as-bag-of-bytes. This is generally preferable
->as it's less error-prone.
-
-Thanks for the details!
-
-So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
-
-I mean:
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
-index 893a072c9d..2e5e67bdb9 100644
---- a/hw/virtio/virtio.c
-+++ b/hw/virtio/virtio.c
-@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
-      /* Make sure flags is seen before off_wrap */
-      smp_rmb();
-      e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
--    virtio_tswap16s(vdev, &e->flags);
-  }
-That definitely looks like it's probably not correct...
-Yeah, I just sent that patch:
-20240701075208.19634-1-sgarzare@redhat.com
-">https://lore.kernel.org/qemu-devel/
-20240701075208.19634-1-sgarzare@redhat.com
-We can continue the discussion there.
-
-Thanks,
-Stefano
-
diff --git a/results/classifier/016/none/48245039 b/results/classifier/016/none/48245039
deleted file mode 100644
index 913c2333..00000000
--- a/results/classifier/016/none/48245039
+++ /dev/null
@@ -1,557 +0,0 @@
-user-level: 0.787
-performance: 0.642
-operating system: 0.416
-risc-v: 0.375
-debug: 0.341
-x86: 0.185
-TCG: 0.172
-ppc: 0.166
-device: 0.139
-arm: 0.119
-VMM: 0.111
-boot: 0.111
-files: 0.104
-PID: 0.099
-vnc: 0.095
-register: 0.088
-socket: 0.085
-network: 0.081
-i386: 0.071
-alpha: 0.059
-hypervisor: 0.056
-virtual: 0.055
-peripherals: 0.054
-kernel: 0.026
-semantic: 0.025
-architecture: 0.011
-KVM: 0.010
-mistranslation: 0.005
-assembly: 0.004
-graphic: 0.004
-permissions: 0.002
-
-[Qemu-devel] [BUG] gcov support appears to be broken
-
-Hello, according to out docs, here is the procedure that should produce 
-coverage report for execution of the complete "make check":
-
-#./configure --enable-gcov
-#make
-#make check
-#make coverage-report
-
-It seems that first three commands execute as expected. (For example, there are 
-plenty of files generated by "make check" that would've not been generated if 
-"enable-gcov" hadn't been chosen.) However, the last command complains about 
-some missing files related to FP support. If those files are added (for 
-example, artificially, using "touch <missing-file"), that it starts complaining 
-about missing some decodetree-generated files. Other kinds of files are 
-involved too.
-
-It would be nice to have coverage support working. Please somebody take a look, 
-or explain if I make a mistake or misunderstood our gcov support.
-
-Yours,
-Aleksandar
-
-On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote:
->
->
-Hello, according to out docs, here is the procedure that should produce
->
-coverage report for execution of the complete "make check":
->
->
-#./configure --enable-gcov
->
-#make
->
-#make check
->
-#make coverage-report
->
->
-It seems that first three commands execute as expected. (For example, there
->
-are plenty of files generated by "make check" that would've not been
->
-generated if "enable-gcov" hadn't been chosen.) However, the last command
->
-complains about some missing files related to FP support. If those files are
->
-added (for example, artificially, using "touch <missing-file"), that it
->
-starts complaining about missing some decodetree-generated files. Other kinds
->
-of files are involved too.
->
->
-It would be nice to have coverage support working. Please somebody take a
->
-look, or explain if I make a mistake or misunderstood our gcov support.
-Cc'ing Alex who's probably the closest we have to a gcov expert.
-
-(make/make check of a --enable-gcov build is in the set of things our
-Travis CI setup runs, so we do defend that part against regressions.)
-
-thanks
--- PMM
-
-Peter Maydell <address@hidden> writes:
-
->
-On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote:
->
->
->
-> Hello, according to out docs, here is the procedure that should produce
->
-> coverage report for execution of the complete "make check":
->
->
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-> support. If those files are added (for example, artificially, using
->
-> "touch <missing-file"), that it starts complaining about missing some
->
-> decodetree-generated files. Other kinds of files are involved too.
-The gcov tool is fairly noisy about missing files but that just
-indicates the tests haven't exercised those code paths. "make check"
-especially doesn't touch much of the TCG code and a chunk of floating
-point.
-
->
->
->
-> It would be nice to have coverage support working. Please somebody
->
-> take a look, or explain if I make a mistake or misunderstood our gcov
->
-> support.
-So your failure mode is no report is generated at all? It's working for
-me here.
-
->
->
-Cc'ing Alex who's probably the closest we have to a gcov expert.
->
->
-(make/make check of a --enable-gcov build is in the set of things our
->
-Travis CI setup runs, so we do defend that part against regressions.)
-We defend the build but I have just checked and it seems our
-check_coverage script is currently failing:
-https://travis-ci.org/stsquad/qemu/jobs/567809808#L10328
-But as it's an after_success script it doesn't fail the build.
-
->
->
-thanks
->
--- PMM
---
-Alex BennÃ©e
-
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-So your failure mode is no report is generated at all? It's working for
->
-me here.
-Alex, no report is generated for my test setups - in fact, "make 
-coverage-report" even says that it explicitly deletes what appears to be the 
-main coverage report html file).
-
-This is the terminal output of an unsuccessful executions of "make 
-coverage-report" for recent ToT:
-
-~/Build/qemu-TOT-TEST$ make coverage-report
-make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp'
-make[1]: Nothing to be done for 'all'.
-make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp'
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 'wrap.inc.c'
-Makefile:1048: recipe for target 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed
-make: *** 
-[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html'
-
-This instance is executed in QEMU 3.0 source tree: (so, it looks the problem 
-existed for quite some time)
-
-~/Build/qemu-3.0$ make coverage-report
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 
-'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c'
-Makefile:992: recipe for target 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed
-make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] 
-Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html'
-
-Fond regards,
-Aleksandar
-
-
->
-Alex BennÃ©e
-
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-So your failure mode is no report is generated at all? It's working for
->
-me here.
-Another piece of info:
-
-~/Build/qemu-TOT-TEST$ gcov --version
-gcov (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010
-Copyright (C) 2015 Free Software Foundation, Inc.
-This is free software; see the source for copying conditions.
-There is NO warranty; not even for MERCHANTABILITY or 
-FITNESS FOR A PARTICULAR PURPOSE.
-
-:~/Build/qemu-TOT-TEST$ gcc --version
-gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
-Copyright (C) 2017 Free Software Foundation, Inc.
-This is free software; see the source for copying conditions.  There is NO
-warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-
-
-
-
-Alex, no report is generated for my test setups - in fact, "make 
-coverage-report" even says that it explicitly deletes what appears to be the 
-main coverage report html file).
-
-This is the terminal output of an unsuccessful executions of "make 
-coverage-report" for recent ToT:
-
-~/Build/qemu-TOT-TEST$ make coverage-report
-make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp'
-make[1]: Nothing to be done for 'all'.
-make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp'
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 'wrap.inc.c'
-Makefile:1048: recipe for target 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed
-make: *** 
-[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html'
-
-This instance is executed in QEMU 3.0 source tree: (so, it looks the problem 
-existed for quite some time)
-
-~/Build/qemu-3.0$ make coverage-report
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 
-'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c'
-Makefile:992: recipe for target 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed
-make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] 
-Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html'
-
-Fond regards,
-Aleksandar
-
-
->
-Alex BennÃ©e
-
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-So your failure mode is no report is generated at all? It's working for
->
-me here.
-Alex, here is the thing:
-
-Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from 
-git repo to the most recent 4.1 (actually, to a dev version, from the very tip 
-of the tree), and "make coverage-report" started generating coverage reports. 
-It did emit some error messages (totally different than previous), but still it 
-did not stop like it used to do with gcovr 3.2.
-
-Perhaps you would want to add some gcov/gcovr minimal version info in our docs. 
-(or at least a statement "this was tested with such and such gcc, gcov and 
-gcovr", etc.?)
-
-Coverage report looked fine at first glance, but it a kind of disappointed me 
-when I digged deeper into its content - for example, it shows very low coverage 
-for our FP code (softfloat), while, in fact, we know that "make check" contains 
-detailed tests on FP functionalities. But this is most likely a separate 
-problem of a very different nature, perhaps the issue of separate git repo for 
-FP tests (testfloat) that our FP tests use as a mid-layer.
-
-I'll try how everything works with my test examples, and will let you know.
-
-Your help is greatly appreciated,
-Aleksandar
-
-Fond regards,
-Aleksandar
-
-
->
-Alex BennÃ©e
-
-Aleksandar Markovic <address@hidden> writes:
-
->
->> #./configure --enable-gcov
->
->> #make
->
->> #make check
->
->> #make coverage-report
->
->>
->
->> It seems that first three commands execute as expected. (For example,
->
->> there are plenty of files generated by "make check" that would've not
->
->> been generated if "enable-gcov" hadn't been chosen.) However, the
->
->> last command complains about some missing files related to FP
->
->
-> So your failure mode is no report is generated at all? It's working for
->
-> me here.
->
->
-Alex, here is the thing:
->
->
-Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from
->
-git repo to the most recent 4.1 (actually, to a dev version, from the very
->
-tip of the tree), and "make coverage-report" started generating coverage
->
-reports. It did emit some error messages (totally different than previous),
->
-but still it did not stop like it used to do with gcovr 3.2.
->
->
-Perhaps you would want to add some gcov/gcovr minimal version info in our
->
-docs. (or at least a statement "this was tested with such and such gcc, gcov
->
-and gcovr", etc.?)
->
->
-Coverage report looked fine at first glance, but it a kind of
->
-disappointed me when I digged deeper into its content - for example,
->
-it shows very low coverage for our FP code (softfloat), while, in
->
-fact, we know that "make check" contains detailed tests on FP
->
-functionalities. But this is most likely a separate problem of a very
->
-different nature, perhaps the issue of separate git repo for FP tests
->
-(testfloat) that our FP tests use as a mid-layer.
-I get:
-
-68.6 %  2593 / 3782     62.2 %  1690 / 2718
-
-Which is not bad considering we don't exercise the 80 and 128 bit
-softfloat code at all (which is not shared by the re-factored 16/32/64
-bit code).
-
->
->
-I'll try how everything works with my test examples, and will let you know.
->
->
-Your help is greatly appreciated,
->
-Aleksandar
->
->
-Fond regards,
->
-Aleksandar
->
->
->
-> Alex BennÃ©e
---
-Alex BennÃ©e
-
->
-> it shows very low coverage for our FP code (softfloat), while, in
->
-> fact, we know that "make check" contains detailed tests on FP
->
-> functionalities. But this is most likely a separate problem of a very
->
-> different nature, perhaps the issue of separate git repo for FP tests
->
-> (testfloat) that our FP tests use as a mid-layer.
->
->
-I get:
->
->
-68.6 %  2593 / 3782     62.2 %  1690 / 2718
->
-I would expect that kind of result too.
-
-However, I get:
-
-File:   fpu/softfloat.c                 Lines:  8       3334    0.2 %
-Date:   2019-08-05 19:56:58             Branches:       3       2376    0.1 %
-
-:(
-
-OK, I'll try to figure that out, and most likely I could live with it if it is 
-an isolated problem.
-
-Thank you for your assistance in this matter,
-Aleksandar
-
->
-Which is not bad considering we don't exercise the 80 and 128 bit
->
-softfloat code at all (which is not shared by the re-factored 16/32/64
->
-bit code).
->
->
-Alex BennÃ©e
-
->
-> it shows very low coverage for our FP code (softfloat), while, in
->
-> fact, we know that "make check" contains detailed tests on FP
->
-> functionalities. But this is most likely a separate problem of a very
->
-> different nature, perhaps the issue of separate git repo for FP tests
->
-> (testfloat) that our FP tests use as a mid-layer.
->
->
-I get:
->
->
-68.6 %  2593 / 3782     62.2 %  1690 / 2718
->
-This problem is solved too. (and it is my fault)
-
-I worked with multiple versions of QEMU, and my previous low-coverage results 
-were for QEMU 3.0, and for that version the directory tests/fp did not even 
-exist. :D (<blush>)
-
-For QEMU ToT, I get now:
-
-fpu/softfloat.c         
-        68.8 %  2592 / 3770     62.3 %  1693 / 2718
-
-which is identical for all intents and purposes to your result.
-
-Yours cordially,
-Aleksandar
-
diff --git a/results/classifier/016/none/50773216 b/results/classifier/016/none/50773216
deleted file mode 100644
index 5a856c2f..00000000
--- a/results/classifier/016/none/50773216
+++ /dev/null
@@ -1,137 +0,0 @@
-virtual: 0.431
-debug: 0.366
-register: 0.178
-x86: 0.170
-vnc: 0.116
-operating system: 0.087
-hypervisor: 0.084
-files: 0.082
-PID: 0.068
-TCG: 0.058
-i386: 0.053
-network: 0.041
-user-level: 0.040
-performance: 0.039
-kernel: 0.038
-semantic: 0.031
-socket: 0.027
-alpha: 0.027
-ppc: 0.026
-device: 0.018
-boot: 0.017
-permissions: 0.007
-assembly: 0.006
-arm: 0.006
-peripherals: 0.004
-risc-v: 0.004
-VMM: 0.004
-graphic: 0.003
-architecture: 0.003
-KVM: 0.002
-mistranslation: 0.002
-
-[Qemu-devel] Can I have someone's feedback on [bug 1809075] Concurrency bug on keyboard events: capslock LED messing up keycode streams causes character misses at guest kernel
-
-Hi everyone.
-Can I please have someone's feedback on this bug?
-https://bugs.launchpad.net/qemu/+bug/1809075
-Briefly, guest OS loses characters sent to it via vnc. And I spot the
-bug in relation to ps2 driver.
-I'm thinking of possible fixes and I might want to use a memory barrier.
-But I would really like to have some suggestion from a qemu developer
-first. For example, can we brutally drop capslock LED key events in ps2
-queue?
-It is actually relevant to openQA, an automated QA tool for openSUSE.
-And this bug blocks a few test cases for us.
-Thank you in advance!
-
-Kind regards,
-Gao Zhiyuan
-
-Cc'ing Marc-AndrÃ© & Gerd.
-
-On 12/19/18 10:31 AM, Gao Zhiyuan wrote:
->
-Hi everyone.
->
->
-Can I please have someone's feedback on this bug?
->
-https://bugs.launchpad.net/qemu/+bug/1809075
->
-Briefly, guest OS loses characters sent to it via vnc. And I spot the
->
-bug in relation to ps2 driver.
->
->
-I'm thinking of possible fixes and I might want to use a memory barrier.
->
-But I would really like to have some suggestion from a qemu developer
->
-first. For example, can we brutally drop capslock LED key events in ps2
->
-queue?
->
->
-It is actually relevant to openQA, an automated QA tool for openSUSE.
->
-And this bug blocks a few test cases for us.
->
->
-Thank you in advance!
->
->
-Kind regards,
->
-Gao Zhiyuan
->
-
-On Thu, Jan 03, 2019 at 12:05:54PM +0100, Philippe Mathieu-DaudÃ© wrote:
->
-Cc'ing Marc-AndrÃ© & Gerd.
->
->
-On 12/19/18 10:31 AM, Gao Zhiyuan wrote:
->
-> Hi everyone.
->
->
->
-> Can I please have someone's feedback on this bug?
->
->
-https://bugs.launchpad.net/qemu/+bug/1809075
->
-> Briefly, guest OS loses characters sent to it via vnc. And I spot the
->
-> bug in relation to ps2 driver.
->
->
->
-> I'm thinking of possible fixes and I might want to use a memory barrier.
->
-> But I would really like to have some suggestion from a qemu developer
->
-> first. For example, can we brutally drop capslock LED key events in ps2
->
-> queue?
-There is no "capslock LED key event".  0xfa is KBD_REPLY_ACK, and the
-device queues it in response to guest port writes.  Yes, the ack can
-race with actual key events.  But IMO that isn't a bug in qemu.
-
-Probably the linux kernel just throws away everything until it got the
-ack for the port write, and that way the key event gets lost.  On
-physical hardware you will not notice because it is next to impossible
-to type fast enough to hit the race window.
-
-So, go fix the kernel.
-
-Alternatively fix vncdotool to send uppercase letters properly with
-shift key pressed.  Then qemu wouldn't generate capslock key events
-(that happens because qemu thinks guest and host capslock state is out
-of sync) and the guests's capslock led update request wouldn't get into
-the way.
-
-cheers,
-  Gerd
-
diff --git a/results/classifier/016/none/55753058 b/results/classifier/016/none/55753058
deleted file mode 100644
index 7cfee9ee..00000000
--- a/results/classifier/016/none/55753058
+++ /dev/null
@@ -1,320 +0,0 @@
-x86: 0.784
-operating system: 0.778
-kernel: 0.648
-debug: 0.645
-user-level: 0.550
-hypervisor: 0.097
-files: 0.093
-performance: 0.091
-assembly: 0.071
-virtual: 0.065
-PID: 0.040
-TCG: 0.037
-register: 0.035
-ppc: 0.022
-semantic: 0.010
-network: 0.007
-device: 0.006
-boot: 0.005
-architecture: 0.004
-i386: 0.004
-alpha: 0.003
-arm: 0.003
-socket: 0.002
-permissions: 0.002
-risc-v: 0.002
-vnc: 0.002
-graphic: 0.002
-peripherals: 0.001
-VMM: 0.001
-mistranslation: 0.001
-KVM: 0.000
-
-[RESEND][BUG FIX HELP] QEMU main thread endlessly hangs in __ppoll()
-
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may still
-exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-
-The qemu main thread endlessly hangs in the handle of the qmp statement:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-and we have the call trace looks like:
-#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
-timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
-sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
-#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
-__nfds=<optimized out>, __fds=<optimized out>)
-at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
-#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
-timeout=<optimized out>) at util/qemu-timer.c:348
-#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
-blocking=blocking@entry=true) at util/aio-posix.c:669
-#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
-ignore_bds_parents=false, parent=0x0, recursive=false,
-bs=0x55561138b0a0) at block/io.c:430
-#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
-parent=0x0, ignore_bds_parents=<optimized out>,
-poll=<optimized out>) at block/io.c:396
-#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
-child=0x7f36dc0ce380, errp=<optimized out>)
-at block/quorum.c:1063
-#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
-"colo-disk0", has_child=<optimized out>,
-child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
-errp=0x7ffc56c66f98) at blockdev.c:4494
-#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
-out>, ret=<optimized out>, errp=0x7ffc56c67018)
-at qapi/qapi-commands-block-core.c:1538
-#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
-allow_oob=<optimized out>, request=<optimized out>,
-cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
-#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
-out>, allow_oob=<optimized out>)
-at qapi/qmp-dispatch.c:175
-#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
-req=<optimized out>) at monitor/qmp.c:145
-#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized out>)
-at monitor/qmp.c:234
-#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
-util/async.c:117
-#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
-#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
-util/aio-posix.c:459
-#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
-callback=<optimized out>, user_data=<optimized out>)
-at util/async.c:260
-#17 0x00007f3c22302fbd in g_main_context_dispatch () from
-/lib/x86_64-linux-gnu/libglib-2.0.so.0
-#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
-#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
-#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
-#21 0x000055560ff600fe in main_loop () at vl.c:1814
-#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized out>,
-envp=<optimized out>) at vl.c:4503
-We found that we're doing endless check in the line of
-block/io.c:bdrv_do_drained_begin():
-BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
-and it turns out that the bdrv_drain_poll() always get true from:
-- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
-- AND atomic_read(&bs->in_flight)
-
-I personally think this is a deadlock issue in the a QEMU block layer
-(as we know, we have some #FIXME comments in related codes, such as block
-permisson update).
-Any comments are welcome and appreciated.
-
----
-thx,likexu
-
-On 2/28/21 9:39 PM, Like Xu wrote:
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
-still exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-Do you have a test case that reproduces on 5.2? It'd be nice to know if
-it was still a problem in the latest source tree or not.
---js
-The qemu main thread endlessly hangs in the handle of the qmp statement:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-and we have the call trace looks like:
-#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
-timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
-sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
-#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
-__nfds=<optimized out>, __fds=<optimized out>)
-at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
-#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
-timeout=<optimized out>) at util/qemu-timer.c:348
-#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
-blocking=blocking@entry=true) at util/aio-posix.c:669
-#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
-ignore_bds_parents=false, parent=0x0, recursive=false,
-bs=0x55561138b0a0) at block/io.c:430
-#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
-parent=0x0, ignore_bds_parents=<optimized out>,
-poll=<optimized out>) at block/io.c:396
-#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
-child=0x7f36dc0ce380, errp=<optimized out>)
-at block/quorum.c:1063
-#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
-"colo-disk0", has_child=<optimized out>,
-child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
-errp=0x7ffc56c66f98) at blockdev.c:4494
-#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
-out>, ret=<optimized out>, errp=0x7ffc56c67018)
-at qapi/qapi-commands-block-core.c:1538
-#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
-allow_oob=<optimized out>, request=<optimized out>,
-cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
-#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
-out>, allow_oob=<optimized out>)
-at qapi/qmp-dispatch.c:175
-#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
-req=<optimized out>) at monitor/qmp.c:145
-#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized
-out>) at monitor/qmp.c:234
-#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
-util/async.c:117
-#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
-#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
-util/aio-posix.c:459
-#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
-callback=<optimized out>, user_data=<optimized out>)
-at util/async.c:260
-#17 0x00007f3c22302fbd in g_main_context_dispatch () from
-/lib/x86_64-linux-gnu/libglib-2.0.so.0
-#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
-#19 os_host_main_loop_wait (timeout=<optimized out>) at
-util/main-loop.c:242
-#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
-#21 0x000055560ff600fe in main_loop () at vl.c:1814
-#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized
-out>, envp=<optimized out>) at vl.c:4503
-We found that we're doing endless check in the line of
-block/io.c:bdrv_do_drained_begin():
-Â Â Â Â BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
-and it turns out that the bdrv_drain_poll() always get true from:
-- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
-- AND atomic_read(&bs->in_flight)
-
-I personally think this is a deadlock issue in the a QEMU block layer
-(as we know, we have some #FIXME comments in related codes, such as
-block permisson update).
-Any comments are welcome and appreciated.
-
----
-thx,likexu
-
-Hi John,
-
-Thanks for your comment.
-
-On 2021/3/5 7:53, John Snow wrote:
-On 2/28/21 9:39 PM, Like Xu wrote:
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
-still exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-Do you have a test case that reproduces on 5.2? It'd be nice to know if it
-was still a problem in the latest source tree or not.
-We narrowed down the source of the bug, which basically came from
-the following qmp usage:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-One of the test cases is the COLO usage (docs/colo-proxy.txt).
-
-This issue is sporadic,the probability may be 1/15 for a io-heavy guest.
-
-I believe it's reproducible on 5.2 and the latest tree.
---js
-The qemu main thread endlessly hangs in the handle of the qmp statement:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-and we have the call trace looks like:
-#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
-timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
-sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
-#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
-__nfds=<optimized out>, __fds=<optimized out>)
-at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
-#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
-timeout=<optimized out>) at util/qemu-timer.c:348
-#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
-blocking=blocking@entry=true) at util/aio-posix.c:669
-#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
-ignore_bds_parents=false, parent=0x0, recursive=false,
-bs=0x55561138b0a0) at block/io.c:430
-#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
-parent=0x0, ignore_bds_parents=<optimized out>,
-poll=<optimized out>) at block/io.c:396
-#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
-child=0x7f36dc0ce380, errp=<optimized out>)
-at block/quorum.c:1063
-#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
-"colo-disk0", has_child=<optimized out>,
-child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
-errp=0x7ffc56c66f98) at blockdev.c:4494
-#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
-out>, ret=<optimized out>, errp=0x7ffc56c67018)
-at qapi/qapi-commands-block-core.c:1538
-#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
-allow_oob=<optimized out>, request=<optimized out>,
-cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
-#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
-out>, allow_oob=<optimized out>)
-at qapi/qmp-dispatch.c:175
-#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
-req=<optimized out>) at monitor/qmp.c:145
-#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized
-out>) at monitor/qmp.c:234
-#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
-util/async.c:117
-#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
-#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
-util/aio-posix.c:459
-#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
-callback=<optimized out>, user_data=<optimized out>)
-at util/async.c:260
-#17 0x00007f3c22302fbd in g_main_context_dispatch () from
-/lib/x86_64-linux-gnu/libglib-2.0.so.0
-#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
-#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
-#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
-#21 0x000055560ff600fe in main_loop () at vl.c:1814
-#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized
-out>, envp=<optimized out>) at vl.c:4503
-We found that we're doing endless check in the line of
-block/io.c:bdrv_do_drained_begin():
-Â Â Â Â Â BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
-and it turns out that the bdrv_drain_poll() always get true from:
-- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
-- AND atomic_read(&bs->in_flight)
-
-I personally think this is a deadlock issue in the a QEMU block layer
-(as we know, we have some #FIXME comments in related codes, such as block
-permisson update).
-Any comments are welcome and appreciated.
-
----
-thx,likexu
-
-On 3/4/21 10:08 PM, Like Xu wrote:
-Hi John,
-
-Thanks for your comment.
-
-On 2021/3/5 7:53, John Snow wrote:
-On 2/28/21 9:39 PM, Like Xu wrote:
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
-still exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-Do you have a test case that reproduces on 5.2? It'd be nice to know
-if it was still a problem in the latest source tree or not.
-We narrowed down the source of the bug, which basically came from
-the following qmp usage:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-One of the test cases is the COLO usage (docs/colo-proxy.txt).
-
-This issue is sporadic,the probability may be 1/15 for a io-heavy guest.
-
-I believe it's reproducible on 5.2 and the latest tree.
-Can you please test and confirm that this is the case, and then file a
-bug report on the LP:
-https://launchpad.net/qemu
-and include:
-- The exact commit you used (current origin/master debug build would be
-the most ideal.)
-- Which QEMU binary you are using (qemu-system-x86_64?)
-- The shortest command line you are aware of that reproduces the problem
-- The host OS and kernel version
-- An updated call trace
-- Any relevant commands issued prior to the one that caused the hang; or
-detailed reproduction steps if possible.
-Thanks,
---js
-
diff --git a/results/classifier/016/none/56309929 b/results/classifier/016/none/56309929
deleted file mode 100644
index 9eb7151e..00000000
--- a/results/classifier/016/none/56309929
+++ /dev/null
@@ -1,207 +0,0 @@
-kernel: 0.698
-files: 0.249
-operating system: 0.101
-semantic: 0.071
-TCG: 0.056
-debug: 0.041
-virtual: 0.024
-ppc: 0.017
-PID: 0.015
-hypervisor: 0.015
-register: 0.013
-VMM: 0.013
-x86: 0.011
-user-level: 0.007
-device: 0.007
-performance: 0.007
-network: 0.004
-architecture: 0.003
-risc-v: 0.003
-alpha: 0.003
-KVM: 0.003
-permissions: 0.002
-arm: 0.002
-peripherals: 0.002
-vnc: 0.002
-socket: 0.002
-boot: 0.002
-graphic: 0.001
-assembly: 0.001
-mistranslation: 0.001
-i386: 0.001
-
-[Qemu-devel] [BUG 2.6] Broken CONFIG_TPM?
-
-A compilation test with clang -Weverything reported this problem:
-
-config-host.h:112:20: warning: '$' in identifier
-[-Wdollar-in-identifier-extension]
-
-The line of code looks like this:
-
-#define CONFIG_TPM $(CONFIG_SOFTMMU)
-
-This is fine for Makefile code, but won't work as expected in C code.
-
-Am 28.04.2016 um 22:33 schrieb Stefan Weil:
->
-A compilation test with clang -Weverything reported this problem:
->
->
-config-host.h:112:20: warning: '$' in identifier
->
-[-Wdollar-in-identifier-extension]
->
->
-The line of code looks like this:
->
->
-#define CONFIG_TPM $(CONFIG_SOFTMMU)
->
->
-This is fine for Makefile code, but won't work as expected in C code.
->
-A complete 64 bit build with clang -Weverything creates a log file of
-1.7 GB.
-Here are the uniq warnings sorted by their frequency:
-
-      1 -Wflexible-array-extensions
-      1 -Wgnu-folding-constant
-      1 -Wunknown-pragmas
-      1 -Wunknown-warning-option
-      1 -Wunreachable-code-loop-increment
-      2 -Warray-bounds-pointer-arithmetic
-      2 -Wdollar-in-identifier-extension
-      3 -Woverlength-strings
-      3 -Wweak-vtables
-      4 -Wgnu-empty-struct
-      4 -Wstring-conversion
-      6 -Wclass-varargs
-      7 -Wc99-extensions
-      7 -Wc++-compat
-      8 -Wfloat-equal
-     11 -Wformat-nonliteral
-     16 -Wshift-negative-value
-     19 -Wglobal-constructors
-     28 -Wc++11-long-long
-     29 -Wembedded-directive
-     38 -Wvla
-     40 -Wcovered-switch-default
-     40 -Wmissing-variable-declarations
-     49 -Wold-style-cast
-     53 -Wgnu-conditional-omitted-operand
-     56 -Wformat-pedantic
-     61 -Wvariadic-macros
-     77 -Wc++11-extensions
-     83 -Wgnu-flexible-array-initializer
-     83 -Wzero-length-array
-     96 -Wgnu-designator
-    102 -Wmissing-noreturn
-    103 -Wconditional-uninitialized
-    107 -Wdisabled-macro-expansion
-    115 -Wunreachable-code-return
-    134 -Wunreachable-code
-    243 -Wunreachable-code-break
-    257 -Wfloat-conversion
-    280 -Wswitch-enum
-    291 -Wpointer-arith
-    298 -Wshadow
-    378 -Wassign-enum
-    395 -Wused-but-marked-unused
-    420 -Wreserved-id-macro
-    493 -Wdocumentation
-    510 -Wshift-sign-overflow
-    565 -Wgnu-case-range
-    566 -Wgnu-zero-variadic-macro-arguments
-    650 -Wbad-function-cast
-    705 -Wmissing-field-initializers
-    817 -Wgnu-statement-expression
-    968 -Wdocumentation-unknown-command
-   1021 -Wextra-semi
-   1112 -Wgnu-empty-initializer
-   1138 -Wcast-qual
-   1509 -Wcast-align
-   1766 -Wextended-offsetof
-   1937 -Wsign-compare
-   2130 -Wpacked
-   2404 -Wunused-macros
-   3081 -Wpadded
-   4182 -Wconversion
-   5430 -Wlanguage-extension-token
-   6655 -Wshorten-64-to-32
-   6995 -Wpedantic
-   7354 -Wunused-parameter
-  27659 -Wsign-conversion
-
-Stefan Weil <address@hidden> writes:
-
->
-A compilation test with clang -Weverything reported this problem:
->
->
-config-host.h:112:20: warning: '$' in identifier
->
-[-Wdollar-in-identifier-extension]
->
->
-The line of code looks like this:
->
->
-#define CONFIG_TPM $(CONFIG_SOFTMMU)
->
->
-This is fine for Makefile code, but won't work as expected in C code.
-Broken in commit 3b8acc1 "configure: fix TPM logic".  Cc'ing Paolo.
-
-Impact: #ifdef CONFIG_TPM never disables code.  There are no other uses
-of CONFIG_TPM in C code.
-
-I had a quick peek at configure and create_config, but refrained from
-attempting to fix this, since I don't understand when exactly CONFIG_TPM
-should be defined.
-
-On 29 April 2016 at 08:42, Markus Armbruster <address@hidden> wrote:
->
-Stefan Weil <address@hidden> writes:
->
->
-> A compilation test with clang -Weverything reported this problem:
->
->
->
-> config-host.h:112:20: warning: '$' in identifier
->
-> [-Wdollar-in-identifier-extension]
->
->
->
-> The line of code looks like this:
->
->
->
-> #define CONFIG_TPM $(CONFIG_SOFTMMU)
->
->
->
-> This is fine for Makefile code, but won't work as expected in C code.
->
->
-Broken in commit 3b8acc1 "configure: fix TPM logic".  Cc'ing Paolo.
->
->
-Impact: #ifdef CONFIG_TPM never disables code.  There are no other uses
->
-of CONFIG_TPM in C code.
->
->
-I had a quick peek at configure and create_config, but refrained from
->
-attempting to fix this, since I don't understand when exactly CONFIG_TPM
->
-should be defined.
-Looking at 'git blame' suggests this has been wrong like this for
-some years, so we don't need to scramble to fix it for 2.6.
-
-thanks
--- PMM
-
diff --git a/results/classifier/016/none/65781993 b/results/classifier/016/none/65781993
deleted file mode 100644
index 92cd0275..00000000
--- a/results/classifier/016/none/65781993
+++ /dev/null
@@ -1,2820 +0,0 @@
-debug: 0.668
-hypervisor: 0.630
-operating system: 0.229
-socket: 0.188
-files: 0.071
-performance: 0.053
-network: 0.043
-x86: 0.027
-virtual: 0.022
-register: 0.021
-TCG: 0.019
-kernel: 0.013
-i386: 0.011
-device: 0.010
-permissions: 0.009
-alpha: 0.008
-PID: 0.008
-semantic: 0.006
-ppc: 0.006
-assembly: 0.004
-risc-v: 0.004
-user-level: 0.004
-boot: 0.003
-architecture: 0.003
-arm: 0.002
-vnc: 0.002
-VMM: 0.002
-mistranslation: 0.002
-graphic: 0.002
-peripherals: 0.001
-KVM: 0.001
-
-[Qemu-devel] 答复: Re:   答复: Re:  [BUG]COLO failover hang
-
-Thank youã
-
-I have test areadyã
-
-When the Primary Node panic,the Secondary Node qemu hang at the same placeã
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node qemu 
-will not produce the problem,but Primary Node panic canã
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-
-
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-     }
-
-
- 
-
-
-     trace_migration_socket_incoming_accepted()
-
-
-    
-
-
-     qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-     migration_channel_process_incoming(migrate_get_current(),
-
-
-                                        QIO_CHANNEL(sioc))
-
-
-     object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-æéäººï¼ address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼
-ï¼ hi.
-ï¼
-ï¼ I test the git qemu master have the same problem.
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, 
-ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼
-ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read 
-ï¼ (address@hidden, address@hidden "", 
-ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼
-ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, 
-ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at 
-ï¼ migration/qemu-file-channel.c:78
-ï¼
-ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at 
-ï¼ migration/qemu-file.c:295
-ï¼
-ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, 
-ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼
-ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at 
-ï¼ migration/qemu-file.c:568
-ï¼
-ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at 
-ï¼ migration/qemu-file.c:648
-ï¼
-ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, 
-ï¼ address@hidden) at migration/colo.c:244
-ï¼
-ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized 
-ï¼ outï¼, address@hidden, 
-ï¼ address@hidden)
-ï¼
-ï¼     at migration/colo.c:264
-ï¼
-ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread 
-ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼
-ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼
-ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼
-ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼
-ï¼ $3 = 0
-ï¼
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90, 
-ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼
-ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at 
-ï¼ gmain.c:3054
-ï¼
-ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼, 
-ï¼ address@hidden) at gmain.c:3630
-ï¼
-ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼
-ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at 
-ï¼ util/main-loop.c:258
-ï¼
-ï¼ #5  main_loop_wait (address@hidden) at 
-ï¼ util/main-loop.c:506
-ï¼
-ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼
-ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized 
-ï¼ outï¼) at vl.c:4709
-ï¼
-ï¼ (gdb) p ioc-ï¼features
-ï¼
-ï¼ $1 = 6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼
-ï¼
-ï¼ May be socket_accept_incoming_migration should 
-ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼
-ï¼
-ï¼ thank you.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼ address@hidden
-ï¼ address@hidden
-ï¼ address@hidden@huawei.comï¼
-ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼
-ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼
-ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ Do the colo have any plan for development?
-ï¼
-ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼
-ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼
-ï¼ In our internal version can run it successfully,
-ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ Next time if you have some question about COLO,
-ï¼ please cc me and zhanghailiang address@hidden
-ï¼
-ï¼
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ (gdb) bt
-ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
-ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ --
-ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼
-ï¼ -- 
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-
--- 
-Thanks
-Zhang Chen
-
-Hi,
-
-On 2017/3/21 16:10, address@hidden wrote:
-Thank youã
-
-I have test areadyã
-
-When the Primary Node panic,the Secondary Node qemu hang at the same placeã
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node qemu 
-will not produce the problem,but Primary Node panic canã
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-Yes, you are right, when we do failover for primary/secondary VM, we will 
-shutdown the related
-fd in case it is stuck in the read/write fd.
-
-It seems that you didn't follow the above introduction exactly to do the test. 
-Could you
-share your test procedures ? Especially the commands used in the test.
-
-Thanks,
-Hailiang
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-      }
-
-
-
-
-
-      trace_migration_socket_incoming_accepted()
-
-
-
-
-
-      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-      migration_channel_process_incoming(migrate_get_current(),
-
-
-                                         QIO_CHANNEL(sioc))
-
-
-      object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-æéäººï¼ address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼
-ï¼ hi.
-ï¼
-ï¼ I test the git qemu master have the same problem.
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼
-ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ (address@hidden, address@hidden "",
-ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼
-ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ migration/qemu-file-channel.c:78
-ï¼
-ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ migration/qemu-file.c:295
-ï¼
-ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼
-ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ migration/qemu-file.c:568
-ï¼
-ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ migration/qemu-file.c:648
-ï¼
-ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ address@hidden) at migration/colo.c:244
-ï¼
-ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ outï¼, address@hidden,
-ï¼ address@hidden)
-ï¼
-ï¼     at migration/colo.c:264
-ï¼
-ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼
-ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼
-ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼
-ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼
-ï¼ $3 = 0
-ï¼
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼
-ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ gmain.c:3054
-ï¼
-ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ address@hidden) at gmain.c:3630
-ï¼
-ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼
-ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ util/main-loop.c:258
-ï¼
-ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ util/main-loop.c:506
-ï¼
-ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼
-ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ outï¼) at vl.c:4709
-ï¼
-ï¼ (gdb) p ioc-ï¼features
-ï¼
-ï¼ $1 = 6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼
-ï¼
-ï¼ May be socket_accept_incoming_migration should
-ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼
-ï¼
-ï¼ thank you.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼ address@hidden
-ï¼ address@hidden
-ï¼ address@hidden@huawei.comï¼
-ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼
-ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼
-ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ Do the colo have any plan for development?
-ï¼
-ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼
-ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼
-ï¼ In our internal version can run it successfully,
-ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ Next time if you have some question about COLO,
-ï¼ please cc me and zhanghailiang address@hidden
-ï¼
-ï¼
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ (gdb) bt
-ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
-ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ --
-ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼
-ï¼ --
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-
-Hi,
-
-Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-
-Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-case COLO thread/incoming thread is stuck in read/write() while do failover,
-but it didn't take effect, because all the fd used by COLO (also migration)
-has been wrapped by qio channel, and it will not call the shutdown API if
-we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-
-Cc: Dr. David Alan Gilbert <address@hidden>
-
-I doubted migration cancel has the same problem, it may be stuck in write()
-if we tried to cancel migration.
-
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
-**errp)
-{
-    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
-    migration_channel_connect(s, ioc, NULL);
-    ... ...
-We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-and the
-migrate_fd_cancel()
-{
- ... ...
-    if (s->state == MIGRATION_STATUS_CANCELLING && f) {
-        qemu_file_shutdown(f);  --> This will not take effect. No ?
-    }
-}
-
-Thanks,
-Hailiang
-
-On 2017/3/21 16:10, address@hidden wrote:
-Thank youã
-
-I have test areadyã
-
-When the Primary Node panic,the Secondary Node qemu hang at the same placeã
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node qemu 
-will not produce the problem,but Primary Node panic canã
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-
-
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-      }
-
-
-
-
-
-      trace_migration_socket_incoming_accepted()
-
-
-
-
-
-      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-      migration_channel_process_incoming(migrate_get_current(),
-
-
-                                         QIO_CHANNEL(sioc))
-
-
-      object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-æéäººï¼ address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼
-ï¼ hi.
-ï¼
-ï¼ I test the git qemu master have the same problem.
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼
-ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ (address@hidden, address@hidden "",
-ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼
-ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ migration/qemu-file-channel.c:78
-ï¼
-ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ migration/qemu-file.c:295
-ï¼
-ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼
-ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ migration/qemu-file.c:568
-ï¼
-ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ migration/qemu-file.c:648
-ï¼
-ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ address@hidden) at migration/colo.c:244
-ï¼
-ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ outï¼, address@hidden,
-ï¼ address@hidden)
-ï¼
-ï¼     at migration/colo.c:264
-ï¼
-ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼
-ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼
-ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼
-ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼
-ï¼ $3 = 0
-ï¼
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼
-ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ gmain.c:3054
-ï¼
-ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ address@hidden) at gmain.c:3630
-ï¼
-ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼
-ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ util/main-loop.c:258
-ï¼
-ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ util/main-loop.c:506
-ï¼
-ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼
-ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ outï¼) at vl.c:4709
-ï¼
-ï¼ (gdb) p ioc-ï¼features
-ï¼
-ï¼ $1 = 6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼
-ï¼
-ï¼ May be socket_accept_incoming_migration should
-ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼
-ï¼
-ï¼ thank you.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼ address@hidden
-ï¼ address@hidden
-ï¼ address@hidden@huawei.comï¼
-ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼
-ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼
-ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ Do the colo have any plan for development?
-ï¼
-ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼
-ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼
-ï¼ In our internal version can run it successfully,
-ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ Next time if you have some question about COLO,
-ï¼ please cc me and zhanghailiang address@hidden
-ï¼
-ï¼
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ (gdb) bt
-ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
-ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ --
-ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼
-ï¼ --
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-
-* Hailiang Zhang (address@hidden) wrote:
->
-Hi,
->
->
-Thanks for reporting this, and i confirmed it in my test, and it is a bug.
->
->
-Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
->
-case COLO thread/incoming thread is stuck in read/write() while do failover,
->
-but it didn't take effect, because all the fd used by COLO (also migration)
->
-has been wrapped by qio channel, and it will not call the shutdown API if
->
-we didn't qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-QIO_CHANNEL_FEATURE_SHUTDOWN).
->
->
-Cc: Dr. David Alan Gilbert <address@hidden>
->
->
-I doubted migration cancel has the same problem, it may be stuck in write()
->
-if we tried to cancel migration.
->
->
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error
->
-**errp)
->
-{
->
-qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
->
-migration_channel_connect(s, ioc, NULL);
->
-... ...
->
-We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
->
-and the
->
-migrate_fd_cancel()
->
-{
->
-... ...
->
-if (s->state == MIGRATION_STATUS_CANCELLING && f) {
->
-qemu_file_shutdown(f);  --> This will not take effect. No ?
->
-}
->
-}
-(cc'd in Daniel Berrange).
-I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); 
-at the
-top of qio_channel_socket_new;  so I think that's safe isn't it?
-
-Dave
-
->
-Thanks,
->
-Hailiang
->
->
-On 2017/3/21 16:10, address@hidden wrote:
->
-> Thank youã
->
->
->
-> I have test areadyã
->
->
->
-> When the Primary Node panic,the Secondary Node qemu hang at the same placeã
->
->
->
-> Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node
->
-> qemu will not produce the problem,but Primary Node panic canã
->
->
->
-> I think due to the feature of channel does not support
->
-> QIO_CHANNEL_FEATURE_SHUTDOWN.
->
->
->
->
->
-> when failover,channel_shutdown could not shut down the channel.
->
->
->
->
->
-> so the colo_process_incoming_thread will hang at recvmsg.
->
->
->
->
->
-> I test a patch:
->
->
->
->
->
-> diff --git a/migration/socket.c b/migration/socket.c
->
->
->
->
->
-> index 13966f1..d65a0ea 100644
->
->
->
->
->
-> --- a/migration/socket.c
->
->
->
->
->
-> +++ b/migration/socket.c
->
->
->
->
->
-> @@ -147,8 +147,9 @@ static gboolean
->
-> socket_accept_incoming_migration(QIOChannel *ioc,
->
->
->
->
->
->       }
->
->
->
->
->
->
->
->
->
->
->
->       trace_migration_socket_incoming_accepted()
->
->
->
->
->
->
->
->
->
->
->
->       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
->
->
->
->
->
-> +    qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> QIO_CHANNEL_FEATURE_SHUTDOWN)
->
->
->
->
->
->       migration_channel_process_incoming(migrate_get_current(),
->
->
->
->
->
->                                          QIO_CHANNEL(sioc))
->
->
->
->
->
->       object_unref(OBJECT(sioc))
->
->
->
->
->
->
->
->
->
-> My test will not hang any more.
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
-> åå§é®ä»¶
->
->
->
->
->
->
->
-> åä»¶äººï¼ address@hidden
->
-> æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
->
-> æéäººï¼ address@hidden address@hidden
->
-> æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
->
-> ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
->
->
->
->
->
->
->
->
->
->
->
-> Hi,Wang.
->
->
->
-> You can test this branch:
->
->
->
->
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
->
->
->
-> and please follow wiki ensure your own configuration correctly.
->
->
->
->
-http://wiki.qemu-project.org/Features/COLO
->
->
->
->
->
-> Thanks
->
->
->
-> Zhang Chen
->
->
->
->
->
-> On 03/21/2017 03:27 PM, address@hidden wrote:
->
-> ï¼
->
-> ï¼ hi.
->
-> ï¼
->
-> ï¼ I test the git qemu master have the same problem.
->
-> ï¼
->
-> ï¼ (gdb) bt
->
-> ï¼
->
-> ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
->
-> ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->
-> ï¼
->
-> ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
->
-> ï¼ (address@hidden, address@hidden "",
->
-> ï¼ address@hidden, address@hidden) at io/channel.c:114
->
-> ï¼
->
-> ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
->
-> ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
->
-> ï¼ migration/qemu-file-channel.c:78
->
-> ï¼
->
-> ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
->
-> ï¼ migration/qemu-file.c:295
->
-> ï¼
->
-> ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
->
-> ï¼ address@hidden) at migration/qemu-file.c:555
->
-> ï¼
->
-> ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
->
-> ï¼ migration/qemu-file.c:568
->
-> ï¼
->
-> ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
->
-> ï¼ migration/qemu-file.c:648
->
-> ï¼
->
-> ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
->
-> ï¼ address@hidden) at migration/colo.c:244
->
-> ï¼
->
-> ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
->
-> ï¼ outï¼, address@hidden,
->
-> ï¼ address@hidden)
->
-> ï¼
->
-> ï¼     at migration/colo.c:264
->
-> ï¼
->
-> ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
->
-> ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
->
-> ï¼
->
-> ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->
-> ï¼
->
-> ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->
-> ï¼
->
-> ï¼ (gdb) p ioc-ï¼name
->
-> ï¼
->
-> ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->
-> ï¼
->
-> ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
->
-> ï¼
->
-> ï¼ $3 = 0
->
-> ï¼
->
-> ï¼
->
-> ï¼ (gdb) bt
->
-> ï¼
->
-> ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
->
-> ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->
-> ï¼
->
-> ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
->
-> ï¼ gmain.c:3054
->
-> ï¼
->
-> ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
->
-> ï¼ address@hidden) at gmain.c:3630
->
-> ï¼
->
-> ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->
-> ï¼
->
-> ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
->
-> ï¼ util/main-loop.c:258
->
-> ï¼
->
-> ï¼ #5  main_loop_wait (address@hidden) at
->
-> ï¼ util/main-loop.c:506
->
-> ï¼
->
-> ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->
-> ï¼
->
-> ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
->
-> ï¼ outï¼) at vl.c:4709
->
-> ï¼
->
-> ï¼ (gdb) p ioc-ï¼features
->
-> ï¼
->
-> ï¼ $1 = 6
->
-> ï¼
->
-> ï¼ (gdb) p ioc-ï¼name
->
-> ï¼
->
-> ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->
-> ï¼
->
-> ï¼
->
-> ï¼ May be socket_accept_incoming_migration should
->
-> ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->
-> ï¼
->
-> ï¼
->
-> ï¼ thank you.
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼ åå§é®ä»¶
->
-> ï¼ address@hidden
->
-> ï¼ address@hidden
->
-> ï¼ address@hidden@huawei.comï¼
->
-> ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
->
-> ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
->
-> ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
->
-> ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
->
-> ï¼ ï¼
->
-> ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
->
-> ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
->
-> ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
->
-> ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
->
-> ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
->
-> ï¼ ï¼
->
-> ï¼ ï¼ I found that the colo in qemu is not complete yet.
->
-> ï¼ ï¼ Do the colo have any plan for development?
->
-> ï¼
->
-> ï¼ Yes, We are developing. You can see some of patch we pushing.
->
-> ï¼
->
-> ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
->
-> ï¼
->
-> ï¼ In our internal version can run it successfully,
->
-> ï¼ The failover detail you can ask Zhanghailiang for help.
->
-> ï¼ Next time if you have some question about COLO,
->
-> ï¼ please cc me and zhanghailiang address@hidden
->
-> ï¼
->
-> ï¼
->
-> ï¼ Thanks
->
-> ï¼ Zhang Chen
->
-> ï¼
->
-> ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼ centos7.2+qemu2.7.50
->
-> ï¼ ï¼ (gdb) bt
->
-> ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
->
-> ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
->
-> ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0)
->
-> at
->
-> ï¼ ï¼ io/channel-socket.c:497
->
-> ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
->
-> ï¼ ï¼ address@hidden "", address@hidden,
->
-> ï¼ ï¼ address@hidden) at io/channel.c:97
->
-> ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
->
-> ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
->
-> ï¼ ï¼ migration/qemu-file-channel.c:78
->
-> ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
->
-> ï¼ ï¼ migration/qemu-file.c:257
->
-> ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
->
-> ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
->
-> ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
->
-> ï¼ ï¼ migration/qemu-file.c:523
->
-> ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
->
-> ï¼ ï¼ migration/qemu-file.c:603
->
-> ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
->
-> ï¼ ï¼ address@hidden) at migration/colo.c:215
->
-> ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
->
-> ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
->
-> ï¼ ï¼ migration/colo.c:546
->
-> ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
->
-> ï¼ ï¼ migration/colo.c:649
->
-> ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
->
-> ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼ --
->
-> ï¼ ï¼ View this message in context:
->
->
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
->
-> ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼ ï¼
->
-> ï¼
->
-> ï¼ --
->
-> ï¼ Thanks
->
-> ï¼ Zhang Chen
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
-> ï¼
->
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-* Hailiang Zhang (address@hidden) wrote:
-Hi,
-
-Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-
-Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-case COLO thread/incoming thread is stuck in read/write() while do failover,
-but it didn't take effect, because all the fd used by COLO (also migration)
-has been wrapped by qio channel, and it will not call the shutdown API if
-we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-
-Cc: Dr. David Alan Gilbert <address@hidden>
-
-I doubted migration cancel has the same problem, it may be stuck in write()
-if we tried to cancel migration.
-
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
-**errp)
-{
-     qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
-     migration_channel_connect(s, ioc, NULL);
-     ... ...
-We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-and the
-migrate_fd_cancel()
-{
-  ... ...
-     if (s->state == MIGRATION_STATUS_CANCELLING && f) {
-         qemu_file_shutdown(f);  --> This will not take effect. No ?
-     }
-}
-(cc'd in Daniel Berrange).
-I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); 
-at the
-top of qio_channel_socket_new;  so I think that's safe isn't it?
-Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
-Dave
-Thanks,
-Hailiang
-
-On 2017/3/21 16:10, address@hidden wrote:
-Thank youã
-
-I have test areadyã
-
-When the Primary Node panic,the Secondary Node qemu hang at the same placeã
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node qemu 
-will not produce the problem,but Primary Node panic canã
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-
-
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-       }
-
-
-
-
-
-       trace_migration_socket_incoming_accepted()
-
-
-
-
-
-       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-       migration_channel_process_incoming(migrate_get_current(),
-
-
-                                          QIO_CHANNEL(sioc))
-
-
-       object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-æéäººï¼ address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼
-ï¼ hi.
-ï¼
-ï¼ I test the git qemu master have the same problem.
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼
-ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ (address@hidden, address@hidden "",
-ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼
-ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ migration/qemu-file-channel.c:78
-ï¼
-ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ migration/qemu-file.c:295
-ï¼
-ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼
-ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ migration/qemu-file.c:568
-ï¼
-ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ migration/qemu-file.c:648
-ï¼
-ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ address@hidden) at migration/colo.c:244
-ï¼
-ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ outï¼, address@hidden,
-ï¼ address@hidden)
-ï¼
-ï¼     at migration/colo.c:264
-ï¼
-ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼
-ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼
-ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼
-ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼
-ï¼ $3 = 0
-ï¼
-ï¼
-ï¼ (gdb) bt
-ï¼
-ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼
-ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ gmain.c:3054
-ï¼
-ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ address@hidden) at gmain.c:3630
-ï¼
-ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼
-ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ util/main-loop.c:258
-ï¼
-ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ util/main-loop.c:506
-ï¼
-ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼
-ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ outï¼) at vl.c:4709
-ï¼
-ï¼ (gdb) p ioc-ï¼features
-ï¼
-ï¼ $1 = 6
-ï¼
-ï¼ (gdb) p ioc-ï¼name
-ï¼
-ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼
-ï¼
-ï¼ May be socket_accept_incoming_migration should
-ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼
-ï¼
-ï¼ thank you.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼ address@hidden
-ï¼ address@hidden
-ï¼ address@hidden@huawei.comï¼
-ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼
-ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼
-ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ Do the colo have any plan for development?
-ï¼
-ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼
-ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼
-ï¼ In our internal version can run it successfully,
-ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ Next time if you have some question about COLO,
-ï¼ please cc me and zhanghailiang address@hidden
-ï¼
-ï¼
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ (gdb) bt
-ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
-ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
-ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼ --
-ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼ ï¼
-ï¼
-ï¼ --
-ï¼ Thanks
-ï¼ Zhang Chen
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-.
-
-* Hailiang Zhang (address@hidden) wrote:
->
-On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
->
-> * Hailiang Zhang (address@hidden) wrote:
->
-> > Hi,
->
-> >
->
-> > Thanks for reporting this, and i confirmed it in my test, and it is a bug.
->
-> >
->
-> > Though we tried to call qemu_file_shutdown() to shutdown the related fd,
->
-> > in
->
-> > case COLO thread/incoming thread is stuck in read/write() while do
->
-> > failover,
->
-> > but it didn't take effect, because all the fd used by COLO (also
->
-> > migration)
->
-> > has been wrapped by qio channel, and it will not call the shutdown API if
->
-> > we didn't qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> > QIO_CHANNEL_FEATURE_SHUTDOWN).
->
-> >
->
-> > Cc: Dr. David Alan Gilbert <address@hidden>
->
-> >
->
-> > I doubted migration cancel has the same problem, it may be stuck in
->
-> > write()
->
-> > if we tried to cancel migration.
->
-> >
->
-> > void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
->
-> > Error **errp)
->
-> > {
->
-> >      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
->
-> >      migration_channel_connect(s, ioc, NULL);
->
-> >      ... ...
->
-> > We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> > QIO_CHANNEL_FEATURE_SHUTDOWN) above,
->
-> > and the
->
-> > migrate_fd_cancel()
->
-> > {
->
-> >   ... ...
->
-> >      if (s->state == MIGRATION_STATUS_CANCELLING && f) {
->
-> >          qemu_file_shutdown(f);  --> This will not take effect. No ?
->
-> >      }
->
-> > }
->
->
->
-> (cc'd in Daniel Berrange).
->
-> I see that we call qio_channel_set_feature(ioc,
->
-> QIO_CHANNEL_FEATURE_SHUTDOWN); at the
->
-> top of qio_channel_socket_new;  so I think that's safe isn't it?
->
->
->
->
-Hmm, you are right, this problem is only exist for the migration incoming fd,
->
-thanks.
-Yes, and I don't think we normally do a cancel on the incoming side of a 
-migration.
-
-Dave
-
->
-> Dave
->
->
->
-> > Thanks,
->
-> > Hailiang
->
-> >
->
-> > On 2017/3/21 16:10, address@hidden wrote:
->
-> > > Thank youã
->
-> > >
->
-> > > I have test areadyã
->
-> > >
->
-> > > When the Primary Node panic,the Secondary Node qemu hang at the same
->
-> > > placeã
->
-> > >
->
-> > > Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary
->
-> > > Node qemu will not produce the problem,but Primary Node panic canã
->
-> > >
->
-> > > I think due to the feature of channel does not support
->
-> > > QIO_CHANNEL_FEATURE_SHUTDOWN.
->
-> > >
->
-> > >
->
-> > > when failover,channel_shutdown could not shut down the channel.
->
-> > >
->
-> > >
->
-> > > so the colo_process_incoming_thread will hang at recvmsg.
->
-> > >
->
-> > >
->
-> > > I test a patch:
->
-> > >
->
-> > >
->
-> > > diff --git a/migration/socket.c b/migration/socket.c
->
-> > >
->
-> > >
->
-> > > index 13966f1..d65a0ea 100644
->
-> > >
->
-> > >
->
-> > > --- a/migration/socket.c
->
-> > >
->
-> > >
->
-> > > +++ b/migration/socket.c
->
-> > >
->
-> > >
->
-> > > @@ -147,8 +147,9 @@ static gboolean
->
-> > > socket_accept_incoming_migration(QIOChannel *ioc,
->
-> > >
->
-> > >
->
-> > >        }
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >        trace_migration_socket_incoming_accepted()
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >        qio_channel_set_name(QIO_CHANNEL(sioc),
->
-> > > "migration-socket-incoming")
->
-> > >
->
-> > >
->
-> > > +    qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> > > QIO_CHANNEL_FEATURE_SHUTDOWN)
->
-> > >
->
-> > >
->
-> > >        migration_channel_process_incoming(migrate_get_current(),
->
-> > >
->
-> > >
->
-> > >                                           QIO_CHANNEL(sioc))
->
-> > >
->
-> > >
->
-> > >        object_unref(OBJECT(sioc))
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > > My test will not hang any more.
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > > åå§é®ä»¶
->
-> > >
->
-> > >
->
-> > >
->
-> > > åä»¶äººï¼ address@hidden
->
-> > > æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
->
-> > > æéäººï¼ address@hidden address@hidden
->
-> > > æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
->
-> > > ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > > Hi,Wang.
->
-> > >
->
-> > > You can test this branch:
->
-> > >
->
-> > >
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
->
-> > >
->
-> > > and please follow wiki ensure your own configuration correctly.
->
-> > >
->
-> > >
-http://wiki.qemu-project.org/Features/COLO
->
-> > >
->
-> > >
->
-> > > Thanks
->
-> > >
->
-> > > Zhang Chen
->
-> > >
->
-> > >
->
-> > > On 03/21/2017 03:27 PM, address@hidden wrote:
->
-> > > ï¼
->
-> > > ï¼ hi.
->
-> > > ï¼
->
-> > > ï¼ I test the git qemu master have the same problem.
->
-> > > ï¼
->
-> > > ï¼ (gdb) bt
->
-> > > ï¼
->
-> > > ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
->
-> > > ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->
-> > > ï¼
->
-> > > ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
->
-> > > ï¼ (address@hidden, address@hidden "",
->
-> > > ï¼ address@hidden, address@hidden) at io/channel.c:114
->
-> > > ï¼
->
-> > > ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
->
-> > > ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
->
-> > > ï¼ migration/qemu-file-channel.c:78
->
-> > > ï¼
->
-> > > ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
->
-> > > ï¼ migration/qemu-file.c:295
->
-> > > ï¼
->
-> > > ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
->
-> > > ï¼ address@hidden) at migration/qemu-file.c:555
->
-> > > ï¼
->
-> > > ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
->
-> > > ï¼ migration/qemu-file.c:568
->
-> > > ï¼
->
-> > > ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
->
-> > > ï¼ migration/qemu-file.c:648
->
-> > > ï¼
->
-> > > ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
->
-> > > ï¼ address@hidden) at migration/colo.c:244
->
-> > > ï¼
->
-> > > ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
->
-> > > ï¼ outï¼, address@hidden,
->
-> > > ï¼ address@hidden)
->
-> > > ï¼
->
-> > > ï¼     at migration/colo.c:264
->
-> > > ï¼
->
-> > > ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
->
-> > > ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
->
-> > > ï¼
->
-> > > ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->
-> > > ï¼
->
-> > > ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->
-> > > ï¼
->
-> > > ï¼ (gdb) p ioc-ï¼name
->
-> > > ï¼
->
-> > > ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->
-> > > ï¼
->
-> > > ï¼ (gdb) p ioc-ï¼features        Do not support
->
-> > > QIO_CHANNEL_FEATURE_SHUTDOWN
->
-> > > ï¼
->
-> > > ï¼ $3 = 0
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ (gdb) bt
->
-> > > ï¼
->
-> > > ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
->
-> > > ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->
-> > > ï¼
->
-> > > ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
->
-> > > ï¼ gmain.c:3054
->
-> > > ï¼
->
-> > > ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
->
-> > > ï¼ address@hidden) at gmain.c:3630
->
-> > > ï¼
->
-> > > ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->
-> > > ï¼
->
-> > > ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
->
-> > > ï¼ util/main-loop.c:258
->
-> > > ï¼
->
-> > > ï¼ #5  main_loop_wait (address@hidden) at
->
-> > > ï¼ util/main-loop.c:506
->
-> > > ï¼
->
-> > > ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->
-> > > ï¼
->
-> > > ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
->
-> > > ï¼ outï¼) at vl.c:4709
->
-> > > ï¼
->
-> > > ï¼ (gdb) p ioc-ï¼features
->
-> > > ï¼
->
-> > > ï¼ $1 = 6
->
-> > > ï¼
->
-> > > ï¼ (gdb) p ioc-ï¼name
->
-> > > ï¼
->
-> > > ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ May be socket_accept_incoming_migration should
->
-> > > ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ thank you.
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ åå§é®ä»¶
->
-> > > ï¼ address@hidden
->
-> > > ï¼ address@hidden
->
-> > > ï¼ address@hidden@huawei.comï¼
->
-> > > ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
->
-> > > ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
->
-> > > ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
->
-> > > ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
->
-> > > ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
->
-> > > ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
->
-> > > ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
->
-> > > ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼ I found that the colo in qemu is not complete yet.
->
-> > > ï¼ ï¼ Do the colo have any plan for development?
->
-> > > ï¼
->
-> > > ï¼ Yes, We are developing. You can see some of patch we pushing.
->
-> > > ï¼
->
-> > > ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
->
-> > > ï¼
->
-> > > ï¼ In our internal version can run it successfully,
->
-> > > ï¼ The failover detail you can ask Zhanghailiang for help.
->
-> > > ï¼ Next time if you have some question about COLO,
->
-> > > ï¼ please cc me and zhanghailiang address@hidden
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ Thanks
->
-> > > ï¼ Zhang Chen
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼ centos7.2+qemu2.7.50
->
-> > > ï¼ ï¼ (gdb) bt
->
-> > > ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
->
-> > > ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized
->
-> > > outï¼,
->
-> > > ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0,
->
-> > > errp=0x0) at
->
-> > > ï¼ ï¼ io/channel-socket.c:497
->
-> > > ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
->
-> > > ï¼ ï¼ address@hidden "", address@hidden,
->
-> > > ï¼ ï¼ address@hidden) at io/channel.c:97
->
-> > > ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized
->
-> > > outï¼,
->
-> > > ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
->
-> > > ï¼ ï¼ migration/qemu-file-channel.c:78
->
-> > > ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
->
-> > > ï¼ ï¼ migration/qemu-file.c:257
->
-> > > ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
->
-> > > ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
->
-> > > ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
->
-> > > ï¼ ï¼ migration/qemu-file.c:523
->
-> > > ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
->
-> > > ï¼ ï¼ migration/qemu-file.c:603
->
-> > > ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
->
-> > > ï¼ ï¼ address@hidden) at migration/colo.c:215
->
-> > > ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message
->
-> > > (errp=0x7f3d62bfaa48,
->
-> > > ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
->
-> > > ï¼ ï¼ migration/colo.c:546
->
-> > > ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
->
-> > > ï¼ ï¼ migration/colo.c:649
->
-> > > ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from
->
-> > > /lib64/libpthread.so.0
->
-> > > ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼ --
->
-> > > ï¼ ï¼ View this message in context:
->
-> > >
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
->
-> > > ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼ ï¼
->
-> > > ï¼
->
-> > > ï¼ --
->
-> > > ï¼ Thanks
->
-> > > ï¼ Zhang Chen
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > > ï¼
->
-> > >
->
-> >
->
-> --
->
-> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->
->
-> .
->
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
diff --git a/results/classifier/016/none/70294255 b/results/classifier/016/none/70294255
deleted file mode 100644
index 286c1cc4..00000000
--- a/results/classifier/016/none/70294255
+++ /dev/null
@@ -1,1088 +0,0 @@
-socket: 0.753
-debug: 0.454
-network: 0.316
-operating system: 0.278
-files: 0.213
-hypervisor: 0.060
-virtual: 0.046
-kernel: 0.045
-performance: 0.044
-i386: 0.043
-alpha: 0.037
-TCG: 0.036
-permissions: 0.032
-device: 0.025
-x86: 0.024
-PID: 0.021
-boot: 0.014
-KVM: 0.013
-semantic: 0.011
-risc-v: 0.010
-VMM: 0.010
-register: 0.009
-assembly: 0.009
-architecture: 0.008
-arm: 0.007
-vnc: 0.006
-ppc: 0.006
-peripherals: 0.005
-user-level: 0.004
-graphic: 0.003
-mistranslation: 0.002
-
-[Qemu-devel] 答复: Re:   答复: Re:  答复: Re: 答复: Re: [BUG]COLO failover hang
-
-hi:
-
-yes.it is better.
-
-And should we delete 
-
-
-
-
-#ifdef WIN32
-
-    QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL)
-
-#endif
-
-
-
-
-in qio_channel_socket_acceptï¼
-
-qio_channel_socket_new already have it.
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992
-æéäººï¼ address@hidden address@hidden address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: çå¤: Re: [BUG]COLO failover hang
-
-
-
-
-
-Hi,
-
-On 2017/3/22 9:42, address@hidden wrote:
-ï¼ diff --git a/migration/socket.c b/migration/socket.c
-ï¼
-ï¼
-ï¼ index 13966f1..d65a0ea 100644
-ï¼
-ï¼
-ï¼ --- a/migration/socket.c
-ï¼
-ï¼
-ï¼ +++ b/migration/socket.c
-ï¼
-ï¼
-ï¼ @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-ï¼
-ï¼
-ï¼       }
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼       trace_migration_socket_incoming_accepted()
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-ï¼
-ï¼
-ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-ï¼
-ï¼
-ï¼       migration_channel_process_incoming(migrate_get_current(),
-ï¼
-ï¼
-ï¼                                          QIO_CHANNEL(sioc))
-ï¼
-ï¼
-ï¼       object_unref(OBJECT(sioc))
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ Is this patch ok?
-ï¼
-
-Yes, i think this works, but a better way maybe to call 
-qio_channel_set_feature()
-in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
-socket accept fd,
-Or fix it by this:
-
-diff --git a/io/channel-socket.c b/io/channel-socket.c
-index f546c68..ce6894c 100644
---- a/io/channel-socket.c
-+++ b/io/channel-socket.c
-@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
-                            Error **errp)
-  {
-      QIOChannelSocket *cioc
--
--    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
--    cioc-ï¼fd = -1
-+
-+    cioc = qio_channel_socket_new()
-      cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr)
-      cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr)
-
-
-Thanks,
-Hailiang
-
-ï¼ I have test it . The test could not hang any more.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼
-ï¼
-ï¼
-ï¼ åä»¶äººï¼ address@hidden
-ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden
-ï¼ æéäººï¼ address@hidden address@hidden address@hidden
-ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
-ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: [BUG]COLO failover hang
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote:
-ï¼ ï¼ï¼ Hi,
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do 
-failover,
-ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
-ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
-ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
-ï¼ ï¼ï¼ if we tried to cancel migration.
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
-Error **errp)
-ï¼ ï¼ï¼ {
-ï¼ ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
-ï¼ ï¼ï¼      migration_channel_connect(s, ioc, NULL)
-ï¼ ï¼ï¼      ... ...
-ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-ï¼ ï¼ï¼ and the
-ï¼ ï¼ï¼ migrate_fd_cancel()
-ï¼ ï¼ï¼ {
-ï¼ ï¼ï¼   ... ...
-ï¼ ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
-ï¼ ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
-ï¼ ï¼ï¼      }
-ï¼ ï¼ï¼ }
-ï¼ ï¼
-ï¼ ï¼ (cc'd in Daniel Berrange).
-ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, 
-QIO_CHANNEL_FEATURE_SHUTDOWN) at the
-ï¼ ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
-ï¼ ï¼
-ï¼
-ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
-ï¼
-ï¼ ï¼ Dave
-ï¼ ï¼
-ï¼ ï¼ï¼ Thanks,
-ï¼ ï¼ï¼ Hailiang
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
-ï¼ ï¼ï¼ï¼ Thank youã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I have test areadyã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same 
-placeã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node 
-qemu will not produce the problem,but Primary Node panic canã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I test a patch:
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ --- a/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        }
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), 
-"migration-socket-incoming")
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN)
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        object_unref(OBJECT(sioc))
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ My test will not hang any more.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ åå§é®ä»¶
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
-ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
-ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ï¼ ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Hi,Wang.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ You can test this branch:
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-http://wiki.qemu-project.org/Features/COLO
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Thanks
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ hi.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden)
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼     at migration/colo.c:264
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $3 = 0
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $1 = 6
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
-ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ thank you.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
-ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
-ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
-ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ Thanks
-ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized 
-outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, 
-errp=0x0) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message 
-(errp=0x7f3d62bfaa48,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ --
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ --
-ï¼ ï¼ï¼ï¼ ï¼ Thanks
-ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼
-ï¼ ï¼ --
-ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
-ï¼ ï¼
-ï¼ ï¼ .
-ï¼ ï¼
-ï¼
-
-On 2017/3/22 16:09, address@hidden wrote:
-hi:
-
-yes.it is better.
-
-And should we delete
-Yes, you are right.
-#ifdef WIN32
-
-     QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL)
-
-#endif
-
-
-
-
-in qio_channel_socket_acceptï¼
-
-qio_channel_socket_new already have it.
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992
-æéäººï¼ address@hidden address@hidden address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03
-ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: çå¤: Re: [BUG]COLO failover hang
-
-
-
-
-
-Hi,
-
-On 2017/3/22 9:42, address@hidden wrote:
-ï¼ diff --git a/migration/socket.c b/migration/socket.c
-ï¼
-ï¼
-ï¼ index 13966f1..d65a0ea 100644
-ï¼
-ï¼
-ï¼ --- a/migration/socket.c
-ï¼
-ï¼
-ï¼ +++ b/migration/socket.c
-ï¼
-ï¼
-ï¼ @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-ï¼
-ï¼
-ï¼       }
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼       trace_migration_socket_incoming_accepted()
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-ï¼
-ï¼
-ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-ï¼
-ï¼
-ï¼       migration_channel_process_incoming(migrate_get_current(),
-ï¼
-ï¼
-ï¼                                          QIO_CHANNEL(sioc))
-ï¼
-ï¼
-ï¼       object_unref(OBJECT(sioc))
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ Is this patch ok?
-ï¼
-
-Yes, i think this works, but a better way maybe to call 
-qio_channel_set_feature()
-in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
-socket accept fd,
-Or fix it by this:
-
-diff --git a/io/channel-socket.c b/io/channel-socket.c
-index f546c68..ce6894c 100644
---- a/io/channel-socket.c
-+++ b/io/channel-socket.c
-@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
-                             Error **errp)
-   {
-       QIOChannelSocket *cioc
--
--    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
--    cioc-ï¼fd = -1
-+
-+    cioc = qio_channel_socket_new()
-       cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr)
-       cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr)
-
-
-Thanks,
-Hailiang
-
-ï¼ I have test it . The test could not hang any more.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼
-ï¼
-ï¼
-ï¼ åä»¶äººï¼ address@hidden
-ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden
-ï¼ æéäººï¼ address@hidden address@hidden address@hidden
-ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
-ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: [BUG]COLO failover hang
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote:
-ï¼ ï¼ï¼ Hi,
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do 
-failover,
-ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
-ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
-ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
-ï¼ ï¼ï¼ if we tried to cancel migration.
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
-Error **errp)
-ï¼ ï¼ï¼ {
-ï¼ ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
-ï¼ ï¼ï¼      migration_channel_connect(s, ioc, NULL)
-ï¼ ï¼ï¼      ... ...
-ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-ï¼ ï¼ï¼ and the
-ï¼ ï¼ï¼ migrate_fd_cancel()
-ï¼ ï¼ï¼ {
-ï¼ ï¼ï¼   ... ...
-ï¼ ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
-ï¼ ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
-ï¼ ï¼ï¼      }
-ï¼ ï¼ï¼ }
-ï¼ ï¼
-ï¼ ï¼ (cc'd in Daniel Berrange).
-ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, 
-QIO_CHANNEL_FEATURE_SHUTDOWN) at the
-ï¼ ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
-ï¼ ï¼
-ï¼
-ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
-ï¼
-ï¼ ï¼ Dave
-ï¼ ï¼
-ï¼ ï¼ï¼ Thanks,
-ï¼ ï¼ï¼ Hailiang
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
-ï¼ ï¼ï¼ï¼ Thank youã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I have test areadyã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same 
-placeã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node 
-qemu will not produce the problem,but Primary Node panic canã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I test a patch:
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ --- a/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        }
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), 
-"migration-socket-incoming")
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN)
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        object_unref(OBJECT(sioc))
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ My test will not hang any more.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ åå§é®ä»¶
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
-ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
-ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ï¼ ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Hi,Wang.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ You can test this branch:
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-http://wiki.qemu-project.org/Features/COLO
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Thanks
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ hi.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden)
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼     at migration/colo.c:264
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $3 = 0
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $1 = 6
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
-ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ thank you.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
-ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
-ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
-ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ Thanks
-ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized 
-outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, 
-errp=0x0) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message 
-(errp=0x7f3d62bfaa48,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ --
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ --
-ï¼ ï¼ï¼ï¼ ï¼ Thanks
-ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼
-ï¼ ï¼ --
-ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
-ï¼ ï¼
-ï¼ ï¼ .
-ï¼ ï¼
-ï¼
-
diff --git a/results/classifier/016/none/70868267 b/results/classifier/016/none/70868267
deleted file mode 100644
index 3f50c2ef..00000000
--- a/results/classifier/016/none/70868267
+++ /dev/null
@@ -1,67 +0,0 @@
-x86: 0.245
-operating system: 0.079
-files: 0.026
-hypervisor: 0.023
-TCG: 0.023
-debug: 0.020
-network: 0.019
-PID: 0.018
-i386: 0.011
-virtual: 0.008
-register: 0.006
-user-level: 0.004
-ppc: 0.003
-semantic: 0.003
-device: 0.002
-socket: 0.002
-assembly: 0.002
-VMM: 0.002
-kernel: 0.002
-performance: 0.001
-arm: 0.001
-alpha: 0.001
-graphic: 0.001
-vnc: 0.001
-peripherals: 0.001
-architecture: 0.001
-boot: 0.001
-risc-v: 0.001
-permissions: 0.000
-KVM: 0.000
-mistranslation: 0.000
-
-[Qemu-devel] [BUG] Failed to compile using gcc7.1
-
-Hi all,
-
-After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc.
-
-The error is:
-
-------
-  CC      block/blkdebug.o
-block/blkdebug.c: In function 'blkdebug_refresh_filename':
-block/blkdebug.c:693:31: error: '%s' directive output may be truncated
-writing up to 4095 bytes into a region of size 4086
-[-Werror=format-truncation=]
-"blkdebug:%s:%s", s->config_file ?: "",
-                               ^~
-In file included from /usr/include/stdio.h:939:0,
-                 from /home/adam/qemu/include/qemu/osdep.h:68,
-                 from block/blkdebug.c:25:
-/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk'
-output 11 or more bytes (assuming 4106) into a destination of size 4096
-return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
-          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-        __bos (__s), __fmt, __va_arg_pack ());
-        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-cc1: all warnings being treated as errors
-make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1
-------
-
-It seems that gcc 7 is introducing more restrict check for printf.
-If using clang, although there are some extra warning, it can at least
-pass the compile.
-Thanks,
-Qu
-
diff --git a/results/classifier/016/none/80604314 b/results/classifier/016/none/80604314
deleted file mode 100644
index 8112f757..00000000
--- a/results/classifier/016/none/80604314
+++ /dev/null
@@ -1,1507 +0,0 @@
-hypervisor: 0.669
-network: 0.654
-debug: 0.554
-operating system: 0.404
-virtual: 0.190
-files: 0.103
-TCG: 0.102
-PID: 0.097
-boot: 0.095
-device: 0.090
-user-level: 0.089
-VMM: 0.084
-vnc: 0.081
-register: 0.062
-socket: 0.055
-kernel: 0.042
-KVM: 0.021
-risc-v: 0.019
-performance: 0.014
-assembly: 0.011
-semantic: 0.008
-architecture: 0.007
-alpha: 0.004
-ppc: 0.003
-permissions: 0.003
-graphic: 0.003
-peripherals: 0.002
-mistranslation: 0.002
-x86: 0.001
-arm: 0.001
-i386: 0.000
-
-[BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
-
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, 
-    config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-
-Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
-the autogenerated virtio-net-ccw device is present) works. Specifying
-several "-device virtio-net-pci" works as well.
-
-Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
-client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
-works (in-between state does not compile).
-
-This is reproducible with tcg as well. Same problem both with
---enable-vhost-vdpa and --disable-vhost-vdpa.
-
-Have not yet tried to figure out what might be special with
-virtio-ccw... anyone have an idea?
-
-[This should probably be considered a blocker?]
-
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-When I start qemu with a second virtio-net-ccw device (i.e. adding
->
--device virtio-net-ccw in addition to the autogenerated device), I get
->
-a segfault. gdb points to
->
->
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-config=0x55d6ad9e3f80 "RT") at
->
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->
-(backtrace doesn't go further)
->
->
-Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-the autogenerated virtio-net-ccw device is present) works. Specifying
->
-several "-device virtio-net-pci" works as well.
->
->
-Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-works (in-between state does not compile).
-Ouch. I didn't test all in-between states :(
-But I wish we had a 0-day instrastructure like kernel has,
-that catches things like that.
-
->
-This is reproducible with tcg as well. Same problem both with
->
---enable-vhost-vdpa and --disable-vhost-vdpa.
->
->
-Have not yet tried to figure out what might be special with
->
-virtio-ccw... anyone have an idea?
->
->
-[This should probably be considered a blocker?]
-
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
-
->
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> a segfault. gdb points to
->
->
->
-> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->     config=0x55d6ad9e3f80 "RT") at
->
-> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> 146     if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->
->
-> (backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-
->
->
->
-> Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> several "-device virtio-net-pci" works as well.
->
->
->
-> Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> works (in-between state does not compile).
->
->
-Ouch. I didn't test all in-between states :(
->
-But I wish we had a 0-day instrastructure like kernel has,
->
-that catches things like that.
-Yep, that would be useful... so patchew only builds the complete series?
-
->
->
-> This is reproducible with tcg as well. Same problem both with
->
-> --enable-vhost-vdpa and --disable-vhost-vdpa.
->
->
->
-> Have not yet tried to figure out what might be special with
->
-> virtio-ccw... anyone have an idea?
->
->
->
-> [This should probably be considered a blocker?]
-I think so, as it makes s390x unusable with more that one
-virtio-net-ccw device, and I don't even see a workaround.
-
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-On Fri, 24 Jul 2020 09:30:58 -0400
->
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
->
->
-> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> > -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> > a segfault. gdb points to
->
-> >
->
-> > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-> >     config=0x55d6ad9e3f80 "RT") at
->
-> > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > 146           if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> >
->
-> > (backtrace doesn't go further)
->
->
-The core was incomplete, but running under gdb directly shows that it
->
-is just a bog-standard config space access (first for that device).
->
->
-The cause of the crash is that nc->peer is not set... no idea how that
->
-can happen, not that familiar with that part of QEMU. (Should the code
->
-check, or is that really something that should not happen?)
->
->
-What I don't understand is why it is set correctly for the first,
->
-autogenerated virtio-net-ccw device, but not for the second one, and
->
-why virtio-net-pci doesn't show these problems. The only difference
->
-between -ccw and -pci that comes to my mind here is that config space
->
-accesses for ccw are done via an asynchronous operation, so timing
->
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-
->
-> >
->
-> > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> > the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> > several "-device virtio-net-pci" works as well.
->
-> >
->
-> > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> > works (in-between state does not compile).
->
->
->
-> Ouch. I didn't test all in-between states :(
->
-> But I wish we had a 0-day instrastructure like kernel has,
->
-> that catches things like that.
->
->
-Yep, that would be useful... so patchew only builds the complete series?
->
->
->
->
-> > This is reproducible with tcg as well. Same problem both with
->
-> > --enable-vhost-vdpa and --disable-vhost-vdpa.
->
-> >
->
-> > Have not yet tried to figure out what might be special with
->
-> > virtio-ccw... anyone have an idea?
->
-> >
->
-> > [This should probably be considered a blocker?]
->
->
-I think so, as it makes s390x unusable with more that one
->
-virtio-net-ccw device, and I don't even see a workaround.
-
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
-
->
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> On Fri, 24 Jul 2020 09:30:58 -0400
->
-> "Michael S. Tsirkin" <mst@redhat.com> wrote:
->
->
->
-> > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > > When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> > > -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> > > a segfault. gdb points to
->
-> > >
->
-> > > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-> > >     config=0x55d6ad9e3f80 "RT") at
->
-> > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > >
->
-> > > (backtrace doesn't go further)
->
->
->
-> The core was incomplete, but running under gdb directly shows that it
->
-> is just a bog-standard config space access (first for that device).
->
->
->
-> The cause of the crash is that nc->peer is not set... no idea how that
->
-> can happen, not that familiar with that part of QEMU. (Should the code
->
-> check, or is that really something that should not happen?)
->
->
->
-> What I don't understand is why it is set correctly for the first,
->
-> autogenerated virtio-net-ccw device, but not for the second one, and
->
-> why virtio-net-pci doesn't show these problems. The only difference
->
-> between -ccw and -pci that comes to my mind here is that config space
->
-> accesses for ccw are done via an asynchronous operation, so timing
->
-> might be different.
->
->
-Hopefully Jason has an idea. Could you post a full command line
->
-please? Do you need a working guest to trigger this? Does this trigger
->
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on 
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
- 
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-
->
->
-> > >
->
-> > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> > > the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> > > several "-device virtio-net-pci" works as well.
->
-> > >
->
-> > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> > > works (in-between state does not compile).
->
-> >
->
-> > Ouch. I didn't test all in-between states :(
->
-> > But I wish we had a 0-day instrastructure like kernel has,
->
-> > that catches things like that.
->
->
->
-> Yep, that would be useful... so patchew only builds the complete series?
->
->
->
-> >
->
-> > > This is reproducible with tcg as well. Same problem both with
->
-> > > --enable-vhost-vdpa and --disable-vhost-vdpa.
->
-> > >
->
-> > > Have not yet tried to figure out what might be special with
->
-> > > virtio-ccw... anyone have an idea?
->
-> > >
->
-> > > [This should probably be considered a blocker?]
->
->
->
-> I think so, as it makes s390x unusable with more that one
->
-> virtio-net-ccw device, and I don't even see a workaround.
->
-
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-     config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-
-Thanks
-0001-virtio-net-check-the-existence-of-peer-before-accesi.patch
-Description:
-Text Data
-
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-
->
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
->
-> On Fri, 24 Jul 2020 11:17:57 -0400
->
-> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->
->
->> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
->>> On Fri, 24 Jul 2020 09:30:58 -0400
->
->>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>
->
->>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
->>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
->>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
->
->>>>> a segfault. gdb points to
->
->>>>>
->
->>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->>>>>      config=0x55d6ad9e3f80 "RT") at
->
->>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
->>>>> 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->>>>>
->
->>>>> (backtrace doesn't go further)
->
->>> The core was incomplete, but running under gdb directly shows that it
->
->>> is just a bog-standard config space access (first for that device).
->
->>>
->
->>> The cause of the crash is that nc->peer is not set... no idea how that
->
->>> can happen, not that familiar with that part of QEMU. (Should the code
->
->>> check, or is that really something that should not happen?)
->
->>>
->
->>> What I don't understand is why it is set correctly for the first,
->
->>> autogenerated virtio-net-ccw device, but not for the second one, and
->
->>> why virtio-net-pci doesn't show these problems. The only difference
->
->>> between -ccw and -pci that comes to my mind here is that config space
->
->>> accesses for ccw are done via an asynchronous operation, so timing
->
->>> might be different.
->
->> Hopefully Jason has an idea. Could you post a full command line
->
->> please? Do you need a working guest to trigger this? Does this trigger
->
->> on an x86 host?
->
-> Yes, it does trigger with tcg-on-x86 as well. I've been using
->
->
->
-> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
-> qemu,zpci=on
->
-> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> -device
->
-> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> -device virtio-net-ccw
->
->
->
-> It seems it needs the guest actually doing something with the nics; I
->
-> cannot reproduce the crash if I use the old advent calendar moon buggy
->
-> image and just add a virtio-net-ccw device.
->
->
->
-> (I don't think it's a problem with my local build, as I see the problem
->
-> both on my laptop and on an LPAR.)
->
->
->
-It looks to me we forget the check the existence of peer.
->
->
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-
-On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-      config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-For autogenerated virtio-net-cww, I think the reason is that it has
-already had a peer set.
-Thanks
-
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-
->
-On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
->
-> On Sat, 25 Jul 2020 08:40:07 +0800
->
-> Jason Wang <jasowang@redhat.com> wrote:
->
->
->
->> On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
->
->>> On Fri, 24 Jul 2020 11:17:57 -0400
->
->>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>
->
->>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
->>>>> On Fri, 24 Jul 2020 09:30:58 -0400
->
->>>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>>>
->
->>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
->>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
->>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
->
->>>>>>> a segfault. gdb points to
->
->>>>>>>
->
->>>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->>>>>>>       config=0x55d6ad9e3f80 "RT") at
->
->>>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
->>>>>>> 146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->>>>>>>
->
->>>>>>> (backtrace doesn't go further)
->
->>>>> The core was incomplete, but running under gdb directly shows that it
->
->>>>> is just a bog-standard config space access (first for that device).
->
->>>>>
->
->>>>> The cause of the crash is that nc->peer is not set... no idea how that
->
->>>>> can happen, not that familiar with that part of QEMU. (Should the code
->
->>>>> check, or is that really something that should not happen?)
->
->>>>>
->
->>>>> What I don't understand is why it is set correctly for the first,
->
->>>>> autogenerated virtio-net-ccw device, but not for the second one, and
->
->>>>> why virtio-net-pci doesn't show these problems. The only difference
->
->>>>> between -ccw and -pci that comes to my mind here is that config space
->
->>>>> accesses for ccw are done via an asynchronous operation, so timing
->
->>>>> might be different.
->
->>>> Hopefully Jason has an idea. Could you post a full command line
->
->>>> please? Do you need a working guest to trigger this? Does this trigger
->
->>>> on an x86 host?
->
->>> Yes, it does trigger with tcg-on-x86 as well. I've been using
->
->>>
->
->>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
->>> qemu,zpci=on
->
->>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
->>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
->>> -device
->
->>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
->>> -device virtio-net-ccw
->
->>>
->
->>> It seems it needs the guest actually doing something with the nics; I
->
->>> cannot reproduce the crash if I use the old advent calendar moon buggy
->
->>> image and just add a virtio-net-ccw device.
->
->>>
->
->>> (I don't think it's a problem with my local build, as I see the problem
->
->>> both on my laptop and on an LPAR.)
->
->>
->
->> It looks to me we forget the check the existence of peer.
->
->>
->
->> Please try the attached patch to see if it works.
->
-> Thanks, that patch gets my guest up and running again. So, FWIW,
->
->
->
-> Tested-by: Cornelia Huck <cohuck@redhat.com>
->
->
->
-> Any idea why this did not hit with virtio-net-pci (or the autogenerated
->
-> virtio-net-ccw device)?
->
->
->
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-
->
->
-For autogenerated virtio-net-cww, I think the reason is that it has
->
-already had a peer set.
-Ok, that might well be.
-
-On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-       config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need
-start without peer, and you need a real guest (any Linux) that is trying
-to access the config space of virtio-net.
-Thanks
-For autogenerated virtio-net-cww, I think the reason is that it has
-already had a peer set.
-Ok, that might well be.
-
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
->
->
-On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
->
-> On Mon, 27 Jul 2020 15:38:12 +0800
->
-> Jason Wang <jasowang@redhat.com> wrote:
->
->
->
-> > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
->
-> > > On Sat, 25 Jul 2020 08:40:07 +0800
->
-> > > Jason Wang <jasowang@redhat.com> wrote:
->
-> > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
->
-> > > > > On Fri, 24 Jul 2020 11:17:57 -0400
->
-> > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
-> > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
->
-> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
-> > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > > > > > > > > When I start qemu with a second virtio-net-ccw device (i.e.
->
-> > > > > > > > > adding
->
-> > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
->
-> > > > > > > > > device), I get
->
-> > > > > > > > > a segfault. gdb points to
->
-> > > > > > > > >
->
-> > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
->
-> > > > > > > > > (vdev=<optimized out>,
->
-> > > > > > > > >        config=0x55d6ad9e3f80 "RT") at
->
-> > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > > > > > > > 146     if (nc->peer->info->type ==
->
-> > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > > > > > > > >
->
-> > > > > > > > > (backtrace doesn't go further)
->
-> > > > > > > The core was incomplete, but running under gdb directly shows
->
-> > > > > > > that it
->
-> > > > > > > is just a bog-standard config space access (first for that
->
-> > > > > > > device).
->
-> > > > > > >
->
-> > > > > > > The cause of the crash is that nc->peer is not set... no idea
->
-> > > > > > > how that
->
-> > > > > > > can happen, not that familiar with that part of QEMU. (Should
->
-> > > > > > > the code
->
-> > > > > > > check, or is that really something that should not happen?)
->
-> > > > > > >
->
-> > > > > > > What I don't understand is why it is set correctly for the
->
-> > > > > > > first,
->
-> > > > > > > autogenerated virtio-net-ccw device, but not for the second
->
-> > > > > > > one, and
->
-> > > > > > > why virtio-net-pci doesn't show these problems. The only
->
-> > > > > > > difference
->
-> > > > > > > between -ccw and -pci that comes to my mind here is that config
->
-> > > > > > > space
->
-> > > > > > > accesses for ccw are done via an asynchronous operation, so
->
-> > > > > > > timing
->
-> > > > > > > might be different.
->
-> > > > > > Hopefully Jason has an idea. Could you post a full command line
->
-> > > > > > please? Do you need a working guest to trigger this? Does this
->
-> > > > > > trigger
->
-> > > > > > on an x86 host?
->
-> > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
->
-> > > > >
->
-> > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
-> > > > > qemu,zpci=on
->
-> > > > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> > > > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> > > > > -device
->
-> > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> > > > > -device virtio-net-ccw
->
-> > > > >
->
-> > > > > It seems it needs the guest actually doing something with the nics;
->
-> > > > > I
->
-> > > > > cannot reproduce the crash if I use the old advent calendar moon
->
-> > > > > buggy
->
-> > > > > image and just add a virtio-net-ccw device.
->
-> > > > >
->
-> > > > > (I don't think it's a problem with my local build, as I see the
->
-> > > > > problem
->
-> > > > > both on my laptop and on an LPAR.)
->
-> > > > It looks to me we forget the check the existence of peer.
->
-> > > >
->
-> > > > Please try the attached patch to see if it works.
->
-> > > Thanks, that patch gets my guest up and running again. So, FWIW,
->
-> > >
->
-> > > Tested-by: Cornelia Huck <cohuck@redhat.com>
->
-> > >
->
-> > > Any idea why this did not hit with virtio-net-pci (or the autogenerated
->
-> > > virtio-net-ccw device)?
->
-> >
->
-> > It can be hit with virtio-net-pci as well (just start without peer).
->
-> Hm, I had not been able to reproduce the crash with a 'naked' -device
->
-> virtio-net-pci. But checking seems to be the right idea anyway.
->
->
->
-Sorry for being unclear, I meant for networking part, you just need start
->
-without peer, and you need a real guest (any Linux) that is trying to access
->
-the config space of virtio-net.
->
->
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-
->
->
->
->
-> > For autogenerated virtio-net-cww, I think the reason is that it has
->
-> > already had a peer set.
->
-> Ok, that might well be.
->
->
->
->
-
-On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
-On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-        config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck<cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need start
-without peer, and you need a real guest (any Linux) that is trying to access
-the config space of virtio-net.
-
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-Yes, it depends on the cli actually.
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-I don't test pxe but I can reproduce this with pci (just start a linux
-guest without a peer).
-Thanks
-
-On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
->
->
-On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
->
-> On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
->
-> > On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
->
-> > > On Mon, 27 Jul 2020 15:38:12 +0800
->
-> > > Jason Wang<jasowang@redhat.com>  wrote:
->
-> > >
->
-> > > > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
->
-> > > > > On Sat, 25 Jul 2020 08:40:07 +0800
->
-> > > > > Jason Wang<jasowang@redhat.com>  wrote:
->
-> > > > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
->
-> > > > > > > On Fri, 24 Jul 2020 11:17:57 -0400
->
-> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
->
-> > > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> > > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
->
-> > > > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
->
-> > > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck
->
-> > > > > > > > > > wrote:
->
-> > > > > > > > > > > When I start qemu with a second virtio-net-ccw device
->
-> > > > > > > > > > > (i.e. adding
->
-> > > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
->
-> > > > > > > > > > > device), I get
->
-> > > > > > > > > > > a segfault. gdb points to
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
->
-> > > > > > > > > > > (vdev=<optimized out>,
->
-> > > > > > > > > > >         config=0x55d6ad9e3f80 "RT") at
->
-> > > > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > > > > > > > > > 146         if (nc->peer->info->type ==
->
-> > > > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > (backtrace doesn't go further)
->
-> > > > > > > > > The core was incomplete, but running under gdb directly
->
-> > > > > > > > > shows that it
->
-> > > > > > > > > is just a bog-standard config space access (first for that
->
-> > > > > > > > > device).
->
-> > > > > > > > >
->
-> > > > > > > > > The cause of the crash is that nc->peer is not set... no
->
-> > > > > > > > > idea how that
->
-> > > > > > > > > can happen, not that familiar with that part of QEMU.
->
-> > > > > > > > > (Should the code
->
-> > > > > > > > > check, or is that really something that should not happen?)
->
-> > > > > > > > >
->
-> > > > > > > > > What I don't understand is why it is set correctly for the
->
-> > > > > > > > > first,
->
-> > > > > > > > > autogenerated virtio-net-ccw device, but not for the second
->
-> > > > > > > > > one, and
->
-> > > > > > > > > why virtio-net-pci doesn't show these problems. The only
->
-> > > > > > > > > difference
->
-> > > > > > > > > between -ccw and -pci that comes to my mind here is that
->
-> > > > > > > > > config space
->
-> > > > > > > > > accesses for ccw are done via an asynchronous operation, so
->
-> > > > > > > > > timing
->
-> > > > > > > > > might be different.
->
-> > > > > > > > Hopefully Jason has an idea. Could you post a full command
->
-> > > > > > > > line
->
-> > > > > > > > please? Do you need a working guest to trigger this? Does
->
-> > > > > > > > this trigger
->
-> > > > > > > > on an x86 host?
->
-> > > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
->
-> > > > > > >
->
-> > > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg
->
-> > > > > > > -cpu qemu,zpci=on
->
-> > > > > > > -m 1024 -nographic -device
->
-> > > > > > > virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> > > > > > > -drive
->
-> > > > > > > file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> > > > > > > -device
->
-> > > > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> > > > > > > -device virtio-net-ccw
->
-> > > > > > >
->
-> > > > > > > It seems it needs the guest actually doing something with the
->
-> > > > > > > nics; I
->
-> > > > > > > cannot reproduce the crash if I use the old advent calendar
->
-> > > > > > > moon buggy
->
-> > > > > > > image and just add a virtio-net-ccw device.
->
-> > > > > > >
->
-> > > > > > > (I don't think it's a problem with my local build, as I see the
->
-> > > > > > > problem
->
-> > > > > > > both on my laptop and on an LPAR.)
->
-> > > > > > It looks to me we forget the check the existence of peer.
->
-> > > > > >
->
-> > > > > > Please try the attached patch to see if it works.
->
-> > > > > Thanks, that patch gets my guest up and running again. So, FWIW,
->
-> > > > >
->
-> > > > > Tested-by: Cornelia Huck<cohuck@redhat.com>
->
-> > > > >
->
-> > > > > Any idea why this did not hit with virtio-net-pci (or the
->
-> > > > > autogenerated
->
-> > > > > virtio-net-ccw device)?
->
-> > > > It can be hit with virtio-net-pci as well (just start without peer).
->
-> > > Hm, I had not been able to reproduce the crash with a 'naked' -device
->
-> > > virtio-net-pci. But checking seems to be the right idea anyway.
->
-> > Sorry for being unclear, I meant for networking part, you just need start
->
-> > without peer, and you need a real guest (any Linux) that is trying to
->
-> > access
->
-> > the config space of virtio-net.
->
-> >
->
-> > Thanks
->
-> A pxe guest will do it, but that doesn't support ccw, right?
->
->
->
-Yes, it depends on the cli actually.
->
->
->
->
->
-> I'm still unclear why this triggers with ccw but not pci -
->
-> any idea?
->
->
->
-I don't test pxe but I can reproduce this with pci (just start a linux guest
->
-without a peer).
->
->
-Thanks
->
-Might be a good addition to a unit test. Not sure what would the
-test do exactly: just make sure guest runs? Looks like a lot of work
-for an empty test ... maybe we can poke at the guest config with
-qtest commands at least.
-
--- 
-MST
-
-On 2020/7/27 ä¸å9:16, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
-On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
-On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-         config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck<cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need start
-without peer, and you need a real guest (any Linux) that is trying to access
-the config space of virtio-net.
-
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-Yes, it depends on the cli actually.
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-I don't test pxe but I can reproduce this with pci (just start a linux guest
-without a peer).
-
-Thanks
-Might be a good addition to a unit test. Not sure what would the
-test do exactly: just make sure guest runs? Looks like a lot of work
-for an empty test ... maybe we can poke at the guest config with
-qtest commands at least.
-That should work or we can simply extend the exist virtio-net qtest to
-do that.
-Thanks
-
author	Christian Krinitsin <mail@krinitsin.com>	2025-07-03 19:39:53 +0200
committer	Christian Krinitsin <mail@krinitsin.com>	2025-07-03 19:39:53 +0200
commit	dee4dcba78baf712cab403d47d9db319ab7f95d6 (patch)
tree	418478faf06786701a56268672f73d6b0b4eb239 /results/classifier/016/none
parent	4d9e26c0333abd39bdbd039dcdb30ed429c475ba (diff)
download	emulator-bug-study-dee4dcba78baf712cab403d47d9db319ab7f95d6.tar.gz emulator-bug-study-dee4dcba78baf712cab403d47d9db319ab7f95d6.zip