11 files changed, 7675 insertions, 0 deletions
diff --git a/results/classifier/016/none/23300761 b/results/classifier/016/none/23300761
new file mode 100644
index 00000000..2a3e6f16
--- /dev/null
+++ b/results/classifier/016/none/23300761
@@ -0,0 +1,340 @@
+i386: 0.475
+x86: 0.171
+debug: 0.052
+files: 0.038
+performance: 0.030
+register: 0.029
+virtual: 0.027
+PID: 0.025
+TCG: 0.019
+semantic: 0.018
+operating system: 0.017
+socket: 0.013
+boot: 0.013
+hypervisor: 0.012
+device: 0.012
+user-level: 0.011
+risc-v: 0.010
+alpha: 0.007
+ppc: 0.006
+VMM: 0.005
+vnc: 0.004
+network: 0.004
+architecture: 0.003
+permissions: 0.003
+assembly: 0.003
+peripherals: 0.003
+kernel: 0.002
+arm: 0.002
+graphic: 0.002
+mistranslation: 0.001
+KVM: 0.000
+
+[Qemu-devel] [BUG] 216 Alerts reported by LGTM for QEMU (some might be release critical)
+
+Hi,
+LGTM reports 16 errors, 81 warnings and 119 recommendations:
+https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
+.
+Some of them are already know (wrong format strings), others look like
+real errors:
+- several multiplication results which don't work as they should in
+contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
+32 bit!),Â  target/i386/translate.c and other files
+- potential buffer overflows in gdbstub.c and other files
+I am afraid that the overflows in the block code are release critical,
+maybe that in target/i386/translate.c and other errors, too.
+About half of the alerts are issues which can be fixed later.
+
+Regards
+
+Stefan
+
+On 13/07/19 19:46, Stefan Weil wrote:
+>
+>
+LGTM reports 16 errors, 81 warnings and 119 recommendations:
+>
+https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
+.
+>
+>
+Some of them are already know (wrong format strings), others look like
+>
+real errors:
+>
+>
+- several multiplication results which don't work as they should in
+>
+contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
+>
+32 bit!),Â  target/i386/translate.c and other files
+m->nb_clusters here is limited by s->l2_slice_size (see for example
+handle_alloc) so I wouldn't be surprised if this is a false positive.  I
+couldn't find this particular multiplication in Coverity, but it has
+about 250 issues marked as intentional or false positive so there's
+probably a lot of overlap with what LGTM found.
+
+Paolo
+
+Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
+>
+On 13/07/19 19:46, Stefan Weil wrote:
+>
+> LGTM reports 16 errors, 81 warnings and 119 recommendations:
+>
+>
+https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
+.
+>
+>
+>
+> Some of them are already known (wrong format strings), others look like
+>
+> real errors:
+>
+>
+>
+> - several multiplication results which don't work as they should in
+>
+> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
+>
+> 32 bit!),Â  target/i386/translate.c and other files
+>
+m->nb_clusters here is limited by s->l2_slice_size (see for example
+>
+handle_alloc) so I wouldn't be surprised if this is a false positive.  I
+>
+couldn't find this particular multiplication in Coverity, but it has
+>
+about 250 issues marked as intentional or false positive so there's
+>
+probably a lot of overlap with what LGTM found.
+>
+>
+Paolo
+>
+From other projects I know that there is a certain overlap between the
+results from Coverity Scan an LGTM, but it is good to have both
+analyzers, and the results from LGTM are typically quite reliable.
+
+Even if we know that there is no multiplication overflow, the code could
+be modified. Either the assigned value should use the same data type as
+the factors (possible when there is never an overflow, avoids a size
+extension), or the multiplication could use the larger data type by
+adding a type cast to one of the factors (then an overflow cannot
+happen, static code analysers and human reviewers have an easier job,
+but the multiplication costs more time).
+
+Stefan
+
+Am 14.07.2019 um 15:28 hat Stefan Weil geschrieben:
+>
+Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
+>
+> On 13/07/19 19:46, Stefan Weil wrote:
+>
+>> LGTM reports 16 errors, 81 warnings and 119 recommendations:
+>
+>>
+https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
+.
+>
+>>
+>
+>> Some of them are already known (wrong format strings), others look like
+>
+>> real errors:
+>
+>>
+>
+>> - several multiplication results which don't work as they should in
+>
+>> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
+>
+>> 32 bit!),Â  target/i386/translate.c and other files
+Request sizes are limited to 32 bit in the generic block layer before
+they are even passed to the individual block drivers, so most if not all
+of these are going to be false positives.
+
+>
+> m->nb_clusters here is limited by s->l2_slice_size (see for example
+>
+> handle_alloc) so I wouldn't be surprised if this is a false positive.  I
+>
+> couldn't find this particular multiplication in Coverity, but it has
+>
+> about 250 issues marked as intentional or false positive so there's
+>
+> probably a lot of overlap with what LGTM found.
+>
+>
+>
+> Paolo
+>
+>
+From other projects I know that there is a certain overlap between the
+>
+results from Coverity Scan an LGTM, but it is good to have both
+>
+analyzers, and the results from LGTM are typically quite reliable.
+>
+>
+Even if we know that there is no multiplication overflow, the code could
+>
+be modified. Either the assigned value should use the same data type as
+>
+the factors (possible when there is never an overflow, avoids a size
+>
+extension), or the multiplication could use the larger data type by
+>
+adding a type cast to one of the factors (then an overflow cannot
+>
+happen, static code analysers and human reviewers have an easier job,
+>
+but the multiplication costs more time).
+But if you look at the code we're talking about, you see that it's
+complaining about things where being more explicit would make things
+less readable.
+
+For example, if complains about the multiplication in this line:
+
+    s->file_size += n * s->header.cluster_size;
+
+We know that n * s->header.cluster_size fits in 32 bits, but
+s->file_size is 64 bits (and has to be 64 bits). Do you really think we
+should introduce another uint32_t variable to store the intermediate
+result? And if we cast n to uint64_t, not only might the multiplication
+cost more time, but also human readers would wonder why the result could
+become larger than 32 bits. So a cast would be misleading.
+
+
+It also complains about this line:
+
+    ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size,
+                        PREALLOC_MODE_OFF, &local_err);
+
+Here, we don't even assign the result to a 64 bit variable, but just
+pass it to a function which takes a 64 bit parameter. Again, I don't
+think introducing additional variables for the intermediate result or
+adding casts would be an improvement of the situation.
+
+
+So I don't think this is a good enough tool to base our code on what it
+does and doesn't understand. It would have too much of a negative impact
+on our code. We'd rather need a way to mark false positives as such and
+move on without changing the code in such cases.
+
+Kevin
+
+On Sat, 13 Jul 2019 at 18:46, Stefan Weil <address@hidden> wrote:
+>
+LGTM reports 16 errors, 81 warnings and 119 recommendations:
+>
+https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
+.
+I had a look at some of these before, but mostly I came
+to the conclusion that it wasn't worth trying to put the
+effort into keeping up with the site because they didn't
+seem to provide any useful way to mark things as false
+positives. Coverity has its flaws but at least you can do
+that kind of thing in its UI (it runs at about a 33% fp
+rate, I think.) "Analyzer thinks this multiply can overflow
+but in fact it's not possible" is quite a common false
+positive cause...
+
+Anyway, if you want to fish out specific issues, analyse
+whether they're false positive or real, and report them
+to the mailing list as followups to the patches which
+introduced the issue, that's probably the best way for
+us to make use of this analyzer. (That is essentially
+what I do for coverity.)
+
+thanks
+-- PMM
+
+Am 14.07.2019 um 19:30 schrieb Peter Maydell:
+[...]
+>
+"Analyzer thinks this multiply can overflow
+>
+but in fact it's not possible" is quite a common false
+>
+positive cause...
+The analysers don't complain because a multiply can overflow.
+
+They complain because the code indicates that a larger result is
+expected, for example uint64_t = uint32_t * uint32_t. They would not
+complain for the same multiplication if it were assigned to a uint32_t.
+
+So there is a simple solution to write the code in a way which avoids
+false positives...
+
+Stefan
+
+Stefan Weil <address@hidden> writes:
+
+>
+Am 14.07.2019 um 19:30 schrieb Peter Maydell:
+>
+[...]
+>
+> "Analyzer thinks this multiply can overflow
+>
+> but in fact it's not possible" is quite a common false
+>
+> positive cause...
+>
+>
+>
+The analysers don't complain because a multiply can overflow.
+>
+>
+They complain because the code indicates that a larger result is
+>
+expected, for example uint64_t = uint32_t * uint32_t. They would not
+>
+complain for the same multiplication if it were assigned to a uint32_t.
+I agree this is an anti-pattern.
+
+>
+So there is a simple solution to write the code in a way which avoids
+>
+false positives...
+You wrote elsewhere in this thread:
+
+    Either the assigned value should use the same data type as the
+    factors (possible when there is never an overflow, avoids a size
+    extension), or the multiplication could use the larger data type by
+    adding a type cast to one of the factors (then an overflow cannot
+    happen, static code analysers and human reviewers have an easier
+    job, but the multiplication costs more time).
+
+Makes sense to me.
+
+On 7/14/19 5:30 PM, Peter Maydell wrote:
+>
+I had a look at some of these before, but mostly I came
+>
+to the conclusion that it wasn't worth trying to put the
+>
+effort into keeping up with the site because they didn't
+>
+seem to provide any useful way to mark things as false
+>
+positives. Coverity has its flaws but at least you can do
+>
+that kind of thing in its UI (it runs at about a 33% fp
+>
+rate, I think.)
+Yes, LGTM wants you to modify the source code with
+
+  /* lgtm [cpp/some-warning-code] */
+
+and on the same line as the reported problem.  Which is mildly annoying in that
+you're definitely committing to LGTM in the long term.  Also for any
+non-trivial bit of code, it will almost certainly run over 80 columns.
+
+
+r~
+
diff --git a/results/classifier/016/none/42613410 b/results/classifier/016/none/42613410
new file mode 100644
index 00000000..387e80bd
--- /dev/null
+++ b/results/classifier/016/none/42613410
@@ -0,0 +1,176 @@
+network: 0.116
+x86: 0.043
+TCG: 0.038
+operating system: 0.031
+files: 0.031
+register: 0.030
+socket: 0.029
+virtual: 0.026
+i386: 0.021
+ppc: 0.020
+PID: 0.020
+VMM: 0.020
+hypervisor: 0.018
+arm: 0.018
+device: 0.017
+risc-v: 0.016
+alpha: 0.016
+boot: 0.013
+vnc: 0.013
+semantic: 0.012
+debug: 0.010
+KVM: 0.006
+kernel: 0.005
+user-level: 0.005
+performance: 0.004
+peripherals: 0.003
+architecture: 0.003
+permissions: 0.002
+graphic: 0.002
+assembly: 0.001
+mistranslation: 0.001
+
+[Qemu-devel] [PATCH, Bug 1612908] scripts: Add TCP endpoints for qom-* scripts
+
+From: Carl Allendorph <address@hidden>
+
+I've created a patch for bug #1612908. The current docs for the scripts
+in the "scripts/qmp/" directory suggest that both unix sockets and
+tcp endpoints can be used. The TCP endpoints don't work for most of the
+scripts, with notable exception of 'qmp-shell'. This patch attempts to
+refactor the process of distinguishing between unix path endpoints and
+tcp endpoints to work for all of these scripts.
+
+Carl Allendorph (1):
+  scripts: Add ability for qom-* python scripts to target tcp endpoints
+
+ scripts/qmp/qmp-shell | 22 ++--------------------
+ scripts/qmp/qmp.py    | 23 ++++++++++++++++++++---
+ 2 files changed, 22 insertions(+), 23 deletions(-)
+
+--
+2.7.4
+
+From: Carl Allendorph <address@hidden>
+
+The current code for QEMUMonitorProtocol accepts both a unix socket
+endpoint as a string and a tcp endpoint as a tuple. Most of the scripts
+that use this class don't massage the command line argument to generate
+a tuple. This patch refactors qmp-shell slightly to reuse the existing
+parsing of the "host:port" string for all the qom-* scripts.
+
+Signed-off-by: Carl Allendorph <address@hidden>
+---
+ scripts/qmp/qmp-shell | 22 ++--------------------
+ scripts/qmp/qmp.py    | 23 ++++++++++++++++++++---
+ 2 files changed, 22 insertions(+), 23 deletions(-)
+
+diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
+index 0373b24..8a2a437 100755
+--- a/scripts/qmp/qmp-shell
++++ b/scripts/qmp/qmp-shell
+@@ -83,9 +83,6 @@ class QMPCompleter(list):
+ class QMPShellError(Exception):
+     pass
+ 
+-class QMPShellBadPort(QMPShellError):
+-    pass
+-
+ class FuzzyJSON(ast.NodeTransformer):
+     '''This extension of ast.NodeTransformer filters literal "true/false/null"
+     values in an AST and replaces them by proper "True/False/None" values that
+@@ -103,28 +100,13 @@ class FuzzyJSON(ast.NodeTransformer):
+ #       _execute_cmd()). Let's design a better one.
+ class QMPShell(qmp.QEMUMonitorProtocol):
+     def __init__(self, address, pretty=False):
+-        qmp.QEMUMonitorProtocol.__init__(self, self.__get_address(address))
++        qmp.QEMUMonitorProtocol.__init__(self, address)
+         self._greeting = None
+         self._completer = None
+         self._pretty = pretty
+         self._transmode = False
+         self._actions = list()
+ 
+-    def __get_address(self, arg):
+-        """
+-        Figure out if the argument is in the port:host form, if it's not it's
+-        probably a file path.
+-        """
+-        addr = arg.split(':')
+-        if len(addr) == 2:
+-            try:
+-                port = int(addr[1])
+-            except ValueError:
+-                raise QMPShellBadPort
+-            return ( addr[0], port )
+-        # socket path
+-        return arg
+-
+     def _fill_completion(self):
+         for cmd in self.cmd('query-commands')['return']:
+             self._completer.append(cmd['name'])
+@@ -400,7 +382,7 @@ def main():
+ 
+         if qemu is None:
+             fail_cmdline()
+-    except QMPShellBadPort:
++    except qmp.QMPShellBadPort:
+         die('bad port number in command-line')
+ 
+     try:
+diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py
+index 62d3651..261ece8 100644
+--- a/scripts/qmp/qmp.py
++++ b/scripts/qmp/qmp.py
+@@ -25,21 +25,23 @@ class QMPCapabilitiesError(QMPError):
+ class QMPTimeoutError(QMPError):
+     pass
+ 
++class QMPShellBadPort(QMPError):
++    pass
++
+ class QEMUMonitorProtocol:
+     def __init__(self, address, server=False, debug=False):
+         """
+         Create a QEMUMonitorProtocol class.
+ 
+         @param address: QEMU address, can be either a unix socket path (string)
+-                        or a tuple in the form ( address, port ) for a TCP
+-                        connection
++                        or a TCP endpoint (string in the format "host:port")
+         @param server: server mode listens on the socket (bool)
+         @raise socket.error on socket connection errors
+         @note No connection is established, this is done by the connect() or
+               accept() methods
+         """
+         self.__events = []
+-        self.__address = address
++        self.__address = self.__get_address(address)
+         self._debug = debug
+         self.__sock = self.__get_sock()
+         if server:
+@@ -47,6 +49,21 @@ class QEMUMonitorProtocol:
+             self.__sock.bind(self.__address)
+             self.__sock.listen(1)
+ 
++    def __get_address(self, arg):
++        """
++        Figure out if the argument is in the port:host form, if it's not it's
++        probably a file path.
++        """
++        addr = arg.split(':')
++        if len(addr) == 2:
++            try:
++                port = int(addr[1])
++            except ValueError:
++                raise QMPShellBadPort
++            return ( addr[0], port )
++        # socket path
++        return arg
++
+     def __get_sock(self):
+         if isinstance(self.__address, tuple):
+             family = socket.AF_INET
+-- 
+2.7.4
+
diff --git a/results/classifier/016/none/42974450 b/results/classifier/016/none/42974450
new file mode 100644
index 00000000..9ab3582a
--- /dev/null
+++ b/results/classifier/016/none/42974450
@@ -0,0 +1,456 @@
+operating system: 0.713
+kernel: 0.463
+debug: 0.442
+hypervisor: 0.390
+x86: 0.334
+virtual: 0.259
+files: 0.192
+TCG: 0.182
+register: 0.171
+device: 0.116
+KVM: 0.071
+i386: 0.064
+VMM: 0.054
+PID: 0.052
+ppc: 0.049
+boot: 0.047
+assembly: 0.037
+architecture: 0.035
+socket: 0.033
+network: 0.028
+user-level: 0.023
+risc-v: 0.023
+arm: 0.022
+semantic: 0.017
+vnc: 0.014
+alpha: 0.007
+peripherals: 0.007
+performance: 0.005
+permissions: 0.004
+graphic: 0.002
+mistranslation: 0.001
+
+[Bug Report] Possible Missing Endianness Conversion
+
+The virtio packed virtqueue support patch[1] suggests converting
+endianness by lines:
+
+virtio_tswap16s(vdev, &e->off_wrap);
+virtio_tswap16s(vdev, &e->flags);
+
+Though both of these conversion statements aren't present in the
+latest qemu code here[2]
+
+Is this intentional?
+
+[1]:
+https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html
+[2]:
+https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314
+
+CCing Jason.
+
+On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
+>
+>
+The virtio packed virtqueue support patch[1] suggests converting
+>
+endianness by lines:
+>
+>
+virtio_tswap16s(vdev, &e->off_wrap);
+>
+virtio_tswap16s(vdev, &e->flags);
+>
+>
+Though both of these conversion statements aren't present in the
+>
+latest qemu code here[2]
+>
+>
+Is this intentional?
+Good catch!
+
+It looks like it was removed (maybe by mistake) by commit
+d152cdd6f6 ("virtio: use virtio accessor to access packed event")
+
+Jason can you confirm that?
+
+Thanks,
+Stefano
+
+>
+>
+[1]:
+https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html
+>
+[2]:
+https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314
+>
+
+On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
+>
+>
+CCing Jason.
+>
+>
+On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
+>
+>
+>
+> The virtio packed virtqueue support patch[1] suggests converting
+>
+> endianness by lines:
+>
+>
+>
+> virtio_tswap16s(vdev, &e->off_wrap);
+>
+> virtio_tswap16s(vdev, &e->flags);
+>
+>
+>
+> Though both of these conversion statements aren't present in the
+>
+> latest qemu code here[2]
+>
+>
+>
+> Is this intentional?
+>
+>
+Good catch!
+>
+>
+It looks like it was removed (maybe by mistake) by commit
+>
+d152cdd6f6 ("virtio: use virtio accessor to access packed event")
+That commit changes from:
+
+-    address_space_read_cached(cache, off_off, &e->off_wrap,
+-                              sizeof(e->off_wrap));
+-    virtio_tswap16s(vdev, &e->off_wrap);
+
+which does a byte read of 2 bytes and then swaps the bytes
+depending on the host endianness and the value of
+virtio_access_is_big_endian()
+
+to this:
+
++    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+
+virtio_lduw_phys_cached() is a small function which calls
+either lduw_be_phys_cached() or lduw_le_phys_cached()
+depending on the value of virtio_access_is_big_endian().
+(And lduw_be_phys_cached() and lduw_le_phys_cached() do
+the right thing for the host-endianness to do a "load
+a specifically big or little endian 16-bit value".)
+
+Which is to say that because we use a load/store function that's
+explicit about the size of the data type it is accessing, the
+function itself can handle doing the load as big or little
+endian, rather than the calling code having to do a manual swap after
+it has done a load-as-bag-of-bytes. This is generally preferable
+as it's less error-prone.
+
+(Explicit swap-after-loading still has a place where the
+code is doing a load of a whole structure out of the
+guest and then swapping each struct field after the fact,
+because it means we can do a single load-from-guest-memory
+rather than a whole sequence of calls all the way down
+through the memory subsystem.)
+
+thanks
+-- PMM
+
+On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
+On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
+CCing Jason.
+
+On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
+>
+> The virtio packed virtqueue support patch[1] suggests converting
+> endianness by lines:
+>
+> virtio_tswap16s(vdev, &e->off_wrap);
+> virtio_tswap16s(vdev, &e->flags);
+>
+> Though both of these conversion statements aren't present in the
+> latest qemu code here[2]
+>
+> Is this intentional?
+
+Good catch!
+
+It looks like it was removed (maybe by mistake) by commit
+d152cdd6f6 ("virtio: use virtio accessor to access packed event")
+That commit changes from:
+
+-    address_space_read_cached(cache, off_off, &e->off_wrap,
+-                              sizeof(e->off_wrap));
+-    virtio_tswap16s(vdev, &e->off_wrap);
+
+which does a byte read of 2 bytes and then swaps the bytes
+depending on the host endianness and the value of
+virtio_access_is_big_endian()
+
+to this:
+
++    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+
+virtio_lduw_phys_cached() is a small function which calls
+either lduw_be_phys_cached() or lduw_le_phys_cached()
+depending on the value of virtio_access_is_big_endian().
+(And lduw_be_phys_cached() and lduw_le_phys_cached() do
+the right thing for the host-endianness to do a "load
+a specifically big or little endian 16-bit value".)
+
+Which is to say that because we use a load/store function that's
+explicit about the size of the data type it is accessing, the
+function itself can handle doing the load as big or little
+endian, rather than the calling code having to do a manual swap after
+it has done a load-as-bag-of-bytes. This is generally preferable
+as it's less error-prone.
+Thanks for the details!
+
+So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
+
+I mean:
+diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
+index 893a072c9d..2e5e67bdb9 100644
+--- a/hw/virtio/virtio.c
++++ b/hw/virtio/virtio.c
+@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
+     /* Make sure flags is seen before off_wrap */
+     smp_rmb();
+     e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+-    virtio_tswap16s(vdev, &e->flags);
+ }
+
+ static void vring_packed_off_wrap_write(VirtIODevice *vdev,
+
+Thanks,
+Stefano
+(Explicit swap-after-loading still has a place where the
+code is doing a load of a whole structure out of the
+guest and then swapping each struct field after the fact,
+because it means we can do a single load-from-guest-memory
+rather than a whole sequence of calls all the way down
+through the memory subsystem.)
+
+thanks
+-- PMM
+
+On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote:
+>
+>
+On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
+>
+>On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
+>
+>>
+>
+>> CCing Jason.
+>
+>>
+>
+>> On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
+>
+>> >
+>
+>> > The virtio packed virtqueue support patch[1] suggests converting
+>
+>> > endianness by lines:
+>
+>> >
+>
+>> > virtio_tswap16s(vdev, &e->off_wrap);
+>
+>> > virtio_tswap16s(vdev, &e->flags);
+>
+>> >
+>
+>> > Though both of these conversion statements aren't present in the
+>
+>> > latest qemu code here[2]
+>
+>> >
+>
+>> > Is this intentional?
+>
+>>
+>
+>> Good catch!
+>
+>>
+>
+>> It looks like it was removed (maybe by mistake) by commit
+>
+>> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
+>
+>
+>
+>That commit changes from:
+>
+>
+>
+>-    address_space_read_cached(cache, off_off, &e->off_wrap,
+>
+>-                              sizeof(e->off_wrap));
+>
+>-    virtio_tswap16s(vdev, &e->off_wrap);
+>
+>
+>
+>which does a byte read of 2 bytes and then swaps the bytes
+>
+>depending on the host endianness and the value of
+>
+>virtio_access_is_big_endian()
+>
+>
+>
+>to this:
+>
+>
+>
+>+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+>
+>
+>
+>virtio_lduw_phys_cached() is a small function which calls
+>
+>either lduw_be_phys_cached() or lduw_le_phys_cached()
+>
+>depending on the value of virtio_access_is_big_endian().
+>
+>(And lduw_be_phys_cached() and lduw_le_phys_cached() do
+>
+>the right thing for the host-endianness to do a "load
+>
+>a specifically big or little endian 16-bit value".)
+>
+>
+>
+>Which is to say that because we use a load/store function that's
+>
+>explicit about the size of the data type it is accessing, the
+>
+>function itself can handle doing the load as big or little
+>
+>endian, rather than the calling code having to do a manual swap after
+>
+>it has done a load-as-bag-of-bytes. This is generally preferable
+>
+>as it's less error-prone.
+>
+>
+Thanks for the details!
+>
+>
+So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
+>
+>
+I mean:
+>
+diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
+>
+index 893a072c9d..2e5e67bdb9 100644
+>
+--- a/hw/virtio/virtio.c
+>
++++ b/hw/virtio/virtio.c
+>
+@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
+>
+/* Make sure flags is seen before off_wrap */
+>
+smp_rmb();
+>
+e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+>
+-    virtio_tswap16s(vdev, &e->flags);
+>
+}
+That definitely looks like it's probably not correct...
+
+-- PMM
+
+On Fri, Jun 28, 2024 at 03:53:09PM GMT, Peter Maydell wrote:
+On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote:
+On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
+>On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
+>>
+>> CCing Jason.
+>>
+>> On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
+>> >
+>> > The virtio packed virtqueue support patch[1] suggests converting
+>> > endianness by lines:
+>> >
+>> > virtio_tswap16s(vdev, &e->off_wrap);
+>> > virtio_tswap16s(vdev, &e->flags);
+>> >
+>> > Though both of these conversion statements aren't present in the
+>> > latest qemu code here[2]
+>> >
+>> > Is this intentional?
+>>
+>> Good catch!
+>>
+>> It looks like it was removed (maybe by mistake) by commit
+>> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
+>
+>That commit changes from:
+>
+>-    address_space_read_cached(cache, off_off, &e->off_wrap,
+>-                              sizeof(e->off_wrap));
+>-    virtio_tswap16s(vdev, &e->off_wrap);
+>
+>which does a byte read of 2 bytes and then swaps the bytes
+>depending on the host endianness and the value of
+>virtio_access_is_big_endian()
+>
+>to this:
+>
+>+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+>
+>virtio_lduw_phys_cached() is a small function which calls
+>either lduw_be_phys_cached() or lduw_le_phys_cached()
+>depending on the value of virtio_access_is_big_endian().
+>(And lduw_be_phys_cached() and lduw_le_phys_cached() do
+>the right thing for the host-endianness to do a "load
+>a specifically big or little endian 16-bit value".)
+>
+>Which is to say that because we use a load/store function that's
+>explicit about the size of the data type it is accessing, the
+>function itself can handle doing the load as big or little
+>endian, rather than the calling code having to do a manual swap after
+>it has done a load-as-bag-of-bytes. This is generally preferable
+>as it's less error-prone.
+
+Thanks for the details!
+
+So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
+
+I mean:
+diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
+index 893a072c9d..2e5e67bdb9 100644
+--- a/hw/virtio/virtio.c
++++ b/hw/virtio/virtio.c
+@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
+      /* Make sure flags is seen before off_wrap */
+      smp_rmb();
+      e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+-    virtio_tswap16s(vdev, &e->flags);
+  }
+That definitely looks like it's probably not correct...
+Yeah, I just sent that patch:
+20240701075208.19634-1-sgarzare@redhat.com
+">https://lore.kernel.org/qemu-devel/
+20240701075208.19634-1-sgarzare@redhat.com
+We can continue the discussion there.
+
+Thanks,
+Stefano
+
diff --git a/results/classifier/016/none/48245039 b/results/classifier/016/none/48245039
new file mode 100644
index 00000000..913c2333
--- /dev/null
+++ b/results/classifier/016/none/48245039
@@ -0,0 +1,557 @@
+user-level: 0.787
+performance: 0.642
+operating system: 0.416
+risc-v: 0.375
+debug: 0.341
+x86: 0.185
+TCG: 0.172
+ppc: 0.166
+device: 0.139
+arm: 0.119
+VMM: 0.111
+boot: 0.111
+files: 0.104
+PID: 0.099
+vnc: 0.095
+register: 0.088
+socket: 0.085
+network: 0.081
+i386: 0.071
+alpha: 0.059
+hypervisor: 0.056
+virtual: 0.055
+peripherals: 0.054
+kernel: 0.026
+semantic: 0.025
+architecture: 0.011
+KVM: 0.010
+mistranslation: 0.005
+assembly: 0.004
+graphic: 0.004
+permissions: 0.002
+
+[Qemu-devel] [BUG] gcov support appears to be broken
+
+Hello, according to out docs, here is the procedure that should produce 
+coverage report for execution of the complete "make check":
+
+#./configure --enable-gcov
+#make
+#make check
+#make coverage-report
+
+It seems that first three commands execute as expected. (For example, there are 
+plenty of files generated by "make check" that would've not been generated if 
+"enable-gcov" hadn't been chosen.) However, the last command complains about 
+some missing files related to FP support. If those files are added (for 
+example, artificially, using "touch <missing-file"), that it starts complaining 
+about missing some decodetree-generated files. Other kinds of files are 
+involved too.
+
+It would be nice to have coverage support working. Please somebody take a look, 
+or explain if I make a mistake or misunderstood our gcov support.
+
+Yours,
+Aleksandar
+
+On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote:
+>
+>
+Hello, according to out docs, here is the procedure that should produce
+>
+coverage report for execution of the complete "make check":
+>
+>
+#./configure --enable-gcov
+>
+#make
+>
+#make check
+>
+#make coverage-report
+>
+>
+It seems that first three commands execute as expected. (For example, there
+>
+are plenty of files generated by "make check" that would've not been
+>
+generated if "enable-gcov" hadn't been chosen.) However, the last command
+>
+complains about some missing files related to FP support. If those files are
+>
+added (for example, artificially, using "touch <missing-file"), that it
+>
+starts complaining about missing some decodetree-generated files. Other kinds
+>
+of files are involved too.
+>
+>
+It would be nice to have coverage support working. Please somebody take a
+>
+look, or explain if I make a mistake or misunderstood our gcov support.
+Cc'ing Alex who's probably the closest we have to a gcov expert.
+
+(make/make check of a --enable-gcov build is in the set of things our
+Travis CI setup runs, so we do defend that part against regressions.)
+
+thanks
+-- PMM
+
+Peter Maydell <address@hidden> writes:
+
+>
+On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote:
+>
+>
+>
+> Hello, according to out docs, here is the procedure that should produce
+>
+> coverage report for execution of the complete "make check":
+>
+>
+>
+> #./configure --enable-gcov
+>
+> #make
+>
+> #make check
+>
+> #make coverage-report
+>
+>
+>
+> It seems that first three commands execute as expected. (For example,
+>
+> there are plenty of files generated by "make check" that would've not
+>
+> been generated if "enable-gcov" hadn't been chosen.) However, the
+>
+> last command complains about some missing files related to FP
+>
+> support. If those files are added (for example, artificially, using
+>
+> "touch <missing-file"), that it starts complaining about missing some
+>
+> decodetree-generated files. Other kinds of files are involved too.
+The gcov tool is fairly noisy about missing files but that just
+indicates the tests haven't exercised those code paths. "make check"
+especially doesn't touch much of the TCG code and a chunk of floating
+point.
+
+>
+>
+>
+> It would be nice to have coverage support working. Please somebody
+>
+> take a look, or explain if I make a mistake or misunderstood our gcov
+>
+> support.
+So your failure mode is no report is generated at all? It's working for
+me here.
+
+>
+>
+Cc'ing Alex who's probably the closest we have to a gcov expert.
+>
+>
+(make/make check of a --enable-gcov build is in the set of things our
+>
+Travis CI setup runs, so we do defend that part against regressions.)
+We defend the build but I have just checked and it seems our
+check_coverage script is currently failing:
+https://travis-ci.org/stsquad/qemu/jobs/567809808#L10328
+But as it's an after_success script it doesn't fail the build.
+
+>
+>
+thanks
+>
+-- PMM
+--
+Alex BennÃ©e
+
+>
+> #./configure --enable-gcov
+>
+> #make
+>
+> #make check
+>
+> #make coverage-report
+>
+>
+>
+> It seems that first three commands execute as expected. (For example,
+>
+> there are plenty of files generated by "make check" that would've not
+>
+> been generated if "enable-gcov" hadn't been chosen.) However, the
+>
+> last command complains about some missing files related to FP
+>
+So your failure mode is no report is generated at all? It's working for
+>
+me here.
+Alex, no report is generated for my test setups - in fact, "make 
+coverage-report" even says that it explicitly deletes what appears to be the 
+main coverage report html file).
+
+This is the terminal output of an unsuccessful executions of "make 
+coverage-report" for recent ToT:
+
+~/Build/qemu-TOT-TEST$ make coverage-report
+make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp'
+make[1]: Nothing to be done for 'all'.
+make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp'
+        CHK version_gen.h
+  GEN     coverage-report.html
+Traceback (most recent call last):
+  File "/usr/bin/gcovr", line 1970, in <module>
+    print_html_report(covdata, options.html_details)
+  File "/usr/bin/gcovr", line 1473, in print_html_report
+    INPUT = open(data['FILENAME'], 'r')
+IOError: [Errno 2] No such file or directory: 'wrap.inc.c'
+Makefile:1048: recipe for target 
+'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed
+make: *** 
+[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1
+make: *** Deleting file 
+'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html'
+
+This instance is executed in QEMU 3.0 source tree: (so, it looks the problem 
+existed for quite some time)
+
+~/Build/qemu-3.0$ make coverage-report
+        CHK version_gen.h
+  GEN     coverage-report.html
+Traceback (most recent call last):
+  File "/usr/bin/gcovr", line 1970, in <module>
+    print_html_report(covdata, options.html_details)
+  File "/usr/bin/gcovr", line 1473, in print_html_report
+    INPUT = open(data['FILENAME'], 'r')
+IOError: [Errno 2] No such file or directory: 
+'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c'
+Makefile:992: recipe for target 
+'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed
+make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] 
+Error 1
+make: *** Deleting file 
+'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html'
+
+Fond regards,
+Aleksandar
+
+
+>
+Alex BennÃ©e
+
+>
+> #./configure --enable-gcov
+>
+> #make
+>
+> #make check
+>
+> #make coverage-report
+>
+>
+>
+> It seems that first three commands execute as expected. (For example,
+>
+> there are plenty of files generated by "make check" that would've not
+>
+> been generated if "enable-gcov" hadn't been chosen.) However, the
+>
+> last command complains about some missing files related to FP
+>
+So your failure mode is no report is generated at all? It's working for
+>
+me here.
+Another piece of info:
+
+~/Build/qemu-TOT-TEST$ gcov --version
+gcov (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010
+Copyright (C) 2015 Free Software Foundation, Inc.
+This is free software; see the source for copying conditions.
+There is NO warranty; not even for MERCHANTABILITY or 
+FITNESS FOR A PARTICULAR PURPOSE.
+
+:~/Build/qemu-TOT-TEST$ gcc --version
+gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
+Copyright (C) 2017 Free Software Foundation, Inc.
+This is free software; see the source for copying conditions.  There is NO
+warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+Alex, no report is generated for my test setups - in fact, "make 
+coverage-report" even says that it explicitly deletes what appears to be the 
+main coverage report html file).
+
+This is the terminal output of an unsuccessful executions of "make 
+coverage-report" for recent ToT:
+
+~/Build/qemu-TOT-TEST$ make coverage-report
+make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp'
+make[1]: Nothing to be done for 'all'.
+make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp'
+        CHK version_gen.h
+  GEN     coverage-report.html
+Traceback (most recent call last):
+  File "/usr/bin/gcovr", line 1970, in <module>
+    print_html_report(covdata, options.html_details)
+  File "/usr/bin/gcovr", line 1473, in print_html_report
+    INPUT = open(data['FILENAME'], 'r')
+IOError: [Errno 2] No such file or directory: 'wrap.inc.c'
+Makefile:1048: recipe for target 
+'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed
+make: *** 
+[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1
+make: *** Deleting file 
+'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html'
+
+This instance is executed in QEMU 3.0 source tree: (so, it looks the problem 
+existed for quite some time)
+
+~/Build/qemu-3.0$ make coverage-report
+        CHK version_gen.h
+  GEN     coverage-report.html
+Traceback (most recent call last):
+  File "/usr/bin/gcovr", line 1970, in <module>
+    print_html_report(covdata, options.html_details)
+  File "/usr/bin/gcovr", line 1473, in print_html_report
+    INPUT = open(data['FILENAME'], 'r')
+IOError: [Errno 2] No such file or directory: 
+'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c'
+Makefile:992: recipe for target 
+'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed
+make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] 
+Error 1
+make: *** Deleting file 
+'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html'
+
+Fond regards,
+Aleksandar
+
+
+>
+Alex BennÃ©e
+
+>
+> #./configure --enable-gcov
+>
+> #make
+>
+> #make check
+>
+> #make coverage-report
+>
+>
+>
+> It seems that first three commands execute as expected. (For example,
+>
+> there are plenty of files generated by "make check" that would've not
+>
+> been generated if "enable-gcov" hadn't been chosen.) However, the
+>
+> last command complains about some missing files related to FP
+>
+So your failure mode is no report is generated at all? It's working for
+>
+me here.
+Alex, here is the thing:
+
+Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from 
+git repo to the most recent 4.1 (actually, to a dev version, from the very tip 
+of the tree), and "make coverage-report" started generating coverage reports. 
+It did emit some error messages (totally different than previous), but still it 
+did not stop like it used to do with gcovr 3.2.
+
+Perhaps you would want to add some gcov/gcovr minimal version info in our docs. 
+(or at least a statement "this was tested with such and such gcc, gcov and 
+gcovr", etc.?)
+
+Coverage report looked fine at first glance, but it a kind of disappointed me 
+when I digged deeper into its content - for example, it shows very low coverage 
+for our FP code (softfloat), while, in fact, we know that "make check" contains 
+detailed tests on FP functionalities. But this is most likely a separate 
+problem of a very different nature, perhaps the issue of separate git repo for 
+FP tests (testfloat) that our FP tests use as a mid-layer.
+
+I'll try how everything works with my test examples, and will let you know.
+
+Your help is greatly appreciated,
+Aleksandar
+
+Fond regards,
+Aleksandar
+
+
+>
+Alex BennÃ©e
+
+Aleksandar Markovic <address@hidden> writes:
+
+>
+>> #./configure --enable-gcov
+>
+>> #make
+>
+>> #make check
+>
+>> #make coverage-report
+>
+>>
+>
+>> It seems that first three commands execute as expected. (For example,
+>
+>> there are plenty of files generated by "make check" that would've not
+>
+>> been generated if "enable-gcov" hadn't been chosen.) However, the
+>
+>> last command complains about some missing files related to FP
+>
+>
+> So your failure mode is no report is generated at all? It's working for
+>
+> me here.
+>
+>
+Alex, here is the thing:
+>
+>
+Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from
+>
+git repo to the most recent 4.1 (actually, to a dev version, from the very
+>
+tip of the tree), and "make coverage-report" started generating coverage
+>
+reports. It did emit some error messages (totally different than previous),
+>
+but still it did not stop like it used to do with gcovr 3.2.
+>
+>
+Perhaps you would want to add some gcov/gcovr minimal version info in our
+>
+docs. (or at least a statement "this was tested with such and such gcc, gcov
+>
+and gcovr", etc.?)
+>
+>
+Coverage report looked fine at first glance, but it a kind of
+>
+disappointed me when I digged deeper into its content - for example,
+>
+it shows very low coverage for our FP code (softfloat), while, in
+>
+fact, we know that "make check" contains detailed tests on FP
+>
+functionalities. But this is most likely a separate problem of a very
+>
+different nature, perhaps the issue of separate git repo for FP tests
+>
+(testfloat) that our FP tests use as a mid-layer.
+I get:
+
+68.6 %  2593 / 3782     62.2 %  1690 / 2718
+
+Which is not bad considering we don't exercise the 80 and 128 bit
+softfloat code at all (which is not shared by the re-factored 16/32/64
+bit code).
+
+>
+>
+I'll try how everything works with my test examples, and will let you know.
+>
+>
+Your help is greatly appreciated,
+>
+Aleksandar
+>
+>
+Fond regards,
+>
+Aleksandar
+>
+>
+>
+> Alex BennÃ©e
+--
+Alex BennÃ©e
+
+>
+> it shows very low coverage for our FP code (softfloat), while, in
+>
+> fact, we know that "make check" contains detailed tests on FP
+>
+> functionalities. But this is most likely a separate problem of a very
+>
+> different nature, perhaps the issue of separate git repo for FP tests
+>
+> (testfloat) that our FP tests use as a mid-layer.
+>
+>
+I get:
+>
+>
+68.6 %  2593 / 3782     62.2 %  1690 / 2718
+>
+I would expect that kind of result too.
+
+However, I get:
+
+File:   fpu/softfloat.c                 Lines:  8       3334    0.2 %
+Date:   2019-08-05 19:56:58             Branches:       3       2376    0.1 %
+
+:(
+
+OK, I'll try to figure that out, and most likely I could live with it if it is 
+an isolated problem.
+
+Thank you for your assistance in this matter,
+Aleksandar
+
+>
+Which is not bad considering we don't exercise the 80 and 128 bit
+>
+softfloat code at all (which is not shared by the re-factored 16/32/64
+>
+bit code).
+>
+>
+Alex BennÃ©e
+
+>
+> it shows very low coverage for our FP code (softfloat), while, in
+>
+> fact, we know that "make check" contains detailed tests on FP
+>
+> functionalities. But this is most likely a separate problem of a very
+>
+> different nature, perhaps the issue of separate git repo for FP tests
+>
+> (testfloat) that our FP tests use as a mid-layer.
+>
+>
+I get:
+>
+>
+68.6 %  2593 / 3782     62.2 %  1690 / 2718
+>
+This problem is solved too. (and it is my fault)
+
+I worked with multiple versions of QEMU, and my previous low-coverage results 
+were for QEMU 3.0, and for that version the directory tests/fp did not even 
+exist. :D (<blush>)
+
+For QEMU ToT, I get now:
+
+fpu/softfloat.c         
+        68.8 %  2592 / 3770     62.3 %  1693 / 2718
+
+which is identical for all intents and purposes to your result.
+
+Yours cordially,
+Aleksandar
+
diff --git a/results/classifier/016/none/50773216 b/results/classifier/016/none/50773216
new file mode 100644
index 00000000..5a856c2f
--- /dev/null
+++ b/results/classifier/016/none/50773216
@@ -0,0 +1,137 @@
+virtual: 0.431
+debug: 0.366
+register: 0.178
+x86: 0.170
+vnc: 0.116
+operating system: 0.087
+hypervisor: 0.084
+files: 0.082
+PID: 0.068
+TCG: 0.058
+i386: 0.053
+network: 0.041
+user-level: 0.040
+performance: 0.039
+kernel: 0.038
+semantic: 0.031
+socket: 0.027
+alpha: 0.027
+ppc: 0.026
+device: 0.018
+boot: 0.017
+permissions: 0.007
+assembly: 0.006
+arm: 0.006
+peripherals: 0.004
+risc-v: 0.004
+VMM: 0.004
+graphic: 0.003
+architecture: 0.003
+KVM: 0.002
+mistranslation: 0.002
+
+[Qemu-devel] Can I have someone's feedback on [bug 1809075] Concurrency bug on keyboard events: capslock LED messing up keycode streams causes character misses at guest kernel
+
+Hi everyone.
+Can I please have someone's feedback on this bug?
+https://bugs.launchpad.net/qemu/+bug/1809075
+Briefly, guest OS loses characters sent to it via vnc. And I spot the
+bug in relation to ps2 driver.
+I'm thinking of possible fixes and I might want to use a memory barrier.
+But I would really like to have some suggestion from a qemu developer
+first. For example, can we brutally drop capslock LED key events in ps2
+queue?
+It is actually relevant to openQA, an automated QA tool for openSUSE.
+And this bug blocks a few test cases for us.
+Thank you in advance!
+
+Kind regards,
+Gao Zhiyuan
+
+Cc'ing Marc-AndrÃ© & Gerd.
+
+On 12/19/18 10:31 AM, Gao Zhiyuan wrote:
+>
+Hi everyone.
+>
+>
+Can I please have someone's feedback on this bug?
+>
+https://bugs.launchpad.net/qemu/+bug/1809075
+>
+Briefly, guest OS loses characters sent to it via vnc. And I spot the
+>
+bug in relation to ps2 driver.
+>
+>
+I'm thinking of possible fixes and I might want to use a memory barrier.
+>
+But I would really like to have some suggestion from a qemu developer
+>
+first. For example, can we brutally drop capslock LED key events in ps2
+>
+queue?
+>
+>
+It is actually relevant to openQA, an automated QA tool for openSUSE.
+>
+And this bug blocks a few test cases for us.
+>
+>
+Thank you in advance!
+>
+>
+Kind regards,
+>
+Gao Zhiyuan
+>
+
+On Thu, Jan 03, 2019 at 12:05:54PM +0100, Philippe Mathieu-DaudÃ© wrote:
+>
+Cc'ing Marc-AndrÃ© & Gerd.
+>
+>
+On 12/19/18 10:31 AM, Gao Zhiyuan wrote:
+>
+> Hi everyone.
+>
+>
+>
+> Can I please have someone's feedback on this bug?
+>
+>
+https://bugs.launchpad.net/qemu/+bug/1809075
+>
+> Briefly, guest OS loses characters sent to it via vnc. And I spot the
+>
+> bug in relation to ps2 driver.
+>
+>
+>
+> I'm thinking of possible fixes and I might want to use a memory barrier.
+>
+> But I would really like to have some suggestion from a qemu developer
+>
+> first. For example, can we brutally drop capslock LED key events in ps2
+>
+> queue?
+There is no "capslock LED key event".  0xfa is KBD_REPLY_ACK, and the
+device queues it in response to guest port writes.  Yes, the ack can
+race with actual key events.  But IMO that isn't a bug in qemu.
+
+Probably the linux kernel just throws away everything until it got the
+ack for the port write, and that way the key event gets lost.  On
+physical hardware you will not notice because it is next to impossible
+to type fast enough to hit the race window.
+
+So, go fix the kernel.
+
+Alternatively fix vncdotool to send uppercase letters properly with
+shift key pressed.  Then qemu wouldn't generate capslock key events
+(that happens because qemu thinks guest and host capslock state is out
+of sync) and the guests's capslock led update request wouldn't get into
+the way.
+
+cheers,
+  Gerd
+
diff --git a/results/classifier/016/none/55753058 b/results/classifier/016/none/55753058
new file mode 100644
index 00000000..7cfee9ee
--- /dev/null
+++ b/results/classifier/016/none/55753058
@@ -0,0 +1,320 @@
+x86: 0.784
+operating system: 0.778
+kernel: 0.648
+debug: 0.645
+user-level: 0.550
+hypervisor: 0.097
+files: 0.093
+performance: 0.091
+assembly: 0.071
+virtual: 0.065
+PID: 0.040
+TCG: 0.037
+register: 0.035
+ppc: 0.022
+semantic: 0.010
+network: 0.007
+device: 0.006
+boot: 0.005
+architecture: 0.004
+i386: 0.004
+alpha: 0.003
+arm: 0.003
+socket: 0.002
+permissions: 0.002
+risc-v: 0.002
+vnc: 0.002
+graphic: 0.002
+peripherals: 0.001
+VMM: 0.001
+mistranslation: 0.001
+KVM: 0.000
+
+[RESEND][BUG FIX HELP] QEMU main thread endlessly hangs in __ppoll()
+
+Hi Genius,
+I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may still
+exist in the mainline.
+Thanks in advance to heroes who can take a look and share understanding.
+
+The qemu main thread endlessly hangs in the handle of the qmp statement:
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
+'drive_del replication0' } }
+and we have the call trace looks like:
+#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
+timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
+sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
+#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
+__nfds=<optimized out>, __fds=<optimized out>)
+at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
+#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
+timeout=<optimized out>) at util/qemu-timer.c:348
+#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
+blocking=blocking@entry=true) at util/aio-posix.c:669
+#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
+ignore_bds_parents=false, parent=0x0, recursive=false,
+bs=0x55561138b0a0) at block/io.c:430
+#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
+parent=0x0, ignore_bds_parents=<optimized out>,
+poll=<optimized out>) at block/io.c:396
+#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
+child=0x7f36dc0ce380, errp=<optimized out>)
+at block/quorum.c:1063
+#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
+"colo-disk0", has_child=<optimized out>,
+child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
+errp=0x7ffc56c66f98) at blockdev.c:4494
+#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
+out>, ret=<optimized out>, errp=0x7ffc56c67018)
+at qapi/qapi-commands-block-core.c:1538
+#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
+allow_oob=<optimized out>, request=<optimized out>,
+cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
+#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
+out>, allow_oob=<optimized out>)
+at qapi/qmp-dispatch.c:175
+#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
+req=<optimized out>) at monitor/qmp.c:145
+#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized out>)
+at monitor/qmp.c:234
+#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
+util/async.c:117
+#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
+#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
+util/aio-posix.c:459
+#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
+callback=<optimized out>, user_data=<optimized out>)
+at util/async.c:260
+#17 0x00007f3c22302fbd in g_main_context_dispatch () from
+/lib/x86_64-linux-gnu/libglib-2.0.so.0
+#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
+#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
+#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
+#21 0x000055560ff600fe in main_loop () at vl.c:1814
+#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized out>,
+envp=<optimized out>) at vl.c:4503
+We found that we're doing endless check in the line of
+block/io.c:bdrv_do_drained_begin():
+BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
+and it turns out that the bdrv_drain_poll() always get true from:
+- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
+- AND atomic_read(&bs->in_flight)
+
+I personally think this is a deadlock issue in the a QEMU block layer
+(as we know, we have some #FIXME comments in related codes, such as block
+permisson update).
+Any comments are welcome and appreciated.
+
+---
+thx,likexu
+
+On 2/28/21 9:39 PM, Like Xu wrote:
+Hi Genius,
+I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
+still exist in the mainline.
+Thanks in advance to heroes who can take a look and share understanding.
+Do you have a test case that reproduces on 5.2? It'd be nice to know if
+it was still a problem in the latest source tree or not.
+--js
+The qemu main thread endlessly hangs in the handle of the qmp statement:
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
+'drive_del replication0' } }
+and we have the call trace looks like:
+#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
+timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
+sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
+#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
+__nfds=<optimized out>, __fds=<optimized out>)
+at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
+#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
+timeout=<optimized out>) at util/qemu-timer.c:348
+#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
+blocking=blocking@entry=true) at util/aio-posix.c:669
+#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
+ignore_bds_parents=false, parent=0x0, recursive=false,
+bs=0x55561138b0a0) at block/io.c:430
+#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
+parent=0x0, ignore_bds_parents=<optimized out>,
+poll=<optimized out>) at block/io.c:396
+#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
+child=0x7f36dc0ce380, errp=<optimized out>)
+at block/quorum.c:1063
+#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
+"colo-disk0", has_child=<optimized out>,
+child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
+errp=0x7ffc56c66f98) at blockdev.c:4494
+#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
+out>, ret=<optimized out>, errp=0x7ffc56c67018)
+at qapi/qapi-commands-block-core.c:1538
+#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
+allow_oob=<optimized out>, request=<optimized out>,
+cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
+#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
+out>, allow_oob=<optimized out>)
+at qapi/qmp-dispatch.c:175
+#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
+req=<optimized out>) at monitor/qmp.c:145
+#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized
+out>) at monitor/qmp.c:234
+#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
+util/async.c:117
+#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
+#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
+util/aio-posix.c:459
+#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
+callback=<optimized out>, user_data=<optimized out>)
+at util/async.c:260
+#17 0x00007f3c22302fbd in g_main_context_dispatch () from
+/lib/x86_64-linux-gnu/libglib-2.0.so.0
+#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
+#19 os_host_main_loop_wait (timeout=<optimized out>) at
+util/main-loop.c:242
+#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
+#21 0x000055560ff600fe in main_loop () at vl.c:1814
+#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized
+out>, envp=<optimized out>) at vl.c:4503
+We found that we're doing endless check in the line of
+block/io.c:bdrv_do_drained_begin():
+Â Â Â Â BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
+and it turns out that the bdrv_drain_poll() always get true from:
+- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
+- AND atomic_read(&bs->in_flight)
+
+I personally think this is a deadlock issue in the a QEMU block layer
+(as we know, we have some #FIXME comments in related codes, such as
+block permisson update).
+Any comments are welcome and appreciated.
+
+---
+thx,likexu
+
+Hi John,
+
+Thanks for your comment.
+
+On 2021/3/5 7:53, John Snow wrote:
+On 2/28/21 9:39 PM, Like Xu wrote:
+Hi Genius,
+I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
+still exist in the mainline.
+Thanks in advance to heroes who can take a look and share understanding.
+Do you have a test case that reproduces on 5.2? It'd be nice to know if it
+was still a problem in the latest source tree or not.
+We narrowed down the source of the bug, which basically came from
+the following qmp usage:
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
+'drive_del replication0' } }
+One of the test cases is the COLO usage (docs/colo-proxy.txt).
+
+This issue is sporadic,the probability may be 1/15 for a io-heavy guest.
+
+I believe it's reproducible on 5.2 and the latest tree.
+--js
+The qemu main thread endlessly hangs in the handle of the qmp statement:
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
+'drive_del replication0' } }
+and we have the call trace looks like:
+#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
+timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
+sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
+#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
+__nfds=<optimized out>, __fds=<optimized out>)
+at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
+#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
+timeout=<optimized out>) at util/qemu-timer.c:348
+#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
+blocking=blocking@entry=true) at util/aio-posix.c:669
+#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
+ignore_bds_parents=false, parent=0x0, recursive=false,
+bs=0x55561138b0a0) at block/io.c:430
+#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
+parent=0x0, ignore_bds_parents=<optimized out>,
+poll=<optimized out>) at block/io.c:396
+#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
+child=0x7f36dc0ce380, errp=<optimized out>)
+at block/quorum.c:1063
+#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
+"colo-disk0", has_child=<optimized out>,
+child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
+errp=0x7ffc56c66f98) at blockdev.c:4494
+#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
+out>, ret=<optimized out>, errp=0x7ffc56c67018)
+at qapi/qapi-commands-block-core.c:1538
+#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
+allow_oob=<optimized out>, request=<optimized out>,
+cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
+#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
+out>, allow_oob=<optimized out>)
+at qapi/qmp-dispatch.c:175
+#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
+req=<optimized out>) at monitor/qmp.c:145
+#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized
+out>) at monitor/qmp.c:234
+#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
+util/async.c:117
+#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
+#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
+util/aio-posix.c:459
+#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
+callback=<optimized out>, user_data=<optimized out>)
+at util/async.c:260
+#17 0x00007f3c22302fbd in g_main_context_dispatch () from
+/lib/x86_64-linux-gnu/libglib-2.0.so.0
+#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
+#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
+#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
+#21 0x000055560ff600fe in main_loop () at vl.c:1814
+#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized
+out>, envp=<optimized out>) at vl.c:4503
+We found that we're doing endless check in the line of
+block/io.c:bdrv_do_drained_begin():
+Â Â Â Â Â BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
+and it turns out that the bdrv_drain_poll() always get true from:
+- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
+- AND atomic_read(&bs->in_flight)
+
+I personally think this is a deadlock issue in the a QEMU block layer
+(as we know, we have some #FIXME comments in related codes, such as block
+permisson update).
+Any comments are welcome and appreciated.
+
+---
+thx,likexu
+
+On 3/4/21 10:08 PM, Like Xu wrote:
+Hi John,
+
+Thanks for your comment.
+
+On 2021/3/5 7:53, John Snow wrote:
+On 2/28/21 9:39 PM, Like Xu wrote:
+Hi Genius,
+I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
+still exist in the mainline.
+Thanks in advance to heroes who can take a look and share understanding.
+Do you have a test case that reproduces on 5.2? It'd be nice to know
+if it was still a problem in the latest source tree or not.
+We narrowed down the source of the bug, which basically came from
+the following qmp usage:
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
+'drive_del replication0' } }
+One of the test cases is the COLO usage (docs/colo-proxy.txt).
+
+This issue is sporadic,the probability may be 1/15 for a io-heavy guest.
+
+I believe it's reproducible on 5.2 and the latest tree.
+Can you please test and confirm that this is the case, and then file a
+bug report on the LP:
+https://launchpad.net/qemu
+and include:
+- The exact commit you used (current origin/master debug build would be
+the most ideal.)
+- Which QEMU binary you are using (qemu-system-x86_64?)
+- The shortest command line you are aware of that reproduces the problem
+- The host OS and kernel version
+- An updated call trace
+- Any relevant commands issued prior to the one that caused the hang; or
+detailed reproduction steps if possible.
+Thanks,
+--js
+
diff --git a/results/classifier/016/none/56309929 b/results/classifier/016/none/56309929
new file mode 100644
index 00000000..9eb7151e
--- /dev/null
+++ b/results/classifier/016/none/56309929
@@ -0,0 +1,207 @@
+kernel: 0.698
+files: 0.249
+operating system: 0.101
+semantic: 0.071
+TCG: 0.056
+debug: 0.041
+virtual: 0.024
+ppc: 0.017
+PID: 0.015
+hypervisor: 0.015
+register: 0.013
+VMM: 0.013
+x86: 0.011
+user-level: 0.007
+device: 0.007
+performance: 0.007
+network: 0.004
+architecture: 0.003
+risc-v: 0.003
+alpha: 0.003
+KVM: 0.003
+permissions: 0.002
+arm: 0.002
+peripherals: 0.002
+vnc: 0.002
+socket: 0.002
+boot: 0.002
+graphic: 0.001
+assembly: 0.001
+mistranslation: 0.001
+i386: 0.001
+
+[Qemu-devel] [BUG 2.6] Broken CONFIG_TPM?
+
+A compilation test with clang -Weverything reported this problem:
+
+config-host.h:112:20: warning: '$' in identifier
+[-Wdollar-in-identifier-extension]
+
+The line of code looks like this:
+
+#define CONFIG_TPM $(CONFIG_SOFTMMU)
+
+This is fine for Makefile code, but won't work as expected in C code.
+
+Am 28.04.2016 um 22:33 schrieb Stefan Weil:
+>
+A compilation test with clang -Weverything reported this problem:
+>
+>
+config-host.h:112:20: warning: '$' in identifier
+>
+[-Wdollar-in-identifier-extension]
+>
+>
+The line of code looks like this:
+>
+>
+#define CONFIG_TPM $(CONFIG_SOFTMMU)
+>
+>
+This is fine for Makefile code, but won't work as expected in C code.
+>
+A complete 64 bit build with clang -Weverything creates a log file of
+1.7 GB.
+Here are the uniq warnings sorted by their frequency:
+
+      1 -Wflexible-array-extensions
+      1 -Wgnu-folding-constant
+      1 -Wunknown-pragmas
+      1 -Wunknown-warning-option
+      1 -Wunreachable-code-loop-increment
+      2 -Warray-bounds-pointer-arithmetic
+      2 -Wdollar-in-identifier-extension
+      3 -Woverlength-strings
+      3 -Wweak-vtables
+      4 -Wgnu-empty-struct
+      4 -Wstring-conversion
+      6 -Wclass-varargs
+      7 -Wc99-extensions
+      7 -Wc++-compat
+      8 -Wfloat-equal
+     11 -Wformat-nonliteral
+     16 -Wshift-negative-value
+     19 -Wglobal-constructors
+     28 -Wc++11-long-long
+     29 -Wembedded-directive
+     38 -Wvla
+     40 -Wcovered-switch-default
+     40 -Wmissing-variable-declarations
+     49 -Wold-style-cast
+     53 -Wgnu-conditional-omitted-operand
+     56 -Wformat-pedantic
+     61 -Wvariadic-macros
+     77 -Wc++11-extensions
+     83 -Wgnu-flexible-array-initializer
+     83 -Wzero-length-array
+     96 -Wgnu-designator
+    102 -Wmissing-noreturn
+    103 -Wconditional-uninitialized
+    107 -Wdisabled-macro-expansion
+    115 -Wunreachable-code-return
+    134 -Wunreachable-code
+    243 -Wunreachable-code-break
+    257 -Wfloat-conversion
+    280 -Wswitch-enum
+    291 -Wpointer-arith
+    298 -Wshadow
+    378 -Wassign-enum
+    395 -Wused-but-marked-unused
+    420 -Wreserved-id-macro
+    493 -Wdocumentation
+    510 -Wshift-sign-overflow
+    565 -Wgnu-case-range
+    566 -Wgnu-zero-variadic-macro-arguments
+    650 -Wbad-function-cast
+    705 -Wmissing-field-initializers
+    817 -Wgnu-statement-expression
+    968 -Wdocumentation-unknown-command
+   1021 -Wextra-semi
+   1112 -Wgnu-empty-initializer
+   1138 -Wcast-qual
+   1509 -Wcast-align
+   1766 -Wextended-offsetof
+   1937 -Wsign-compare
+   2130 -Wpacked
+   2404 -Wunused-macros
+   3081 -Wpadded
+   4182 -Wconversion
+   5430 -Wlanguage-extension-token
+   6655 -Wshorten-64-to-32
+   6995 -Wpedantic
+   7354 -Wunused-parameter
+  27659 -Wsign-conversion
+
+Stefan Weil <address@hidden> writes:
+
+>
+A compilation test with clang -Weverything reported this problem:
+>
+>
+config-host.h:112:20: warning: '$' in identifier
+>
+[-Wdollar-in-identifier-extension]
+>
+>
+The line of code looks like this:
+>
+>
+#define CONFIG_TPM $(CONFIG_SOFTMMU)
+>
+>
+This is fine for Makefile code, but won't work as expected in C code.
+Broken in commit 3b8acc1 "configure: fix TPM logic".  Cc'ing Paolo.
+
+Impact: #ifdef CONFIG_TPM never disables code.  There are no other uses
+of CONFIG_TPM in C code.
+
+I had a quick peek at configure and create_config, but refrained from
+attempting to fix this, since I don't understand when exactly CONFIG_TPM
+should be defined.
+
+On 29 April 2016 at 08:42, Markus Armbruster <address@hidden> wrote:
+>
+Stefan Weil <address@hidden> writes:
+>
+>
+> A compilation test with clang -Weverything reported this problem:
+>
+>
+>
+> config-host.h:112:20: warning: '$' in identifier
+>
+> [-Wdollar-in-identifier-extension]
+>
+>
+>
+> The line of code looks like this:
+>
+>
+>
+> #define CONFIG_TPM $(CONFIG_SOFTMMU)
+>
+>
+>
+> This is fine for Makefile code, but won't work as expected in C code.
+>
+>
+Broken in commit 3b8acc1 "configure: fix TPM logic".  Cc'ing Paolo.
+>
+>
+Impact: #ifdef CONFIG_TPM never disables code.  There are no other uses
+>
+of CONFIG_TPM in C code.
+>
+>
+I had a quick peek at configure and create_config, but refrained from
+>
+attempting to fix this, since I don't understand when exactly CONFIG_TPM
+>
+should be defined.
+Looking at 'git blame' suggests this has been wrong like this for
+some years, so we don't need to scramble to fix it for 2.6.
+
+thanks
+-- PMM
+
diff --git a/results/classifier/016/none/65781993 b/results/classifier/016/none/65781993
new file mode 100644
index 00000000..92cd0275
--- /dev/null
+++ b/results/classifier/016/none/65781993
@@ -0,0 +1,2820 @@
+debug: 0.668
+hypervisor: 0.630
+operating system: 0.229
+socket: 0.188
+files: 0.071
+performance: 0.053
+network: 0.043
+x86: 0.027
+virtual: 0.022
+register: 0.021
+TCG: 0.019
+kernel: 0.013
+i386: 0.011
+device: 0.010
+permissions: 0.009
+alpha: 0.008
+PID: 0.008
+semantic: 0.006
+ppc: 0.006
+assembly: 0.004
+risc-v: 0.004
+user-level: 0.004
+boot: 0.003
+architecture: 0.003
+arm: 0.002
+vnc: 0.002
+VMM: 0.002
+mistranslation: 0.002
+graphic: 0.002
+peripherals: 0.001
+KVM: 0.001
+
+[Qemu-devel] 答复: Re:   答复: Re:  [BUG]COLO failover hang
+
+Thank youã
+
+I have test areadyã
+
+When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+
+Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node qemu 
+will not produce the problem,but Primary Node panic canã
+
+I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+
+
+when failover,channel_shutdown could not shut down the channel.
+
+
+so the colo_process_incoming_thread will hang at recvmsg.
+
+
+I test a patch:
+
+
+diff --git a/migration/socket.c b/migration/socket.c
+
+
+index 13966f1..d65a0ea 100644
+
+
+--- a/migration/socket.c
+
+
++++ b/migration/socket.c
+
+
+@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
+*ioc,
+
+
+     }
+
+
+ 
+
+
+     trace_migration_socket_incoming_accepted()
+
+
+    
+
+
+     qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+
+
++    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+
+
+     migration_channel_process_incoming(migrate_get_current(),
+
+
+                                        QIO_CHANNEL(sioc))
+
+
+     object_unref(OBJECT(sioc))
+
+
+
+
+My test will not hang any more.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+æéäººï¼ address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+
+
+
+
+
+Hi,Wang.
+
+You can test this branch:
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+and please follow wiki ensure your own configuration correctly.
+http://wiki.qemu-project.org/Features/COLO
+Thanks
+
+Zhang Chen
+
+
+On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼
+ï¼ hi.
+ï¼
+ï¼ I test the git qemu master have the same problem.
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, 
+ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼
+ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read 
+ï¼ (address@hidden, address@hidden "", 
+ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼
+ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, 
+ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at 
+ï¼ migration/qemu-file-channel.c:78
+ï¼
+ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at 
+ï¼ migration/qemu-file.c:295
+ï¼
+ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, 
+ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼
+ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at 
+ï¼ migration/qemu-file.c:568
+ï¼
+ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at 
+ï¼ migration/qemu-file.c:648
+ï¼
+ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, 
+ï¼ address@hidden) at migration/colo.c:244
+ï¼
+ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized 
+ï¼ outï¼, address@hidden, 
+ï¼ address@hidden)
+ï¼
+ï¼     at migration/colo.c:264
+ï¼
+ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread 
+ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼
+ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼
+ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼
+ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼
+ï¼ $3 = 0
+ï¼
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90, 
+ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼
+ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at 
+ï¼ gmain.c:3054
+ï¼
+ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼, 
+ï¼ address@hidden) at gmain.c:3630
+ï¼
+ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼
+ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at 
+ï¼ util/main-loop.c:258
+ï¼
+ï¼ #5  main_loop_wait (address@hidden) at 
+ï¼ util/main-loop.c:506
+ï¼
+ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼
+ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized 
+ï¼ outï¼) at vl.c:4709
+ï¼
+ï¼ (gdb) p ioc-ï¼features
+ï¼
+ï¼ $1 = 6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼
+ï¼
+ï¼ May be socket_accept_incoming_migration should 
+ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼
+ï¼
+ï¼ thank you.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼ address@hidden
+ï¼ address@hidden
+ï¼ address@hidden@huawei.comï¼
+ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼
+ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼
+ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ Do the colo have any plan for development?
+ï¼
+ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼
+ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼
+ï¼ In our internal version can run it successfully,
+ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ Next time if you have some question about COLO,
+ï¼ please cc me and zhanghailiang address@hidden
+ï¼
+ï¼
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ (gdb) bt
+ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
+ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ --
+ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼
+ï¼ -- 
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+
+-- 
+Thanks
+Zhang Chen
+
+Hi,
+
+On 2017/3/21 16:10, address@hidden wrote:
+Thank youã
+
+I have test areadyã
+
+When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+
+Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node qemu 
+will not produce the problem,but Primary Node panic canã
+
+I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+Yes, you are right, when we do failover for primary/secondary VM, we will 
+shutdown the related
+fd in case it is stuck in the read/write fd.
+
+It seems that you didn't follow the above introduction exactly to do the test. 
+Could you
+share your test procedures ? Especially the commands used in the test.
+
+Thanks,
+Hailiang
+when failover,channel_shutdown could not shut down the channel.
+
+
+so the colo_process_incoming_thread will hang at recvmsg.
+
+
+I test a patch:
+
+
+diff --git a/migration/socket.c b/migration/socket.c
+
+
+index 13966f1..d65a0ea 100644
+
+
+--- a/migration/socket.c
+
+
++++ b/migration/socket.c
+
+
+@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
+*ioc,
+
+
+      }
+
+
+
+
+
+      trace_migration_socket_incoming_accepted()
+
+
+
+
+
+      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+
+
++    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+
+
+      migration_channel_process_incoming(migrate_get_current(),
+
+
+                                         QIO_CHANNEL(sioc))
+
+
+      object_unref(OBJECT(sioc))
+
+
+
+
+My test will not hang any more.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+æéäººï¼ address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+
+
+
+
+
+Hi,Wang.
+
+You can test this branch:
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+and please follow wiki ensure your own configuration correctly.
+http://wiki.qemu-project.org/Features/COLO
+Thanks
+
+Zhang Chen
+
+
+On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼
+ï¼ hi.
+ï¼
+ï¼ I test the git qemu master have the same problem.
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼
+ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ (address@hidden, address@hidden "",
+ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼
+ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ migration/qemu-file-channel.c:78
+ï¼
+ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ migration/qemu-file.c:295
+ï¼
+ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼
+ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ migration/qemu-file.c:568
+ï¼
+ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ migration/qemu-file.c:648
+ï¼
+ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ address@hidden) at migration/colo.c:244
+ï¼
+ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ outï¼, address@hidden,
+ï¼ address@hidden)
+ï¼
+ï¼     at migration/colo.c:264
+ï¼
+ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼
+ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼
+ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼
+ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼
+ï¼ $3 = 0
+ï¼
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼
+ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ gmain.c:3054
+ï¼
+ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ address@hidden) at gmain.c:3630
+ï¼
+ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼
+ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ util/main-loop.c:258
+ï¼
+ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ util/main-loop.c:506
+ï¼
+ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼
+ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ outï¼) at vl.c:4709
+ï¼
+ï¼ (gdb) p ioc-ï¼features
+ï¼
+ï¼ $1 = 6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼
+ï¼
+ï¼ May be socket_accept_incoming_migration should
+ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼
+ï¼
+ï¼ thank you.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼ address@hidden
+ï¼ address@hidden
+ï¼ address@hidden@huawei.comï¼
+ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼
+ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼
+ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ Do the colo have any plan for development?
+ï¼
+ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼
+ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼
+ï¼ In our internal version can run it successfully,
+ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ Next time if you have some question about COLO,
+ï¼ please cc me and zhanghailiang address@hidden
+ï¼
+ï¼
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ (gdb) bt
+ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
+ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ --
+ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼
+ï¼ --
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+
+Hi,
+
+Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+
+Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+case COLO thread/incoming thread is stuck in read/write() while do failover,
+but it didn't take effect, because all the fd used by COLO (also migration)
+has been wrapped by qio channel, and it will not call the shutdown API if
+we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+
+Cc: Dr. David Alan Gilbert <address@hidden>
+
+I doubted migration cancel has the same problem, it may be stuck in write()
+if we tried to cancel migration.
+
+void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
+**errp)
+{
+    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
+    migration_channel_connect(s, ioc, NULL);
+    ... ...
+We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+and the
+migrate_fd_cancel()
+{
+ ... ...
+    if (s->state == MIGRATION_STATUS_CANCELLING && f) {
+        qemu_file_shutdown(f);  --> This will not take effect. No ?
+    }
+}
+
+Thanks,
+Hailiang
+
+On 2017/3/21 16:10, address@hidden wrote:
+Thank youã
+
+I have test areadyã
+
+When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+
+Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node qemu 
+will not produce the problem,but Primary Node panic canã
+
+I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+
+
+when failover,channel_shutdown could not shut down the channel.
+
+
+so the colo_process_incoming_thread will hang at recvmsg.
+
+
+I test a patch:
+
+
+diff --git a/migration/socket.c b/migration/socket.c
+
+
+index 13966f1..d65a0ea 100644
+
+
+--- a/migration/socket.c
+
+
++++ b/migration/socket.c
+
+
+@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
+*ioc,
+
+
+      }
+
+
+
+
+
+      trace_migration_socket_incoming_accepted()
+
+
+
+
+
+      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+
+
++    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+
+
+      migration_channel_process_incoming(migrate_get_current(),
+
+
+                                         QIO_CHANNEL(sioc))
+
+
+      object_unref(OBJECT(sioc))
+
+
+
+
+My test will not hang any more.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+æéäººï¼ address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+
+
+
+
+
+Hi,Wang.
+
+You can test this branch:
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+and please follow wiki ensure your own configuration correctly.
+http://wiki.qemu-project.org/Features/COLO
+Thanks
+
+Zhang Chen
+
+
+On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼
+ï¼ hi.
+ï¼
+ï¼ I test the git qemu master have the same problem.
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼
+ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ (address@hidden, address@hidden "",
+ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼
+ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ migration/qemu-file-channel.c:78
+ï¼
+ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ migration/qemu-file.c:295
+ï¼
+ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼
+ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ migration/qemu-file.c:568
+ï¼
+ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ migration/qemu-file.c:648
+ï¼
+ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ address@hidden) at migration/colo.c:244
+ï¼
+ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ outï¼, address@hidden,
+ï¼ address@hidden)
+ï¼
+ï¼     at migration/colo.c:264
+ï¼
+ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼
+ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼
+ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼
+ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼
+ï¼ $3 = 0
+ï¼
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼
+ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ gmain.c:3054
+ï¼
+ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ address@hidden) at gmain.c:3630
+ï¼
+ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼
+ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ util/main-loop.c:258
+ï¼
+ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ util/main-loop.c:506
+ï¼
+ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼
+ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ outï¼) at vl.c:4709
+ï¼
+ï¼ (gdb) p ioc-ï¼features
+ï¼
+ï¼ $1 = 6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼
+ï¼
+ï¼ May be socket_accept_incoming_migration should
+ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼
+ï¼
+ï¼ thank you.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼ address@hidden
+ï¼ address@hidden
+ï¼ address@hidden@huawei.comï¼
+ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼
+ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼
+ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ Do the colo have any plan for development?
+ï¼
+ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼
+ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼
+ï¼ In our internal version can run it successfully,
+ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ Next time if you have some question about COLO,
+ï¼ please cc me and zhanghailiang address@hidden
+ï¼
+ï¼
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ (gdb) bt
+ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
+ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ --
+ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼
+ï¼ --
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+
+* Hailiang Zhang (address@hidden) wrote:
+>
+Hi,
+>
+>
+Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+>
+>
+Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+>
+case COLO thread/incoming thread is stuck in read/write() while do failover,
+>
+but it didn't take effect, because all the fd used by COLO (also migration)
+>
+has been wrapped by qio channel, and it will not call the shutdown API if
+>
+we didn't qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+>
+>
+Cc: Dr. David Alan Gilbert <address@hidden>
+>
+>
+I doubted migration cancel has the same problem, it may be stuck in write()
+>
+if we tried to cancel migration.
+>
+>
+void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error
+>
+**errp)
+>
+{
+>
+qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
+>
+migration_channel_connect(s, ioc, NULL);
+>
+... ...
+>
+We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+>
+and the
+>
+migrate_fd_cancel()
+>
+{
+>
+... ...
+>
+if (s->state == MIGRATION_STATUS_CANCELLING && f) {
+>
+qemu_file_shutdown(f);  --> This will not take effect. No ?
+>
+}
+>
+}
+(cc'd in Daniel Berrange).
+I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); 
+at the
+top of qio_channel_socket_new;  so I think that's safe isn't it?
+
+Dave
+
+>
+Thanks,
+>
+Hailiang
+>
+>
+On 2017/3/21 16:10, address@hidden wrote:
+>
+> Thank youã
+>
+>
+>
+> I have test areadyã
+>
+>
+>
+> When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+>
+>
+>
+> Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node
+>
+> qemu will not produce the problem,but Primary Node panic canã
+>
+>
+>
+> I think due to the feature of channel does not support
+>
+> QIO_CHANNEL_FEATURE_SHUTDOWN.
+>
+>
+>
+>
+>
+> when failover,channel_shutdown could not shut down the channel.
+>
+>
+>
+>
+>
+> so the colo_process_incoming_thread will hang at recvmsg.
+>
+>
+>
+>
+>
+> I test a patch:
+>
+>
+>
+>
+>
+> diff --git a/migration/socket.c b/migration/socket.c
+>
+>
+>
+>
+>
+> index 13966f1..d65a0ea 100644
+>
+>
+>
+>
+>
+> --- a/migration/socket.c
+>
+>
+>
+>
+>
+> +++ b/migration/socket.c
+>
+>
+>
+>
+>
+> @@ -147,8 +147,9 @@ static gboolean
+>
+> socket_accept_incoming_migration(QIOChannel *ioc,
+>
+>
+>
+>
+>
+>       }
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>       trace_migration_socket_incoming_accepted()
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+>
+>
+>
+>
+>
+> +    qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+> QIO_CHANNEL_FEATURE_SHUTDOWN)
+>
+>
+>
+>
+>
+>       migration_channel_process_incoming(migrate_get_current(),
+>
+>
+>
+>
+>
+>                                          QIO_CHANNEL(sioc))
+>
+>
+>
+>
+>
+>       object_unref(OBJECT(sioc))
+>
+>
+>
+>
+>
+>
+>
+>
+>
+> My test will not hang any more.
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+> åå§é®ä»¶
+>
+>
+>
+>
+>
+>
+>
+> åä»¶äººï¼ address@hidden
+>
+> æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+>
+> æéäººï¼ address@hidden address@hidden
+>
+> æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+>
+> ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+> Hi,Wang.
+>
+>
+>
+> You can test this branch:
+>
+>
+>
+>
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+>
+>
+>
+> and please follow wiki ensure your own configuration correctly.
+>
+>
+>
+>
+http://wiki.qemu-project.org/Features/COLO
+>
+>
+>
+>
+>
+> Thanks
+>
+>
+>
+> Zhang Chen
+>
+>
+>
+>
+>
+> On 03/21/2017 03:27 PM, address@hidden wrote:
+>
+> ï¼
+>
+> ï¼ hi.
+>
+> ï¼
+>
+> ï¼ I test the git qemu master have the same problem.
+>
+> ï¼
+>
+> ï¼ (gdb) bt
+>
+> ï¼
+>
+> ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+>
+> ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+>
+> ï¼
+>
+> ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+>
+> ï¼ (address@hidden, address@hidden "",
+>
+> ï¼ address@hidden, address@hidden) at io/channel.c:114
+>
+> ï¼
+>
+> ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+>
+> ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+>
+> ï¼ migration/qemu-file-channel.c:78
+>
+> ï¼
+>
+> ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+>
+> ï¼ migration/qemu-file.c:295
+>
+> ï¼
+>
+> ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+>
+> ï¼ address@hidden) at migration/qemu-file.c:555
+>
+> ï¼
+>
+> ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+>
+> ï¼ migration/qemu-file.c:568
+>
+> ï¼
+>
+> ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+>
+> ï¼ migration/qemu-file.c:648
+>
+> ï¼
+>
+> ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+>
+> ï¼ address@hidden) at migration/colo.c:244
+>
+> ï¼
+>
+> ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+>
+> ï¼ outï¼, address@hidden,
+>
+> ï¼ address@hidden)
+>
+> ï¼
+>
+> ï¼     at migration/colo.c:264
+>
+> ï¼
+>
+> ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+>
+> ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+>
+> ï¼
+>
+> ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+>
+> ï¼
+>
+> ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+>
+> ï¼
+>
+> ï¼ (gdb) p ioc-ï¼name
+>
+> ï¼
+>
+> ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+>
+> ï¼
+>
+> ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+>
+> ï¼
+>
+> ï¼ $3 = 0
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ (gdb) bt
+>
+> ï¼
+>
+> ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+>
+> ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+>
+> ï¼
+>
+> ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+>
+> ï¼ gmain.c:3054
+>
+> ï¼
+>
+> ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+>
+> ï¼ address@hidden) at gmain.c:3630
+>
+> ï¼
+>
+> ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+>
+> ï¼
+>
+> ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+>
+> ï¼ util/main-loop.c:258
+>
+> ï¼
+>
+> ï¼ #5  main_loop_wait (address@hidden) at
+>
+> ï¼ util/main-loop.c:506
+>
+> ï¼
+>
+> ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+>
+> ï¼
+>
+> ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+>
+> ï¼ outï¼) at vl.c:4709
+>
+> ï¼
+>
+> ï¼ (gdb) p ioc-ï¼features
+>
+> ï¼
+>
+> ï¼ $1 = 6
+>
+> ï¼
+>
+> ï¼ (gdb) p ioc-ï¼name
+>
+> ï¼
+>
+> ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ May be socket_accept_incoming_migration should
+>
+> ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ thank you.
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ åå§é®ä»¶
+>
+> ï¼ address@hidden
+>
+> ï¼ address@hidden
+>
+> ï¼ address@hidden@huawei.comï¼
+>
+> ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+>
+> ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+>
+> ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+>
+> ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+>
+> ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+>
+> ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+>
+> ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+>
+> ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼ I found that the colo in qemu is not complete yet.
+>
+> ï¼ ï¼ Do the colo have any plan for development?
+>
+> ï¼
+>
+> ï¼ Yes, We are developing. You can see some of patch we pushing.
+>
+> ï¼
+>
+> ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+>
+> ï¼
+>
+> ï¼ In our internal version can run it successfully,
+>
+> ï¼ The failover detail you can ask Zhanghailiang for help.
+>
+> ï¼ Next time if you have some question about COLO,
+>
+> ï¼ please cc me and zhanghailiang address@hidden
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ Thanks
+>
+> ï¼ Zhang Chen
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼ centos7.2+qemu2.7.50
+>
+> ï¼ ï¼ (gdb) bt
+>
+> ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+>
+> ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+>
+> ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0)
+>
+> at
+>
+> ï¼ ï¼ io/channel-socket.c:497
+>
+> ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+>
+> ï¼ ï¼ address@hidden "", address@hidden,
+>
+> ï¼ ï¼ address@hidden) at io/channel.c:97
+>
+> ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+>
+> ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+>
+> ï¼ ï¼ migration/qemu-file-channel.c:78
+>
+> ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+>
+> ï¼ ï¼ migration/qemu-file.c:257
+>
+> ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+>
+> ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+>
+> ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+>
+> ï¼ ï¼ migration/qemu-file.c:523
+>
+> ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+>
+> ï¼ ï¼ migration/qemu-file.c:603
+>
+> ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+>
+> ï¼ ï¼ address@hidden) at migration/colo.c:215
+>
+> ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+>
+> ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+>
+> ï¼ ï¼ migration/colo.c:546
+>
+> ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+>
+> ï¼ ï¼ migration/colo.c:649
+>
+> ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+>
+> ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼ --
+>
+> ï¼ ï¼ View this message in context:
+>
+>
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+>
+> ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼
+>
+> ï¼ --
+>
+> ï¼ Thanks
+>
+> ï¼ Zhang Chen
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+>
+>
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
+* Hailiang Zhang (address@hidden) wrote:
+Hi,
+
+Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+
+Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+case COLO thread/incoming thread is stuck in read/write() while do failover,
+but it didn't take effect, because all the fd used by COLO (also migration)
+has been wrapped by qio channel, and it will not call the shutdown API if
+we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+
+Cc: Dr. David Alan Gilbert <address@hidden>
+
+I doubted migration cancel has the same problem, it may be stuck in write()
+if we tried to cancel migration.
+
+void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
+**errp)
+{
+     qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
+     migration_channel_connect(s, ioc, NULL);
+     ... ...
+We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+and the
+migrate_fd_cancel()
+{
+  ... ...
+     if (s->state == MIGRATION_STATUS_CANCELLING && f) {
+         qemu_file_shutdown(f);  --> This will not take effect. No ?
+     }
+}
+(cc'd in Daniel Berrange).
+I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); 
+at the
+top of qio_channel_socket_new;  so I think that's safe isn't it?
+Hmm, you are right, this problem is only exist for the migration incoming fd, 
+thanks.
+Dave
+Thanks,
+Hailiang
+
+On 2017/3/21 16:10, address@hidden wrote:
+Thank youã
+
+I have test areadyã
+
+When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+
+Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node qemu 
+will not produce the problem,but Primary Node panic canã
+
+I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+
+
+when failover,channel_shutdown could not shut down the channel.
+
+
+so the colo_process_incoming_thread will hang at recvmsg.
+
+
+I test a patch:
+
+
+diff --git a/migration/socket.c b/migration/socket.c
+
+
+index 13966f1..d65a0ea 100644
+
+
+--- a/migration/socket.c
+
+
++++ b/migration/socket.c
+
+
+@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
+*ioc,
+
+
+       }
+
+
+
+
+
+       trace_migration_socket_incoming_accepted()
+
+
+
+
+
+       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+
+
++    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+
+
+       migration_channel_process_incoming(migrate_get_current(),
+
+
+                                          QIO_CHANNEL(sioc))
+
+
+       object_unref(OBJECT(sioc))
+
+
+
+
+My test will not hang any more.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+æéäººï¼ address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+
+
+
+
+
+Hi,Wang.
+
+You can test this branch:
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+and please follow wiki ensure your own configuration correctly.
+http://wiki.qemu-project.org/Features/COLO
+Thanks
+
+Zhang Chen
+
+
+On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼
+ï¼ hi.
+ï¼
+ï¼ I test the git qemu master have the same problem.
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼
+ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ (address@hidden, address@hidden "",
+ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼
+ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ migration/qemu-file-channel.c:78
+ï¼
+ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ migration/qemu-file.c:295
+ï¼
+ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼
+ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ migration/qemu-file.c:568
+ï¼
+ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ migration/qemu-file.c:648
+ï¼
+ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ address@hidden) at migration/colo.c:244
+ï¼
+ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ outï¼, address@hidden,
+ï¼ address@hidden)
+ï¼
+ï¼     at migration/colo.c:264
+ï¼
+ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼
+ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼
+ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼
+ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼
+ï¼ $3 = 0
+ï¼
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼
+ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ gmain.c:3054
+ï¼
+ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ address@hidden) at gmain.c:3630
+ï¼
+ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼
+ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ util/main-loop.c:258
+ï¼
+ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ util/main-loop.c:506
+ï¼
+ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼
+ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ outï¼) at vl.c:4709
+ï¼
+ï¼ (gdb) p ioc-ï¼features
+ï¼
+ï¼ $1 = 6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼
+ï¼
+ï¼ May be socket_accept_incoming_migration should
+ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼
+ï¼
+ï¼ thank you.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼ address@hidden
+ï¼ address@hidden
+ï¼ address@hidden@huawei.comï¼
+ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼
+ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼
+ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ Do the colo have any plan for development?
+ï¼
+ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼
+ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼
+ï¼ In our internal version can run it successfully,
+ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ Next time if you have some question about COLO,
+ï¼ please cc me and zhanghailiang address@hidden
+ï¼
+ï¼
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ (gdb) bt
+ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
+ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ --
+ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼
+ï¼ --
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+.
+
+* Hailiang Zhang (address@hidden) wrote:
+>
+On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
+>
+> * Hailiang Zhang (address@hidden) wrote:
+>
+> > Hi,
+>
+> >
+>
+> > Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+>
+> >
+>
+> > Though we tried to call qemu_file_shutdown() to shutdown the related fd,
+>
+> > in
+>
+> > case COLO thread/incoming thread is stuck in read/write() while do
+>
+> > failover,
+>
+> > but it didn't take effect, because all the fd used by COLO (also
+>
+> > migration)
+>
+> > has been wrapped by qio channel, and it will not call the shutdown API if
+>
+> > we didn't qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+> > QIO_CHANNEL_FEATURE_SHUTDOWN).
+>
+> >
+>
+> > Cc: Dr. David Alan Gilbert <address@hidden>
+>
+> >
+>
+> > I doubted migration cancel has the same problem, it may be stuck in
+>
+> > write()
+>
+> > if we tried to cancel migration.
+>
+> >
+>
+> > void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
+>
+> > Error **errp)
+>
+> > {
+>
+> >      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
+>
+> >      migration_channel_connect(s, ioc, NULL);
+>
+> >      ... ...
+>
+> > We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+> > QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+>
+> > and the
+>
+> > migrate_fd_cancel()
+>
+> > {
+>
+> >   ... ...
+>
+> >      if (s->state == MIGRATION_STATUS_CANCELLING && f) {
+>
+> >          qemu_file_shutdown(f);  --> This will not take effect. No ?
+>
+> >      }
+>
+> > }
+>
+>
+>
+> (cc'd in Daniel Berrange).
+>
+> I see that we call qio_channel_set_feature(ioc,
+>
+> QIO_CHANNEL_FEATURE_SHUTDOWN); at the
+>
+> top of qio_channel_socket_new;  so I think that's safe isn't it?
+>
+>
+>
+>
+Hmm, you are right, this problem is only exist for the migration incoming fd,
+>
+thanks.
+Yes, and I don't think we normally do a cancel on the incoming side of a 
+migration.
+
+Dave
+
+>
+> Dave
+>
+>
+>
+> > Thanks,
+>
+> > Hailiang
+>
+> >
+>
+> > On 2017/3/21 16:10, address@hidden wrote:
+>
+> > > Thank youã
+>
+> > >
+>
+> > > I have test areadyã
+>
+> > >
+>
+> > > When the Primary Node panic,the Secondary Node qemu hang at the same
+>
+> > > placeã
+>
+> > >
+>
+> > > Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary
+>
+> > > Node qemu will not produce the problem,but Primary Node panic canã
+>
+> > >
+>
+> > > I think due to the feature of channel does not support
+>
+> > > QIO_CHANNEL_FEATURE_SHUTDOWN.
+>
+> > >
+>
+> > >
+>
+> > > when failover,channel_shutdown could not shut down the channel.
+>
+> > >
+>
+> > >
+>
+> > > so the colo_process_incoming_thread will hang at recvmsg.
+>
+> > >
+>
+> > >
+>
+> > > I test a patch:
+>
+> > >
+>
+> > >
+>
+> > > diff --git a/migration/socket.c b/migration/socket.c
+>
+> > >
+>
+> > >
+>
+> > > index 13966f1..d65a0ea 100644
+>
+> > >
+>
+> > >
+>
+> > > --- a/migration/socket.c
+>
+> > >
+>
+> > >
+>
+> > > +++ b/migration/socket.c
+>
+> > >
+>
+> > >
+>
+> > > @@ -147,8 +147,9 @@ static gboolean
+>
+> > > socket_accept_incoming_migration(QIOChannel *ioc,
+>
+> > >
+>
+> > >
+>
+> > >        }
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >        trace_migration_socket_incoming_accepted()
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >        qio_channel_set_name(QIO_CHANNEL(sioc),
+>
+> > > "migration-socket-incoming")
+>
+> > >
+>
+> > >
+>
+> > > +    qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+> > > QIO_CHANNEL_FEATURE_SHUTDOWN)
+>
+> > >
+>
+> > >
+>
+> > >        migration_channel_process_incoming(migrate_get_current(),
+>
+> > >
+>
+> > >
+>
+> > >                                           QIO_CHANNEL(sioc))
+>
+> > >
+>
+> > >
+>
+> > >        object_unref(OBJECT(sioc))
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > My test will not hang any more.
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > åå§é®ä»¶
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > åä»¶äººï¼ address@hidden
+>
+> > > æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+>
+> > > æéäººï¼ address@hidden address@hidden
+>
+> > > æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+>
+> > > ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > Hi,Wang.
+>
+> > >
+>
+> > > You can test this branch:
+>
+> > >
+>
+> > >
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+>
+> > >
+>
+> > > and please follow wiki ensure your own configuration correctly.
+>
+> > >
+>
+> > >
+http://wiki.qemu-project.org/Features/COLO
+>
+> > >
+>
+> > >
+>
+> > > Thanks
+>
+> > >
+>
+> > > Zhang Chen
+>
+> > >
+>
+> > >
+>
+> > > On 03/21/2017 03:27 PM, address@hidden wrote:
+>
+> > > ï¼
+>
+> > > ï¼ hi.
+>
+> > > ï¼
+>
+> > > ï¼ I test the git qemu master have the same problem.
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) bt
+>
+> > > ï¼
+>
+> > > ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+>
+> > > ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+>
+> > > ï¼
+>
+> > > ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+>
+> > > ï¼ (address@hidden, address@hidden "",
+>
+> > > ï¼ address@hidden, address@hidden) at io/channel.c:114
+>
+> > > ï¼
+>
+> > > ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+>
+> > > ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+>
+> > > ï¼ migration/qemu-file-channel.c:78
+>
+> > > ï¼
+>
+> > > ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+>
+> > > ï¼ migration/qemu-file.c:295
+>
+> > > ï¼
+>
+> > > ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+>
+> > > ï¼ address@hidden) at migration/qemu-file.c:555
+>
+> > > ï¼
+>
+> > > ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+>
+> > > ï¼ migration/qemu-file.c:568
+>
+> > > ï¼
+>
+> > > ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+>
+> > > ï¼ migration/qemu-file.c:648
+>
+> > > ï¼
+>
+> > > ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+>
+> > > ï¼ address@hidden) at migration/colo.c:244
+>
+> > > ï¼
+>
+> > > ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+>
+> > > ï¼ outï¼, address@hidden,
+>
+> > > ï¼ address@hidden)
+>
+> > > ï¼
+>
+> > > ï¼     at migration/colo.c:264
+>
+> > > ï¼
+>
+> > > ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+>
+> > > ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+>
+> > > ï¼
+>
+> > > ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+>
+> > > ï¼
+>
+> > > ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) p ioc-ï¼name
+>
+> > > ï¼
+>
+> > > ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) p ioc-ï¼features        Do not support
+>
+> > > QIO_CHANNEL_FEATURE_SHUTDOWN
+>
+> > > ï¼
+>
+> > > ï¼ $3 = 0
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) bt
+>
+> > > ï¼
+>
+> > > ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+>
+> > > ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+>
+> > > ï¼
+>
+> > > ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+>
+> > > ï¼ gmain.c:3054
+>
+> > > ï¼
+>
+> > > ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+>
+> > > ï¼ address@hidden) at gmain.c:3630
+>
+> > > ï¼
+>
+> > > ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+>
+> > > ï¼
+>
+> > > ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+>
+> > > ï¼ util/main-loop.c:258
+>
+> > > ï¼
+>
+> > > ï¼ #5  main_loop_wait (address@hidden) at
+>
+> > > ï¼ util/main-loop.c:506
+>
+> > > ï¼
+>
+> > > ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+>
+> > > ï¼
+>
+> > > ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+>
+> > > ï¼ outï¼) at vl.c:4709
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) p ioc-ï¼features
+>
+> > > ï¼
+>
+> > > ï¼ $1 = 6
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) p ioc-ï¼name
+>
+> > > ï¼
+>
+> > > ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ May be socket_accept_incoming_migration should
+>
+> > > ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ thank you.
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ åå§é®ä»¶
+>
+> > > ï¼ address@hidden
+>
+> > > ï¼ address@hidden
+>
+> > > ï¼ address@hidden@huawei.comï¼
+>
+> > > ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+>
+> > > ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+>
+> > > ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+>
+> > > ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+>
+> > > ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+>
+> > > ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+>
+> > > ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+>
+> > > ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼ I found that the colo in qemu is not complete yet.
+>
+> > > ï¼ ï¼ Do the colo have any plan for development?
+>
+> > > ï¼
+>
+> > > ï¼ Yes, We are developing. You can see some of patch we pushing.
+>
+> > > ï¼
+>
+> > > ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+>
+> > > ï¼
+>
+> > > ï¼ In our internal version can run it successfully,
+>
+> > > ï¼ The failover detail you can ask Zhanghailiang for help.
+>
+> > > ï¼ Next time if you have some question about COLO,
+>
+> > > ï¼ please cc me and zhanghailiang address@hidden
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ Thanks
+>
+> > > ï¼ Zhang Chen
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼ centos7.2+qemu2.7.50
+>
+> > > ï¼ ï¼ (gdb) bt
+>
+> > > ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+>
+> > > ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized
+>
+> > > outï¼,
+>
+> > > ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0,
+>
+> > > errp=0x0) at
+>
+> > > ï¼ ï¼ io/channel-socket.c:497
+>
+> > > ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+>
+> > > ï¼ ï¼ address@hidden "", address@hidden,
+>
+> > > ï¼ ï¼ address@hidden) at io/channel.c:97
+>
+> > > ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized
+>
+> > > outï¼,
+>
+> > > ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+>
+> > > ï¼ ï¼ migration/qemu-file-channel.c:78
+>
+> > > ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+>
+> > > ï¼ ï¼ migration/qemu-file.c:257
+>
+> > > ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+>
+> > > ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+>
+> > > ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+>
+> > > ï¼ ï¼ migration/qemu-file.c:523
+>
+> > > ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+>
+> > > ï¼ ï¼ migration/qemu-file.c:603
+>
+> > > ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+>
+> > > ï¼ ï¼ address@hidden) at migration/colo.c:215
+>
+> > > ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message
+>
+> > > (errp=0x7f3d62bfaa48,
+>
+> > > ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+>
+> > > ï¼ ï¼ migration/colo.c:546
+>
+> > > ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+>
+> > > ï¼ ï¼ migration/colo.c:649
+>
+> > > ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from
+>
+> > > /lib64/libpthread.so.0
+>
+> > > ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼ --
+>
+> > > ï¼ ï¼ View this message in context:
+>
+> > >
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+>
+> > > ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼
+>
+> > > ï¼ --
+>
+> > > ï¼ Thanks
+>
+> > > ï¼ Zhang Chen
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > >
+>
+> >
+>
+> --
+>
+> Dr. David Alan Gilbert / address@hidden / Manchester, UK
+>
+>
+>
+> .
+>
+>
+>
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
diff --git a/results/classifier/016/none/70294255 b/results/classifier/016/none/70294255
new file mode 100644
index 00000000..286c1cc4
--- /dev/null
+++ b/results/classifier/016/none/70294255
@@ -0,0 +1,1088 @@
+socket: 0.753
+debug: 0.454
+network: 0.316
+operating system: 0.278
+files: 0.213
+hypervisor: 0.060
+virtual: 0.046
+kernel: 0.045
+performance: 0.044
+i386: 0.043
+alpha: 0.037
+TCG: 0.036
+permissions: 0.032
+device: 0.025
+x86: 0.024
+PID: 0.021
+boot: 0.014
+KVM: 0.013
+semantic: 0.011
+risc-v: 0.010
+VMM: 0.010
+register: 0.009
+assembly: 0.009
+architecture: 0.008
+arm: 0.007
+vnc: 0.006
+ppc: 0.006
+peripherals: 0.005
+user-level: 0.004
+graphic: 0.003
+mistranslation: 0.002
+
+[Qemu-devel] 答复: Re:   答复: Re:  答复: Re: 答复: Re: [BUG]COLO failover hang
+
+hi:
+
+yes.it is better.
+
+And should we delete 
+
+
+
+
+#ifdef WIN32
+
+    QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL)
+
+#endif
+
+
+
+
+in qio_channel_socket_acceptï¼
+
+qio_channel_socket_new already have it.
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992
+æéäººï¼ address@hidden address@hidden address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: çå¤: Re: [BUG]COLO failover hang
+
+
+
+
+
+Hi,
+
+On 2017/3/22 9:42, address@hidden wrote:
+ï¼ diff --git a/migration/socket.c b/migration/socket.c
+ï¼
+ï¼
+ï¼ index 13966f1..d65a0ea 100644
+ï¼
+ï¼
+ï¼ --- a/migration/socket.c
+ï¼
+ï¼
+ï¼ +++ b/migration/socket.c
+ï¼
+ï¼
+ï¼ @@ -147,8 +147,9 @@ static gboolean 
+socket_accept_incoming_migration(QIOChannel *ioc,
+ï¼
+ï¼
+ï¼       }
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼       trace_migration_socket_incoming_accepted()
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+ï¼
+ï¼
+ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+ï¼
+ï¼
+ï¼       migration_channel_process_incoming(migrate_get_current(),
+ï¼
+ï¼
+ï¼                                          QIO_CHANNEL(sioc))
+ï¼
+ï¼
+ï¼       object_unref(OBJECT(sioc))
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ Is this patch ok?
+ï¼
+
+Yes, i think this works, but a better way maybe to call 
+qio_channel_set_feature()
+in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
+socket accept fd,
+Or fix it by this:
+
+diff --git a/io/channel-socket.c b/io/channel-socket.c
+index f546c68..ce6894c 100644
+--- a/io/channel-socket.c
++++ b/io/channel-socket.c
+@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
+                            Error **errp)
+  {
+      QIOChannelSocket *cioc
+-
+-    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
+-    cioc-ï¼fd = -1
++
++    cioc = qio_channel_socket_new()
+      cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr)
+      cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr)
+
+
+Thanks,
+Hailiang
+
+ï¼ I have test it . The test could not hang any more.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼
+ï¼
+ï¼
+ï¼ åä»¶äººï¼ address@hidden
+ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden
+ï¼ æéäººï¼ address@hidden address@hidden address@hidden
+ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
+ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: [BUG]COLO failover hang
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
+ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote:
+ï¼ ï¼ï¼ Hi,
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do 
+failover,
+ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
+ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
+ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
+ï¼ ï¼ï¼ if we tried to cancel migration.
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
+Error **errp)
+ï¼ ï¼ï¼ {
+ï¼ ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
+ï¼ ï¼ï¼      migration_channel_connect(s, ioc, NULL)
+ï¼ ï¼ï¼      ... ...
+ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+ï¼ ï¼ï¼ and the
+ï¼ ï¼ï¼ migrate_fd_cancel()
+ï¼ ï¼ï¼ {
+ï¼ ï¼ï¼   ... ...
+ï¼ ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
+ï¼ ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
+ï¼ ï¼ï¼      }
+ï¼ ï¼ï¼ }
+ï¼ ï¼
+ï¼ ï¼ (cc'd in Daniel Berrange).
+ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, 
+QIO_CHANNEL_FEATURE_SHUTDOWN) at the
+ï¼ ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
+ï¼ ï¼
+ï¼
+ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, 
+thanks.
+ï¼
+ï¼ ï¼ Dave
+ï¼ ï¼
+ï¼ ï¼ï¼ Thanks,
+ï¼ ï¼ï¼ Hailiang
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
+ï¼ ï¼ï¼ï¼ Thank youã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I have test areadyã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same 
+placeã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node 
+qemu will not produce the problem,but Primary Node panic canã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I test a patch:
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ --- a/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
+socket_accept_incoming_migration(QIOChannel *ioc,
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        }
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), 
+"migration-socket-incoming")
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN)
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        object_unref(OBJECT(sioc))
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ My test will not hang any more.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ åå§é®ä»¶
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
+ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
+ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ï¼ ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Hi,Wang.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ You can test this branch:
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+http://wiki.qemu-project.org/Features/COLO
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Thanks
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ hi.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden)
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼     at migration/colo.c:264
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $3 = 0
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $1 = 6
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
+ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ thank you.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
+ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
+ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
+ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ Thanks
+ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized 
+outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, 
+errp=0x0) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message 
+(errp=0x7f3d62bfaa48,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ --
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ --
+ï¼ ï¼ï¼ï¼ ï¼ Thanks
+ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼
+ï¼ ï¼ --
+ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
+ï¼ ï¼
+ï¼ ï¼ .
+ï¼ ï¼
+ï¼
+
+On 2017/3/22 16:09, address@hidden wrote:
+hi:
+
+yes.it is better.
+
+And should we delete
+Yes, you are right.
+#ifdef WIN32
+
+     QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL)
+
+#endif
+
+
+
+
+in qio_channel_socket_acceptï¼
+
+qio_channel_socket_new already have it.
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992
+æéäººï¼ address@hidden address@hidden address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: çå¤: Re: [BUG]COLO failover hang
+
+
+
+
+
+Hi,
+
+On 2017/3/22 9:42, address@hidden wrote:
+ï¼ diff --git a/migration/socket.c b/migration/socket.c
+ï¼
+ï¼
+ï¼ index 13966f1..d65a0ea 100644
+ï¼
+ï¼
+ï¼ --- a/migration/socket.c
+ï¼
+ï¼
+ï¼ +++ b/migration/socket.c
+ï¼
+ï¼
+ï¼ @@ -147,8 +147,9 @@ static gboolean 
+socket_accept_incoming_migration(QIOChannel *ioc,
+ï¼
+ï¼
+ï¼       }
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼       trace_migration_socket_incoming_accepted()
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+ï¼
+ï¼
+ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+ï¼
+ï¼
+ï¼       migration_channel_process_incoming(migrate_get_current(),
+ï¼
+ï¼
+ï¼                                          QIO_CHANNEL(sioc))
+ï¼
+ï¼
+ï¼       object_unref(OBJECT(sioc))
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ Is this patch ok?
+ï¼
+
+Yes, i think this works, but a better way maybe to call 
+qio_channel_set_feature()
+in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
+socket accept fd,
+Or fix it by this:
+
+diff --git a/io/channel-socket.c b/io/channel-socket.c
+index f546c68..ce6894c 100644
+--- a/io/channel-socket.c
++++ b/io/channel-socket.c
+@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
+                             Error **errp)
+   {
+       QIOChannelSocket *cioc
+-
+-    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
+-    cioc-ï¼fd = -1
++
++    cioc = qio_channel_socket_new()
+       cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr)
+       cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr)
+
+
+Thanks,
+Hailiang
+
+ï¼ I have test it . The test could not hang any more.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼
+ï¼
+ï¼
+ï¼ åä»¶äººï¼ address@hidden
+ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden
+ï¼ æéäººï¼ address@hidden address@hidden address@hidden
+ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
+ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: [BUG]COLO failover hang
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
+ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote:
+ï¼ ï¼ï¼ Hi,
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do 
+failover,
+ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
+ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
+ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
+ï¼ ï¼ï¼ if we tried to cancel migration.
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
+Error **errp)
+ï¼ ï¼ï¼ {
+ï¼ ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
+ï¼ ï¼ï¼      migration_channel_connect(s, ioc, NULL)
+ï¼ ï¼ï¼      ... ...
+ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+ï¼ ï¼ï¼ and the
+ï¼ ï¼ï¼ migrate_fd_cancel()
+ï¼ ï¼ï¼ {
+ï¼ ï¼ï¼   ... ...
+ï¼ ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
+ï¼ ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
+ï¼ ï¼ï¼      }
+ï¼ ï¼ï¼ }
+ï¼ ï¼
+ï¼ ï¼ (cc'd in Daniel Berrange).
+ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, 
+QIO_CHANNEL_FEATURE_SHUTDOWN) at the
+ï¼ ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
+ï¼ ï¼
+ï¼
+ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, 
+thanks.
+ï¼
+ï¼ ï¼ Dave
+ï¼ ï¼
+ï¼ ï¼ï¼ Thanks,
+ï¼ ï¼ï¼ Hailiang
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
+ï¼ ï¼ï¼ï¼ Thank youã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I have test areadyã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same 
+placeã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node 
+qemu will not produce the problem,but Primary Node panic canã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I test a patch:
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ --- a/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
+socket_accept_incoming_migration(QIOChannel *ioc,
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        }
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), 
+"migration-socket-incoming")
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN)
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        object_unref(OBJECT(sioc))
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ My test will not hang any more.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ åå§é®ä»¶
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
+ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
+ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ï¼ ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Hi,Wang.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ You can test this branch:
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+http://wiki.qemu-project.org/Features/COLO
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Thanks
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ hi.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden)
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼     at migration/colo.c:264
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $3 = 0
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $1 = 6
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
+ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ thank you.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
+ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
+ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
+ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ Thanks
+ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized 
+outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, 
+errp=0x0) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message 
+(errp=0x7f3d62bfaa48,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ --
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ --
+ï¼ ï¼ï¼ï¼ ï¼ Thanks
+ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼
+ï¼ ï¼ --
+ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
+ï¼ ï¼
+ï¼ ï¼ .
+ï¼ ï¼
+ï¼
+
diff --git a/results/classifier/016/none/70868267 b/results/classifier/016/none/70868267
new file mode 100644
index 00000000..3f50c2ef
--- /dev/null
+++ b/results/classifier/016/none/70868267
@@ -0,0 +1,67 @@
+x86: 0.245
+operating system: 0.079
+files: 0.026
+hypervisor: 0.023
+TCG: 0.023
+debug: 0.020
+network: 0.019
+PID: 0.018
+i386: 0.011
+virtual: 0.008
+register: 0.006
+user-level: 0.004
+ppc: 0.003
+semantic: 0.003
+device: 0.002
+socket: 0.002
+assembly: 0.002
+VMM: 0.002
+kernel: 0.002
+performance: 0.001
+arm: 0.001
+alpha: 0.001
+graphic: 0.001
+vnc: 0.001
+peripherals: 0.001
+architecture: 0.001
+boot: 0.001
+risc-v: 0.001
+permissions: 0.000
+KVM: 0.000
+mistranslation: 0.000
+
+[Qemu-devel] [BUG] Failed to compile using gcc7.1
+
+Hi all,
+
+After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc.
+
+The error is:
+
+------
+  CC      block/blkdebug.o
+block/blkdebug.c: In function 'blkdebug_refresh_filename':
+block/blkdebug.c:693:31: error: '%s' directive output may be truncated
+writing up to 4095 bytes into a region of size 4086
+[-Werror=format-truncation=]
+"blkdebug:%s:%s", s->config_file ?: "",
+                               ^~
+In file included from /usr/include/stdio.h:939:0,
+                 from /home/adam/qemu/include/qemu/osdep.h:68,
+                 from block/blkdebug.c:25:
+/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk'
+output 11 or more bytes (assuming 4106) into a destination of size 4096
+return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
+          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+        __bos (__s), __fmt, __va_arg_pack ());
+        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+cc1: all warnings being treated as errors
+make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1
+------
+
+It seems that gcc 7 is introducing more restrict check for printf.
+If using clang, although there are some extra warning, it can at least
+pass the compile.
+Thanks,
+Qu
+
diff --git a/results/classifier/016/none/80604314 b/results/classifier/016/none/80604314
new file mode 100644
index 00000000..8112f757
--- /dev/null
+++ b/results/classifier/016/none/80604314
@@ -0,0 +1,1507 @@
+hypervisor: 0.669
+network: 0.654
+debug: 0.554
+operating system: 0.404
+virtual: 0.190
+files: 0.103
+TCG: 0.102
+PID: 0.097
+boot: 0.095
+device: 0.090
+user-level: 0.089
+VMM: 0.084
+vnc: 0.081
+register: 0.062
+socket: 0.055
+kernel: 0.042
+KVM: 0.021
+risc-v: 0.019
+performance: 0.014
+assembly: 0.011
+semantic: 0.008
+architecture: 0.007
+alpha: 0.004
+ppc: 0.003
+permissions: 0.003
+graphic: 0.003
+peripherals: 0.002
+mistranslation: 0.002
+x86: 0.001
+arm: 0.001
+i386: 0.000
+
+[BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
+
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, 
+    config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+
+Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
+the autogenerated virtio-net-ccw device is present) works. Specifying
+several "-device virtio-net-pci" works as well.
+
+Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
+client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
+works (in-between state does not compile).
+
+This is reproducible with tcg as well. Same problem both with
+--enable-vhost-vdpa and --disable-vhost-vdpa.
+
+Have not yet tried to figure out what might be special with
+virtio-ccw... anyone have an idea?
+
+[This should probably be considered a blocker?]
+
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+-device virtio-net-ccw in addition to the autogenerated device), I get
+>
+a segfault. gdb points to
+>
+>
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+config=0x55d6ad9e3f80 "RT") at
+>
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+>
+(backtrace doesn't go further)
+>
+>
+Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
+>
+the autogenerated virtio-net-ccw device is present) works. Specifying
+>
+several "-device virtio-net-pci" works as well.
+>
+>
+Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
+>
+client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
+>
+works (in-between state does not compile).
+Ouch. I didn't test all in-between states :(
+But I wish we had a 0-day instrastructure like kernel has,
+that catches things like that.
+
+>
+This is reproducible with tcg as well. Same problem both with
+>
+--enable-vhost-vdpa and --disable-vhost-vdpa.
+>
+>
+Have not yet tried to figure out what might be special with
+>
+virtio-ccw... anyone have an idea?
+>
+>
+[This should probably be considered a blocker?]
+
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin" <mst@redhat.com> wrote:
+
+>
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+> When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+> -device virtio-net-ccw in addition to the autogenerated device), I get
+>
+> a segfault. gdb points to
+>
+>
+>
+> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+>     config=0x55d6ad9e3f80 "RT") at
+>
+> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+> 146     if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+>
+>
+> (backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+
+>
+>
+>
+> Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
+>
+> the autogenerated virtio-net-ccw device is present) works. Specifying
+>
+> several "-device virtio-net-pci" works as well.
+>
+>
+>
+> Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
+>
+> client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
+>
+> works (in-between state does not compile).
+>
+>
+Ouch. I didn't test all in-between states :(
+>
+But I wish we had a 0-day instrastructure like kernel has,
+>
+that catches things like that.
+Yep, that would be useful... so patchew only builds the complete series?
+
+>
+>
+> This is reproducible with tcg as well. Same problem both with
+>
+> --enable-vhost-vdpa and --disable-vhost-vdpa.
+>
+>
+>
+> Have not yet tried to figure out what might be special with
+>
+> virtio-ccw... anyone have an idea?
+>
+>
+>
+> [This should probably be considered a blocker?]
+I think so, as it makes s390x unusable with more that one
+virtio-net-ccw device, and I don't even see a workaround.
+
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+On Fri, 24 Jul 2020 09:30:58 -0400
+>
+"Michael S. Tsirkin" <mst@redhat.com> wrote:
+>
+>
+> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+> > When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+> > -device virtio-net-ccw in addition to the autogenerated device), I get
+>
+> > a segfault. gdb points to
+>
+> >
+>
+> > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+> >     config=0x55d6ad9e3f80 "RT") at
+>
+> > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+> > 146           if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+> >
+>
+> > (backtrace doesn't go further)
+>
+>
+The core was incomplete, but running under gdb directly shows that it
+>
+is just a bog-standard config space access (first for that device).
+>
+>
+The cause of the crash is that nc->peer is not set... no idea how that
+>
+can happen, not that familiar with that part of QEMU. (Should the code
+>
+check, or is that really something that should not happen?)
+>
+>
+What I don't understand is why it is set correctly for the first,
+>
+autogenerated virtio-net-ccw device, but not for the second one, and
+>
+why virtio-net-pci doesn't show these problems. The only difference
+>
+between -ccw and -pci that comes to my mind here is that config space
+>
+accesses for ccw are done via an asynchronous operation, so timing
+>
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+
+>
+> >
+>
+> > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
+>
+> > the autogenerated virtio-net-ccw device is present) works. Specifying
+>
+> > several "-device virtio-net-pci" works as well.
+>
+> >
+>
+> > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
+>
+> > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
+>
+> > works (in-between state does not compile).
+>
+>
+>
+> Ouch. I didn't test all in-between states :(
+>
+> But I wish we had a 0-day instrastructure like kernel has,
+>
+> that catches things like that.
+>
+>
+Yep, that would be useful... so patchew only builds the complete series?
+>
+>
+>
+>
+> > This is reproducible with tcg as well. Same problem both with
+>
+> > --enable-vhost-vdpa and --disable-vhost-vdpa.
+>
+> >
+>
+> > Have not yet tried to figure out what might be special with
+>
+> > virtio-ccw... anyone have an idea?
+>
+> >
+>
+> > [This should probably be considered a blocker?]
+>
+>
+I think so, as it makes s390x unusable with more that one
+>
+virtio-net-ccw device, and I don't even see a workaround.
+
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin" <mst@redhat.com> wrote:
+
+>
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+> On Fri, 24 Jul 2020 09:30:58 -0400
+>
+> "Michael S. Tsirkin" <mst@redhat.com> wrote:
+>
+>
+>
+> > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+> > > When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+> > > -device virtio-net-ccw in addition to the autogenerated device), I get
+>
+> > > a segfault. gdb points to
+>
+> > >
+>
+> > > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+> > >     config=0x55d6ad9e3f80 "RT") at
+>
+> > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+> > > 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+> > >
+>
+> > > (backtrace doesn't go further)
+>
+>
+>
+> The core was incomplete, but running under gdb directly shows that it
+>
+> is just a bog-standard config space access (first for that device).
+>
+>
+>
+> The cause of the crash is that nc->peer is not set... no idea how that
+>
+> can happen, not that familiar with that part of QEMU. (Should the code
+>
+> check, or is that really something that should not happen?)
+>
+>
+>
+> What I don't understand is why it is set correctly for the first,
+>
+> autogenerated virtio-net-ccw device, but not for the second one, and
+>
+> why virtio-net-pci doesn't show these problems. The only difference
+>
+> between -ccw and -pci that comes to my mind here is that config space
+>
+> accesses for ccw are done via an asynchronous operation, so timing
+>
+> might be different.
+>
+>
+Hopefully Jason has an idea. Could you post a full command line
+>
+please? Do you need a working guest to trigger this? Does this trigger
+>
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on 
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+ 
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+
+>
+>
+> > >
+>
+> > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
+>
+> > > the autogenerated virtio-net-ccw device is present) works. Specifying
+>
+> > > several "-device virtio-net-pci" works as well.
+>
+> > >
+>
+> > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
+>
+> > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
+>
+> > > works (in-between state does not compile).
+>
+> >
+>
+> > Ouch. I didn't test all in-between states :(
+>
+> > But I wish we had a 0-day instrastructure like kernel has,
+>
+> > that catches things like that.
+>
+>
+>
+> Yep, that would be useful... so patchew only builds the complete series?
+>
+>
+>
+> >
+>
+> > > This is reproducible with tcg as well. Same problem both with
+>
+> > > --enable-vhost-vdpa and --disable-vhost-vdpa.
+>
+> > >
+>
+> > > Have not yet tried to figure out what might be special with
+>
+> > > virtio-ccw... anyone have an idea?
+>
+> > >
+>
+> > > [This should probably be considered a blocker?]
+>
+>
+>
+> I think so, as it makes s390x unusable with more that one
+>
+> virtio-net-ccw device, and I don't even see a workaround.
+>
+
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+     config=0x55d6ad9e3f80 "RT") at 
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+It looks to me we forget the check the existence of peer.
+
+Please try the attached patch to see if it works.
+
+Thanks
+0001-virtio-net-check-the-existence-of-peer-before-accesi.patch
+Description:
+Text Data
+
+On Sat, 25 Jul 2020 08:40:07 +0800
+Jason Wang <jasowang@redhat.com> wrote:
+
+>
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+>
+> On Fri, 24 Jul 2020 11:17:57 -0400
+>
+> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+>
+>
+>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+>>> On Fri, 24 Jul 2020 09:30:58 -0400
+>
+>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+>>>
+>
+>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
+>
+>>>>> a segfault. gdb points to
+>
+>>>>>
+>
+>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+>>>>>      config=0x55d6ad9e3f80 "RT") at
+>
+>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+>>>>> 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+>>>>>
+>
+>>>>> (backtrace doesn't go further)
+>
+>>> The core was incomplete, but running under gdb directly shows that it
+>
+>>> is just a bog-standard config space access (first for that device).
+>
+>>>
+>
+>>> The cause of the crash is that nc->peer is not set... no idea how that
+>
+>>> can happen, not that familiar with that part of QEMU. (Should the code
+>
+>>> check, or is that really something that should not happen?)
+>
+>>>
+>
+>>> What I don't understand is why it is set correctly for the first,
+>
+>>> autogenerated virtio-net-ccw device, but not for the second one, and
+>
+>>> why virtio-net-pci doesn't show these problems. The only difference
+>
+>>> between -ccw and -pci that comes to my mind here is that config space
+>
+>>> accesses for ccw are done via an asynchronous operation, so timing
+>
+>>> might be different.
+>
+>> Hopefully Jason has an idea. Could you post a full command line
+>
+>> please? Do you need a working guest to trigger this? Does this trigger
+>
+>> on an x86 host?
+>
+> Yes, it does trigger with tcg-on-x86 as well. I've been using
+>
+>
+>
+> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
+>
+> qemu,zpci=on
+>
+> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+>
+> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+>
+> -device
+>
+> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+>
+> -device virtio-net-ccw
+>
+>
+>
+> It seems it needs the guest actually doing something with the nics; I
+>
+> cannot reproduce the crash if I use the old advent calendar moon buggy
+>
+> image and just add a virtio-net-ccw device.
+>
+>
+>
+> (I don't think it's a problem with my local build, as I see the problem
+>
+> both on my laptop and on an LPAR.)
+>
+>
+>
+It looks to me we forget the check the existence of peer.
+>
+>
+Please try the attached patch to see if it works.
+Thanks, that patch gets my guest up and running again. So, FWIW,
+
+Tested-by: Cornelia Huck <cohuck@redhat.com>
+
+Any idea why this did not hit with virtio-net-pci (or the autogenerated
+virtio-net-ccw device)?
+
+On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+On Sat, 25 Jul 2020 08:40:07 +0800
+Jason Wang <jasowang@redhat.com> wrote:
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+      config=0x55d6ad9e3f80 "RT") at 
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+It looks to me we forget the check the existence of peer.
+
+Please try the attached patch to see if it works.
+Thanks, that patch gets my guest up and running again. So, FWIW,
+
+Tested-by: Cornelia Huck <cohuck@redhat.com>
+
+Any idea why this did not hit with virtio-net-pci (or the autogenerated
+virtio-net-ccw device)?
+It can be hit with virtio-net-pci as well (just start without peer).
+For autogenerated virtio-net-cww, I think the reason is that it has
+already had a peer set.
+Thanks
+
+On Mon, 27 Jul 2020 15:38:12 +0800
+Jason Wang <jasowang@redhat.com> wrote:
+
+>
+On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+>
+> On Sat, 25 Jul 2020 08:40:07 +0800
+>
+> Jason Wang <jasowang@redhat.com> wrote:
+>
+>
+>
+>> On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+>
+>>> On Fri, 24 Jul 2020 11:17:57 -0400
+>
+>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+>>>
+>
+>>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+>>>>> On Fri, 24 Jul 2020 09:30:58 -0400
+>
+>>>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+>>>>>
+>
+>>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+>>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+>>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
+>
+>>>>>>> a segfault. gdb points to
+>
+>>>>>>>
+>
+>>>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+>>>>>>>       config=0x55d6ad9e3f80 "RT") at
+>
+>>>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+>>>>>>> 146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+>>>>>>>
+>
+>>>>>>> (backtrace doesn't go further)
+>
+>>>>> The core was incomplete, but running under gdb directly shows that it
+>
+>>>>> is just a bog-standard config space access (first for that device).
+>
+>>>>>
+>
+>>>>> The cause of the crash is that nc->peer is not set... no idea how that
+>
+>>>>> can happen, not that familiar with that part of QEMU. (Should the code
+>
+>>>>> check, or is that really something that should not happen?)
+>
+>>>>>
+>
+>>>>> What I don't understand is why it is set correctly for the first,
+>
+>>>>> autogenerated virtio-net-ccw device, but not for the second one, and
+>
+>>>>> why virtio-net-pci doesn't show these problems. The only difference
+>
+>>>>> between -ccw and -pci that comes to my mind here is that config space
+>
+>>>>> accesses for ccw are done via an asynchronous operation, so timing
+>
+>>>>> might be different.
+>
+>>>> Hopefully Jason has an idea. Could you post a full command line
+>
+>>>> please? Do you need a working guest to trigger this? Does this trigger
+>
+>>>> on an x86 host?
+>
+>>> Yes, it does trigger with tcg-on-x86 as well. I've been using
+>
+>>>
+>
+>>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
+>
+>>> qemu,zpci=on
+>
+>>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+>
+>>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+>
+>>> -device
+>
+>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+>
+>>> -device virtio-net-ccw
+>
+>>>
+>
+>>> It seems it needs the guest actually doing something with the nics; I
+>
+>>> cannot reproduce the crash if I use the old advent calendar moon buggy
+>
+>>> image and just add a virtio-net-ccw device.
+>
+>>>
+>
+>>> (I don't think it's a problem with my local build, as I see the problem
+>
+>>> both on my laptop and on an LPAR.)
+>
+>>
+>
+>> It looks to me we forget the check the existence of peer.
+>
+>>
+>
+>> Please try the attached patch to see if it works.
+>
+> Thanks, that patch gets my guest up and running again. So, FWIW,
+>
+>
+>
+> Tested-by: Cornelia Huck <cohuck@redhat.com>
+>
+>
+>
+> Any idea why this did not hit with virtio-net-pci (or the autogenerated
+>
+> virtio-net-ccw device)?
+>
+>
+>
+It can be hit with virtio-net-pci as well (just start without peer).
+Hm, I had not been able to reproduce the crash with a 'naked' -device
+virtio-net-pci. But checking seems to be the right idea anyway.
+
+>
+>
+For autogenerated virtio-net-cww, I think the reason is that it has
+>
+already had a peer set.
+Ok, that might well be.
+
+On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
+On Mon, 27 Jul 2020 15:38:12 +0800
+Jason Wang <jasowang@redhat.com> wrote:
+On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+On Sat, 25 Jul 2020 08:40:07 +0800
+Jason Wang <jasowang@redhat.com> wrote:
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+       config=0x55d6ad9e3f80 "RT") at 
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+It looks to me we forget the check the existence of peer.
+
+Please try the attached patch to see if it works.
+Thanks, that patch gets my guest up and running again. So, FWIW,
+
+Tested-by: Cornelia Huck <cohuck@redhat.com>
+
+Any idea why this did not hit with virtio-net-pci (or the autogenerated
+virtio-net-ccw device)?
+It can be hit with virtio-net-pci as well (just start without peer).
+Hm, I had not been able to reproduce the crash with a 'naked' -device
+virtio-net-pci. But checking seems to be the right idea anyway.
+Sorry for being unclear, I meant for networking part, you just need
+start without peer, and you need a real guest (any Linux) that is trying
+to access the config space of virtio-net.
+Thanks
+For autogenerated virtio-net-cww, I think the reason is that it has
+already had a peer set.
+Ok, that might well be.
+
+On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
+>
+>
+On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
+>
+> On Mon, 27 Jul 2020 15:38:12 +0800
+>
+> Jason Wang <jasowang@redhat.com> wrote:
+>
+>
+>
+> > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+>
+> > > On Sat, 25 Jul 2020 08:40:07 +0800
+>
+> > > Jason Wang <jasowang@redhat.com> wrote:
+>
+> > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+>
+> > > > > On Fri, 24 Jul 2020 11:17:57 -0400
+>
+> > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+> > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+> > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
+>
+> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+> > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+> > > > > > > > > When I start qemu with a second virtio-net-ccw device (i.e.
+>
+> > > > > > > > > adding
+>
+> > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
+>
+> > > > > > > > > device), I get
+>
+> > > > > > > > > a segfault. gdb points to
+>
+> > > > > > > > >
+>
+> > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
+>
+> > > > > > > > > (vdev=<optimized out>,
+>
+> > > > > > > > >        config=0x55d6ad9e3f80 "RT") at
+>
+> > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+> > > > > > > > > 146     if (nc->peer->info->type ==
+>
+> > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+> > > > > > > > >
+>
+> > > > > > > > > (backtrace doesn't go further)
+>
+> > > > > > > The core was incomplete, but running under gdb directly shows
+>
+> > > > > > > that it
+>
+> > > > > > > is just a bog-standard config space access (first for that
+>
+> > > > > > > device).
+>
+> > > > > > >
+>
+> > > > > > > The cause of the crash is that nc->peer is not set... no idea
+>
+> > > > > > > how that
+>
+> > > > > > > can happen, not that familiar with that part of QEMU. (Should
+>
+> > > > > > > the code
+>
+> > > > > > > check, or is that really something that should not happen?)
+>
+> > > > > > >
+>
+> > > > > > > What I don't understand is why it is set correctly for the
+>
+> > > > > > > first,
+>
+> > > > > > > autogenerated virtio-net-ccw device, but not for the second
+>
+> > > > > > > one, and
+>
+> > > > > > > why virtio-net-pci doesn't show these problems. The only
+>
+> > > > > > > difference
+>
+> > > > > > > between -ccw and -pci that comes to my mind here is that config
+>
+> > > > > > > space
+>
+> > > > > > > accesses for ccw are done via an asynchronous operation, so
+>
+> > > > > > > timing
+>
+> > > > > > > might be different.
+>
+> > > > > > Hopefully Jason has an idea. Could you post a full command line
+>
+> > > > > > please? Do you need a working guest to trigger this? Does this
+>
+> > > > > > trigger
+>
+> > > > > > on an x86 host?
+>
+> > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
+>
+> > > > >
+>
+> > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
+>
+> > > > > qemu,zpci=on
+>
+> > > > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+>
+> > > > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+>
+> > > > > -device
+>
+> > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+>
+> > > > > -device virtio-net-ccw
+>
+> > > > >
+>
+> > > > > It seems it needs the guest actually doing something with the nics;
+>
+> > > > > I
+>
+> > > > > cannot reproduce the crash if I use the old advent calendar moon
+>
+> > > > > buggy
+>
+> > > > > image and just add a virtio-net-ccw device.
+>
+> > > > >
+>
+> > > > > (I don't think it's a problem with my local build, as I see the
+>
+> > > > > problem
+>
+> > > > > both on my laptop and on an LPAR.)
+>
+> > > > It looks to me we forget the check the existence of peer.
+>
+> > > >
+>
+> > > > Please try the attached patch to see if it works.
+>
+> > > Thanks, that patch gets my guest up and running again. So, FWIW,
+>
+> > >
+>
+> > > Tested-by: Cornelia Huck <cohuck@redhat.com>
+>
+> > >
+>
+> > > Any idea why this did not hit with virtio-net-pci (or the autogenerated
+>
+> > > virtio-net-ccw device)?
+>
+> >
+>
+> > It can be hit with virtio-net-pci as well (just start without peer).
+>
+> Hm, I had not been able to reproduce the crash with a 'naked' -device
+>
+> virtio-net-pci. But checking seems to be the right idea anyway.
+>
+>
+>
+Sorry for being unclear, I meant for networking part, you just need start
+>
+without peer, and you need a real guest (any Linux) that is trying to access
+>
+the config space of virtio-net.
+>
+>
+Thanks
+A pxe guest will do it, but that doesn't support ccw, right?
+
+I'm still unclear why this triggers with ccw but not pci -
+any idea?
+
+>
+>
+>
+>
+> > For autogenerated virtio-net-cww, I think the reason is that it has
+>
+> > already had a peer set.
+>
+> Ok, that might well be.
+>
+>
+>
+>
+
+On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
+On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
+On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
+On Mon, 27 Jul 2020 15:38:12 +0800
+Jason Wang<jasowang@redhat.com>  wrote:
+On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+On Sat, 25 Jul 2020 08:40:07 +0800
+Jason Wang<jasowang@redhat.com>  wrote:
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin"<mst@redhat.com>   wrote:
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin"<mst@redhat.com>   wrote:
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+        config=0x55d6ad9e3f80 "RT") at 
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+It looks to me we forget the check the existence of peer.
+
+Please try the attached patch to see if it works.
+Thanks, that patch gets my guest up and running again. So, FWIW,
+
+Tested-by: Cornelia Huck<cohuck@redhat.com>
+
+Any idea why this did not hit with virtio-net-pci (or the autogenerated
+virtio-net-ccw device)?
+It can be hit with virtio-net-pci as well (just start without peer).
+Hm, I had not been able to reproduce the crash with a 'naked' -device
+virtio-net-pci. But checking seems to be the right idea anyway.
+Sorry for being unclear, I meant for networking part, you just need start
+without peer, and you need a real guest (any Linux) that is trying to access
+the config space of virtio-net.
+
+Thanks
+A pxe guest will do it, but that doesn't support ccw, right?
+Yes, it depends on the cli actually.
+I'm still unclear why this triggers with ccw but not pci -
+any idea?
+I don't test pxe but I can reproduce this with pci (just start a linux
+guest without a peer).
+Thanks
+
+On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
+>
+>
+On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
+>
+> On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
+>
+> > On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
+>
+> > > On Mon, 27 Jul 2020 15:38:12 +0800
+>
+> > > Jason Wang<jasowang@redhat.com>  wrote:
+>
+> > >
+>
+> > > > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+>
+> > > > > On Sat, 25 Jul 2020 08:40:07 +0800
+>
+> > > > > Jason Wang<jasowang@redhat.com>  wrote:
+>
+> > > > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+>
+> > > > > > > On Fri, 24 Jul 2020 11:17:57 -0400
+>
+> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
+>
+> > > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+> > > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
+>
+> > > > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
+>
+> > > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck
+>
+> > > > > > > > > > wrote:
+>
+> > > > > > > > > > > When I start qemu with a second virtio-net-ccw device
+>
+> > > > > > > > > > > (i.e. adding
+>
+> > > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
+>
+> > > > > > > > > > > device), I get
+>
+> > > > > > > > > > > a segfault. gdb points to
+>
+> > > > > > > > > > >
+>
+> > > > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
+>
+> > > > > > > > > > > (vdev=<optimized out>,
+>
+> > > > > > > > > > >         config=0x55d6ad9e3f80 "RT") at
+>
+> > > > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+> > > > > > > > > > > 146         if (nc->peer->info->type ==
+>
+> > > > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+> > > > > > > > > > >
+>
+> > > > > > > > > > > (backtrace doesn't go further)
+>
+> > > > > > > > > The core was incomplete, but running under gdb directly
+>
+> > > > > > > > > shows that it
+>
+> > > > > > > > > is just a bog-standard config space access (first for that
+>
+> > > > > > > > > device).
+>
+> > > > > > > > >
+>
+> > > > > > > > > The cause of the crash is that nc->peer is not set... no
+>
+> > > > > > > > > idea how that
+>
+> > > > > > > > > can happen, not that familiar with that part of QEMU.
+>
+> > > > > > > > > (Should the code
+>
+> > > > > > > > > check, or is that really something that should not happen?)
+>
+> > > > > > > > >
+>
+> > > > > > > > > What I don't understand is why it is set correctly for the
+>
+> > > > > > > > > first,
+>
+> > > > > > > > > autogenerated virtio-net-ccw device, but not for the second
+>
+> > > > > > > > > one, and
+>
+> > > > > > > > > why virtio-net-pci doesn't show these problems. The only
+>
+> > > > > > > > > difference
+>
+> > > > > > > > > between -ccw and -pci that comes to my mind here is that
+>
+> > > > > > > > > config space
+>
+> > > > > > > > > accesses for ccw are done via an asynchronous operation, so
+>
+> > > > > > > > > timing
+>
+> > > > > > > > > might be different.
+>
+> > > > > > > > Hopefully Jason has an idea. Could you post a full command
+>
+> > > > > > > > line
+>
+> > > > > > > > please? Do you need a working guest to trigger this? Does
+>
+> > > > > > > > this trigger
+>
+> > > > > > > > on an x86 host?
+>
+> > > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
+>
+> > > > > > >
+>
+> > > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg
+>
+> > > > > > > -cpu qemu,zpci=on
+>
+> > > > > > > -m 1024 -nographic -device
+>
+> > > > > > > virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+>
+> > > > > > > -drive
+>
+> > > > > > > file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+>
+> > > > > > > -device
+>
+> > > > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+>
+> > > > > > > -device virtio-net-ccw
+>
+> > > > > > >
+>
+> > > > > > > It seems it needs the guest actually doing something with the
+>
+> > > > > > > nics; I
+>
+> > > > > > > cannot reproduce the crash if I use the old advent calendar
+>
+> > > > > > > moon buggy
+>
+> > > > > > > image and just add a virtio-net-ccw device.
+>
+> > > > > > >
+>
+> > > > > > > (I don't think it's a problem with my local build, as I see the
+>
+> > > > > > > problem
+>
+> > > > > > > both on my laptop and on an LPAR.)
+>
+> > > > > > It looks to me we forget the check the existence of peer.
+>
+> > > > > >
+>
+> > > > > > Please try the attached patch to see if it works.
+>
+> > > > > Thanks, that patch gets my guest up and running again. So, FWIW,
+>
+> > > > >
+>
+> > > > > Tested-by: Cornelia Huck<cohuck@redhat.com>
+>
+> > > > >
+>
+> > > > > Any idea why this did not hit with virtio-net-pci (or the
+>
+> > > > > autogenerated
+>
+> > > > > virtio-net-ccw device)?
+>
+> > > > It can be hit with virtio-net-pci as well (just start without peer).
+>
+> > > Hm, I had not been able to reproduce the crash with a 'naked' -device
+>
+> > > virtio-net-pci. But checking seems to be the right idea anyway.
+>
+> > Sorry for being unclear, I meant for networking part, you just need start
+>
+> > without peer, and you need a real guest (any Linux) that is trying to
+>
+> > access
+>
+> > the config space of virtio-net.
+>
+> >
+>
+> > Thanks
+>
+> A pxe guest will do it, but that doesn't support ccw, right?
+>
+>
+>
+Yes, it depends on the cli actually.
+>
+>
+>
+>
+>
+> I'm still unclear why this triggers with ccw but not pci -
+>
+> any idea?
+>
+>
+>
+I don't test pxe but I can reproduce this with pci (just start a linux guest
+>
+without a peer).
+>
+>
+Thanks
+>
+Might be a good addition to a unit test. Not sure what would the
+test do exactly: just make sure guest runs? Looks like a lot of work
+for an empty test ... maybe we can poke at the guest config with
+qtest commands at least.
+
+-- 
+MST
+
+On 2020/7/27 ä¸å9:16, Michael S. Tsirkin wrote:
+On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
+On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
+On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
+On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
+On Mon, 27 Jul 2020 15:38:12 +0800
+Jason Wang<jasowang@redhat.com>  wrote:
+On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+On Sat, 25 Jul 2020 08:40:07 +0800
+Jason Wang<jasowang@redhat.com>  wrote:
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin"<mst@redhat.com>   wrote:
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin"<mst@redhat.com>   wrote:
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+         config=0x55d6ad9e3f80 "RT") at 
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+It looks to me we forget the check the existence of peer.
+
+Please try the attached patch to see if it works.
+Thanks, that patch gets my guest up and running again. So, FWIW,
+
+Tested-by: Cornelia Huck<cohuck@redhat.com>
+
+Any idea why this did not hit with virtio-net-pci (or the autogenerated
+virtio-net-ccw device)?
+It can be hit with virtio-net-pci as well (just start without peer).
+Hm, I had not been able to reproduce the crash with a 'naked' -device
+virtio-net-pci. But checking seems to be the right idea anyway.
+Sorry for being unclear, I meant for networking part, you just need start
+without peer, and you need a real guest (any Linux) that is trying to access
+the config space of virtio-net.
+
+Thanks
+A pxe guest will do it, but that doesn't support ccw, right?
+Yes, it depends on the cli actually.
+I'm still unclear why this triggers with ccw but not pci -
+any idea?
+I don't test pxe but I can reproduce this with pci (just start a linux guest
+without a peer).
+
+Thanks
+Might be a good addition to a unit test. Not sure what would the
+test do exactly: just make sure guest runs? Looks like a lot of work
+for an empty test ... maybe we can poke at the guest config with
+qtest commands at least.
+That should work or we can simply extend the exist virtio-net qtest to
+do that.
+Thanks
+