operating system: 0.984 vnc: 0.983 permissions: 0.983 assembly: 0.979 register: 0.979 debug: 0.979 user-level: 0.979 risc-v: 0.978 semantic: 0.978 architecture: 0.978 graphic: 0.978 peripherals: 0.977 performance: 0.976 PID: 0.976 network: 0.975 socket: 0.975 arm: 0.975 virtual: 0.975 ppc: 0.974 device: 0.974 alpha: 0.973 mistranslation: 0.973 kernel: 0.973 KVM: 0.971 boot: 0.969 hypervisor: 0.962 files: 0.961 i386: 0.960 VMM: 0.960 TCG: 0.958 x86: 0.955 [Qemu-devel] [vhost-user BUG ?] QEMU process segfault when shutdown or reboot with vhost-user Hi, We catch a segfault in our project. Qemu version is 2.3.0 The Stack backtrace is: (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00007f7ad9280b2f in qemu_deliver_packet (sender=, flags=, data=, size=100, opaque= 0x7f7ad9d6db10) at net/net.c:510 #2 0x00007f7ad92831fa in qemu_net_queue_deliver (size=, data=, flags=, sender=, queue=) at net/queue.c:157 #3 qemu_net_queue_flush (queue=0x7f7ad9d39630) at net/queue.c:254 #4 0x00007f7ad9280dac in qemu_flush_or_purge_queued_packets (nc=0x7f7ad9d6db10, purge=true) at net/net.c:539 #5 0x00007f7ad9280e76 in net_vm_change_state_handler (opaque=, running=, state=100) at net/net.c:1214 #6 0x00007f7ad915612f in vm_state_notify (running=0, state=RUN_STATE_SHUTDOWN) at vl.c:1820 #7 0x00007f7ad906db1a in do_vm_stop (state=) at /usr/src/packages/BUILD/qemu-kvm-2.3.0/cpus.c:631 #8 vm_stop (state=RUN_STATE_SHUTDOWN) at /usr/src/packages/BUILD/qemu-kvm-2.3.0/cpus.c:1325 #9 0x00007f7ad915e4a2 in main_loop_should_exit () at vl.c:2080 #10 main_loop () at vl.c:2131 #11 main (argc=, argv=, envp=) at vl.c:4721 (gdb) p *(NetClientState *)0x7f7ad9d6db10 $1 = {info = 0x7f7ad9824520, link_down = 0, next = {tqe_next = 0x7f7ad0f06d10, tqe_prev = 0x7f7ad98b1cf0}, peer = 0x7f7ad0f06d10, incoming_queue = 0x7f7ad9d39630, model = 0x7f7ad9d39590 "vhost_user", name = 0x7f7ad9d39570 "hostnet0", info_str = "vhost-user to charnet0", '\000' , receive_disabled = 0, destructor = 0x7f7ad92821f0 , queue_index = 0, rxfilter_notify_enabled = 0} (gdb) p *(NetClientInfo *)0x7f7ad9824520 $2 = {type = NET_CLIENT_OPTIONS_KIND_VHOST_USER, size = 360, receive = 0, receive_raw = 0, receive_iov = 0, can_receive = 0, cleanup = 0x7f7ad9288850 , link_status_changed = 0, query_rx_filter = 0, poll = 0, has_ufo = 0x7f7ad92886d0 , has_vnet_hdr = 0x7f7ad9288670 , has_vnet_hdr_len = 0, using_vnet_hdr = 0, set_offload = 0, set_vnet_hdr_len = 0} (gdb) The corresponding codes where gdb reports error are: (We have added some codes in net.c) ssize_t qemu_deliver_packet(NetClientState *sender, unsigned flags, const uint8_t *data, size_t size, void *opaque) { NetClientState *nc = opaque; ssize_t ret; if (nc->link_down) { return size; } if (nc->receive_disabled) { return 0; } if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { ret = nc->info->receive_raw(nc, data, size); } else { ret = nc->info->receive(nc, data, size); ----> Here is 510 line } I'm not quite familiar with vhost-user, but for vhost-user, these two callback functions seem to be always NULL, Why we can come here ? Is it an error to add VM state change handler for vhost-user ? Thanks, zhanghailiang Hi On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang wrote: > The corresponding codes where gdb reports error are: (We have added some > codes in net.c) Can you reproduce with unmodified qemu? Could you give instructions to do so? > ssize_t qemu_deliver_packet(NetClientState *sender, > unsigned flags, > const uint8_t *data, > size_t size, > void *opaque) > { > NetClientState *nc = opaque; > ssize_t ret; > > if (nc->link_down) { > return size; > } > > if (nc->receive_disabled) { > return 0; > } > > if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { > ret = nc->info->receive_raw(nc, data, size); > } else { > ret = nc->info->receive(nc, data, size); ----> Here is 510 line > } > > I'm not quite familiar with vhost-user, but for vhost-user, these two > callback functions seem to be always NULL, > Why we can come here ? You should not come here, vhost-user has nc->receive_disabled (it changes in 2.5) -- Marc-André Lureau On 2015/11/3 22:54, Marc-André Lureau wrote: Hi On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang wrote: The corresponding codes where gdb reports error are: (We have added some codes in net.c) Can you reproduce with unmodified qemu? Could you give instructions to do so? OK, i will try to do it. There is nothing special, we run iperf tool in VM, and then shutdown or reboot it. There is change you can catch segfault. ssize_t qemu_deliver_packet(NetClientState *sender, unsigned flags, const uint8_t *data, size_t size, void *opaque) { NetClientState *nc = opaque; ssize_t ret; if (nc->link_down) { return size; } if (nc->receive_disabled) { return 0; } if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { ret = nc->info->receive_raw(nc, data, size); } else { ret = nc->info->receive(nc, data, size); ----> Here is 510 line } I'm not quite familiar with vhost-user, but for vhost-user, these two callback functions seem to be always NULL, Why we can come here ? You should not come here, vhost-user has nc->receive_disabled (it changes in 2.5) I have looked at the newest codes, i think we can still have chance to come here, since we will change nc->receive_disable to false temporarily in qemu_flush_or_purge_queued_packets(), there is no difference between 2.3 and 2.5 for this. Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true in qemu_net_queue_flush() for vhost-user ? i will try to reproduce it by using newest qemu. Thanks, zhanghailiang On 11/04/2015 10:24 AM, zhanghailiang wrote: > On 2015/11/3 22:54, Marc-André Lureau wrote: > > Hi > > > > On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang > > wrote: > >> The corresponding codes where gdb reports error are: (We have added > >> some > >> codes in net.c) > > > > Can you reproduce with unmodified qemu? Could you give instructions > > to do so? > > > > OK, i will try to do it. There is nothing special, we run iperf tool > in VM, > and then shutdown or reboot it. There is change you can catch segfault. > > >> ssize_t qemu_deliver_packet(NetClientState *sender, > >> unsigned flags, > >> const uint8_t *data, > >> size_t size, > >> void *opaque) > >> { > >> NetClientState *nc = opaque; > >> ssize_t ret; > >> > >> if (nc->link_down) { > >> return size; > >> } > >> > >> if (nc->receive_disabled) { > >> return 0; > >> } > >> > >> if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { > >> ret = nc->info->receive_raw(nc, data, size); > >> } else { > >> ret = nc->info->receive(nc, data, size); ----> Here is > >> 510 line > >> } > >> > >> I'm not quite familiar with vhost-user, but for vhost-user, these two > >> callback functions seem to be always NULL, > >> Why we can come here ? > > > > You should not come here, vhost-user has nc->receive_disabled (it > > changes in 2.5) > > > > I have looked at the newest codes, i think we can still have chance to > come here, since we will change nc->receive_disable to false > temporarily in > qemu_flush_or_purge_queued_packets(), there is no difference between > 2.3 and 2.5 > for this. > Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true > in qemu_net_queue_flush() for vhost-user ? The only thing I can image is self announcing. Are you trying to do migration? 2.5 only support sending rarp through this. And it's better to have a breakpoint to see why a packet was queued for vhost-user. The stack trace may also help in this case. > > i will try to reproduce it by using newest qemu. > > Thanks, > zhanghailiang > On 2015/11/4 11:19, Jason Wang wrote: On 11/04/2015 10:24 AM, zhanghailiang wrote: On 2015/11/3 22:54, Marc-André Lureau wrote: Hi On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang wrote: The corresponding codes where gdb reports error are: (We have added some codes in net.c) Can you reproduce with unmodified qemu? Could you give instructions to do so? OK, i will try to do it. There is nothing special, we run iperf tool in VM, and then shutdown or reboot it. There is change you can catch segfault. ssize_t qemu_deliver_packet(NetClientState *sender, unsigned flags, const uint8_t *data, size_t size, void *opaque) { NetClientState *nc = opaque; ssize_t ret; if (nc->link_down) { return size; } if (nc->receive_disabled) { return 0; } if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { ret = nc->info->receive_raw(nc, data, size); } else { ret = nc->info->receive(nc, data, size); ----> Here is 510 line } I'm not quite familiar with vhost-user, but for vhost-user, these two callback functions seem to be always NULL, Why we can come here ? You should not come here, vhost-user has nc->receive_disabled (it changes in 2.5) I have looked at the newest codes, i think we can still have chance to come here, since we will change nc->receive_disable to false temporarily in qemu_flush_or_purge_queued_packets(), there is no difference between 2.3 and 2.5 for this. Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true in qemu_net_queue_flush() for vhost-user ? The only thing I can image is self announcing. Are you trying to do migration? 2.5 only support sending rarp through this. Hmm, it's not triggered by migration, For qemu-2.5, IMHO, it doesn't have such problem, since the callback function 'receive' is not NULL. It is vhost_user_receive(). And it's better to have a breakpoint to see why a packet was queued for vhost-user. The stack trace may also help in this case. OK, i'm trying to reproduce it. Thanks, zhanghailiang i will try to reproduce it by using newest qemu. Thanks, zhanghailiang .