summary refs log tree commit diff stats
path: root/net/net.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* net: implement UDP tunnel features offloadingPaolo Abeni2025-10-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When any host or guest GSO over UDP tunnel offload is enabled the virtio net header includes the additional tunnel-related fields, update the size accordingly. Push the GSO over UDP tunnel offloads all the way down to the tap device extending the newly introduced NetFeatures struct, and eventually enable the associated features. As per virtio specification, to convert features bit to offload bit, map the extended features into the reserved range. Finally, make the vhost backend aware of the exact header layout, to copy it correctly. The tunnel-related field are present if either the guest or the host negotiated any UDP tunnel related feature: add them to the kernel supported features list, to allow qemu transfer to the backend the needed information. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <093b4bc68368046bffbcab2202227632d6e4e83b.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* net: implement tunnel probingPaolo Abeni2025-10-041-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | Tap devices support GSO over UDP tunnel offload. Probe for such feature in a similar manner to other offloads. GSO over UDP tunnel needs to be enabled in addition to a "plain" offload (TSO or USO). No need to check separately for the outer header checksum offload: the kernel is going to support both of them or none. The new features are disabled by default to avoid compat issues, and could be enabled, after that hw_compat_10_1 will be added, together with the related compat entries. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <a987a8a7613cbf33bb2209c7c7f5889b512638a7.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* net: bundle all offloads in a single structPaolo Abeni2025-10-041-3/+2
| | | | | | | | | | | | | | | | The set_offload() argument list is already pretty long and we are going to introduce soon a bunch of additional offloads. Replace the offload arguments with a single struct and update all the relevant call-sites. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <a9d4dd043b8c71b791e9ff05e17ef06072d9714e.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi2025-07-161-0/+9
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging virtio,pci,pc: features, fixes, tests SPCR acpi table can now be disabled vhost-vdpa can now report hashing capability to guest PPTT acpi table now tells guest vCPUs are identical vost-user-blk now shuts down faster loongarch64 now supports bios-tables-test intel_iommu now supports ATS cxl now supports DCD Fabric Management Command Set arm now supports acpi pci hotplug fixes, cleanups Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ # pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv # joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i # TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg # h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z # ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc= # =sktm # -----END PGP SIGNATURE----- # gpg: Signature made Tue 15 Jul 2025 02:56:48 EDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (97 commits) hw/cxl: mailbox-utils: 0x5605 - FMAPI Initiate DC Release hw/cxl: mailbox-utils: 0x5604 - FMAPI Initiate DC Add hw/cxl: Create helper function to create DC Event Records from extents hw/cxl: mailbox-utils: 0x5603 - FMAPI Get DC Region Extent Lists hw/cxl: mailbox-utils: 0x5602 - FMAPI Set DC Region Config hw/mem: cxl_type3: Add DC Region bitmap lock hw/cxl: Move definition for dynamic_capacity_uuid and enum for DC event types to header hw/cxl: mailbox-utils: 0x5601 - FMAPI Get Host Region Config hw/mem: cxl_type3: Add dsmas_flags to CXLDCRegion struct hw/cxl: mailbox-utils: 0x5600 - FMAPI Get DCD Info hw/cxl: fix DC extent capacity tracking tests: virt: Update expected ACPI tables for virt test hw/acpi/aml-build: Build a root node in the PPTT table hw/acpi/aml-build: Set identical implementation flag for PPTT processor nodes tests: virt: Allow changes to PPTT test table qtest/bios-tables-test: Generate reference blob for DSDT.acpipcihp qtest/bios-tables-test: Generate reference blob for DSDT.hpoffacpiindex tests/qtest/bios-tables-test: Add aarch64 ACPI PCI hotplug test tests/qtest/bios-tables-test: Prepare for addition of acpi pci hp tests hw/arm/virt: Let virt support pci hotplug/unplug GED event ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Conflicts: net/vhost-vdpa.c vhost_vdpa_set_steering_ebpf() was removed, resolve the context conflict.
| * net/vhost-vdpa: Report hashing capabilityAkihiko Odaki2025-07-141-0/+9
| | | | | | | | | | | | | | | | | | | | Report hashing capability so that virtio-net can deliver the correct capability information to the guest. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20250530-vdpa-v1-2-5af4109b1c19@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | net: Add passt network backendLaurent Vivier2025-07-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit introduces support for passt as a new network backend. passt is an unprivileged, user-mode networking solution that provides connectivity for virtual machines by launching an external helper process. The implementation reuses the generic stream data handling logic. It launches the passt binary using GSubprocess, passing it a file descriptor from a socketpair() for communication. QEMU connects to the other end of the socket pair to establish the network data stream. The PID of the passt daemon is tracked via a temporary file to ensure it is terminated when QEMU exits. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* | net: Define net_client_set_link()Laurent Vivier2025-07-141-12/+20
|/ | | | | | | | | | | | | | | | | | | | | The code to set the link status is currently located in qmp_set_link(). This function identifies the device by name, searches for the corresponding NetClientState, and then updates the link status. In some parts of the code, such as vhost-user.c, the NetClientState are already available. Calling qmp_set_link() from these locations leads to a redundant search for the clients. This patch refactors the logic by introducing a new function, net_client_set_link(), which accepts a NetClientState array directly. qmp_set_link() is simplified to be a wrapper that performs the client search and then calls the new function. The vhost-user implementation is updated to use net_client_set_link() directly, thereby eliminating the unnecessary client lookup. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: move backend cleanup to NIC cleanupEugenio Pérez2025-03-101-6/+27
| | | | | | | | | | | | | | | | | | | | | | Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present") effectively delayed the backend cleanup, allowing the frontend or the guest to access it resources as long as the frontend is still visible to the guest. However it does not clean up the resources until the qemu process is over. This causes an effective leak if the device is deleted with device_del, as there is no way to close the vdpa device. This makes impossible to re-add that device to this or other QEMU instances until the first instance of QEMU is finished. Move the cleanup from qemu_cleanup to the NIC deletion and to net_cleanup. Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present") Reported-by: Lei Yang <leiyang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: parameterize the removing client from nc listEugenio Pérez2025-03-101-5/+8
| | | | | | | | | | | | This change is used in later commits so we can avoid the removal of the netclient if it is delayed. No functional change intended. Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* qapi: Move include/qapi/qmp/ to include/qobject/Daniel P. Berrangé2025-02-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | The general expectation is that header files should follow the same file/path naming scheme as the corresponding source file. There are various historical exceptions to this practice in QEMU, with one of the most notable being the include/qapi/qmp/ directory. Most of the headers there correspond to source files in qobject/. This patch corrects most of that inconsistency by creating include/qobject/ and moving the headers for qobject/ there. This also fixes MAINTAINERS for include/qapi/qmp/dispatch.h: scripts/get_maintainer.pl now reports "QAPI" instead of "No maintainers found". Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> #s390x Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20241118151235.2665921-2-armbru@redhat.com> [Rebased]
* net: Fix announce_selfLaurent Vivier2025-01-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | b9ad513e1876 ("net: Remove receive_raw()") adds an iovec entry in qemu_deliver_packet_iov() to add the virtio-net header in the data when QEMU_NET_PACKET_FLAG_RAW is set but forgets to increase the number of iovec entries in the array, so receive_iov() will only send the first entry (the virtio-net entry, full of 0) and no data. The packet will be discarded. The only user of QEMU_NET_PACKET_FLAG_RAW is announce_self. We can see the problem with tcpdump: - QEMU parameters: .. -monitor stdio \ -netdev bridge,id=netdev0,br=virbr0 \ -device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 \ - HMP command: (qemu) announce_self - TCP dump: $ sudo tcpdump -nxi virbr0 without the fix: <nothing> with the fix: ARP, Reverse Request who-is 9a:2b:2c:2d:2e:2f tell 9a:2b:2c:2d:2e:2f, length 46 0x0000: 0001 0800 0604 0003 9a2b 2c2d 2e2f 0000 0x0010: 0000 9a2b 2c2d 2e2f 0000 0000 0000 0000 0x0020: 0000 0000 0000 0000 0000 0000 0000 Reported-by: Xiaohui Li <xiaohli@redhat.com> Bug: https://issues.redhat.com/browse/RHEL-73891 Fixes: b9ad513e1876 ("net: Remove receive_raw()") Cc: akihiko.odaki@daynix.com Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
* include: Rename sysemu/ -> system/Philippe Mathieu-Daudé2024-12-201-1/+1
| | | | | | | | | | | | | Headers in include/sysemu/ are not only related to system *emulation*, they are also used by virtualization. Rename as system/ which is clearer. Files renamed manually then mechanical change using sed tool. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Message-Id: <20241203172445.28576-1-philmd@linaro.org>
* net: Check if nc is NULL in qemu_get_vnet_hdr_len()Akihiko Odaki2024-10-281-0/+4
| | | | | | | | | | | | | | A netdev may not have a peer specified, resulting in NULL. We should make it behave like /dev/null in such a case instead of letting it cause segmentatin fault. Fixes: 4b52d63249a5 ("tap: Remove qemu_using_vnet_hdr()") Cc: qemu-stable@nongnu.org Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Tested-by; Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Remove deadcodeDr. David Alan Gilbert2024-10-031-10/+0
| | | | | | | | | | | | | | | | | net_hub_port_find is unused since 2018's commit af1a5c3eb4 ("net: Remove the deprecated "vlan" parameter") qemu_receive_packet_iov is unused since commit ffbd2dbd8e ("e1000e: Perform software segmentation for loopback") in turn it was the last user of qemu_net_queue_receive_iov. Remove them. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
* net: Fix '-net nic,model=' for non-help argumentsDavid Woodhouse2024-08-121-1/+1
| | | | | | | | | | | Oops, don't *delete* the model option when checking for 'help'. Fixes: 64f75f57f9d2 ("net: Reinstate '-net nic, model=help' output as documented in man page") Reported-by: Hans <sungdgdhtryrt@gmail.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Cc: qemu-stable@nongnu.org Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Reinstate '-net nic, model=help' output as documented in man pageDavid Woodhouse2024-08-021-3/+22
| | | | | | | | | | | While refactoring the NIC initialization code, I broke '-net nic,model=help' which no longer outputs a list of available NIC models. Fixes: 2cdeca04adab ("net: report list of available models according to platform") Cc: qemu-stable@nongnu.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Remove receive_raw()Akihiko Odaki2024-06-041-6/+12
| | | | | | | | | | | While netmap implements virtio-net header, it does not implement receive_raw(). Instead of implementing receive_raw for netmap, add virtio-net headers in the common code and use receive_iov()/receive() instead. This also fixes the buffer size for the virtio-net header. Fixes: fbbdbddec0 ("tap: allow extended virtio header with hash info") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Move virtio-net header length assertionAkihiko Odaki2024-06-041-0/+5
| | | | | | | The virtio-net header length assertion should happen for any clients. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* tap: Remove qemu_using_vnet_hdr()Akihiko Odaki2024-06-041-23/+1
| | | | | | | | | | | | Since qemu_set_vnet_hdr_len() is always called when qemu_using_vnet_hdr() is called, we can merge them and save some code. For consistency, express that the virtio-net header is not in use by returning 0 with qemu_get_vnet_hdr_len() instead of having a dedicated function, qemu_get_using_vnet_hdr(). Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: make nb_nics and nd_table[] static in net/net.cDavid Woodhouse2024-02-021-0/+3
| | | | | | | | | Also remove the stale declaration of host_net_devices; the actual definition was removed long ago in commit 7cc28cb06104 ("net: Remove the deprecated 'host_net_add' and 'host_net_remove' HMP commands") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Thomas Huth <thuth@redhat.com>
* net: remove qemu_show_nic_models(), qemu_find_nic_model()David Woodhouse2024-02-021-33/+6
| | | | | | | | | | These old functions can be removed now too. Let net_param_nic() print the full set of network devices directly, and also make it note that a list more specific to this platform/config will be available by using '-nic model=help' instead. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Thomas Huth <thuth@redhat.com>
* net: remove qemu_check_nic_model()David Woodhouse2024-02-021-13/+0
| | | | | | | | There are no callers of this function any more, as they have all been converted to qemu_{create,configure}_nic_device(). Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Thomas Huth <thuth@redhat.com>
* net: add qemu_create_nic_bus_devices()David Woodhouse2024-02-021-0/+53
| | | | | | | | | | | This will instantiate any NICs which live on a given bus type. Each bus is allowed *one* substitution (for PCI it's virtio → virtio-net-pci, for Xen it's xen → xen-net-device; no point in overengineering it unless we actually want more). Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Thomas Huth <thuth@redhat.com>
* net: report list of available models according to platformDavid Woodhouse2024-02-021-0/+94
| | | | | | | | | By noting the models for which a configuration was requested, we can give the user an accurate list of which NIC models were actually available on the platform/configuration that was otherwise chosen. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
* net: add qemu_{configure,create}_nic_device(), qemu_find_nic_info()David Woodhouse2024-02-021-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most code which directly accesses nd_table[] and nb_nics uses them for one of two things. Either "I have created a NIC device and I'd like a configuration for it", or "I will create a NIC device *if* there is a configuration for it". With some variants on the theme around whether they actually *check* if the model specified in the configuration is the right one. Provide functions which perform both of those, allowing platforms to be a little more consistent and as a step towards making nd_table[] and nb_nics private to the net code. One might argue that platforms ought to be consistent about whether they create the unconfigured devices or not, but making significant user-visible changes is explicitly *not* the intent right now. The new functions leave the 'model' field of the NICInfo as NULL after using it for the default NIC model, unlike the qemu_check_nic_model() function which does set nd->model to match default_model explicitly. This is acceptable because there is no code which consumes nd->model except this NIC-matching code in net/net.c, and no reasonable excuse for any code wanting to use nd->model in future. Also export the qemu_find_nic_info() helper, as some platforms have special cases they need to handle. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Thomas Huth <thuth@redhat.com>
* net: do not delete nics in net_cleanup()David Woodhouse2023-11-211-6/+22
| | | | | | | | | | | | | | | | | | | | | | | In net_cleanup() we only need to delete the netdevs, as those may have state which outlives Qemu when it exits, and thus may actually need to be cleaned up on exit. The nics, on the other hand, are owned by the device which created them. Most devices don't bother to clean up on exit because they don't have any state which will outlive Qemu... but XenBus devices do need to clean up their nodes in XenStore, and do have an exit handler to delete them. When the XenBus exit handler destroys the xen-net-device, it attempts to delete its nic after net_cleanup() had already done so. And crashes. Fix this by only deleting netdevs as we walk the list. As the comment notes, we can't use QTAILQ_FOREACH_SAFE() as each deletion may remove *multiple* entries, including the "safely" saved 'next' pointer. But we can store the *previous* entry, since nics are safe. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Update MemReentrancyGuard for NICAkihiko Odaki2023-11-211-0/+14
| | | | | | | | | | | | | | | | | | Recently MemReentrancyGuard was added to DeviceState to record that the device is engaging in I/O. The network device backend needs to update it when delivering a packet to a device. This implementation follows what bottom half does, but it does not add a tracepoint for the case that the network device backend started delivering a packet to a device which is already engaging in I/O. This is because such reentrancy frequently happens for qemu_flush_queued_packets() and is insignificant. Fixes: CVE-2023-3019 Reported-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Provide MemReentrancyGuard * to qemu_new_nic()Akihiko Odaki2023-11-211-0/+1
| | | | | | | | | | | | | Recently MemReentrancyGuard was added to DeviceState to record that the device is engaging in I/O. The network device backend needs to update it when delivering a packet to a device. In preparation for such a change, add MemReentrancyGuard * as a parameter of qemu_new_nic(). Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Fix a misleading error messageMarkus Armbruster2023-11-171-3/+3
| | | | | | | | | | | | | | The error message $ qemu-system-x86_64 -netdev user,id=net0,ipv6-net=fec0::0/ qemu-system-x86_64: -netdev user,id=net0,ipv6-net=fec0::0/: Parameter 'ipv6-prefixlen' expects a number points to ipv6-prefixlen instead of ipv6-net. Fix: qemu-system-x86_64: -netdev user,id=net0,ipv6-net=fec0::0/: parameter 'ipv6-net' expects a number after '/' Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20231031111059.3407803-6-armbru@redhat.com>
* net/net: Clean up global variable shadowingPhilippe Mathieu-Daudé2023-10-061-7/+7
| | | | | | | | | | | | | | | | | | | | | | | Fix: net/net.c:1680:35: error: declaration shadows a variable in the global scope [-Werror,-Wshadow] bool netdev_is_modern(const char *optarg) ^ net/net.c:1714:38: error: declaration shadows a variable in the global scope [-Werror,-Wshadow] void netdev_parse_modern(const char *optarg) ^ net/net.c:1728:60: error: declaration shadows a variable in the global scope [-Werror,-Wshadow] void net_client_parse(QemuOptsList *opts_list, const char *optarg) ^ /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/getopt.h:77:14: note: previous declaration is here extern char *optarg; /* getopt(3) external variables */ ^ Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20231004120019.93101-4-philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
* net: add initial support for AF_XDP network backendIlya Maximets2023-09-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AF_XDP is a network socket family that allows communication directly with the network device driver in the kernel, bypassing most or all of the kernel networking stack. In the essence, the technology is pretty similar to netmap. But, unlike netmap, AF_XDP is Linux-native and works with any network interfaces without driver modifications. Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't require access to character devices or unix sockets. Only access to the network interface itself is necessary. This patch implements a network backend that communicates with the kernel by creating an AF_XDP socket. A chunk of userspace memory is shared between QEMU and the host kernel. 4 ring buffers (Tx, Rx, Fill and Completion) are placed in that memory along with a pool of memory buffers for the packet data. Data transmission is done by allocating one of the buffers, copying packet data into it and placing the pointer into Tx ring. After transmission, device will return the buffer via Completion ring. On Rx, device will take a buffer form a pre-populated Fill ring, write the packet data into it and place the buffer into Rx ring. AF_XDP network backend takes on the communication with the host kernel and the network interface and forwards packets to/from the peer device in QEMU. Usage example: -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1 XDP program bridges the socket with a network interface. It can be attached to the interface in 2 different modes: 1. skb - this mode should work for any interface and doesn't require driver support. With a caveat of lower performance. 2. native - this does require support from the driver and allows to bypass skb allocation in the kernel and potentially use zero-copy while getting packets in/out userspace. By default, QEMU will try to use native mode and fall back to skb. Mode can be forced via 'mode' option. To force 'copy' even in native mode, use 'force-copy=on' option. This might be useful if there is some issue with the driver. Option 'queues=N' allows to specify how many device queues should be open. Note that all the queues that are not open are still functional and can receive traffic, but it will not be delivered to QEMU. So, the number of device queues should generally match the QEMU configuration, unless the device is shared with something else and the traffic re-direction to appropriate queues is correctly configured on a device level (e.g. with ethtool -N). 'start-queue=M' option can be used to specify from which queue id QEMU should start configuring 'N' queues. It might also be necessary to use this option with certain NICs, e.g. MLX5 NICs. See the docs for examples. In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN or CAP_BPF capabilities in order to load default XSK/XDP programs to the network interface and configure BPF maps. It is possible, however, to run with no capabilities. For that to work, an external process with enough capabilities will need to pre-load default XSK program, create AF_XDP sockets and pass their file descriptors to QEMU process on startup via 'sock-fds' option. Network backend will need to be configured with 'inhibit=on' to avoid loading of the program. QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue or CAP_IPC_LOCK. There are few performance challenges with the current network backends. First is that they do not support IO threads. This means that data path is handled by the main thread in QEMU and may slow down other work or may be slowed down by some other work. This also means that taking advantage of multi-queue is generally not possible today. Another thing is that data path is going through the device emulation code, which is not really optimized for performance. The fastest "frontend" device is virtio-net. But it's not optimized for heavy traffic either, because it expects such use-cases to be handled via some implementation of vhost (user, kernel, vdpa). In practice, we have virtio notifications and rcu lock/unlock on a per-packet basis and not very efficient accesses to the guest memory. Communication channels between backend and frontend devices do not allow passing more than one packet at a time as well. Some of these challenges can be avoided in the future by adding better batching into device emulation or by implementing vhost-af-xdp variant. There are also a few kernel limitations. AF_XDP sockets do not support any kinds of checksum or segmentation offloading. Buffers are limited to a page size (4K), i.e. MTU is limited. Multi-buffer support implementation for AF_XDP is in progress, but not ready yet. Also, transmission in all non-zero-copy modes is synchronous, i.e. done in a syscall. That doesn't allow high packet rates on virtual interfaces. However, keeping in mind all of these challenges, current implementation of the AF_XDP backend shows a decent performance while running on top of a physical NIC with zero-copy support. Test setup: 2 VMs running on 2 physical hosts connected via ConnectX6-Dx card. Network backend is configured to open the NIC directly in native mode. The driver supports zero-copy. NIC is configured to use 1 queue. Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd for PPS testing. iperf3 result: TCP stream : 19.1 Gbps dpdk-testpmd (single queue, single CPU core, 64 B packets) results: Tx only : 3.4 Mpps Rx only : 2.0 Mpps L2 FWD Loopback : 1.5 Mpps In skb mode the same setup shows much lower performance, similar to the setup where pair of physical NICs is replaced with veth pair: iperf3 result: TCP stream : 9 Gbps dpdk-testpmd (single queue, single CPU core, 64 B packets) results: Tx only : 1.2 Mpps Rx only : 1.0 Mpps L2 FWD Loopback : 0.7 Mpps Results in skb mode or over the veth are close to results of a tap backend with vhost=on and disabled segmentation offloading bridged with a NIC. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool) Signed-off-by: Jason Wang <jasowang@redhat.com>
* tap: Add check for USO featuresYuri Benditovich2023-09-181-0/+9
| | | | | | | | | Tap indicates support for USO features according to capabilities of current kernel module. Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Signed-off-by: Andrew Melnychecnko <andrew@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* tap: Add USO support to tap device.Andrew Melnychenko2023-09-181-2/+2
| | | | | | | | | Passing additional parameters (USOv4 and USOv6 offloads) when setting TAP offloads Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Strip virtio-net header when dumpingAkihiko Odaki2023-03-101-0/+18
| | | | | | | | | | | filter-dump specifiees Ethernet as PCAP LinkType, which does not expect virtio-net header. Having virtio-net header in such PCAP file breaks PCAP unconsumable. Unfortunately currently there is no LinkType for virtio-net so for now strip virtio-net header to convert the output to Ethernet. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Replace "Supported NIC models" with "Available NIC models"Thomas Huth2023-02-171-1/+1
| | | | | | | | | | | Just because a NIC model is compiled into the QEMU binary does not necessary mean that it can be used with each and every machine. So let's rather talk about "available" models instead of "supported" models, just to avoid confusion. Reviewed-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Restore printing of the help text with "-nic help"Thomas Huth2023-02-171-2/+12
| | | | | | | | | | | | Running QEMU with "-nic help" used to work in QEMU 5.2 and earlier versions (it showed the available netdev backends), but this feature got broken during some refactoring in version 6.0. Let's restore the old behavior, and while we're at it, let's also print the available NIC models here now since this option can be used to configure both, netdev backend and model in one go. Fixes: ad6f932fe8 ("net: do not exit on "netdev_add help" monitor command") Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Move the code to collect available NIC models to a separate functionThomas Huth2023-02-171-0/+34
| | | | | | | | | The code that collects the available NIC models is not really specific to PCI anymore and will be required in the next patch, too, so let's move this into a new separate function in net.c instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: Move hmp_info_network() to net-hmp-cmds.cMarkus Armbruster2023-02-041-27/+1
| | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230124121946.1139465-17-armbru@redhat.com>
* qapi net: Elide redundant has_FOO in generated CMarkus Armbruster2022-12-141-13/+12
| | | | | | | | | | | | | | | | The has_FOO for pointer-valued FOO are redundant, except for arrays. They are also a nuisance to work with. Recent commit "qapi: Start to elide redundant has_FOO in generated C" provided the means to elide them step by step. This is the step for qapi/net.json. Said commit explains the transformation in more detail. The invariant violations mentioned there do not occur here. Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221104160712.3005652-19-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> [Fixes for MacOS squashed in]
* qapi: net: add stream and dgram netdevsLaurent Vivier2022-10-281-1/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Copied from socket netdev file and modified to use SocketAddress to be able to introduce new features like unix socket. "udp" and "mcast" are squashed into dgram netdev, multicast is detected according to the IP address type. "listen" and "connect" modes are managed by stream netdev. An optional parameter "server" defines the mode (off by default) The two new types need to be parsed the modern way with -netdev, because with the traditional way, the "type" field of netdev structure collides with the "type" field of SocketAddress and prevents the correct evaluation of the command line option. Moreover the traditional way doesn't allow to use the same type (SocketAddress) several times with the -netdev option (needed to specify "local" and "remote" addresses). The previous commit paved the way for parsing the modern way, but omitted one detail: how to pick modern vs. traditional, in netdev_is_modern(). We want to pick based on the value of parameter "type". But how to extract it from the option argument? Parsing the option argument, either the modern or the traditional way, extracts it for us, but only if parsing succeeds. If parsing fails, there is no good option. No matter which parser we pick, it'll be the wrong one for some arguments, and the error reporting will be confusing. Fortunately, the traditional parser accepts *anything* when called in a certain way. This maximizes our chance to extract the value of "type", and in turn minimizes the risk of confusing error reporting. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: introduce qemu_set_info_str() functionLaurent Vivier2022-10-281-5/+12
| | | | | | | | | | Embed the setting of info_str in a function. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
* qapi: net: introduce a way to bypass qemu_opts_parse_noisily()Laurent Vivier2022-10-281-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As qemu_opts_parse_noisily() flattens the QAPI structures ("type" field of Netdev structure can collides with "type" field of SocketAddress), we introduce a way to bypass qemu_opts_parse_noisily() and use directly visit_type_Netdev() to parse the backend parameters. More details from Markus: qemu_init() passes the argument of -netdev, -nic, and -net to net_client_parse(). net_client_parse() parses with qemu_opts_parse_noisily(), passing QemuOptsList qemu_netdev_opts for -netdev, qemu_nic_opts for -nic, and qemu_net_opts for -net. Their desc[] are all empty, which means any keys are accepted. The result of the parse (a QemuOpts) is stored in the QemuOptsList. Note that QemuOpts is flat by design. In some places, we layer non-flat on top using dotted keys convention, but not here. net_init_clients() iterates over the stored QemuOpts, and passes them to net_init_netdev(), net_param_nic(), or net_init_client(), respectively. These functions pass the QemuOpts to net_client_init(). They also do other things with the QemuOpts, which we can ignore here. net_client_init() uses the opts visitor to convert the (flat) QemOpts to a (non-flat) QAPI object Netdev. Netdev is also the argument of QMP command netdev_add. The opts visitor was an early attempt to support QAPI in (QemuOpts-based) CLI. It restricts QAPI types to a certain shape; see commit eb7ee2cbeb "qapi: introduce OptsVisitor". A more modern way to support QAPI is qobject_input_visitor_new_str(). It uses keyval_parse() instead of QemuOpts for KEY=VALUE,... syntax, and it also supports JSON syntax. The former isn't quite as expressive as JSON, but it's a lot closer than QemuOpts + opts visitor. This commit paves the way to use of the modern way instead. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: simplify net_client_parse() error managementLaurent Vivier2022-10-281-4/+2
| | | | | | | | | | | | | | All net_client_parse() callers exit in case of error. Move exit(1) to net_client_parse() and remove error checking from the callers. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: remove the @errp argument of net_client_inits()Laurent Vivier2022-10-281-13/+7
| | | | | | | | | | | | | The only caller passes &error_fatal, so use this directly in the function. It's what we do for -blockdev, -device, and -object. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: introduce convert_host_port()Laurent Vivier2022-10-281-30/+32
| | | | | | | | Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net: improve error message for missing netdev backendDaniel P. Berrangé2022-10-281-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current message when using '-net user...' with SLIRP disabled at compile time is: qemu-system-x86_64: -net user: Parameter 'type' expects a net backend type (maybe it is not compiled into this binary) An observation is that we're using the 'netdev->type' field here which is an enum value, produced after QAPI has converted from its string form. IOW, at this point in the code, we know that the user's specified type name was a valid network backend. The only possible scenario that can make the backend init function be NULL, is if support for that backend was disabled at build time. Given this, we don't need to caveat our error message with a 'maybe' hint, we can be totally explicit. The use of QERR_INVALID_PARAMETER_VALUE doesn't really lend itself to user friendly error message text. Since this is not used to set a specific QAPI error class, we can simply stop using this pre-formatted error text and provide something better. Thus the new message is: qemu-system-x86_64: -net user: network backend 'user' is not compiled into this binary The case of passing 'hubport' for -net is also given a message reminding people they should have used -netdev/-nic instead, as this backend type is only valid for the modern syntax. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* net/vmnet: add vmnet backends to qapi/netVladislav Yaroshchuk2022-05-171-0/+10
| | | | | | | | | | | | | Create separate netdevs for each vmnet operating mode: - vmnet-host - vmnet-shared - vmnet-bridged Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com> Tested-by: Akihiko Odaki <akihiko.odaki@gmail.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Vladislav Yaroshchuk <Vladislav.Yaroshchuk@jetbrains.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* Remove qemu-common.h include from most unitsMarc-André Lureau2022-04-061-1/+0
| | | | | | Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Replace config-time define HOST_WORDS_BIGENDIANMarc-André Lureau2022-04-061-2/+2
| | | | | | | | | | | | | | | | | | | Replace a config-time define with a compile time condition define (compatible with clang and gcc) that must be declared prior to its usage. This avoids having a global configure time define, but also prevents from bad usage, if the config header wasn't included before. This can help to make some code independent from qemu too. gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> [ For the s390x parts I'm involved in ] Acked-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-7-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* net: introduce control clientJason Wang2021-10-201-3/+21
| | | | | | | | | | | | This patch introduces a boolean for the device has control queue which can accepts control command via network queue. The first user would be the control virtqueue support for vhost. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20211020045600.16082-6-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>