summary refs log tree commit diff stats
path: root/hw/ppc/spapr_pci.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* spapr: add hotplug hooks for PHB hotplugGreg Kurz2019-02-261-15/+1
| | | | | | | | | | | | | | | | | | | Hotplugging PHBs is a machine-level operation, but PHBs reside on the main system bus, so we register spapr machine as the handler for the main system bus. Provide the usual pre-plug, plug and unplug-request handlers. Move the checking of the PHB index to the pre-plug handler. It is okay to do that and assert in the realize function because the pre-plug handler is always called, even for the oldest machine types we support. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> (Fixed interrupt controller phandle in "interrupt-map" and TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz) Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059672926.1466090.13612804072190051439.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: add ibm, my-drc-index property for PHB hotplugMichael Roth2019-02-261-0/+9
| | | | | | | | | | This is needed to denote a boot-time PHB as being hot-pluggable. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059672420.1466090.15147504040270659866.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: provide node start offset via spapr_populate_pci_dt()Michael Roth2019-02-261-1/+4
| | | | | | | | | | | | | | | PHB hotplug re-uses PHB device tree generation code and passes it to a guest via RTAS. Doing this requires knowledge of where exactly in the device tree the node describing the PHB begins. Provide this via a new optional pointer that can be used to store the PHB node's start offset. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059671912.1466090.10891589403973703473.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: add PHB unrealizeGreg Kurz2019-02-261-4/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support PHB hotplug we need to clean up lingering references, memory, child properties, etc. prior to the PHB object being finalized. Generally this will be called as a result of calling object_unparent() on the PHB object, which in turn would normally be called as the result of an unplug() operation. When the PHB is finalized, child objects will be unparented in turn, and finalized if the PHB was the only reference holder. so we don't bother to explicitly unparent child objects of the PHB, with the notable exception of DRCs. This is needed to avoid a QEMU crash when unplugging a PHB and resetting the machine before the guest could handle the event. The DRCs are removed from the QOM tree by pci_unregister_root_bus() and we must make sure we're not leaving stale aliases under the global /dr-connector path. The formula that gives the number of DMA windows is moved to an inline function in the hw/pci-host/spapr.h header because it will have other users. The unrealize function is able to cope with partially realized PHBs. It is hence used to implement proper rollback on the realize error path. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <155059669881.1466090.13515030705986041517.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr/drc: Drop spapr_drc_attach() fdt argumentGreg Kurz2019-02-261-1/+1
| | | | | | | | | | | All DRC subtypes have been converted to generate the FDT fragment at configure connector time instead of attach time. The fdt and fdt_offset arguments of spapr_drc_attach() aren't needed anymore. Drop them and make the implementation of the dt_populate() method mandatory. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059667853.1466090.16527852453054217565.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr/pci: Generate FDT fragment at configure connector timeGreg Kurz2019-02-261-7/+12
| | | | | | Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059667346.1466090.326696113231137772.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* qdev: pass an Object * to qbus_set_hotplug_handler()Michael Roth2019-02-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Certain devices types, like memory/CPU, are now being handled using a hotplug interface provided by a top-level MachineClass. Hotpluggable host bridges are another such device where it makes sense to use a machine-level hotplug handler. However, unlike those devices, host-bridges have a parent bus (the main system bus), and devices with a parent bus use a different mechanism for registering their hotplug handlers: qbus_set_hotplug_handler(). This interface currently expects a handler to be a subclass of DeviceClass, but this is not the case for MachineClass, which derives directly from ObjectClass. Internally, the interface only requires an ObjectClass, so expose that in qbus_set_hotplug_handler(). Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <154999589921.690774.3640149277362188566.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: Fix interrupt leak in rtas_ibm_change_msi() error pathGreg Kurz2019-02-171-0/+6
| | | | | | | | | Now that IRQ allocation has been split in two (first allocate IRQ numbers, then claim them), if the claiming fails, we must release the IRQs. Fixes: 4fe75a8ccd80 "spapr: split the IRQ allocation sequence" Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Rename xics to intc in interrupt controller agnostic codeGreg Kurz2019-02-171-3/+3
| | | | | | | | All this code is used with both the XICS and XIVE interrupt controllers. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: Fix endianness in assigned-addresses propertyAlexey Kardashevskiy2019-02-041-1/+1
| | | | | | | | | | | | | | | | reg->phys_hi and assigned->phys_hi are big endian but we do an extra byteswap anyway when copying reg->phys_hi to assigned->phys_hi. To make things slightly more messy, we also add a relocatable bit (b_n()) although in the right endianness. This fixes endianness of assigned->phys_hi. This is unlikely to produce any visible difference though as we should end up there only in the case of PCI hotplug and even then I am not sure if (d->io_regions[i].addr == PCI_BAR_UNMAPPED) == true. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr/pci: Fix primary bus number for PCI bridgesDavid Hildenbrand2019-02-041-4/+1
| | | | | | | | | | | | | | | | | | | | | | | While looking at the s390x implementation, looks like spapr has a similar BUG when building the topology. The primary bus number corresponds always to the bus number of the bus the bridge is attached to. Right now, if we have two bridges attached to the same bus (e.g. root bus) this is however not the case. The first bridge will have primary bus 0, the second bridge primary bus 1, which is wrong. Fix the assignment. While at it, drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff. Setting it temporarily to that value (as discussed e.g. in [1]), is only relevant for a running system that probes the buses. The value is effectively unused for us just doing a DFS. [1] http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: move spapr_create_phb() to core machine codeGreg Kurz2019-01-091-11/+0
| | | | | | | | | | This function is only used when creating the default PHB. Let's rename it and move it to the core machine code for clarity. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: perform unplug via the hotplug handlerDavid Hildenbrand2018-12-201-12/+21
| | | | | | | | | | | | | | | | | | Introduce and use the "unplug" callback. This is a preparation for multi-stage hotplug handlers, whereby the bus hotplug handler is overwritten by the machine hotplug handler. This handler will then pass control to the bus hotplug handler. So to get this running cleanly, we also have to make sure to go via the hotplug handler chain when actually unplugging a device after an unplug request. Lookup the hotplug handler and call "unplug". Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* spapr_pci: convert g_malloc() to g_new()Greg Kurz2018-11-081-1/+1
| | | | | | | | | | | | When allocating an array, it is a recommended coding practice to call g_new(FooType, n) instead of g_malloc(n * sizeof(FooType)) because it takes care to avoid overflow when calculating the size of the allocated block and it returns FooType *, which allows the compiler to perform type checking. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* error: Fix use of error_prepend() with &error_fatal, &error_abortMarkus Armbruster2018-10-191-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | From include/qapi/error.h: * Pass an existing error to the caller with the message modified: * error_propagate(errp, err); * error_prepend(errp, "Could not frobnicate '%s': ", name); Fei Li pointed out that doing error_propagate() first doesn't work well when @errp is &error_fatal or &error_abort: the error_prepend() is never reached. Since I doubt fixing the documentation will stop people from getting it wrong, introduce error_propagate_prepend(), in the hope that it lures people away from using its constituents in the wrong order. Update the instructions in error.h accordingly. Convert existing error_prepend() next to error_propagate to error_propagate_prepend(). If any of these get reached with &error_fatal or &error_abort, the error messages improve. I didn't check whether that's the case anywhere. Cc: Fei Li <fli@suse.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-2-armbru@redhat.com>
* spapr_pci: add an extra 'nr_msis' argument to spapr_populate_pci_dtCédric Le Goater2018-09-251-6/+3
| | | | | | | | | So that we don't have to call qdev_get_machine() to get the machine class and the sPAPRIrq backend holding the number of MSIs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: introduce a spapr_irq class 'nr_msis' attributeCédric Le Goater2018-09-251-2/+3
| | | | | | | | | | | | | | | | | | | | The number of MSI interrupts a sPAPR machine can allocate is in direct relation with the number of interrupts of the sPAPRIrq backend. Define statically this value at the sPAPRIrq class level and use it for the "ibm,pe-total-#msi" property of the sPAPR PHB. According to the PAPR specs, "ibm,pe-total-#msi" defines the maximum number of MSIs that are available to the PE. We choose to advertise the maximum number of MSIs that are available to the machine for simplicity of the model and to avoid segmenting the MSI interrupt pool which can be easily shared. If the pool limit is reached, it can be extended dynamically. Finally, remove XICS_IRQS_SPAPR which is now unused. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: fix potential NULL pointer dereferenceGreg Kurz2018-08-281-1/+1
| | | | | | | | | | | | | | | | | | Commit 2c88b098e76fd added a call to SPAPR_MACHINE_GET_CLASS(spapr) in spapr_phb_realize() before we check spapr isn't NULL. This causes QEMU to crash when starting a non-pseries machine with a sPAPR PHB. This could be fixed by setting the smc variable after the null check, but it seems more explicit to use a ternary operator to skip the call to SPAPR_MACHINE_GET_CLASS() if spapr is NULL, since spapr_phb_realize() will return immediately in this case. This was reported by Coverity (CID 1395170 and 1395183). Fixes: 2c88b098e76fde0c7fcc0476dd3f80ce58409505 Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: factorize the use of SPAPR_MACHINE_GET_CLASS()Cédric Le Goater2018-08-211-5/+6
| | | | | | | | It should save us some CPU cycles as these routines perform a lot of checks. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: introduce a fixed IRQ number spaceCédric Le Goater2018-08-211-7/+22
| | | | | | | | | | | | | | | | | | This proposal introduces a new IRQ number space layout using static numbers for all devices, depending on a device index, and a bitmap allocator for the MSI IRQ numbers which are negotiated by the guest at runtime. As the VIO device model does not have a device index but a "reg" property, we introduce a formula to compute an IRQ number from a "reg" value. It should minimize most of the collisions. The previous layout is kept in pre-3.1 machines raising the 'legacy_irq_allocation' machine class flag. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: split the IRQ allocation sequenceCédric Le Goater2018-06-211-3/+20
| | | | | | | | | | | | | Today, when a device requests for IRQ number in a sPAPR machine, the spapr_irq_alloc() routine first scans the ICSState status array to find an empty slot and then performs the assignement of the selected numbers. Split this sequence in two distinct routines : spapr_irq_find() for lookups and spapr_irq_claim() for claiming the IRQ numbers. This will ease the introduction of a static layout of IRQ numbers. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: Remove unhelpful pagesize warningDavid Gibson2018-06-121-7/+0
| | | | | | | | | | | | | | | | | | | | By default, the IOMMU model built into the spapr virtual PCI host bridge supports 4kiB and 64kiB IOMMU page sizes. However this can be overridden which may be desirable to allow larger IOMMU page sizes when running a guest with hugepage backing and passthrough devices. For that reason a warning was printed when the device wasn't configured to allow the pagesize with which guest RAM is backed. Experience has proven, however, that this message is more confusing than useful. Worse it sometimes makes little sense when the host-available page sizes don't match those available on the guest, which can happen with a POWER8 guest running on a POWER9 KVM host. Long term we do want better handling to allow large IOMMU page sizes to be used, but for now this parameter and warning don't really accomplish it. So, remove the message, pending a better solution. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: fix MSI/MSIX selectionGreg Kurz2018-01-291-19/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In various place we don't correctly check if the device supports MSI or MSI-X. This can cause devices to be advertised with MSI support, even if they only support MSI-X (like virtio-pci-* devices for example): ethernet@0 { ibm,req#msi = <0x1>; <--- wrong! . ibm,loc-code = "qemu_virtio-net-pci:0000:00:00.0"; . ibm,req#msi-x = <0x3>; }; Worse, this can also cause the "ibm,change-msi" RTAS call to corrupt the PCI status and cause migration to fail: qemu-system-ppc64: get_pci_config_device: Bad config data: i=0x6 read: 0 device: 10 cmask: 10 wmask: 0 w1cmask:0 ^^ PCI_STATUS_CAP_LIST bit which is assumed to be constant This patch changes spapr_populate_pci_child_dt() to properly check for MSI support using msi_present(): this ensures that PCIDevice::msi_cap was set by msi_init() and that msi_nr_vectors_allocated() will look at the right place in the config space. Checking PCIDevice::msix_entries_nr is enough for MSI-X but let's add a call to msix_present() there as well for consistency. It also changes rtas_ibm_change_msi() to select the appropriate MSI type in Function 1 instead of always selecting plain MSI. This new behaviour is compliant with LoPAPR 1.1, as described in "Table 71. ibm,change-msi Argument Call Buffer": Function 1: If Number Outputs is equal to 3, request to set to a new number of MSIs (including set to 0). If the “ibm,change-msix-capable” property exists and Number Outputs is equal to 4, request is to set to a new number of MSI or MSI-X (platform choice) interrupts (including set to 0). Since MSI is the the platform default (LoPAPR 6.2.3 MSI Option), let's check for MSI support first. And finally, it checks the input parameters are valid, as described in LoPAPR 1.1 "R1–7.3.10.5.1–3": For the MSI option: The platform must return a Status of -3 (Parameter error) from ibm,change-msi, with no change in interrupt assignments if the PCI configuration address does not support MSI and Function 3 was requested (that is, the “ibm,req#msi” property must exist for the PCI configuration address in order to use Function 3), or does not support MSI-X and Function 4 is requested (that is, the “ibm,req#msi-x” property must exist for the PCI configuration address in order to use Function 4), or if neither MSIs nor MSI-Xs are supported and Function 1 is requested. This ensures that the ret_intr_type variable contains a valid MSI type for this device, and that spapr_msi_setmsg() won't corrupt the PCI status. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Merge remote-tracking branch 'origin/master' into HEADMichael S. Tsirkin2018-01-111-10/+9
|\ | | | | | | | | | | Resolve conflicts around apb. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * spapr_pci: use warn_report()Greg Kurz2018-01-101-3/+3
| | | | | | | | | | | | | | These two are definitely warnings. Let's use the appropriate API. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * spapr: fix LSI interrupt specifiers in the device treeGreg Kurz2017-12-151-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LoPAPR 1.1 B.6.9.1.2 describes the "#interrupt-cells" property of the PowerPC External Interrupt Source Controller node as follows: “#interrupt-cells” Standard property name to define the number of cells in an interrupt- specifier within an interrupt domain. prop-encoded-array: An integer, encoded as with encode-int, that denotes the number of cells required to represent an interrupt specifier in its child nodes. The value of this property for the PowerPC External Interrupt option shall be 2. Thus all interrupt specifiers (as used in the standard “interrupts” property) shall consist of two cells, each containing an integer encoded as with encode-int. The first integer represents the interrupt number the second integer is the trigger code: 0 for edge triggered, 1 for level triggered. This patch fixes the interrupt specifiers in the "interrupt-map" property of the PHB node, that were setting the second cell to 8 (confusion with IRQ_TYPE_LEVEL_LOW ?) instead of 1. VIO devices and RTAS event sources use the same format for interrupt specifiers: while here, we introduce a common helper to handle the encoding details. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> -- v3: - reference public LoPAPR instead of internal PAPR+ in changelog - change helper name to spapr_dt_xics_irq() v2: - drop the erroneous changes to the "interrupts" prop in PCI device nodes - introduce a common helper to encode interrupt specifiers Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * spapr: introduce a spapr_qirq() helperCédric Le Goater2017-12-151-1/+1
| | | | | | | | | | | | | | | | | | | | xics_get_qirq() is only used by the sPAPR machine. Let's move it there and change its name to reflect its scope. It will be useful for XIVE support which will use its own set of qirqs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * spapr: move the IRQ allocation routines under the machineCédric Le Goater2017-12-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | Also change the prototype to use a sPAPRMachineState and prefix them with spapr_irq_. It will let us synchronise the IRQ allocation with the XIVE interrupt mode when available. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* | pci: Eliminate redundant PCIDevice::bus pointerDavid Gibson2017-12-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bus pointer in PCIDevice is basically redundant with QOM information. It's always initialized to the qdev_get_parent_bus(), the only difference is the type. Therefore this patch eliminates the field, instead creating a pci_get_bus() helper to do the type mangling to derive it conveniently from the QOM Device object underneath. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* | pci: Rename root bus initialization functions for clarityDavid Gibson2017-12-051-4/+4
|/ | | | | | | | | | | | | | | | | | pci_bus_init(), pci_bus_new_inplace(), pci_bus_new() and pci_register_bus() are misleadingly named. They're not used for initializing *any* PCI bus, but only for a root PCI bus. Non-root buses - i.e. ones under a logical PCI to PCI bridge - are instead created with a direct qbus_create_inplace() (see pci_bridge_initfn()). This patch renames the functions to make it clear they're only used for a root bus. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* spapr_pci: fail gracefully with non-pseries machine typesGreg Kurz2017-10-171-1/+11
| | | | | | | | | | | | | | | | | | | QEMU currently crashes when the user tries to add an spapr-pci-host-bridge on a non-pseries machine: $ qemu-system-ppc64 -M ppce500 -device spapr-pci-host-bridge,index=1 hw/ppc/spapr_pci.c:1535:spapr_phb_realize: Object 0x1003dacae60 is not an instance of type spapr-machine Aborted (core dumped) The same thing happens with the deprecated but still available child type spapr-pci-vfio-host-bridge. Fix both by checking the machine type with object_dynamic_cast(). Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Merge remote-tracking branch ↵Peter Maydell2017-09-271-2/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'remotes/dgilbert/tags/pull-migration-20170927a' into staging Migration pull 2017-09-27 # gpg: Signature made Wed 27 Sep 2017 14:56:23 BST # gpg: using RSA key 0x0516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20170927a: migration: Route more error paths migration: Route errors up through vmstate_save migration: wire vmstate_save_state errors up to vmstate_subsection_save migration: Check field save returns migration: check pre_save return in vmstate_save_state migration: pre_save return int migration: disable auto-converge during bulk block migration Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * migration: pre_save return intDr. David Alan Gilbert2017-09-271-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the pre_save method on VMStateDescription to return an int rather than void so that it potentially can fail. Changed zillions of devices to make them return 0; the only case I've made it return non-0 is hw/intc/s390_flic_kvm.c that already had an error_report/return case. Note: If you add an error exit in your pre_save you must emit an error_report to say why. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170925112917.21340-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* | spapr_pci: make index property mandatoryGreg Kurz2017-09-271-54/+8
|/ | | | | | | | | | | | | | | | | | | | | | | | | | PHBs can be created with an index property, in which case the machine code automatically sets all the MMIO windows at addresses derived from the index. Alternatively, they can be manually created without index, but the user has to provide addresses for all MMIO windows. The non-index way happens to be more trouble than it's worth: it's difficult to use, keeps requiring (potentially incompatible) changes when some new parameter needs adding, and is awkward to check for collisions. It currently even has a bug that prevents to use two non-index PHBs because their child DRCs are all derived from the same index == -1 value, and, thus, collide. This patch hence makes the index property mandatory. As a consequence, the PHB's memory regions and BUID are now always configured according to the index, and it is no longer possible to set them from the command line. This DOES BREAK backwards compat, but we don't think the non-index PHB feature was used in practice (at least libvirt doesn't) and the simplification is worth it. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: don't create 64-bit MMIO window if we don't need toGreg Kurz2017-09-151-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running a pseries-2.2 or older machine type, we get the following lines in info mtree: address-space: memory ... ffffffffffffffff-ffffffffffffffff (prio 0, i/o): alias pci@800000020000000.mmio64-alias @pci@800000020000000.mmio ffffffffffffffff-ffffffffffffffff address-space: cpu-memory ... ffffffffffffffff-ffffffffffffffff (prio 0, i/o): alias pci@800000020000000.mmio64-alias @pci@800000020000000.mmio ffffffffffffffff-ffffffffffffffff The same thing occurs when running a pseries-2.7 with -global spapr-pci-host-bridge.mem_win_size=2147483648 This happens because we always create a 64-bit MMIO window, even if we didn't explicitely requested it (ie, mem64_win_size == 0) and the 32-bit window is below 2GiB. It doesn't seem to have an impact on the guest though because spapr_populate_pci_dt() doesn't advertise the bogus windows when mem64_win_size == 0. Since these memory regions don't induce any state, we can safely choose to not create them when their address is equal to -1, without breaking migration from existing setups. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: convert sprintf() to g_strdup_printf()Greg Kurz2017-09-151-9/+12
| | | | | | | In order to follow a QEMU common practice. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: handle FDT creation errors with _FDT()Greg Kurz2017-09-151-23/+7
| | | | | | | | | | | | | | | | libfdt failures when creating the FDT should cause QEMU to terminate. Let's use the _FDT() macro which does just that instead of propagating the error to the caller. spapr_populate_pci_child_dt() no longer needs to return a value in this case. Note that, on the way, this get rids of the following nonsensical lines: g_assert(!ret); if (ret) { Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: use the common _FDT() helperGreg Kurz2017-09-151-9/+1
| | | | | | | | | | All other users in hw/ppc already consider an error when building the FDT to be fatal, even on hotplug paths. There's no valid reason for spapr_pci to behave differently. So let's used the common _FDT() helper which terminates QEMU when libfdt fails. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: use g_strdup_printf()Greg Kurz2017-09-151-9/+10
| | | | | | | | Building strings with g_strdup_printf() instead of snprintf() is a QEMU common practice. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: drop useless check in spapr_populate_pci_child_dt()Greg Kurz2017-09-151-5/+1
| | | | | | | | | | spapr_phb_get_loc_code() either returns a non-null pointer, or aborts if g_strdup_printf() failed to allocate memory. Signed-off-by: Greg Kurz <groug@kaod.org> [dwg: Grammatical fix to commit message] Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: drop useless check in spapr_phb_vfio_get_loc_code()Greg Kurz2017-09-151-2/+2
| | | | | | | | | | g_strdup_printf() either returns a non-null pointer, or aborts if it failed to allocate memory. Signed-off-by: Greg Kurz <groug@kaod.org> [dwg: Grammatical fix to commit message] Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: parent the MSI memory region to the PHBGreg Kurz2017-09-081-1/+1
| | | | | | | | | This memory region should be owned by the PHB. This ensures the PHB cannot be finalized as long as the the region is guest visible, or used by a CPU or a device. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: use memory_region_add_subregion() with DMA windowsGreg Kurz2017-09-081-2/+2
| | | | | | | | | Passing a null priority to memory_region_add_subregion_overlap() is strictly equivalent to calling memory_region_add_subregion(). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: Fix obsolete comment about MSIX encoding in addr/dataAlexey Kardashevskiy2017-07-251-3/+1
| | | | | | | | f1c2dc7c866a "spapr-pci: rework MSI/MSIX" (07/2013) changed MSIX encoding but forgot to change the comment so this changes it. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Cleanups relating to DRC awaiting_release fieldDavid Gibson2017-07-171-4/+2
| | | | | | | | | | | | | | | | | | | | | 'awaiting_release' indicates that the host has requested an unplug of the device attached to the DRC, but the guest has not (yet) put the device into a state where it is safe to complete removal. 1. Rename it to 'unplug_requested' which to me at least is clearer 2. Remove the ->release_pending() method used to check this from outside spapr_drc.c. The method only plausibly has one implementation, so use a plain function (spapr_drc_unplug_requested()) instead. 3. Remove it from the migration stream. Attempting to migrate mid-unplug is broken not just for spapr - in general management has no good way to determine if the device should be present on the destination or not. So, until that's fixed, there's no point adding extra things to the stream. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
* spapr: Refactor spapr_drc_detach()David Gibson2017-07-171-6/+1
| | | | | | | | | | | | This function has two unused parameters - remove them. It also sets awaiting_release on all paths, except one. On that path setting it is harmless, since it will be immediately cleared by spapr_drc_release(). So factor it out of the if statements. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
* spapr: Treat devices added before inbound migration as coldpluggedLaurent Vivier2017-07-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When migrating a guest which has already had devices hotplugged, libvirt typically starts the destination qemu with -incoming defer, adds those hotplugged devices with qmp, then initiates the incoming migration. This causes problems for the management of spapr DRC state. Because the device is treated as hotplugged, it goes into a DRC state for a device immediately after it's plugged, but before the guest has acknowledged its presence. However, chances are the guest on the source machine *has* acknowledged the device's presence and configured it. If the source has fully configured the device, then DRC state won't be sent in the migration stream: for maximum migration compatibility with earlier versions we don't migrate DRCs in coldplug-equivalent state. That means that the DRC effectively changes state over the migrate, causing problems later on. In addition, logging hotplug events for these devices isn't what we want because a) those events should already have been issued on the source host and b) the event queue should get wiped out by the incoming state anyway. In short, what we really want is to treat devices added before an incoming migration as if they were coldplugged. To do this, we first add a spapr_drc_hotplugged() helper which determines if the device is hotplugged in the sense relevant for DRC state management. We only send hotplug events when this is true. Second, when we add a device which isn't hotplugged in this sense, we force a reset of the DRC state - this ensures the DRC is in a coldplug-equivalent state (there isn't usually a system reset between these device adds and the incoming migration). This is based on an earlier patch by Laurent Vivier, cleaned up and extended. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
* spapr: Only report host/guest IOMMU page size mismatches on KVMDavid Gibson2017-07-111-1/+2
| | | | | | | | | | | | | | | We print a warning if the spapr IOMMU isn't configured to support a page size matching the host page size backing RAM. When that's the case we need more complex logic to translate VFIO mappings, which is slower. But, it's not so slow that it would be at all noticeable against the general slowness of TCG. So, only warn when using KVM. This removes some noisy and unhelpful warnings from make check on hosts with page sizes which typically differ from those on POWER (e.g. Sparc). Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com>
* spapr: Use unplug_request for PCI hot unplugDavid Gibson2017-07-111-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | AIUI, ->unplug_request in the HotplugHandler is used for "soft" unplug, where acknowledgement from the guest is required before completing the unplug, whereas ->unplug is used for "hard" unplug where qemu unilaterally removes the device, and the guest just has to cope with its sudden absence. For spapr we (correctly) use ->unplug_request for CPU and memory hot unplug but we use ->unplug for PCI. While I think it might be possible to support "hard" PCI unplug within the PAPR model, that's not how it actually works now. Although it's called from ->unplug, the PCI unplug path will usually just mark the device for removal, with completion of the unplug delayed until userspace responds to the unplug notification. If the guest doesn't respond as expected, that could delay the unplug completion arbitrarily long. To reflect that, change the PCI unplug path to be called from ->unplug_request. We also rename spapr_phb_hot_plug_child() and spapr_phb_hot_unplug_child() to spapr_pci_plug() and spapr_pci_unplug_request() to more obviously reflect the callbacks they're implementing. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>
* spapr: Remove unnecessary differences between hotplug and coldplug pathsDavid Gibson2017-07-111-2/+1
| | | | | | | | | | | | | spapr_drc_attach() has a 'coldplug' parameter which sets the DRC into configured state initially, instead of the usual ISOLATED/UNUSABLE state. It turns out this is unnecessary: although coldplugged devices do need to be in CONFIGURED state once the guest starts, that will already be accomplished by the reset code which will move DRCs for already plugged devices into a coldplug equivalent state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>