summary refs log tree commit diff stats
path: root/include/hw/ppc/spapr.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)Alexey Kardashevskiy2016-07-051-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for Dynamic DMA Windows (DDW) option defined by the SPAPR specification which allows to have additional DMA window(s) The "ddw" property is enabled by default on a PHB but for compatibility the pseries-2.6 machine and older disable it. This also creates a single DMA window for the older machines to maintain backward migration. This implements DDW for PHB with emulated and VFIO devices. The host kernel support is required. The advertised IOMMU page sizes are 4K and 64K; 16M pages are supported but not advertised by default, in order to enable them, the user has to specify "pgsz" property for PHB and enable huge pages for RAM. The existing linux guests try creating one additional huge DMA window with 64K or 16MB pages and map the entire guest RAM to. If succeeded, the guest switches to dma_direct_ops and never calls TCE hypercalls (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM and not waste time on map/unmap later. This adds a "dma64_win_addr" property which is a bus address for the 64bit window and by default set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware uses and this allows having emulated and VFIO devices on the same bus. This adds 4 RTAS handlers: * ibm,query-pe-dma-window * ibm,create-pe-dma-window * ibm,remove-pe-dma-window * ibm,reset-pe-dma-window These are registered from type_init() callback. These RTAS handlers are implemented in a separate file to avoid polluting spapr_iommu.c with PCI. This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs and updates all references to dma_liobn. However this does not add 64bit LIOBN to the migration stream as in fact even 32bit LIOBN is rather pointless there (as it is a PHB property and the management software can/should pass LIOBNs via CLI) but we keep it for the backward migration support. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* ppc/xics: Replace "icp" with "xics" in most placesBenjamin Herrenschmidt2016-07-011-1/+1
| | | | | | | | | | | | | | | | The "ICP" is a different object than the "XICS". For historical reasons, we have a number of places where we name a variable "icp" while it contains a XICSState pointer. There *is* an ICPState structure too so this makes the code really confusing. This is a mechanical replacement of all those instances to use the name "xics" instead. There should be no functional change. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [spapr_cpu_init has been moved to spapr_cpu_core.c, change there] Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: CPU hotplug supportBharata B Rao2016-06-171-0/+2
| | | | | | | | | | Set up device tree entries for the hotplugged CPU core and use the exising RTAS event logging infrastructure to send CPU hotplug notification to the guest. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: convert boot CPUs into CPU core devicesBharata B Rao2016-06-171-0/+2
| | | | | | | | | | | | | | | | | | | Introduce sPAPRMachineClass.dr_cpu_enabled to indicate support for CPU core hotplug. Initialize boot time CPUs as core deivces and prevent topologies that result in partially filled cores. Both of these are done only if CPU core hotplug is supported. Note: An unrelated change in the call to xics_system_init() is done in this patch as it makes sense to use the local variable smt introduced in this patch instead of kvmppc_smt_threads() call here. TODO: We derive sPAPR core type by looking at -cpu <model>. However we don't take care of "compat=" feature yet for boot time as well as hotplug CPUs. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Move spapr_cpu_init() to spapr_cpu_core.cBharata B Rao2016-06-171-0/+2
| | | | | | | | | | | | | Start consolidating CPU init related routines in spapr_cpu_core.c. As part of this, move spapr_cpu_init() and its dependencies from spapr.c to spapr_cpu_core.c No functionality change in this patch. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> [dwg: Rename TIMEBASE_FREQ to SPAPR_TIMEBASE_FREQ, since it's now in a public(ish) header] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Abstract CPU core device and type specific core devicesBharata B Rao2016-06-171-0/+1
| | | | | | | | | | | | | Add sPAPR specific abastract CPU core device that is based on generic CPU core device. Use this as base type to create sPAPR CPU specific core devices. TODO: - Add core types for other remaining CPU types - Handle CPU model alias correctly Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Ensure all LMBs are represented in ibm,dynamic-memoryBharata B Rao2016-06-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory hotplug can fail for some combinations of RAM and maxmem when DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends on maximum addressable memory returned by guest and this value is currently being calculated wrongly by the guest kernel routine memory_hotplug_max(). While there is an attempt to fix the guest kernel, this patch works around the problem within QEMU itself. memory_hotplug_max() routine in the guest kernel arrives at max addressable memory by multiplying lmb-size with the lmb-count obtained from ibm,dynamic-memory property. There are two assumptions here: - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM where only hot-pluggable LMBs are present in this property. - The memory area comprising of RAM and hotplug region is contiguous: This needn't be true always for PowerKVM as there can be gap between boot time RAM and hotplug region. To work around this guest kernel bug, ensure that ibm,dynamic-memory has information about all the LMBs (RMA, boot-time LMBs, future hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and hotpluggable region). RMA is represented separately by memory@0 node. Hence mark RMA LMBs and also the LMBs for the gap b/n RAM and hotpluggable region as reserved and as having no valid DRC so that these LMBs are not considered by the guest. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_iommu: Add root memory regionAlexey Kardashevskiy2016-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | We are going to have multiple DMA windows at different offsets on a PCI bus. For the sake of migration, we will have as many TCE table objects pre-created as many windows supported. So we need a way to map windows dynamically onto a PCI bus when migration of a table is completed but at this stage a TCE table object does not have access to a PHB to ask it to map a DMA window backed by just migrated TCE table. This adds a "root" memory region (UINT64_MAX long) to the TCE object. This new region is mapped on a PCI bus with enabled overlapping as there will be one root MR per TCE table, each of them mapped at 0. The actual IOMMU memory region is a subregion of the root region and a TCE table enables/disables this subregion and maps it at the specific offset inside the root MR which is 1:1 mapping of a PCI address space. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_iommu: Migrate full stateAlexey Kardashevskiy2016-06-071-0/+3
| | | | | | | | | | | | | | The source guest could have reallocated the default TCE table and migrate bigger/smaller table. This adds reallocation in post_load() if the default table size is different on source and destination. This adds @bus_offset, @page_shift to the migration stream as a subsection so when DDW is added, migration to older machines will still be possible. As @bus_offset and @page_shift are not used yet, this makes no change in behavior. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_iommu: Introduce "enabled" state for TCE tableAlexey Kardashevskiy2016-06-071-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently TCE tables are created once at start and their sizes never change. We are going to change that by introducing a Dynamic DMA windows support where DMA configuration may change during the guest execution. This changes spapr_tce_new_table() to create an empty zero-size IOMMU memory region (IOMMU MR). Only LIOBN is assigned by the time of creation. It still will be called once at the owner object (VIO or PHB) creation. This introduces an "enabled" state for TCE table objects, some helper functions are added: - spapr_tce_table_enable() receives TCE table parameters, stores in sPAPRTCETable and allocates a guest view of the TCE table (in the user space or KVM) and sets the correct size on the IOMMU MR; - spapr_tce_table_disable() disposes the table and resets the IOMMU MR size; it is made public as the following DDW code will be using it. This changes the PHB reset handler to do the default DMA initialization instead of spapr_phb_realize(). This does not make differenct now but later with more than just one DMA window, we will have to remove them all and create the default one on a system reset. No visible change in behaviour is expected except the actual table will be reallocated every reset. We might optimize this later. The other way to implement this would be dynamically create/remove the TCE table QOM objects but this would make migration impossible as the migration code expects all QOM objects to exist at the receiver so we have to have TCE table objects created when migration begins. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* ppc: Rework POWER7 & POWER8 exception modelCédric Le Goater2016-04-051-5/+0
| | | | | | | | | | | | | | | | | | From: Benjamin Herrenschmidt <benh@kernel.crashing.org> This patch fixes the current AIL implementation for POWER8. The interrupt vector address can be calculated directly from LPCR when the exception is handled. The excp_prefix update becomes useless and we can cleanup the H_SET_MODE hcall. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: Removed LPES0/1 handling for HV vs. !HV Fixed LPCR_ILE case for POWERPC_EXCP_POWER8 ] Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> [dwg: This was written as a cleanup, but it also fixes a real bug where setting an alternative interrupt location would not be correctly migrated] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* pseries: Simplify handling of the hash page table fdDavid Gibson2016-02-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | When migrating the 'pseries' machine type with KVM, we use a special fd to access the hash page table stored within KVM. Usually, this fd is opened at the beginning of migration, and kept open until the migration is complete. However, if there is a guest reset during the migration, the fd can become stale and we need to re-open it. At the moment we use an 'htab_fd_stale' flag in sPAPRMachineState to signal this, which is checked in the migration iterators. But that's rather ugly. It's simpler to just close and invalidate the fd on reset, and lazily re-open it in migration if necessary. This patch implements that change. This requires a small addition to the machine state's instance_init, so that htab_fd is initialized to -1 (telling the migration code it needs to open it) instead of 0, which could be a valid fd. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
* spapr: Remove rtas_st_buffer_direct()David Gibson2016-01-301-8/+0
| | | | | | | | | rtas_st_buffer_direct() is a not particularly useful wrapper around cpu_physical_memory_write(). All the callers are in rtas_ibm_configure_connector, where it's better handled by local helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
* spapr: Small fixes to rtas_ibm_get_system_parameter, remove rtas_st_bufferDavid Gibson2016-01-301-19/+9
| | | | | | | | | | | | | | | | | | rtas_st_buffer() appears in spapr.h as though it were a widely used helper, but in fact it is only used for saving data in a format used by rtas_ibm_get_system_parameter(). This changes it to a local helper more specifically for that function. While we're there fix a couple of small defects in rtas_ibm_get_system_parameter: - For the string value SPLPAR_CHARACTERISTICS, it wasn't including the terminating \0 in the length which it should according to LoPAPR 7.3.16.1 - It now checks that the supplied buffer has at least enough space for the length of the returned data, and returns an error if it does not. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
* hw/ppc/spapr: Use XHCI as host controller for new spapr machinesThomas Huth2016-01-111-1/+2
| | | | | | | | The OHCI has some bugs and performance issues, so for newer machines it's preferable to use XHCI instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_iommu: Provide a function to switch a TCE table to allowing VFIODavid Gibson2015-10-231-0/+2
| | | | | | | | | | | | | | | | | Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not all TCE tables (guest IOMMU contexts) can support VFIO devices. Currently, this is decided at creation time. To support hotplug of VFIO devices, we need to allow a TCE table which previously didn't allow VFIO devices to be switched so that it can. This patch adds an spapr_tce_set_need_vfio() function to do this, by reallocating the table in userspace if necessary. Currently this doesn't allow the KVM acceleration to be re-enabled if all the VFIO devices are removed. That's an optimization for another time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
* spapr_iommu: Rename vfio_accel parameterDavid Gibson2015-10-231-2/+2
| | | | | | | | | | | | | | | | | | The vfio_accel parameter used when creating a new TCE table (guest IOMMU context) has a confusing name. What it really means is whether we need the TCE table created to be able to support VFIO devices. VFIO is relevant, because when available we use in-kernel acceleration of the TCE table, but that may not work with VFIO devices because updates to the table are handled in kernel, bypass qemu and so don't hit qemu's infrastructure for keeping the VFIO host IOMMU state in sync with the guest IOMMU state. Rename the parameter to "need_vfio" throughout. This is a cosmetic change, with no impact on the logic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
* ppc/spapr: Implement H_RANDOM hypercall in QEMUThomas Huth2015-09-231-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PAPR interface defines a hypercall to pass high-quality hardware generated random numbers to guests. Recent kernels can already provide this hypercall to the guest if the right hardware random number generator is available. But in case the user wants to use another source like EGD, or QEMU is running with an older kernel, we should also have this call in QEMU, so that guests that do not support virtio-rng yet can get good random numbers, too. This patch now adds a new pseudo-device to QEMU that either directly provides this hypercall to the guest or is able to enable the in-kernel hypercall if available. The in-kernel hypercall can be enabled with the use-kvm property, e.g.: qemu-system-ppc64 -device spapr-rng,use-kvm=true For handling the hypercall in QEMU instead, a "RngBackend" is required since the hypercall should provide "good" random data instead of pseudo-random (like from a "simple" library function like rand() or g_random_int()). Since there are multiple RngBackends available, the user must select an appropriate back-end via the "rng" property of the device, e.g.: qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=gid0 \ -device spapr-rng,rng=gid0 ... See http://wiki.qemu-project.org/Features-Done/VirtIORNG for other example of specifying RngBackends. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Support hotplug by specifying DRC countBharata B Rao2015-09-231-2/+6
| | | | | | | | | | | | | | | | Support hotplug identifier type RTAS_LOG_V6_HP_ID_DRC_COUNT that allows hotplugging of DRCs by specifying the DRC count. While we are here, rename spapr_hotplug_req_add_event() to spapr_hotplug_req_add_by_index() spapr_hotplug_req_remove_event() to spapr_hotplug_req_remove_by_index() so that they match with spapr_hotplug_req_add_by_count(). Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Support ibm,dynamic-reconfiguration-memoryBharata B Rao2015-09-231-1/+14
| | | | | | | | | | | | | | | | | | | | Parse ibm,architecture.vec table obtained from the guest and enable memory node configuration via ibm,dynamic-reconfiguration-memory if guest supports it. This is in preparation to support memory hotplug for sPAPR guests. This changes the way memory node configuration is done. Currently all memory nodes are built upfront. But after this patch, only memory@0 node for RMA is built upfront. Guest kernel boots with just that and rest of the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory) are built when guest does ibm,client-architecture-support call. Note: This patch needs a SLOF enhancement which is already part of SLOF binary in QEMU. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Add LMB DR connectorsDavid Gibson2015-09-231-0/+1
| | | | | | | | | | | | | | | Enable memory hotplug for pseries 2.4 and add LMB DR connectors. With memory hotplug, enforce RAM size, NUMA node memory size and maxmem to be a multiple of SPAPR_MEMORY_BLOCK_SIZE (256M) since that's the granularity in which LMBs are represented and hot-added. LMB DR connectors will be used by the memory hotplug code. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> [spapr_drc_reset implementation] [since this missed the 2.4 cutoff, changing to only enable for 2.5] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Initialize hotplug memory address spaceBharata B Rao2015-09-231-0/+12
| | | | | | | | | | | | Initialize a hotplug memory region under which all the hotplugged memory is accommodated. Also enable memory hotplug by setting CONFIG_MEM_HOTPLUG. Modelled on i386 memory hotplug. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_drc: don't allow 'empty' DRCs to be unisolated or allocatedMichael Roth2015-09-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Logical resources start with allocation-state:UNUSABLE / isolation-state:ISOLATED. During hotplug, guests will transition them to allocation-state:USABLE, and then to isolation-state:UNISOLATED. For cases where we cannot transition to allocation-state:USABLE, in this case due to no device/resource being association with the logical DRC, we should return an error -3. For physical DRCs, we default to allocation-state:USABLE and stay there, so in this case we should report an error -3 when the guest attempts to make the isolation-state:ISOLATED transition for a DRC with no device associated. These are as documented in PAPR 2.7, 13.5.3.4. We also ensure allocation-state:USABLE when the guest attempts transition to isolation-state:UNISOLATED to deal with misbehaving guests attempting to bring online an unallocated logical resource. This is as documented in PAPR 2.7, 13.7. Currently we implement no such error logic. Fix this by handling these error cases as PAPR defines. Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* sPAPR: Introduce rtas_ldq()Gavin Shan2015-09-231-0/+5
| | | | | | | | | | | This introduces rtas_ldq() to load 64-bits parameter from continuous two 4-bytes memory chunk of RTAS parameter buffer, to simplify the code. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* ppc/spapr: Use qemu_log_mask() for hcall_dprintf()Thomas Huth2015-09-231-8/+3
| | | | | | | | | | | | | | | | | | | | | To see the output of the hcall_dprintf statements, you currently have to enable the DEBUG_SPAPR_HCALLS macro in include/hw/ppc/spapr.h. This is ugly because a) not every user who wants to debug guest problems can or wants to recompile QEMU to be able to see such issues, and b) since this macro is disabled by default, the code in the hcall_dprintf() brackets tends to bitrot until somebody temporarily enables that macro again. Since the hcall_dprintf statements except one indicate guest problems, let's always use qemu_log_mask(LOG_GUEST_ERROR, ...) for this macro instead. One spot indicated an unimplemented host feature, so this is changed into qemu_log_mask(LOG_UNIMP, ...) instead. Now it's possible to see all those messages by simply adding the CLI parameter "-d guest_errors,unimp", without the need to re-compile the binary. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Support ibm, lrdr-capacity device tree propertyBharata B Rao2015-07-071-0/+2
| | | | | | | | | | | | | | | Add support for ibm,lrdr-capacity since this is needed by the guest kernel to know about the possible hot-pluggable CPUs and Memory. With this, pseries kernels will start reporting correct maxcpus in /sys/devices/system/cpu/possible. Also define the minimum hotpluggable memory size as 256MB. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [agraf: Fix compile error on 32bit hosts] Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr: Add sPAPRMachineClassDavid Gibson2015-07-071-0/+15
| | | | | | | | | | | | Currently although we have an sPAPRMachineState descended from MachineState we don't have an sPAPRMAchineClass descended from MachineClass. So far it hasn't been needed, but several upcoming features are going to want it, so this patch creates a stub implementation. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr: Remove obsolete entry_point field from sPAPRMachineStateDavid Gibson2015-07-071-1/+1
| | | | | | | | | | | | | The sPAPRMachineState structure includes an entry_point field containing the initial PC value for starting the machine, even though this always has the value 0x100. I think this is a hangover from very early versions which bypassed the firmware when using -kernel. In any case it has no function now, so remove it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr: Remove obsolete ram_limit field from sPAPRMachineStateDavid Gibson2015-07-071-1/+0
| | | | | | | | | | | | | | | The ram_limit field was imported from sPAPREnvironment where it predates the machine's ram size being available generically from machine->ram_size. Worse, the existing code was inconsistent about where it got the ram size from. Sometimes it used spapr->ram_limit, sometimes the global 'ram_size' and sometimes a local 'ram_size' masking the global. This cleans up the code to consistently use machine->ram_size, eliminating spapr->ram_limit in the process. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr: Merge sPAPREnvironment into sPAPRMachineStateDavid Gibson2015-07-071-9/+24
| | | | | | | | | | | | | | | | | The code for -machine pseries maintains a global sPAPREnvironment structure which keeps track of general state information about the guest platform. This predates the existence of the MachineState structure, but performs basically the same function. Now that we have the generic MachineState, fold sPAPREnvironment into sPAPRMachineState, the pseries specific subclass of MachineState. This is mostly a matter of search and replace, although a few places which relied on the global spapr variable are changed to find the structure via qdev_get_machine(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr_events: event-scan RTAS interfaceTyrel Datwyler2015-06-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | We don't actually rely on this interface to surface hotplug events, and instead rely on the similar-but-interrupt-driven check-exception RTAS interface used for EPOW events. However, the existence of this interface is needed to ensure guest kernels initialize the event-reporting interfaces which will in turn be used by userspace tools to handle these events, so we implement this interface here. Since events surfaced by this call are mutually exclusive to those surfaced via check-exception, we also update the RTAS event queue code to accept a boolean to mark/filter for events accordingly. Events of this sort are not currently generated by QEMU, but the interface has been tested by surfacing hotplug events via event-scan in place of check-exception. Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr_events: re-use EPOW event infrastructure for hotplug eventsNathan Fontenot2015-06-031-1/+13
| | | | | | | | | | | | | | | | | | | | | | | This extends the data structures currently used to report EPOW events to guests via the check-exception RTAS interfaces to also include event types for hotplug/unplug events. This is currently undocumented and being finalized for inclusion in PAPR specification, but we implement this here as an extension for guest userspace tools to implement (existing guest kernels simply log these events via a sysfs interface that's read by rtas_errd, and current versions of rtas_errd/powerpc-utils already support the use of this mechanism for initiating hotplug operations). We also add support for queues of pending RTAS events, since in the case of hotplug there's chance for multiple events being in-flight at any point in time. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr_rtas: add ibm, configure-connector RTAS interfaceMichael Roth2015-06-031-0/+14
| | | | | | | | | | | | | | | | | | | | This interface is used to fetch an OF device-tree nodes that describes a newly-attached device to guest. It is called multiple times to walk the device-tree node and fetch individual properties into a 'workarea'/buffer provided by the guest. The device-tree is generated by QEMU and passed to an sPAPRDRConnector during the initial hotplug operation, and the state of these RTAS calls is tracked by the sPAPRDRConnector. When the last of these properties is successfully fetched, we report as special return value to the guest and transition the device to a 'configured' state on the QEMU/DRC side. See docs/specs/ppc-spapr-hotplug.txt for a complete description of this interface. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr: add rtas_st_buffer_direct() helperMichael Roth2015-06-031-2/+8
| | | | | | | | | | | | This is similar to the existing rtas_st_buffer(), but for cases where the guest is not expecting a length-encoded byte array. Namely, for calls where a "work area" buffer is used to pass around arbitrary fields/data. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr_rtas: add set-indicator RTAS interfaceMike Day2015-06-031-0/+11
| | | | | | | | | | | | | | | | This interface allows a guest to control various platform/device sensors. Initially, we only implement support necessary to control sensors that are required for hotplug: DR connector indicators/LEDs, resource allocation state, and resource isolation state. See docs/specs/ppc-spapr-hotplug.txt for a complete description of this interface. Signed-off-by: Mike Day <ncmike@ncultra.org> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* hw/ppc/spapr_iommu: Fix the check for invalid upper bits in liobnThomas Huth2015-06-031-1/+1
| | | | | | | | | | | The check "liobn & 0xFFFFFFFF00000000ULL" in spapr_tce_find_by_liobn() is completely useless since liobn is only declared as an uint32_t parameter. Fix this by using target_ulong instead (this is what most of the callers of this function are using, too). Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr_iommu: Make spapr_tce_find_by_liobn() publicAlexey Kardashevskiy2015-06-031-0/+1
| | | | | | | | | | | | | | At the moment spapr_tce_find_by_liobn() is used by H_PUT_TCE/... handlers to find an IOMMU by LIOBN. We are going to implement Dynamic DMA windows (DDW), new code will go to a new file and we will use spapr_tce_find_by_liobn() there too so let's make it public. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr_iommu: Add separate trace points for PCI DMA operationsAlexey Kardashevskiy2015-06-031-0/+1
| | | | | | | | This is to reduce VIO noise while debugging PCI DMA. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr_vio: Introduce a liobn number generating macrosAlexey Kardashevskiy2015-06-031-0/+1
| | | | | | | | | | | | | This introduces a macro which makes up a LIOBN from fixed prefix and VIO device address (@reg property). This is to keep LIOBN macros rendering consistent - the same macro for PCI has been added by the previous patch. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr_pci: Introduce a liobn number generating macrosAlexey Kardashevskiy2015-06-031-1/+3
| | | | | | | | | | | | | | | | | | We are going to have multiple DMA windows per PHB and we want them to migrate so we need a predictable way of assigning LIOBNs. This introduces a macro which makes up a LIOBN from fixed prefix, PHB index (unique PHB id) and window number. This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number from LIOBN. It is used to distinguish the default 32bit windows from dynamic windows and avoid picking default DMA window properties from a wrong TCE table. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* sPAPR: Implement EEH RTAS callsGavin Shan2015-03-091-2/+41
| | | | | | | | | | | | | | | | | | | | | | | The emulation for EEH RTAS requests from guest isn't covered by QEMU yet and the patch implements them. The patch defines constants used by EEH RTAS calls and adds callbacks sPAPRPHBClass::{eeh_set_option, eeh_get_state, eeh_reset, eeh_configure}, which are going to be used as follows: * RTAS calls are received in spapr_pci.c, sanity check is done there. * RTAS handlers handle what they can. If there is something it cannot handle and the corresponding sPAPRPHBClass callback is defined, it is called. * Those callbacks are only implemented for VFIO now. They do ioctl() to the IOMMU container fd to complete the calls. Error codes from that ioctl() are transferred back to the guest. [aik: defined RTAS tokens for EEH RTAS calls] Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* pseries: Switch VGA endian on H_SET_MODEDavid Gibson2015-03-091-0/+1
| | | | | | | | | | | | | When the guest switches the interrupt endian mode, which essentially means a global machine endian switch, we want to change the VGA framebuffer endian mode as well in order to be backward compatible with existing guests who don't know about the new endian control register. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
* pseries: Move rtc_offset into RTC device's state structureDavid Gibson2015-03-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | The initial creation of the PAPR RTC qdev class left a wart - the rtc's offset was left in the sPAPREnvironment structure, accessed via a global. This patch moves it into the RTC device's own state structure, were it belongs. This requires a small change to the migration stream format. In order to handle incoming streams from older versions, we also need to retain the rtc_offset field in the sPAPREnvironment structure, so that it can be loaded into via the vmsd, then pushed into the RTC device. Since we're changing the migration format, this also takes the opportunity to: * Change the rtc offset from a value in seconds to a value in nanoseconds, allowing nanosecond offsets between host and guest rtc time, if desired. * Remove both the already unused "next_irq" field and now unused "rtc_offset" field from the new version of the spapr migration stream Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* pseries: Make the PAPR RTC a qdev deviceDavid Gibson2015-03-091-2/+5
| | | | | | | | | | | | | | | | | | At present the PAPR RTC isn't a "device" as such - it's accessed only via firmware/hypervisor calls, and is handled in the sPAPR core code. This becomes inconvenient as we extend it in various ways. This patch makes the PAPR RTC a separate device in the qemu device model. For now, the only piece of device state - the rtc_offset - is still kept in the global sPAPREnvironment structure. That's clearly wrong, but leaving it to be fixed in a following patch makes for a clearer separation between the internal re-organization of the device, and the behavioural changes (because the migration stream format needs to change slightly when the offset is moved into the device's own state). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* pseries: Add spapr_rtc_read() helper functionDavid Gibson2015-03-091-0/+1
| | | | | | | | | | | | | | The virtual RTC time is used in two places in the pseries machine. First is in the RTAS get-time-of-day function which returns the RTC time to the guest. Second is in the spapr events code which is used to timestamp event messages from the hypervisor to the guest. Currently both call qemu_get_timedate() directly, but we want to change that so we can properly handle the various -rtc options. In preparation, create a helper function to return the virtual RTC time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* pseries: Move sPAPR RTC code into its own fileDavid Gibson2015-03-091-0/+1
| | | | | | | | | | | At the moment the RTAS (firmware/hypervisor) time of day functions are implemented in spapr_rtas.c along with a bunch of other things. Since we're going to be expanding these a bit, move the RTAS RTC related code out into new file spapr_rtc.c. Also add its own initialization function, spapr_rtc_init() called from the main machine init routine. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr_vio/spapr_iommu: Move VIO bypass where it belongsAlexey Kardashevskiy2015-03-091-1/+1
| | | | | | | | | | | | | | | | | | | | Instead of tweaking a TCE table device by adding there a bypass flag, let's add an alias to RAM and IOMMU memory region, and enable/disable those according to the selected bypass mode. This way IOMMU memory region can have size of the actual window rather than ram_size which is essential for upcoming DDW support. This moves bypass logic to VIO layer and keeps @bypass flag in TCE table for migration compatibility only. This replaces spapr_tce_set_bypass() calls with explicit assignment to avoid confusion as the function could do something more that just syncing the @bypass flag. This adds a pointer to VIO device into the sPAPRTCETable struct to provide the sPAPRTCETable device a way to update bypass mode for the VIO device. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr: Fix stale HTAB during live migration (KVM)Samuel Mendoza-Jonas2015-01-071-0/+1
| | | | | | | | | | | If a guest reboots during a running migration, changes to the hash page table are not necessarily updated on the destination. Opening a new file descriptor to the HTAB forces the migration handler to resend the entire table. Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr_pci: map the MSI window in each PHBGreg Kurz2014-09-081-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | On sPAPR, virtio devices are connected to the PCI bus and use MSI-X. Commit cc943c36faa192cd4b32af8fe5edb31894017d35 has modified MSI-X so that writes are made using the bus master address space and follow the IOMMU path. Unfortunately, the IOMMU address space address space does not have an MSI window: the notification is silently dropped in unassigned_mem_write instead of reaching the guest... The most visible effect is that all virtio devices are non-functional on sPAPR since then. :( This patch does the following: 1) map the MSI window into the IOMMU address space for each PHB - since each PHB instantiates its own IOMMU address space, we can safely map the window at a fixed address (SPAPR_PCI_MSI_WINDOW) - no real need to keep the MSI window setup in a separate function, the spapr_pci_msi_init() code moves to spapr_phb_realize(). 2) kill the global MSI window as it is not needed in the end Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr: Locate RTAS and device-tree based on real RMABenjamin Herrenschmidt2014-09-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently calculate the final RTAS and FDT location based on the early estimate of the RMA size, cropped to 256M on KVM since we only know the real RMA size at reset time which happens much later in the boot process. This means the FDT and RTAS end up right below 256M while they could be much higher, using precious RMA space and limiting what the OS bootloader can put there which has proved to be a problem with some OSes (such as when using very large initrd's) Fortunately, we do the actual copy of the device-tree into guest memory much later, during reset, late enough to be able to do it using the final RMA value, we just need to move the calculation to the right place. However, RTAS is still loaded too early, so we change the code to load the tiny blob into qemu memory early on, and then copy it into guest memory at reset time. It's small enough that the memory usage doesn't matter. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [aik: fixed errors from checkpatch.pl, defined RTAS_MAX_ADDR] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [agraf: fix compilation on 32bit hosts] Signed-off-by: Alexander Graf <agraf@suse.de>