summary refs log tree commit diff stats
Commit message (Collapse)AuthorAgeFilesLines
...
| * target/ppc: cpu_init: Move Timebase registration into the common functionFabiano Rosas2022-02-181-80/+18
| | | | | | | | | | | | | | | | | | | | Now that the 601 was removed, all of our CPUs have a timebase, so that can be moved into the common function. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20220216162426.1885923-5-farosas@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * target/ppc: cpu_init: Group registration of generic SPRsFabiano Rosas2022-02-181-26/+32
| | | | | | | | | | | | | | | | | | | | | | The top level init_proc calls register_generic_sprs but also registers some other SPRs outside of that function. Let's group everything into a single place. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20220216162426.1885923-4-farosas@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * target/ppc: cpu_init: Remove G2LE init codeFabiano Rosas2022-02-181-41/+1
| | | | | | | | | | | | | | | | | | | | The G2LE CPU initialization code is the same as the G2. Use the latter for both. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20220216162426.1885923-3-farosas@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * target/ppc: cpu_init: Remove not implemented commentsFabiano Rosas2022-02-181-329/+253
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The /* XXX : not implemented */ comments all over cpu_init are confusing and ambiguous. Do they mean not implemented by QEMU, not implemented in a specific access mode? Not implemented by the CPU? Do they apply to just the register right after or to a whole block? Do they mean we have an action to take in the future to implement these? Are they only informative? Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20220216162426.1885923-2-farosas@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * spapr: implement nested-hv capability for the virtual hypervisorNicholas Piggin2022-02-185-11/+452
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements the Nested KVM HV hcall API for spapr under TCG. The L2 is switched in when the H_ENTER_NESTED hcall is made, and the L1 is switched back in returned from the hcall when a HV exception is sent to the vhyp. Register state is copied in and out according to the nested KVM HV hcall API specification. The hdecr timer is started when the L2 is switched in, and it provides the HDEC / 0x980 return to L1. The MMU re-uses the bare metal radix 2-level page table walker by using the get_pate method to point the MMU to the nested partition table entry. MMU faults due to partition scope errors raise HV exceptions and accordingly are routed back to the L1. The MMU does not tag translations for the L1 (direct) vs L2 (nested) guests, so the TLB is flushed on any L1<->L2 transition (hcall entry and exit). Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> [ clg: checkpatch fixes ] Message-Id: <20220216102545.1808018-10-npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * target/ppc: Introduce a vhyp framework for nested HV supportNicholas Piggin2022-02-185-13/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce virtual hypervisor methods that can support a "Nested KVM HV" implementation using the bare metal 2-level radix MMU, and using HV exceptions to return from H_ENTER_NESTED (rather than cause interrupts). HV exceptions can now be raised in the TCG spapr machine when running a nested KVM HV guest. The main ones are the lev==1 syscall, the hdecr, hdsi and hisi, hv fu, and hv emu, and h_virt external interrupts. HV exceptions are intercepted in the exception handler code and instead of causing interrupts in the guest and switching the machine to HV mode, they go to the vhyp where it may exit the H_ENTER_NESTED hcall with the interrupt vector numer as return value as required by the hcall API. Address translation is provided by the 2-level page table walker that is implemented for the bare metal radix MMU. The partition scope page table is pointed to the L1's partition scope by the get_pate vhc method. Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220216102545.1808018-9-npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * target/ppc: Add powerpc_reset_excp_state helperNicholas Piggin2022-02-181-20/+22
| | | | | | | | | | | | | | | | | | | | | | This moves the logic to reset the QEMU exception state into its own function. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [ clg: checkpatch fixes ] Message-Id: <20220216102545.1808018-8-npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * target/ppc: add helper for books vhyp hypercall handlerNicholas Piggin2022-02-181-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | The virtual hypervisor currently always intercepts and handles hypercalls but with a future change this will not always be the case. Add a helper for the test so the logic is abstracted from the mechanism. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20220216102545.1808018-7-npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * target/ppc: make vhyp get_pate method take lpid and return successNicholas Piggin2022-02-183-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In prepartion for implementing a full partition table option for vhyp, update the get_pate method to take an lpid and return a success/fail indicator. The spapr implementation currently just asserts lpid is always 0 and always return success. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [ clg: checkpatch fixes ] Message-Id: <20220216102545.1808018-6-npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * target/ppc: add vhyp addressing mode helper for radix MMUNicholas Piggin2022-02-181-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | The radix on vhyp MMU uses a single-level radix table walk, with the partition scope mapping provided by the flat QEMU machine memory. A subsequent change will use the two-level radix walk on vhyp in some situations, so provide a helper which can abstract that logic. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20220216102545.1808018-5-npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * ppc: allow the hdecr timer to be created/destroyedNicholas Piggin2022-02-182-0/+24
| | | | | | | | | | | | | | | | | | | | | | Machines which don't emulate the HDEC facility are able to use the timer for something else. Provide functions to start and stop the hdecr timer. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [ clg: checkpatch fixes ] Message-Id: <20220216102545.1808018-4-npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * spapr: prevent hdec timer being set up under virtual hypervisorNicholas Piggin2022-02-182-4/+4
| | | | | | | | | | | | | | | | | | | | | | The spapr virtual hypervisor does not require the hdecr timer. Remove it. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20220216102545.1808018-3-npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * target/ppc: raise HV interrupts for partition table entry problemsNicholas Piggin2022-02-181-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Invalid or missing partition table entry exceptions should cause HV interrupts. HDSISR is set to bad MMU config, which is consistent with the ISA and experimentally matches what POWER9 generates. Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [ clg: checkpatch fixes ] Message-Id: <20220216102545.1808018-2-npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * spapr: nvdimm: Introduce spapr-nvdimm deviceShivaprasad G Bhat2022-02-181-0/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the device backend is not persistent memory for the nvdimm, there is need for explicit IO flushes on the backend to ensure persistence. On SPAPR, the issue is addressed by adding a new hcall to request for an explicit flush from the guest when the backend is not pmem. So, the approach here is to convey when the hcall flush is required in a device tree property. The guest once it knows the device backend is not pmem, makes the hcall whenever flush is required. To set the device tree property, a new PAPR specific device type inheriting the nvdimm device is implemented. When the backend doesn't have pmem=on the device tree property "ibm,hcall-flush-required" is set, and the guest makes hcall H_SCM_FLUSH requesting for an explicit flush. The new device has boolean property pmem-override which when "on" advertises the device tree property even when pmem=on for the backend. The flush function invokes the fdatasync or pmem_persist() based on the type of backend. The vmstate structures are made part of the spapr-nvdimm device object. The patch attempts to keep the migration compatibility between source and destination while rejecting the incompatibles ones with failures. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <164396256092.109112.17933240273840803354.stgit@ltczzess4.aus.stglabs.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * spapr: nvdimm: Implement H_SCM_FLUSH hcallShivaprasad G Bhat2022-02-184-1/+266
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds support for the SCM flush hcall for the nvdimm devices. To be available for exploitation by guest through the next patch. The hcall is applicable only for new SPAPR specific device class which is also introduced in this patch. The hcall expects the semantics such that the flush to return with H_LONG_BUSY_ORDER_10_MSEC when the operation is expected to take longer time along with a continue_token. The hcall to be called again by providing the continue_token to get the status. So, all fresh requests are put into a 'pending' list and flush worker is submitted to the thread pool. The thread pool completion callbacks move the requests to 'completed' list, which are cleaned up after collecting the return status for the guest in subsequent hcall from the guest. The semantics makes it necessary to preserve the continue_tokens and their return status across migrations. So, the completed flush states are forwarded to the destination and the pending ones are restarted at the destination in post_load. The necessary nvdimm flush specific vmstate structures are also introduced in this patch which are to be saved in the new SPAPR specific nvdimm device to be introduced in the following patch. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <164396254862.109112.16675611182159105748.stgit@ltczzess4.aus.stglabs.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
| * nvdimm: Add realize, unrealize callbacks to NVDIMMDevice classShivaprasad G Bhat2022-02-184-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | A new subclass inheriting NVDIMMDevice is going to be introduced in subsequent patches. The new subclass uses the realize and unrealize callbacks. Add them on NVDIMMClass to appropriately call them as part of plug-unplug. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Acked-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <164396253158.109112.1926755104259023743.stgit@ltczzess4.aus.stglabs.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
* | Merge remote-tracking branch ↵Peter Maydell2022-02-1934-132/+1123
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20220217b' into staging V3: virtiofs pull 2022-02-17 Security label improvements from Vivek - includes a fix for building against new kernel headers [V3: checkpatch style fixes] [V2: Fix building on old Linux] Blocking flock disable from Sebastian SYNCFS support from Greg Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # gpg: Signature made Thu 17 Feb 2022 17:24:25 GMT # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert-gitlab/tags/pull-virtiofs-20220217b: virtiofsd: Add basic support for FUSE_SYNCFS request virtiofsd: Add an option to enable/disable security label virtiofsd: Create new file using O_TMPFILE and set security context virtiofsd: Create new file with security context virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreate virtiofsd: Move core file creation code in separate function virtiofsd, fuse_lowlevel.c: Add capability to parse security context virtiofsd: Extend size of fuse_conn_info->capable and ->want fields virtiofsd: Parse extended "struct fuse_init_in" linux-headers: Update headers to v5.17-rc1 virtiofsd: Fix breakage due to fuse_init_in size change virtiofsd: Do not support blocking flock Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * | virtiofsd: Add basic support for FUSE_SYNCFS requestGreg Kurz2022-02-174-0/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Honor the expected behavior of syncfs() to synchronously flush all data and metadata to disk on linux systems. If virtiofsd is started with '-o announce_submounts', the client is expected to send a FUSE_SYNCFS request for each individual submount. In this case, we just create a new file descriptor on the submount inode with lo_inode_open(), call syncfs() on it and close it. The intermediary file is needed because O_PATH descriptors aren't backed by an actual file and syncfs() would fail with EBADF. If virtiofsd is started without '-o announce_submounts' or if the client doesn't have the FUSE_CAP_SUBMOUNTS capability, the client only sends a single FUSE_SYNCFS request for the root inode. The server would thus need to track submounts internally and call syncfs() on each of them. This will be implemented later. Note that syncfs() might suffer from a time penalty if the submounts are being hammered by some unrelated workload on the host. The only solution to prevent that is to avoid shared mounts. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20220215181529.164070-2-groug@kaod.org> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
| * | virtiofsd: Add an option to enable/disable security labelVivek Goyal2022-02-173-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide an option "-o security_label/no_security_label" to enable/disable security label functionality. By default these are turned off. If enabled, server will indicate to client that it is capable of handling one security label during file creation. Typically this is expected to be a SELinux label. File server will set this label on the file. It will try to set it atomically wherever possible. But its not possible in all the cases. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-11-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
| * | virtiofsd: Create new file using O_TMPFILE and set security contextVivek Goyal2022-02-171-8/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If guest and host policies can't work with each other, then guest security context (selinux label) needs to be set into an xattr. Say remap guest security.selinux xattr to trusted.virtiofs.security.selinux. That means setting "fscreate" is not going to help as that's ony useful for security.selinux xattr on host. So we need another method which is atomic. Use O_TMPFILE to create new file, set xattr and then linkat() to proper place. But this works only for regular files. So dir, symlinks will continue to be non-atomic. Also if host filesystem does not support O_TMPFILE, we fallback to non-atomic behavior. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-10-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
| * | virtiofsd: Create new file with security contextVivek Goyal2022-02-171-29/+200
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for creating new file with security context as sent by client. It basically takes three paths. - If no security context enabled, then it continues to create files without security context. - If security context is enabled and but security.selinux has not been remapped, then it uses /proc/thread-self/attr/fscreate knob to set security context and then create the file. This will make sure that newly created file gets the security context as set in "fscreate" and this is atomic w.r.t file creation. This is useful and host and guest SELinux policies don't conflict and can work with each other. In that case, guest security.selinux xattr is not remapped and it is passthrough as "security.selinux" xattr on host. - If security context is enabled but security.selinux xattr has been remapped to something else, then it first creates the file and then uses setxattr() to set the remapped xattr with the security context. This is a non-atomic operation w.r.t file creation. This mode will be most versatile and allow host and guest to have their own separate SELinux xattrs and have their own separate SELinux policies. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-9-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
| * | virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreateVivek Goyal2022-02-171-0/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Soon we will be able to create and also set security context on the file atomically using /proc/self/task/tid/attr/fscreate knob. If this knob is available on the system, first set the knob with the desired context and then create the file. It will be created with the context set in fscreate. This works basically for SELinux and its per thread. This patch just introduces the helper functions. Subsequent patches will make use of these helpers. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-8-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Manually merged gettid syscall number fixup from Vivek
| * | virtiofsd: Move core file creation code in separate functionVivek Goyal2022-02-171-11/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move core file creation bits in a separate function. Soon this is going to get more complex as file creation need to set security context also. And there will be multiple modes of file creation in next patch. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-7-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
| * | virtiofsd, fuse_lowlevel.c: Add capability to parse security contextVivek Goyal2022-02-173-1/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add capability to enable and parse security context as sent by client and put into fuse_req. Filesystems now can get security context from request and set it on files during creation. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-6-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
| * | virtiofsd: Extend size of fuse_conn_info->capable and ->want fieldsVivek Goyal2022-02-172-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->capable keeps track of what capabilities kernel supports and ->wants keep track of what capabilities filesytem wants. Right now these fields are 32bit in size. But now fuse has run out of bits and capabilities can now have bit number which are higher than 31. That means 32 bit fields are not suffcient anymore. Increase size to 64 bit so that we can add newer capabilities and still be able to use existing code to check and set the capabilities. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-5-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
| * | virtiofsd: Parse extended "struct fuse_init_in"Vivek Goyal2022-02-171-22/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add some code to parse extended "struct fuse_init_in". And use a local variable "flag" to represent 64 bit flags. This will make it easier to add more features without having to worry about two 32bit flags (->flags and ->flags2) in "fuse_struct_in". Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-4-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Fixed up long line
| * | linux-headers: Update headers to v5.17-rc1Vivek Goyal2022-02-1726-76/+469
| | | | | | | | | | | | | | | | | | | | | | | | | | | Update headers to 5.17-rc1. I need latest fuse changes. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-3-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
| * | virtiofsd: Fix breakage due to fuse_init_in size changeVivek Goyal2022-02-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel version 5.17 has increased the size of "struct fuse_init_in" struct. Previously this struct was 16 bytes and now it has been extended to 64 bytes in size. Once qemu headers are updated to latest, it will expect to receive 64 byte size struct (for protocol version major 7 and minor > 6). But if guest is booting older kernel (older than 5.17), then it still sends older fuse_init_in of size 16 bytes. And do_init() fails. It is expecting 64 byte struct. And this results in mount of virtiofs failing. Fix this by parsing 16 bytes only for now. Separate patches will be posted which will parse rest of the bytes and enable new functionality. Right now we don't support any of the new functionality, so we don't lose anything by not parsing bytes beyond 16. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-2-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
| * | virtiofsd: Do not support blocking flockSebastian Hasler2022-02-161-0/+9
| |/ | | | | | | | | | | | | | | | | | | | | | | With the current implementation, blocking flock can lead to deadlock. Thus, it's better to return EOPNOTSUPP if a user attempts to perform a blocking flock request. Signed-off-by: Sebastian Hasler <sebastian.hasler@stuvus.uni-stuttgart.de> Message-Id: <20220113153249.710216-1-sebastian.hasler@stuvus.uni-stuttgart.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>
* | Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20220217' ↵Peter Maydell2022-02-197-92/+96
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging 9pfs: fixes and cleanup * Fifth patch fixes a 9pfs server crash that happened on some systems due to incorrect (system dependant) handling of struct dirent size. * Tests: Second patch fixes a test error that happened on some systems due mkdir() being called twice for creating the test directory for the 9p 'local' tests. * Tests: Third patch fixes a memory leak. * Tests: The remaining two patches are code cleanup. # gpg: Signature made Thu 17 Feb 2022 16:19:25 GMT # gpg: using RSA key 96D8D110CF7AF8084F88590134C2B58765A47395 # gpg: issuer "qemu_oss@crudebyte.com" # gpg: Good signature from "Christian Schoenebeck <qemu_oss@crudebyte.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: ECAB 1A45 4014 1413 BA38 4926 30DB 47C3 A012 D5F4 # Subkey fingerprint: 96D8 D110 CF7A F808 4F88 5901 34C2 B587 65A4 7395 * remotes/cschoenebeck/tags/pull-9p-20220217: 9pfs: Fix segfault in do_readdir_many caused by struct dirent overread tests/9pfs: Use g_autofree and g_autoptr where possible tests/9pfs: Fix leak of local_test_path tests/9pfs: fix mkdir() being called twice tests/9pfs: use g_autofree where possible Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * 9pfs: Fix segfault in do_readdir_many caused by struct dirent overreadVitaly Chikunov2022-02-175-5/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `struct dirent' returned from readdir(3) could be shorter (or longer) than `sizeof(struct dirent)', thus memcpy of sizeof length will overread into unallocated page causing SIGSEGV. Example stack trace: #0 0x00005555559ebeed v9fs_co_readdir_many (/usr/bin/qemu-system-x86_64 + 0x497eed) #1 0x00005555559ec2e9 v9fs_readdir (/usr/bin/qemu-system-x86_64 + 0x4982e9) #2 0x0000555555eb7983 coroutine_trampoline (/usr/bin/qemu-system-x86_64 + 0x963983) #3 0x00007ffff73e0be0 n/a (n/a + 0x0) While fixing this, provide a helper for any future `struct dirent' cloning. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/841 Cc: qemu-stable@nongnu.org Co-authored-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Acked-by: Greg Kurz <groug@kaod.org> Tested-by: Vitaly Chikunov <vt@altlinux.org> Message-Id: <20220216181821.3481527-1-vt@altlinux.org> [C.S. - Fix typo in source comment. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
| * tests/9pfs: Use g_autofree and g_autoptr where possibleGreg Kurz2022-02-171-9/+4
| | | | | | | | | | | | | | | | | | | | It is recommended to use g_autofree or g_autoptr as it reduces the odds of introducing memory leaks in future changes. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20220201151508.190035-3-groug@kaod.org> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
| * tests/9pfs: Fix leak of local_test_pathGreg Kurz2022-02-171-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | local_test_path is allocated in virtio_9p_create_local_test_dir() to hold the path of the temporary directory. It should be freed in virtio_9p_remove_local_test_dir() when the temporary directory is removed. Clarify the lifecycle of local_test_path while here. Based-on: <f6602123c6f7d0d593466231b04fba087817abbd.1642879848.git.qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20220201151508.190035-2-groug@kaod.org> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
| * tests/9pfs: fix mkdir() being called twiceChristian Schoenebeck2022-02-171-15/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 9p test cases use mkdtemp() to create a temporary directory for running the 'local' 9p tests with real files/dirs. Unlike mktemp() which only generates a unique file name, mkdtemp() also creates the directory, therefore the subsequent mkdir() was wrong and caused errors on some systems. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Fixes: 136b7af2 (tests/9pfs: fix test dir for parallel tests) Reported-by: Daniel P. Berrangé <berrange@redhat.com> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/832 Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Greg Kurz <Greg Kurz <groug@kaod.org> Message-Id: <f6602123c6f7d0d593466231b04fba087817abbd.1642879848.git.qemu_oss@crudebyte.com>
| * tests/9pfs: use g_autofree where possibleChristian Schoenebeck2022-02-171-63/+27
|/ | | | | | Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <E1mn1fA-0005qZ-TM@lizzy.crudebyte.com>
* Merge remote-tracking branch ↵Peter Maydell2022-02-1627-370/+3252
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'remotes/alistair/tags/pull-riscv-to-apply-20220216' into staging Fourth RISC-V PR for QEMU 7.0 * Remove old Ibex PLIC header file * Allow writing 8 bytes with generic loader * Fixes for RV128 * Refactor RISC-V CPU configs * Initial support for XVentanaCondOps custom extension * Fix for vill field in vtype * Fix trap cause for RV32 HS-mode CSR access from RV64 HS-mode * Support for svnapot, svinval and svpbmt extensions # gpg: Signature made Wed 16 Feb 2022 06:24:52 GMT # gpg: using RSA key F6C4AC46D4934868D3B8CE8F21E10D29DF977054 # gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [full] # Primary key fingerprint: F6C4 AC46 D493 4868 D3B8 CE8F 21E1 0D29 DF97 7054 * remotes/alistair/tags/pull-riscv-to-apply-20220216: (35 commits) docs/system: riscv: Update description of CPU target/riscv: add support for svpbmt extension target/riscv: add support for svinval extension target/riscv: add support for svnapot extension target/riscv: add PTE_A/PTE_D/PTE_U bits check for inner PTE target/riscv: Ignore reserved bits in PTE for RV64 hw/intc: Add RISC-V AIA APLIC device emulation target/riscv: Allow users to force enable AIA CSRs in HART hw/riscv: virt: Use AIA INTC compatible string when available target/riscv: Implement AIA IMSIC interface CSRs target/riscv: Implement AIA xiselect and xireg CSRs target/riscv: Implement AIA mtopi, stopi, and vstopi CSRs target/riscv: Implement AIA interrupt filtering CSRs target/riscv: Implement AIA hvictl and hviprioX CSRs target/riscv: Implement AIA CSRs for 64 local interrupts on RV32 target/riscv: Implement AIA local interrupt priorities target/riscv: Allow AIA device emulation to set ireg rmw callback target/riscv: Add defines for AIA CSRs target/riscv: Add AIA cpu feature target/riscv: Allow setting CPU feature from machine/device emulation ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * docs/system: riscv: Update description of CPUYu Li2022-02-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | Since the hypervisor extension been non experimental and enabled for default CPU, the previous command is no longer available and the option `x-h=true` or `h=true` is also no longer required. Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <9040401e-8f87-ef4a-d840-6703f08d068c@bytedance.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * target/riscv: add support for svpbmt extensionWeiwei Li2022-02-163-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | - add PTE_PBMT bits: It uses two PTE bits, but otherwise has no effect on QEMU, since QEMU is sequentially consistent and doesn't model PMAs currently - add PTE_PBMT bit check for inner PTE Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn> Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20220204022658.18097-6-liweiwei@iscas.ac.cn> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * target/riscv: add support for svinval extensionWeiwei Li2022-02-165-0/+85
| | | | | | | | | | | | | | | | | | | | | | | | - sinval.vma, hinval.vvma and hinval.gvma do the same as sfence.vma, hfence.vvma and hfence.gvma except extension check - do nothing other than extension check for sfence.w.inval and sfence.inval.ir Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn> Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20220204022658.18097-5-liweiwei@iscas.ac.cn> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * target/riscv: add support for svnapot extensionWeiwei Li2022-02-163-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | - add PTE_N bit - add PTE_N bit check for inner PTE - update address translation to support 64KiB continuous region (napot_bits = 4) Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn> Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20220204022658.18097-4-liweiwei@iscas.ac.cn> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * target/riscv: add PTE_A/PTE_D/PTE_U bits check for inner PTEWeiwei Li2022-02-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | For non-leaf PTEs, the D, A, and U bits are reserved for future standard use. Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn> Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20220204022658.18097-3-liweiwei@iscas.ac.cn> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * target/riscv: Ignore reserved bits in PTE for RV64Guo Ren2022-02-163-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Highest bits of PTE has been used for svpbmt, ref: [1], [2], so we need to ignore them. They cannot be a part of ppn. 1: The RISC-V Instruction Set Manual, Volume II: Privileged Architecture 4.4 Sv39: Page-Based 39-bit Virtual-Memory System 4.5 Sv48: Page-Based 48-bit Virtual-Memory System 2: https://github.com/riscv/virtual-memory/blob/main/specs/663-Svpbmt-diff.pdf Signed-off-by: Guo Ren <ren_guo@c-sky.com> Reviewed-by: Liu Zhiwei <zhiwei_liu@c-sky.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Cc: Bin Meng <bmeng.cn@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20220204022658.18097-2-liweiwei@iscas.ac.cn> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * hw/intc: Add RISC-V AIA APLIC device emulationAnup Patel2022-02-164-0/+1061
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RISC-V AIA (Advanced Interrupt Architecture) defines a new interrupt controller for wired interrupts called APLIC (Advanced Platform Level Interrupt Controller). The APLIC is capabable of forwarding wired interupts to RISC-V HARTs directly or as MSIs (Message Signaled Interupts). This patch adds device emulation for RISC-V AIA APLIC. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Frank Chang <frank.chang@sifive.com> Message-id: 20220204174700.534953-19-anup@brainfault.org Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * target/riscv: Allow users to force enable AIA CSRs in HARTAnup Patel2022-02-162-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | We add "x-aia" command-line option for RISC-V HART using which allows users to force enable CPU AIA CSRs without changing the interrupt controller available in RISC-V machine. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Message-id: 20220204174700.534953-18-anup@brainfault.org Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * hw/riscv: virt: Use AIA INTC compatible string when availableAnup Patel2022-02-161-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | We should use the AIA INTC compatible string in the CPU INTC DT nodes when the CPUs support AIA feature. This will allow Linux INTC driver to use AIA local interrupt CSRs. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Message-id: 20220204174700.534953-17-anup@brainfault.org Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * target/riscv: Implement AIA IMSIC interface CSRsAnup Patel2022-02-161-0/+203
| | | | | | | | | | | | | | | | | | | | | | | | The AIA specification defines IMSIC interface CSRs for easy access to the per-HART IMSIC registers without using indirect xiselect and xireg CSRs. This patch implements the AIA IMSIC interface CSRs. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Frank Chang <frank.chang@sifive.com> Message-id: 20220204174700.534953-16-anup@brainfault.org Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * target/riscv: Implement AIA xiselect and xireg CSRsAnup Patel2022-02-163-0/+187
| | | | | | | | | | | | | | | | | | | | | | | | The AIA specification defines [m|s|vs]iselect and [m|s|vs]ireg CSRs which allow indirect access to interrupt priority arrays and per-HART IMSIC registers. This patch implements AIA xiselect and xireg CSRs. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Frank Chang <frank.chang@sifive.com> Message-id: 20220204174700.534953-15-anup@brainfault.org Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * target/riscv: Implement AIA mtopi, stopi, and vstopi CSRsAnup Patel2022-02-161-0/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The AIA specification introduces new [m|s|vs]topi CSRs for reporting pending local IRQ number and associated IRQ priority. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Frank Chang <frank.chang@sifive.com> Message-id: 20220204174700.534953-14-anup@brainfault.org [ Changed by AF: - Fixup indentation ] Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * target/riscv: Implement AIA interrupt filtering CSRsAnup Patel2022-02-161-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The AIA specificaiton adds interrupt filtering support for M-mode and HS-mode. Using AIA interrupt filtering M-mode and H-mode can take local interrupt 13 or above and selectively inject same local interrupt to lower privilege modes. At the moment, we don't have any local interrupts above 12 so we add dummy implementation (i.e. read zero and ignore write) of AIA interrupt filtering CSRs. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Message-id: 20220204174700.534953-13-anup@brainfault.org Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * target/riscv: Implement AIA hvictl and hviprioX CSRsAnup Patel2022-02-163-1/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The AIA hvictl and hviprioX CSRs allow hypervisor to control interrupts visible at VS-level. This patch implements AIA hvictl and hviprioX CSRs. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Message-id: 20220204174700.534953-12-anup@brainfault.org [ Changes by AF: - Fix possible unintilised variable error in rmw_sie() ] Signed-off-by: Alistair Francis <alistair.francis@wdc.com>