| Commit message (Collapse) | Author | Files | Lines |
|
Add basic tests for file-based migration.
Note that we cannot use test_precopy_common because that routine
expects it to be possible to run the migration live. With the file
transport there is no live migration because we must wait for the
source to finish writing the migration data to the file before the
destination can start reading. Add a new migration function
specifically to handle the file migration.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230712190742.22294-7-farosas@suse.de>
|
|
Add a smoke test that migrates to a file and gives it to the
script. It should catch the most annoying errors such as changes in
the ram flags.
After code has been merged it becomes way harder to figure out what is
causing the script to fail, the person making the change is the most
likely to know right away what the problem is.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231009184326.15777-7-farosas@suse.de>
|
|
The migration code uses unsigned values for 16, 32 and 64-bit
operations. Fix the script to do the same.
This was causing an issue when parsing the migration stream generated
on the ppc64 target because one of instance_ids was larger than the
32bit signed maximum:
Traceback (most recent call last):
File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 658, in <module>
dump.read(dump_memory = args.memory)
File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 592, in read
classdesc = self.section_classes[section_key]
KeyError: ('spapr_iommu', -2147483648)
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231009184326.15777-6-farosas@suse.de>
|
|
The script is currently broken when the x-ignore-shared capability is
used:
Traceback (most recent call last):
File "./scripts/analyze-migration.py", line 656, in <module>
dump.read(dump_memory = args.memory)
File "./scripts/analyze-migration.py", line 593, in read
section.read()
File "./scripts/analyze-migration.py", line 163, in read
self.name = self.file.readstr(len = namelen)
File "./scripts/analyze-migration.py", line 53, in readstr
return self.readvar(len).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 55: invalid start byte
We're currently adding data to the middle of the ram section depending
on the presence of the capability. As a consequence, any code loading
the ram section needs to know about capabilities so it can interpret
the stream.
Skip the byte that's added when x-ignore-shared is used to fix the
script.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231009184326.15777-5-farosas@suse.de>
|
|
The script is broken when the configuration/capabilities section is
present. Add support for parsing the capabilities so we can fix it in
the next patch.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231009184326.15777-4-farosas@suse.de>
|
|
The 'configuration' state subsections are currently not being parsed
and the script fails when analyzing an aarch64 stream:
Traceback (most recent call last):
File "./scripts/analyze-migration.py", line 625, in <module>
dump.read(dump_memory = args.memory)
File "./scripts/analyze-migration.py", line 571, in read
raise Exception("Unknown section type: %d" % section_type)
Exception: Unknown section type: 5
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231009184326.15777-3-farosas@suse.de>
|
|
Make the migration json writer part of MigrationState struct, allowing
the 'configuration' object be serialized to json.
This will facilitate the parsing of the 'configuration' object in the
next patch that fixes analyze-migration.py for arm.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231009184326.15777-2-farosas@suse.de>
|
|
qemu_ram_block_from_host() may return NULL, which will be dereferenced w/o
check. Usualy return value is checked for this function.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Dmitry Frolov <frolov@swemel.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231010104851.802947-1-frolov@swemel.ru>
|
|
Migration bandwidth is a very important value to live migration. It's
because it's one of the major factors that we'll make decision on when to
switchover to destination in a precopy process.
This value is currently estimated by QEMU during the whole live migration
process by monitoring how fast we were sending the data. This can be the
most accurate bandwidth if in the ideal world, where we're always feeding
unlimited data to the migration channel, and then it'll be limited to the
bandwidth that is available.
However in reality it may be very different, e.g., over a 10Gbps network we
can see query-migrate showing migration bandwidth of only a few tens of
MB/s just because there are plenty of other things the migration thread
might be doing. For example, the migration thread can be busy scanning
zero pages, or it can be fetching dirty bitmap from other external dirty
sources (like vhost or KVM). It means we may not be pushing data as much
as possible to migration channel, so the bandwidth estimated from "how many
data we sent in the channel" can be dramatically inaccurate sometimes.
With that, the decision to switchover will be affected, by assuming that we
may not be able to switchover at all with such a low bandwidth, but in
reality we can.
The migration may not even converge at all with the downtime specified,
with that wrong estimation of bandwidth, keeping iterations forever with a
low estimation of bandwidth.
The issue is QEMU itself may not be able to avoid those uncertainties on
measuing the real "available migration bandwidth". At least not something
I can think of so far.
One way to fix this is when the user is fully aware of the available
bandwidth, then we can allow the user to help providing an accurate value.
For example, if the user has a dedicated channel of 10Gbps for migration
for this specific VM, the user can specify this bandwidth so QEMU can
always do the calculation based on this fact, trusting the user as long as
specified. It may not be the exact bandwidth when switching over (in which
case qemu will push migration data as fast as possible), but much better
than QEMU trying to wildly guess, especially when very wrong.
A new parameter "avail-switchover-bandwidth" is introduced just for this.
So when the user specified this parameter, instead of trusting the
estimated value from QEMU itself (based on the QEMUFile send speed), it
trusts the user more by using this value to decide when to switchover,
assuming that we'll have such bandwidth available then.
Note that specifying this value will not throttle the bandwidth for
switchover yet, so QEMU will always use the full bandwidth possible for
sending switchover data, assuming that should always be the most important
way to use the network at that time.
This can resolve issues like "unconvergence migration" which is caused by
hilarious low "migration bandwidth" detected for whatever reason.
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231010221922.40638-1-peterx@redhat.com>
|
|
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011023627.86691-1-philmd@linaro.org>
|
|
Current migration_completion function is a bit long. Refactor the long
implementation into different subfunctions:
- migration_completion_precopy: completion code related to precopy
- migration_completion_postcopy: completion code related to postcopy
Rename await_return_path_close_on_source to
close_return_path_on_source: It is renamed to match with
open_return_path_on_source.
This improves readability and is easier for future updates (e.g. add new
subfunctions when completion code related to new features are needed). No
functional changes intended.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230804093053.5037-1-wei.w.wang@intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
|
|
This patch fixes invalid ufs register fields.
This fixes an issue reported by Bin Meng that
caused ufs to fail over riscv.
Fixes: bc4e68d362ec ("hw/ufs: Initial commit for emulated Universal-Flash-Storage")
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>
Reported-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Bin Meng <bmeng@tinylab.org>
Tested-by: Bin Meng <bmeng@tinylab.org>
|
|
Fixed four ufs-related coverity issues.
The coverity issues and fixes are as follows
1. CID 1519042: Security issue with the rand() function
Changed to use a fixed value (0xab) instead of rand() as
the value for testing
2. CID 1519043: Dereference after null check
Removed useless (redundant) null checks
3. CID 1519050: Out-of-bounds access issue
Fix to pass an array type variable to find_first_bit and
find_next_bit using DECLARE_BITMAP()
4. CID 1519051: Out-of-bounds read issue
Fix incorrect range check for lun
Fix coverity CID: 1519042 1519043 1519050 1519051
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>
|
|
I haven't really been working on LoongArch for some time now,
so let's remove myself from this entry.
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20231012095135.1423071-1-yangxiaojuan@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
|
|
The system test shutdown uses the 'loongarch_virt_pm' region.
We can use the write AcpiFadtData.sleep_clt register to realize the shutdown.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-ID: <20231012072351.1409344-1-gaosong@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
|
|
The LoongArch 'virt' machine doesn't use its ISA I/O region.
If a ISA device were to be mapped there, there is no support
for ISA IRQ. Unlikely useful. Simply remove.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20231010135342.40219-3-philmd@linaro.org>
Signed-off-by: Song Gao <gaosong@loongson.cn>
|
|
The LoongArch 'virt' machine doesn't use any ISA UART.
No need to build the device model, remove its Kconfig entry.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20231010135342.40219-2-philmd@linaro.org>
Signed-off-by: Song Gao <gaosong@loongson.cn>
|