semantic: 0.984 other: 0.982 assembly: 0.982 boot: 0.980 socket: 0.976 vnc: 0.976 device: 0.974 instruction: 0.974 graphic: 0.973 network: 0.967 KVM: 0.963 mistranslation: 0.949 [Qemu-devel] [BUG] Migrate failes between boards with different PMC counts Hi all, Recently, I found migration failed when enable vPMU. migrate vPMU state was introduced in linux-3.10 + qemu-1.7. As long as enable vPMU, qemu will save / load the vmstate_msr_architectural_pmu(msr_global_ctrl) register during the migration. But global_ctrl generated based on cpuid(0xA), the number of general-purpose performance monitoring counters(PMC) can vary according to Intel SDN. The number of PMC presented to vm, does not support configuration currently, it depend on host cpuid, and enable all pmc defaultly at KVM. It cause migration to fail between boards with different PMC counts. The return value of cpuid (0xA) is different dur to cpu, according to Intel SDN,18-10 Vol. 3B: Note: The number of general-purpose performance monitoring counters (i.e. N in Figure 18-9) can vary across processor generations within a processor family, across processor families, or could be different depending on the configuration chosen at boot time in the BIOS regarding Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom processors; N =4 for processors based on the Nehalem microarchitecture; for processors based on the Sandy Bridge microarchitecture, N = 4 if Intel Hyper Threading Technology is active and N=8 if not active). Also I found, N=8 if HT is not active based on the broadwell,, such as CPU E7-8890 v4 @ 2.20GHz # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming tcp::8888 Completed 100 % qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Aborted So make number of pmc configurable to vm ? Any better idea ? Regards, -Zhuang Yanying * Zhuangyanying (address@hidden) wrote: > Hi all, > > Recently, I found migration failed when enable vPMU. > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. > > As long as enable vPMU, qemu will save / load the > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the migration. > But global_ctrl generated based on cpuid(0xA), the number of general-purpose > performance > monitoring counters(PMC) can vary according to Intel SDN. The number of PMC > presented > to vm, does not support configuration currently, it depend on host cpuid, and > enable all pmc > defaultly at KVM. It cause migration to fail between boards with different > PMC counts. > > The return value of cpuid (0xA) is different dur to cpu, according to Intel > SDN,18-10 Vol. 3B: > > Note: The number of general-purpose performance monitoring counters (i.e. N > in Figure 18-9) > can vary across processor generations within a processor family, across > processor families, or > could be different depending on the configuration chosen at boot time in the > BIOS regarding > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom processors; > N =4 for processors > based on the Nehalem microarchitecture; for processors based on the Sandy > Bridge > microarchitecture, N = 4 if Intel Hyper Threading Technology is active and > N=8 if not active). > > Also I found, N=8 if HT is not active based on the broadwell,, > such as CPU E7-8890 v4 @ 2.20GHz > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming > tcp::8888 > Completed 100 % > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: > kvm_put_msrs: > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. > Aborted > > So make number of pmc configurable to vm ? Any better idea ? Coincidentally we hit a similar problem a few days ago with -cpu host - it took me quite a while to spot the difference between the machines was the source had hyperthreading disabled. An option to set the number of counters makes sense to me; but I wonder how many other options we need as well. Also, I'm not sure there's any easy way for libvirt etc to figure out how many counters a host supports - it's not in /proc/cpuinfo. Dave > > Regards, > -Zhuang Yanying -- Dr. David Alan Gilbert / address@hidden / Manchester, UK On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: > * Zhuangyanying (address@hidden) wrote: > > Hi all, > > > > Recently, I found migration failed when enable vPMU. > > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. > > > > As long as enable vPMU, qemu will save / load the > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the > > migration. > > But global_ctrl generated based on cpuid(0xA), the number of > > general-purpose performance > > monitoring counters(PMC) can vary according to Intel SDN. The number of PMC > > presented > > to vm, does not support configuration currently, it depend on host cpuid, > > and enable all pmc > > defaultly at KVM. It cause migration to fail between boards with different > > PMC counts. > > > > The return value of cpuid (0xA) is different dur to cpu, according to Intel > > SDN,18-10 Vol. 3B: > > > > Note: The number of general-purpose performance monitoring counters (i.e. N > > in Figure 18-9) > > can vary across processor generations within a processor family, across > > processor families, or > > could be different depending on the configuration chosen at boot time in > > the BIOS regarding > > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom > > processors; N =4 for processors > > based on the Nehalem microarchitecture; for processors based on the Sandy > > Bridge > > microarchitecture, N = 4 if Intel Hyper Threading Technology is active and > > N=8 if not active). > > > > Also I found, N=8 if HT is not active based on the broadwell,, > > such as CPU E7-8890 v4 @ 2.20GHz > > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming > > tcp::8888 > > Completed 100 % > > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: > > kvm_put_msrs: > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. > > Aborted > > > > So make number of pmc configurable to vm ? Any better idea ? > > Coincidentally we hit a similar problem a few days ago with -cpu host - it > took me > quite a while to spot the difference between the machines was the source > had hyperthreading disabled. > > An option to set the number of counters makes sense to me; but I wonder > how many other options we need as well. Also, I'm not sure there's any > easy way for libvirt etc to figure out how many counters a host supports - > it's not in /proc/cpuinfo. We actually try to avoid /proc/cpuinfo whereever possible. We do direct CPUID asm instructions to identify features, and prefer to use /sys/devices/system/cpu if that has suitable data Where do the PMC counts come from originally ? CPUID or something else ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| * Daniel P. Berrange (address@hidden) wrote: > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: > > * Zhuangyanying (address@hidden) wrote: > > > Hi all, > > > > > > Recently, I found migration failed when enable vPMU. > > > > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. > > > > > > As long as enable vPMU, qemu will save / load the > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the > > > migration. > > > But global_ctrl generated based on cpuid(0xA), the number of > > > general-purpose performance > > > monitoring counters(PMC) can vary according to Intel SDN. The number of > > > PMC presented > > > to vm, does not support configuration currently, it depend on host cpuid, > > > and enable all pmc > > > defaultly at KVM. It cause migration to fail between boards with > > > different PMC counts. > > > > > > The return value of cpuid (0xA) is different dur to cpu, according to > > > Intel SDN,18-10 Vol. 3B: > > > > > > Note: The number of general-purpose performance monitoring counters (i.e. > > > N in Figure 18-9) > > > can vary across processor generations within a processor family, across > > > processor families, or > > > could be different depending on the configuration chosen at boot time in > > > the BIOS regarding > > > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom > > > processors; N =4 for processors > > > based on the Nehalem microarchitecture; for processors based on the Sandy > > > Bridge > > > microarchitecture, N = 4 if Intel Hyper Threading Technology is active > > > and N=8 if not active). > > > > > > Also I found, N=8 if HT is not active based on the broadwell,, > > > such as CPU E7-8890 v4 @ 2.20GHz > > > > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming > > > tcp::8888 > > > Completed 100 % > > > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: > > > kvm_put_msrs: > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. > > > Aborted > > > > > > So make number of pmc configurable to vm ? Any better idea ? > > > > Coincidentally we hit a similar problem a few days ago with -cpu host - it > > took me > > quite a while to spot the difference between the machines was the source > > had hyperthreading disabled. > > > > An option to set the number of counters makes sense to me; but I wonder > > how many other options we need as well. Also, I'm not sure there's any > > easy way for libvirt etc to figure out how many counters a host supports - > > it's not in /proc/cpuinfo. > > We actually try to avoid /proc/cpuinfo whereever possible. We do direct > CPUID asm instructions to identify features, and prefer to use > /sys/devices/system/cpu if that has suitable data > > Where do the PMC counts come from originally ? CPUID or something else ? Yes, they're bits 8..15 of CPUID leaf 0xa Dave > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- Dr. David Alan Gilbert / address@hidden / Manchester, UK On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (address@hidden) wrote: > > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: > > > * Zhuangyanying (address@hidden) wrote: > > > > Hi all, > > > > > > > > Recently, I found migration failed when enable vPMU. > > > > > > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. > > > > > > > > As long as enable vPMU, qemu will save / load the > > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the > > > > migration. > > > > But global_ctrl generated based on cpuid(0xA), the number of > > > > general-purpose performance > > > > monitoring counters(PMC) can vary according to Intel SDN. The number of > > > > PMC presented > > > > to vm, does not support configuration currently, it depend on host > > > > cpuid, and enable all pmc > > > > defaultly at KVM. It cause migration to fail between boards with > > > > different PMC counts. > > > > > > > > The return value of cpuid (0xA) is different dur to cpu, according to > > > > Intel SDN,18-10 Vol. 3B: > > > > > > > > Note: The number of general-purpose performance monitoring counters > > > > (i.e. N in Figure 18-9) > > > > can vary across processor generations within a processor family, across > > > > processor families, or > > > > could be different depending on the configuration chosen at boot time > > > > in the BIOS regarding > > > > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom > > > > processors; N =4 for processors > > > > based on the Nehalem microarchitecture; for processors based on the > > > > Sandy Bridge > > > > microarchitecture, N = 4 if Intel Hyper Threading Technology is active > > > > and N=8 if not active). > > > > > > > > Also I found, N=8 if HT is not active based on the broadwell,, > > > > such as CPU E7-8890 v4 @ 2.20GHz > > > > > > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda > > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true > > > > -incoming tcp::8888 > > > > Completed 100 % > > > > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff > > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: > > > > kvm_put_msrs: > > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. > > > > Aborted > > > > > > > > So make number of pmc configurable to vm ? Any better idea ? > > > > > > Coincidentally we hit a similar problem a few days ago with -cpu host - > > > it took me > > > quite a while to spot the difference between the machines was the source > > > had hyperthreading disabled. > > > > > > An option to set the number of counters makes sense to me; but I wonder > > > how many other options we need as well. Also, I'm not sure there's any > > > easy way for libvirt etc to figure out how many counters a host supports - > > > it's not in /proc/cpuinfo. > > > > We actually try to avoid /proc/cpuinfo whereever possible. We do direct > > CPUID asm instructions to identify features, and prefer to use > > /sys/devices/system/cpu if that has suitable data > > > > Where do the PMC counts come from originally ? CPUID or something else ? > > Yes, they're bits 8..15 of CPUID leaf 0xa Ok, that's easy enough for libvirt to detect then. More a question of what libvirt should then do this with the info.... Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > -----Original Message----- > From: Daniel P. Berrange [ mailto:address@hidden > Sent: Monday, April 24, 2017 6:34 PM > To: Dr. David Alan Gilbert > Cc: Zhuangyanying; Zhanghailiang; wangxin (U); address@hidden; > Gonglei (Arei); Huangzhichao; address@hidden > Subject: Re: [Qemu-devel] [BUG] Migrate failes between boards with different > PMC counts > > On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: > > * Daniel P. Berrange (address@hidden) wrote: > > > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: > > > > * Zhuangyanying (address@hidden) wrote: > > > > > Hi all, > > > > > > > > > > Recently, I found migration failed when enable vPMU. > > > > > > > > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. > > > > > > > > > > As long as enable vPMU, qemu will save / load the > > > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the > migration. > > > > > But global_ctrl generated based on cpuid(0xA), the number of > > > > > general-purpose performance monitoring counters(PMC) can vary > > > > > according to Intel SDN. The number of PMC presented to vm, does > > > > > not support configuration currently, it depend on host cpuid, and > > > > > enable > all pmc defaultly at KVM. It cause migration to fail between boards with > different PMC counts. > > > > > > > > > > The return value of cpuid (0xA) is different dur to cpu, according to > > > > > Intel > SDN,18-10 Vol. 3B: > > > > > > > > > > Note: The number of general-purpose performance monitoring > > > > > counters (i.e. N in Figure 18-9) can vary across processor > > > > > generations within a processor family, across processor > > > > > families, or could be different depending on the configuration > > > > > chosen at boot time in the BIOS regarding Intel Hyper Threading > > > > > Technology, (e.g. N=2 for 45 nm Intel Atom processors; N =4 for > processors based on the Nehalem microarchitecture; for processors based on > the Sandy Bridge microarchitecture, N = 4 if Intel Hyper Threading Technology > is active and N=8 if not active). > > > > > > > > > > Also I found, N=8 if HT is not active based on the broadwell,, > > > > > such as CPU E7-8890 v4 @ 2.20GHz > > > > > > > > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m > > > > > 4096 -hda > > > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true > > > > > -incoming tcp::8888 Completed 100 % > > > > > qemu-system-x86_64: error: failed to set MSR 0x38f to > > > > > 0x7000000ff > > > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: > kvm_put_msrs: > > > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. > > > > > Aborted > > > > > > > > > > So make number of pmc configurable to vm ? Any better idea ? > > > > > > > > Coincidentally we hit a similar problem a few days ago with -cpu > > > > host - it took me quite a while to spot the difference between > > > > the machines was the source had hyperthreading disabled. > > > > > > > > An option to set the number of counters makes sense to me; but I > > > > wonder how many other options we need as well. Also, I'm not sure > > > > there's any easy way for libvirt etc to figure out how many > > > > counters a host supports - it's not in /proc/cpuinfo. > > > > > > We actually try to avoid /proc/cpuinfo whereever possible. We do > > > direct CPUID asm instructions to identify features, and prefer to > > > use /sys/devices/system/cpu if that has suitable data > > > > > > Where do the PMC counts come from originally ? CPUID or something > else ? > > > > Yes, they're bits 8..15 of CPUID leaf 0xa > > Ok, that's easy enough for libvirt to detect then. More a question of what > libvirt > should then do this with the info.... > Do you mean to do a validation at the begining of migration? in qemuMigrationBakeCookie() & qemuMigrationEatCookie(), if the PMC numbers are not equal, just quit migration? It maybe a good enough first edition. But for a further better edition, maybe it's better to support Heterogeneous migration I think, so we might need to make PMC number configrable, then we need to modify KVM/qemu as well. Regards, -Zhuang Yanying * Zhuangyanying (address@hidden) wrote: > > > > -----Original Message----- > > From: Daniel P. Berrange [ mailto:address@hidden > > Sent: Monday, April 24, 2017 6:34 PM > > To: Dr. David Alan Gilbert > > Cc: Zhuangyanying; Zhanghailiang; wangxin (U); address@hidden; > > Gonglei (Arei); Huangzhichao; address@hidden > > Subject: Re: [Qemu-devel] [BUG] Migrate failes between boards with different > > PMC counts > > > > On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: > > > * Daniel P. Berrange (address@hidden) wrote: > > > > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: > > > > > * Zhuangyanying (address@hidden) wrote: > > > > > > Hi all, > > > > > > > > > > > > Recently, I found migration failed when enable vPMU. > > > > > > > > > > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. > > > > > > > > > > > > As long as enable vPMU, qemu will save / load the > > > > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the > > migration. > > > > > > But global_ctrl generated based on cpuid(0xA), the number of > > > > > > general-purpose performance monitoring counters(PMC) can vary > > > > > > according to Intel SDN. The number of PMC presented to vm, does > > > > > > not support configuration currently, it depend on host cpuid, and > > > > > > enable > > all pmc defaultly at KVM. It cause migration to fail between boards with > > different PMC counts. > > > > > > > > > > > > The return value of cpuid (0xA) is different dur to cpu, according > > > > > > to Intel > > SDN,18-10 Vol. 3B: > > > > > > > > > > > > Note: The number of general-purpose performance monitoring > > > > > > counters (i.e. N in Figure 18-9) can vary across processor > > > > > > generations within a processor family, across processor > > > > > > families, or could be different depending on the configuration > > > > > > chosen at boot time in the BIOS regarding Intel Hyper Threading > > > > > > Technology, (e.g. N=2 for 45 nm Intel Atom processors; N =4 for > > processors based on the Nehalem microarchitecture; for processors based on > > the Sandy Bridge microarchitecture, N = 4 if Intel Hyper Threading > > Technology > > is active and N=8 if not active). > > > > > > > > > > > > Also I found, N=8 if HT is not active based on the broadwell,, > > > > > > such as CPU E7-8890 v4 @ 2.20GHz > > > > > > > > > > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m > > > > > > 4096 -hda > > > > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true > > > > > > -incoming tcp::8888 Completed 100 % > > > > > > qemu-system-x86_64: error: failed to set MSR 0x38f to > > > > > > 0x7000000ff > > > > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: > > kvm_put_msrs: > > > > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. > > > > > > Aborted > > > > > > > > > > > > So make number of pmc configurable to vm ? Any better idea ? > > > > > > > > > > Coincidentally we hit a similar problem a few days ago with -cpu > > > > > host - it took me quite a while to spot the difference between > > > > > the machines was the source had hyperthreading disabled. > > > > > > > > > > An option to set the number of counters makes sense to me; but I > > > > > wonder how many other options we need as well. Also, I'm not sure > > > > > there's any easy way for libvirt etc to figure out how many > > > > > counters a host supports - it's not in /proc/cpuinfo. > > > > > > > > We actually try to avoid /proc/cpuinfo whereever possible. We do > > > > direct CPUID asm instructions to identify features, and prefer to > > > > use /sys/devices/system/cpu if that has suitable data > > > > > > > > Where do the PMC counts come from originally ? CPUID or something > > else ? > > > > > > Yes, they're bits 8..15 of CPUID leaf 0xa > > > > Ok, that's easy enough for libvirt to detect then. More a question of what > > libvirt > > should then do this with the info.... > > > > Do you mean to do a validation at the begining of migration? in > qemuMigrationBakeCookie() & qemuMigrationEatCookie(), if the PMC numbers are > not equal, just quit migration? > It maybe a good enough first edition. > But for a further better edition, maybe it's better to support Heterogeneous > migration I think, so we might need to make PMC number configrable, then we > need to modify KVM/qemu as well. Yes agreed; the only thing I wanted to check was that libvirt would have enough information to be able to use any feature we added to QEMU. Dave > Regards, > -Zhuang Yanying -- Dr. David Alan Gilbert / address@hidden / Manchester, UK