summary refs log tree commit diff stats
path: root/results/classifier/deepseek-1/output/hypervisor/1470720
diff options
context:
space:
mode:
Diffstat (limited to 'results/classifier/deepseek-1/output/hypervisor/1470720')
-rw-r--r--results/classifier/deepseek-1/output/hypervisor/147072054
1 files changed, 54 insertions, 0 deletions
diff --git a/results/classifier/deepseek-1/output/hypervisor/1470720 b/results/classifier/deepseek-1/output/hypervisor/1470720
new file mode 100644
index 00000000..c48e7976
--- /dev/null
+++ b/results/classifier/deepseek-1/output/hypervisor/1470720
@@ -0,0 +1,54 @@
+
+high IRQ-TLB generates network interruptions
+
+ we are having a problem in our hosts, all the vm running on them suddenly, and for some seconds, lost network connectivity.
+
+the root cause appears to be the increase of irb-tlb from low values (less than 20) to more than >100k, that spike only last for some seconds then everything goes back to normal
+
+i've upload an screenshot of collectd for one hypervisor here
+http://zumbi.com.ar/tmp/irq-tlb.png
+
+
+we have hosts running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and kernel 3.13) where the issue is frequent. also we have an small % of our fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel 3.16) where the problem seemed to be nonexistent until today
+
+issue seems to be isolated to < 10% of our hypervisors, some hypervisors had this problem every few days, others only once or twice. our vm are a black box to us we don't know what users run on them, but mostly cpu and network bound workload.
+most of our guests run centos 6.5 (kernel 2.6.32)
+
+vm are bridged to a linuxbridge then veth wired to an ovs switch (neutron openvswitch agent setup)
+
+
+
+maybe first part is not clear, here it goes again
+
+ this happens on some hypervisors at random times, not all hypervisors at the same time, and affects all vm on the hypervisor
+
+overcommit ratio on latest server i had the problem is 3.6 (3.6 vcpu for each cpu), would that be part of the problem?  i see other servers that never had the problem with over commit ratios as high as 4.1 
+
+Seeing the same here, also happens on overbooked hypervisors.
+
+Just one or two hosts have this behaviour.
+
+We are using:
+qemu-kvm                             2.0.0+dfsg-2ubuntu1.25
+libvirt-bin                          1.2.9
+kernel  3.13.0-92-generic
+
+We are using contrail as a SDN.
+
+It looks like it started after upgrading a bunch of packages including kernel (we came from 3.13.0-83-generic)
+
+
+Disabling huge pages seem to help.
+Strangely this should theoretically increase the issue but it so far we have not seen issues after disabling THP.
+(have not seen high load spikes in a week but this might also be holiday related)
+
+So other people can try it out:
+echo never >/sys/kernel/mm/transparent_hugepage/defrag
+echo never > /sys/kernel/mm/transparent_hugepage/enabled
+
+
+
+Looking through old bug tickets... can you still reproduce this issue with the latest version of QEMU? Or could we close this ticket nowadays?
+
+[Expired for QEMU because there has been no activity for 60 days.]
+