-
- Novice
- Posts: 4
- Liked: never
- Joined: Oct 31, 2011 2:43 pm
- Contact:
Virtual Machine Compute Latency Analysis, how does it work?
Recently we upgrade our SCOM environment from 2007R2 with nWorks 5.x to 2012SP1 with nWorks 6.
Soon after we saw a lot of warnings with the counter Veeam VMware: Virtual Machine Compute Latency Analysis.
A example of one of the warnings:
The virtual machine VMName running on host host.local.domain has a Compute Latency issue.
High CPU Latency of 11.86% indicates the VM is waiting for CPU resources from the vSphere host server. This indicates the host is under stress attempting to allocate CPU resources.
When I look in the report in SCOM for this machine I see this graph:
When I look in VMWare at the host I see this:
http://tweakers.net/ext/f/J5NS0ibVwdzXA ... e/full.png
When I look at the vm I see this:
http://tweakers.net/ext/f/Lmcd9V7ivxJnR ... g/full.png
Looking in the host with ESXtop we see this:
http://tweakers.net/ext/f/Ib4nPCzItBut9 ... D/full.png
It seems all is running fine but I cannot find a lot of info on why this message is being generated and if we are missing something. What does the monitor exactly do and should we adjust it?
Soon after we saw a lot of warnings with the counter Veeam VMware: Virtual Machine Compute Latency Analysis.
A example of one of the warnings:
The virtual machine VMName running on host host.local.domain has a Compute Latency issue.
High CPU Latency of 11.86% indicates the VM is waiting for CPU resources from the vSphere host server. This indicates the host is under stress attempting to allocate CPU resources.
When I look in the report in SCOM for this machine I see this graph:
When I look in VMWare at the host I see this:
http://tweakers.net/ext/f/J5NS0ibVwdzXA ... e/full.png
When I look at the vm I see this:
http://tweakers.net/ext/f/Lmcd9V7ivxJnR ... g/full.png
Looking in the host with ESXtop we see this:
http://tweakers.net/ext/f/Ib4nPCzItBut9 ... D/full.png
It seems all is running fine but I cannot find a lot of info on why this message is being generated and if we are missing something. What does the monitor exactly do and should we adjust it?
-
- Veteran
- Posts: 452
- Liked: 76 times
- Joined: May 02, 2012 1:49 pm
- Full Name: Sergey Goncharenko
- Contact:
Re: Virtual Machine Compute Latency Analysis, how does it wo
Hello,
Clearly your VM is experiencing CPU scheduling issue. I do not see any contradiction between SCOM chart, VMware host and VM chart - all of them indicate that VM is experiencing ~ 15% CPU latency with peaks to almost 25%. According to your esxtop output you have 6CPU host, which means that 25% for 1vCPU VM equals 4% overall CPU latency for a host, for 2 vCPU VM, this equals to 8% latency for the host. So either this is the only VM experiencing the issue and it has more than 1 vCPU or you have a couple of 1vCPU VMs with high cpu latency.
Since in esxtop we see very low CPU Ready, we can conclude that something else is affecting cpu latency.
It could be CPU Co-Stop, which is true if you have 2 vCPU VM - let us know if this is the case. Although for a VM to have CPU Co-Stop, the host should be running a lot of other VMs with different number of virtual CPUs assigned(1,3,4 or more vCPUs) which should affect CPU ready too, so I think it's unlikely. Another reason against this assumption is that from what you are saying it looks like with 5.7 you didn't experience any alerts about cpu ready or cpu co-stop.
Another reason for high cpu latency could be some other wait time:
- swap wait(check if this host has enough memory for all running VMs)
- IO wait (check it you have a big difference between CPU Idle and CPU Wait time for this VM - this could indicate about issues with IO, because of slow storage, snapshot or some other storage activity)
If you can provide more details about this VM we can do more analysis to understand why this VM has high CPU latency. 25% is quite high - quarter of time VM is ready to work, but instead it's doing nothing while waiting for host subsystems.
Thanks.
Clearly your VM is experiencing CPU scheduling issue. I do not see any contradiction between SCOM chart, VMware host and VM chart - all of them indicate that VM is experiencing ~ 15% CPU latency with peaks to almost 25%. According to your esxtop output you have 6CPU host, which means that 25% for 1vCPU VM equals 4% overall CPU latency for a host, for 2 vCPU VM, this equals to 8% latency for the host. So either this is the only VM experiencing the issue and it has more than 1 vCPU or you have a couple of 1vCPU VMs with high cpu latency.
Since in esxtop we see very low CPU Ready, we can conclude that something else is affecting cpu latency.
It could be CPU Co-Stop, which is true if you have 2 vCPU VM - let us know if this is the case. Although for a VM to have CPU Co-Stop, the host should be running a lot of other VMs with different number of virtual CPUs assigned(1,3,4 or more vCPUs) which should affect CPU ready too, so I think it's unlikely. Another reason against this assumption is that from what you are saying it looks like with 5.7 you didn't experience any alerts about cpu ready or cpu co-stop.
Another reason for high cpu latency could be some other wait time:
- swap wait(check if this host has enough memory for all running VMs)
- IO wait (check it you have a big difference between CPU Idle and CPU Wait time for this VM - this could indicate about issues with IO, because of slow storage, snapshot or some other storage activity)
If you can provide more details about this VM we can do more analysis to understand why this VM has high CPU latency. 25% is quite high - quarter of time VM is ready to work, but instead it's doing nothing while waiting for host subsystems.
Thanks.
-
- Novice
- Posts: 4
- Liked: never
- Joined: Oct 31, 2011 2:43 pm
- Contact:
Re: Virtual Machine Compute Latency Analysis, how does it wo
The host is a HP ProLiant BL460c G6 with 96GB Memory and 2 Intel Xeon X5570 CPU's.
There is no overcommit configured. If we look in VMWare the % memory used is around 65% accross the cluster. We have 15 hosts in this cluster. The average amount of CPU used on a host is 30%.
The VM is a Windows 2003 Standard server with 2 vCPU's and 4Gb memory. But we also have this problem with 2008 servers for example.
The underlaying storage is a EVA6400.
What sort of information do you need exactly? Thanks for the explanation so far.
I've checked the cpu wait and idle for big differences, but they are almost identical.
There is no overcommit configured. If we look in VMWare the % memory used is around 65% accross the cluster. We have 15 hosts in this cluster. The average amount of CPU used on a host is 30%.
The VM is a Windows 2003 Standard server with 2 vCPU's and 4Gb memory. But we also have this problem with 2008 servers for example.
The underlaying storage is a EVA6400.
What sort of information do you need exactly? Thanks for the explanation so far.
I've checked the cpu wait and idle for big differences, but they are almost identical.
-
- Veteran
- Posts: 452
- Liked: 76 times
- Joined: May 02, 2012 1:49 pm
- Full Name: Sergey Goncharenko
- Contact:
Re: Virtual Machine Compute Latency Analysis, how does it wo
Hi,
Could you provide us with the chart for the problematic VM which have cpuIdlePct, cpuWaitPct, cpuReadyPct, cpuLatencyPct, cpuCoStopPct, cpuSwapWaitPct like on the screenshot below?
Unfortunately VMware don't tell anyone what exactly they include in cpu latency counter. We beleive that it could be calculated as 100% - (%RUN + %WAIT_IDLE), in this case it's very hard to tell what exactly is forcing VM to be in waiting state. We can only tell that it's not memory issues if swap wait is low, not scheduling issue if cpu ready and co-stop is low. If this is the case for your environment there should be something else, snapshot is a very common reason for performance degradation, I would also check kernel latency for HBAs on the host.
Thanks.
Could you provide us with the chart for the problematic VM which have cpuIdlePct, cpuWaitPct, cpuReadyPct, cpuLatencyPct, cpuCoStopPct, cpuSwapWaitPct like on the screenshot below?
Unfortunately VMware don't tell anyone what exactly they include in cpu latency counter. We beleive that it could be calculated as 100% - (%RUN + %WAIT_IDLE), in this case it's very hard to tell what exactly is forcing VM to be in waiting state. We can only tell that it's not memory issues if swap wait is low, not scheduling issue if cpu ready and co-stop is low. If this is the case for your environment there should be something else, snapshot is a very common reason for performance degradation, I would also check kernel latency for HBAs on the host.
Thanks.
-
- Novice
- Posts: 4
- Liked: never
- Joined: Oct 31, 2011 2:43 pm
- Contact:
Re: Virtual Machine Compute Latency Analysis, how does it wo
I don't see your screenshot? But here are the ones I created:
Co-Stop & Swap Wait
http://tweakers.net/ext/f/8mJKqZChJQ3Nu ... 1/full.png
Latency
http://tweakers.net/ext/f/BdDXzwt3jFA1r ... y/full.png
Wait - Idle - Ready
http://tweakers.net/ext/f/SqAwT6vTBOPdo ... 2/full.png
And the adapter
http://tweakers.net/ext/f/gxu99pb4YBcOG ... 2/full.png
I think the storage is being overloaded because I also see more and more alerts about storage.
Co-Stop & Swap Wait
http://tweakers.net/ext/f/8mJKqZChJQ3Nu ... 1/full.png
Latency
http://tweakers.net/ext/f/BdDXzwt3jFA1r ... y/full.png
Wait - Idle - Ready
http://tweakers.net/ext/f/SqAwT6vTBOPdo ... 2/full.png
And the adapter
http://tweakers.net/ext/f/gxu99pb4YBcOG ... 2/full.png
I think the storage is being overloaded because I also see more and more alerts about storage.
-
- Novice
- Posts: 7
- Liked: never
- Joined: Jan 04, 2014 10:16 am
- Full Name: shocko
- Contact:
Re: Virtual Machine Compute Latency Analysis, how does it wo
I'm facing the same issue. The CPU latency counter in vCentre was only added in ESXi 5 I believe? Regardless, I'm trying to workout why it is at 155 in my environment. I do not have any CPU power saving in place and my CPU state on my HP blades are static high performance. There is also no CPU over-commitment and all RAM for the VMs is fully reserved so no swapping.
-
- Veteran
- Posts: 452
- Liked: 76 times
- Joined: May 02, 2012 1:49 pm
- Full Name: Sergey Goncharenko
- Contact:
Re: Virtual Machine Compute Latency Analysis, how does it wo
Hi,
From what we've been able to learn about cpu latency, it could also be affected by NUMA nodes configuration, so even if there is no over-commitment some delays are possible. Are you saying that CPU latency is 155% in SCOM? This is definitely something we need to investigate and fix, so we would appreciate more details, could you also send us a screenshot of this metric in SCOM and in vSphere client?
Thanks.
From what we've been able to learn about cpu latency, it could also be affected by NUMA nodes configuration, so even if there is no over-commitment some delays are possible. Are you saying that CPU latency is 155% in SCOM? This is definitely something we need to investigate and fix, so we would appreciate more details, could you also send us a screenshot of this metric in SCOM and in vSphere client?
Thanks.
-
- Novice
- Posts: 4
- Liked: never
- Joined: May 09, 2014 6:09 pm
- Full Name: Mordock
- Contact:
Re: Virtual Machine Compute Latency Analysis, how does it wo
There has not been any resolution posted for the issues represented here. A few things to look at for those hitting this thread. You need to do some drilling down into esxtop. The cpu latency counter in esxtop is %lat_c. (go to (c)pu, then (f)ields and show I for stats to see it). Use V to limit the display to just vms, then (l)imit the display to the VM in question. Then (e)xpand it to show all processes that make up the VM. In my case, I found that all of the CPU latency was in the vmx process, which is a low priority process that is used to control various non-critical communications and thus the 12% latency that I was seeing could be ignored for now. I have yet to find any more detail as to what might be causing it or how to resolve/reduce it so that it is not raising flags all over the place. If anyone has any insight, I would appreciate it.
Another note, there are 3 settings in most machine bios that impact cpu latency. Under processor, there is c-state and CE1(or CE1-state). These control whether cores are halted when not needed. The under power management, you can generally set High Performance or OS Control. These control whether the cores are running at full speed or are throttled down to as little as 50% when under reduced load. OS Control allows you to set the performance level in the vSphere client under Configuration/Power Management.
To see the impact of these power management settings use esxtop, press (p)ower. This will allow you to see the state of each core/hyperthread. If c-state is enabled, then columns for %C0, %C1, and %C2 will be displayed. If c-state is disabled, these columns will not appear.
Another note, there are 3 settings in most machine bios that impact cpu latency. Under processor, there is c-state and CE1(or CE1-state). These control whether cores are halted when not needed. The under power management, you can generally set High Performance or OS Control. These control whether the cores are running at full speed or are throttled down to as little as 50% when under reduced load. OS Control allows you to set the performance level in the vSphere client under Configuration/Power Management.
To see the impact of these power management settings use esxtop, press (p)ower. This will allow you to see the state of each core/hyperthread. If c-state is enabled, then columns for %C0, %C1, and %C2 will be displayed. If c-state is disabled, these columns will not appear.
-
- Lurker
- Posts: 1
- Liked: never
- Joined: Feb 13, 2014 9:10 am
- Full Name: Ben B
- Contact:
Re: Virtual Machine Compute Latency Analysis, how does it wo
Hello,
We had CPU latency alerts and after trying many different things to solve it, in the end the solution was to disable power management in the bios and on the ESX level.
http://kb.vmware.com/kb/1018206
We had CPU latency alerts and after trying many different things to solve it, in the end the solution was to disable power management in the bios and on the ESX level.
http://kb.vmware.com/kb/1018206
-
- Veteran
- Posts: 452
- Liked: 76 times
- Joined: May 02, 2012 1:49 pm
- Full Name: Sergey Goncharenko
- Contact:
Re: Virtual Machine Compute Latency Analysis, how does it wo
Wow, thank you very much for your input.
Looks like we are receiving more and more confirmations that BIOS power management settings could cause issues with increased CPU latency. We'll make sure to include this information in our KB articles in the product, so that our customers are aware of the possible root causes when they encounter high CPU latency alerts in SCOM.
Once again, thank you all for useful comments and suggestions. And keep us posted, we would appreciate any piece of information that can help our customers and other Virtualization experts to identify root causes of poor VM performance.
Looks like we are receiving more and more confirmations that BIOS power management settings could cause issues with increased CPU latency. We'll make sure to include this information in our KB articles in the product, so that our customers are aware of the possible root causes when they encounter high CPU latency alerts in SCOM.
Once again, thank you all for useful comments and suggestions. And keep us posted, we would appreciate any piece of information that can help our customers and other Virtualization experts to identify root causes of poor VM performance.
-
- Veeam Software
- Posts: 87
- Liked: 6 times
- Joined: May 29, 2012 7:33 pm
- Full Name: Paul Wallace
- Contact:
Re: Virtual Machine Compute Latency Analysis, how does it wo
It seems I'm a bit late to this thread but I felt this contribution would be well placed here -
Host Power Management in VMware vSphere 5.5:
http://www.vmware.com/files/pdf/techpap ... here55.pdf
In my case, setting the BIOS DBPM setting to OS Control (enabling me to control the power management setting through ESXi rather than needing to reboot and do so through the BIOS) and changing the Power Management settings in ESXi to High Performance got rid of these alerts. The document above is a great resource for understanding these settings and configuring them for the best possible performance.
Host Power Management in VMware vSphere 5.5:
http://www.vmware.com/files/pdf/techpap ... here55.pdf
In my case, setting the BIOS DBPM setting to OS Control (enabling me to control the power management setting through ESXi rather than needing to reboot and do so through the BIOS) and changing the Power Management settings in ESXi to High Performance got rid of these alerts. The document above is a great resource for understanding these settings and configuring them for the best possible performance.
Who is online
Users browsing this forum: No registered users and 2 guests