Monitoring and reporting for Veeam Backup & Replication, VMware vSphere and Microsoft Hyper-V in a single System Center Operations Manager Console
Post Reply
goudzwaardc
Novice
Posts: 4
Liked: never
Joined: Oct 31, 2011 2:43 pm
Contact:

Virtual Machine Compute Latency Analysis, how does it work?

Post by goudzwaardc »

Recently we upgrade our SCOM environment from 2007R2 with nWorks 5.x to 2012SP1 with nWorks 6.

Soon after we saw a lot of warnings with the counter Veeam VMware: Virtual Machine Compute Latency Analysis.
A example of one of the warnings:
The virtual machine VMName running on host host.local.domain has a Compute Latency issue.
High CPU Latency of 11.86% indicates the VM is waiting for CPU resources from the vSphere host server. This indicates the host is under stress attempting to allocate CPU resources.
When I look in the report in SCOM for this machine I see this graph:
Image

When I look in VMWare at the host I see this:
http://tweakers.net/ext/f/J5NS0ibVwdzXA ... e/full.png

When I look at the vm I see this:
http://tweakers.net/ext/f/Lmcd9V7ivxJnR ... g/full.png

Looking in the host with ESXtop we see this:
http://tweakers.net/ext/f/Ib4nPCzItBut9 ... D/full.png

It seems all is running fine but I cannot find a lot of info on why this message is being generated and if we are missing something. What does the monitor exactly do and should we adjust it?
sergey.g
Veteran
Posts: 452
Liked: 76 times
Joined: May 02, 2012 1:49 pm
Full Name: Sergey Goncharenko
Contact:

Re: Virtual Machine Compute Latency Analysis, how does it wo

Post by sergey.g »

Hello,

Clearly your VM is experiencing CPU scheduling issue. I do not see any contradiction between SCOM chart, VMware host and VM chart - all of them indicate that VM is experiencing ~ 15% CPU latency with peaks to almost 25%. According to your esxtop output you have 6CPU host, which means that 25% for 1vCPU VM equals 4% overall CPU latency for a host, for 2 vCPU VM, this equals to 8% latency for the host. So either this is the only VM experiencing the issue and it has more than 1 vCPU or you have a couple of 1vCPU VMs with high cpu latency.

Since in esxtop we see very low CPU Ready, we can conclude that something else is affecting cpu latency.

It could be CPU Co-Stop, which is true if you have 2 vCPU VM - let us know if this is the case. Although for a VM to have CPU Co-Stop, the host should be running a lot of other VMs with different number of virtual CPUs assigned(1,3,4 or more vCPUs) which should affect CPU ready too, so I think it's unlikely. Another reason against this assumption is that from what you are saying it looks like with 5.7 you didn't experience any alerts about cpu ready or cpu co-stop.

Another reason for high cpu latency could be some other wait time:
- swap wait(check if this host has enough memory for all running VMs)
- IO wait (check it you have a big difference between CPU Idle and CPU Wait time for this VM - this could indicate about issues with IO, because of slow storage, snapshot or some other storage activity)

If you can provide more details about this VM we can do more analysis to understand why this VM has high CPU latency. 25% is quite high - quarter of time VM is ready to work, but instead it's doing nothing while waiting for host subsystems.

Thanks.
goudzwaardc
Novice
Posts: 4
Liked: never
Joined: Oct 31, 2011 2:43 pm
Contact:

Re: Virtual Machine Compute Latency Analysis, how does it wo

Post by goudzwaardc »

The host is a HP ProLiant BL460c G6 with 96GB Memory and 2 Intel Xeon X5570 CPU's.
There is no overcommit configured. If we look in VMWare the % memory used is around 65% accross the cluster. We have 15 hosts in this cluster. The average amount of CPU used on a host is 30%.

The VM is a Windows 2003 Standard server with 2 vCPU's and 4Gb memory. But we also have this problem with 2008 servers for example.
The underlaying storage is a EVA6400.

What sort of information do you need exactly? Thanks for the explanation so far.

I've checked the cpu wait and idle for big differences, but they are almost identical.
sergey.g
Veteran
Posts: 452
Liked: 76 times
Joined: May 02, 2012 1:49 pm
Full Name: Sergey Goncharenko
Contact:

Re: Virtual Machine Compute Latency Analysis, how does it wo

Post by sergey.g »

Hi,

Could you provide us with the chart for the problematic VM which have cpuIdlePct, cpuWaitPct, cpuReadyPct, cpuLatencyPct, cpuCoStopPct, cpuSwapWaitPct like on the screenshot below?

Image

Unfortunately VMware don't tell anyone what exactly they include in cpu latency counter. We beleive that it could be calculated as 100% - (%RUN + %WAIT_IDLE), in this case it's very hard to tell what exactly is forcing VM to be in waiting state. We can only tell that it's not memory issues if swap wait is low, not scheduling issue if cpu ready and co-stop is low. If this is the case for your environment there should be something else, snapshot is a very common reason for performance degradation, I would also check kernel latency for HBAs on the host.

Thanks.
goudzwaardc
Novice
Posts: 4
Liked: never
Joined: Oct 31, 2011 2:43 pm
Contact:

Re: Virtual Machine Compute Latency Analysis, how does it wo

Post by goudzwaardc »

I don't see your screenshot? But here are the ones I created:

Co-Stop & Swap Wait
http://tweakers.net/ext/f/8mJKqZChJQ3Nu ... 1/full.png

Latency
http://tweakers.net/ext/f/BdDXzwt3jFA1r ... y/full.png

Wait - Idle - Ready
http://tweakers.net/ext/f/SqAwT6vTBOPdo ... 2/full.png

And the adapter
http://tweakers.net/ext/f/gxu99pb4YBcOG ... 2/full.png

I think the storage is being overloaded because I also see more and more alerts about storage.
shocko
Novice
Posts: 7
Liked: never
Joined: Jan 04, 2014 10:16 am
Full Name: shocko
Contact:

Re: Virtual Machine Compute Latency Analysis, how does it wo

Post by shocko »

I'm facing the same issue. The CPU latency counter in vCentre was only added in ESXi 5 I believe? Regardless, I'm trying to workout why it is at 155 in my environment. I do not have any CPU power saving in place and my CPU state on my HP blades are static high performance. There is also no CPU over-commitment and all RAM for the VMs is fully reserved so no swapping.
sergey.g
Veteran
Posts: 452
Liked: 76 times
Joined: May 02, 2012 1:49 pm
Full Name: Sergey Goncharenko
Contact:

Re: Virtual Machine Compute Latency Analysis, how does it wo

Post by sergey.g »

Hi,

From what we've been able to learn about cpu latency, it could also be affected by NUMA nodes configuration, so even if there is no over-commitment some delays are possible. Are you saying that CPU latency is 155% in SCOM? This is definitely something we need to investigate and fix, so we would appreciate more details, could you also send us a screenshot of this metric in SCOM and in vSphere client?

Thanks.
dsellens
Novice
Posts: 4
Liked: never
Joined: May 09, 2014 6:09 pm
Full Name: Mordock
Contact:

Re: Virtual Machine Compute Latency Analysis, how does it wo

Post by dsellens »

There has not been any resolution posted for the issues represented here. A few things to look at for those hitting this thread. You need to do some drilling down into esxtop. The cpu latency counter in esxtop is %lat_c. (go to (c)pu, then (f)ields and show I for stats to see it). Use V to limit the display to just vms, then (l)imit the display to the VM in question. Then (e)xpand it to show all processes that make up the VM. In my case, I found that all of the CPU latency was in the vmx process, which is a low priority process that is used to control various non-critical communications and thus the 12% latency that I was seeing could be ignored for now. I have yet to find any more detail as to what might be causing it or how to resolve/reduce it so that it is not raising flags all over the place. If anyone has any insight, I would appreciate it.

Another note, there are 3 settings in most machine bios that impact cpu latency. Under processor, there is c-state and CE1(or CE1-state). These control whether cores are halted when not needed. The under power management, you can generally set High Performance or OS Control. These control whether the cores are running at full speed or are throttled down to as little as 50% when under reduced load. OS Control allows you to set the performance level in the vSphere client under Configuration/Power Management.

To see the impact of these power management settings use esxtop, press (p)ower. This will allow you to see the state of each core/hyperthread. If c-state is enabled, then columns for %C0, %C1, and %C2 will be displayed. If c-state is disabled, these columns will not appear.
bbrouhard
Lurker
Posts: 1
Liked: never
Joined: Feb 13, 2014 9:10 am
Full Name: Ben B
Contact:

Re: Virtual Machine Compute Latency Analysis, how does it wo

Post by bbrouhard »

Hello,

We had CPU latency alerts and after trying many different things to solve it, in the end the solution was to disable power management in the bios and on the ESX level.
http://kb.vmware.com/kb/1018206
sergey.g
Veteran
Posts: 452
Liked: 76 times
Joined: May 02, 2012 1:49 pm
Full Name: Sergey Goncharenko
Contact:

Re: Virtual Machine Compute Latency Analysis, how does it wo

Post by sergey.g »

Wow, thank you very much for your input.

Looks like we are receiving more and more confirmations that BIOS power management settings could cause issues with increased CPU latency. We'll make sure to include this information in our KB articles in the product, so that our customers are aware of the possible root causes when they encounter high CPU latency alerts in SCOM.

Once again, thank you all for useful comments and suggestions. And keep us posted, we would appreciate any piece of information that can help our customers and other Virtualization experts to identify root causes of poor VM performance.
Paul W.
Veeam Software
Posts: 87
Liked: 6 times
Joined: May 29, 2012 7:33 pm
Full Name: Paul Wallace
Contact:

Re: Virtual Machine Compute Latency Analysis, how does it wo

Post by Paul W. »

It seems I'm a bit late to this thread but I felt this contribution would be well placed here -

Host Power Management in VMware vSphere 5.5:
http://www.vmware.com/files/pdf/techpap ... here55.pdf

In my case, setting the BIOS DBPM setting to OS Control (enabling me to control the power management setting through ESXi rather than needing to reboot and do so through the BIOS) and changing the Power Management settings in ESXi to High Performance got rid of these alerts. The document above is a great resource for understanding these settings and configuring them for the best possible performance.
Post Reply

Who is online

Users browsing this forum: No registered users and 6 guests