VMWare VM CPU Usage: Feature Request extended time options

mervincm · Post by **mervincm** » Feb 11, 2014 5:50 pm this post

We have a Pure VMware environment that we use Veeam One monitoring increasing on.
We use the VM CPU Usage alarm to accomplish two goal.

1) find VM's that could use additional vCPU's. Planning does not always keep up with reality, and sometimes systems get used more than in the past, and our Operations team would like to find these systems and add more CPU resources, keep these machines running well.

2) find machines that are using more CPU time that they legitimately should be, because of a problem. Including situations like a run away process, hung machine, unintentional heavy task etc.

Like most environments, we have a mix of systems,
a) some that idle the majority of the time, and spike for a few minutes or so
b) some that idle the majority of the time, and heavy spike for 30min / one / two / three hours
c) some that do some significant CPU work regularly including short spikes for much of the workday

Using the default VM CPU usage alert on "a" type systems, it works well. no false positives, and if we get an alert it means something has gone wrong and we investigate and fix the issue. Using the default VM CPU usage alert on "b or C" type systems, it does not works well. Too many false positives.

excluding type b or c machines is not an option, otherwise we to not meet goal #2

So, I created an additional alert, a copy of VM CPU usage, calling it VM CPU usage -HEAVY. This is set to alert on 98% usage and for the max time of 60 minutes. Type b and c are assigned to this alert and excluded from the default alert.

Using the VM CPU usage -HEAVY alert on "b" type systems, it works well for only some of them. I still get too many false positives, since some of them have a heavy CPU task that can last more than 60 minutes. Using the VM CPU usage -HEAVY alert on "c" type systems, it works well. no false positives, and if we get an alert it means something has gone wrong and we investigate and fix the issue.

I am left with machines that idle almost all the time, but on occasion work hard for more than 60 minutes straight.
If veeam had an option to alert after 90/120/180 minutes etc, this group could be taken care of.

Thus my feature request

PS I don't add more VCPU to these tasks because they are batch processes, and it does not matter that they take an hour or two. Adding more vCPU to these would soon get me into trouble with scheduling CPU's impacting he environment as a whole. We stick to the "use as few vCPU as the job requires" rule as often as possible.

Post by **Vitaliy S.** » Feb 14, 2014 1:50 pm this post

Hi Mervin,

mervincm wrote:find VM's that could use additional vCPU's. Planning does not always keep up with reality, and sometimes systems get used more than in the past, and our Operations team would like to find these systems and add more CPU resources, keep these machines running well.

You can also use CPU ready metric to locate VMs that are experiencing lack of CPU resources.

mervincm wrote:I still get too many false positives, since some of them have a heavy CPU task that can last more than 60 minutes.

mervincm wrote:If veeam had an option to alert after 90/120/180 minutes etc, this group could be taken care of.

Yes, we can add more options to this list, however if these tasks happen on regular basis, have you considered configuring suppress period by a specific time period?

P.S. phew...it took me a while to read it and mull it over

Great post, btw!

Thanks!

mervincm · Post by **mervincm** » Feb 18, 2014 4:29 pm this post

mervincm wrote:find VM's that could use additional vCPU's. Planning does not always keep up with reality, and sometimes systems get used more than in the past, and our Operations team would like to find these systems and add more CPU resources, keep these machines running well.

You can also use CPU ready metric to locate VMs that are experiencing lack of CPU resources.

By lack of CPU resources, I mean "didn't add enough vCPUs to satisfy the workload"
I understand CPU ready to be useful to find cases of "added more vCPU's that the physical hardware could provide because it was busy feeding other virtual machines"
Do I have this incorrect?

mervincm wrote:I still get too many false positives, since some of them have a heavy CPU task that can last more than 60 minutes.

mervincm wrote:If veeam had an option to alert after 90/120/180 minutes etc, this group could be taken care of.

Yes, we can add more options to this list, however if these tasks happen on regular basis, have you considered configuring suppress period by a specific time period?
Suppressions would work if they could be done by VM, if there was a repeating pattern, and if I had the time to learn the pattern.
I just see a few more options here would be a lot easier and would take care of the majority of my false positives.

Post by **Vitaliy S.** » Feb 18, 2014 7:33 pm this post

mervincm wrote:I understand CPU ready to be useful to find cases of "added more vCPU's that the physical hardware could provide because it was busy feeding other virtual machines"
Do I have this incorrect?

You got this correct, I was just trying to point out that you can also track physical CPU usage via this metric.

mervincm wrote:I just see a few more options here would be a lot easier and would take care of the majority of my false positives.

Thanks for the feedback, I will ask our dev team to add more options then.

mervincm · Post by **mervincm** » Mar 31, 2014 5:31 pm this post

Is there any feedback on if this can be done, and if so some idea of time frame? We need to decide if this can or can not be used for our purposes.

Post by **Vitaliy S.** » Mar 31, 2014 9:32 pm this post

Yes, I have asked to add more time periods in the options list of v8. Meanwhile, I believe it might be possible to set different options via SQL script. I will check with the devs tomorrow.

mervincm · Post by **mervincm** » Apr 07, 2014 2:45 pm this post

great, thanks for the amazing customer interaction!

Post by **Vitaliy S.** » Apr 07, 2014 2:46 pm this post

Unfortunately, it is not possible to adjust these periods in v7, but at least I've seen them already added in one of the builds of the next version.

R&D Forums

VMWare VM CPU Usage: Feature Request extended time options

Re: VMWare VM CPU Usage: Feature Request extended time optio

Re: VMWare VM CPU Usage: Feature Request extended time optio

Re: VMWare VM CPU Usage: Feature Request extended time optio

Re: VMWare VM CPU Usage: Feature Request extended time optio

Re: VMWare VM CPU Usage: Feature Request extended time optio

Re: VMWare VM CPU Usage: Feature Request extended time optio

Re: VMWare VM CPU Usage: Feature Request extended time optio

Who is online