Datastore Latency Analysis monitor

Unleash the power of System Center for vSphere and Hyper-V | Veeam Task Manager for Hyper-V

Datastore Latency Analysis monitor

Veeam Logoby nico.weytens » Tue May 27, 2014 9:42 am

Daily we get several alerts from the Datastore Latency Analysis monitor. We are bugging our storage team about them when we see a pattern, but they claim at their end everything is just fine, that we are exaggerating the problem. There would only be sporadic/short latency issues that can safely be ignored.

I've examined the Product Knowledge for this monitor, and the possible overrides, but I'd like some extra info.
The Product Knowledge summary:
This monitor tracks threshold breaches for the following metric: maxDeviceLatency - the highest of maxDeviceReadLatency and maxDeviceWriteLatency
This is a 'Top N' monitor - the top hosts reporting latency, and their I/O to this datastore, will be listed in the alert description.

Possible overrides, with their default values:
  • Instance Count 5
  • Num Samples 1
  • Threshold1 40
  • Threshold2 80
I figure the Instance Count of 5 stands for the Top N values in the alert, while the thresholds represent 40ms warning and 80ms critical level. I'm somewhat confused on the Num Samples value though... The sample interval isn't mentioned anywhere: not in the monitor, nor in the maxDeviceLatency/maxDeviceReadLatency/maxDeviceWriteLatency collection rules.

Am I correct to assume the interval is the value we've set in our VES webportal for collection interval of the collectors? The default is 5mins, but we have it on 10.
So if we'd set the Num Samples value to 2, the monitor would only spawn an alert when the threshold is breached over 2 consecutive collections, in our case 10mins apart.

Any holes in my reasoning? :)
nico.weytens
Influencer
 
Posts: 17
Liked: 2 times
Joined: Mon Jul 02, 2012 8:30 am
Location: Belgium
Full Name: Nico Weytens

Re: Datastore Latency Analysis monitor

Veeam Logoby Alec King » Tue May 27, 2014 9:48 am

Hey Nico!

You are entirely correct :D

Num Samples defines how many over-threshold-triggers we need, before we generate an alert. And each sample is delivered on the poll schedule you defined in Veeam Extensions settings.

So if you override Num Samples to 2, then it will be 2 x 10 minutes = 20 minutes (in your configuration) before average latency generates an alert.

Cheers,
Alec
Alec King
Veeam Software
 
Posts: 700
Liked: 116 times
Joined: Sun Jan 01, 2006 1:01 am

Re: Datastore Latency Analysis monitor

Veeam Logoby nico.weytens » Tue May 27, 2014 10:38 am

alright, great :D

Still 2 remarks though...
1. isn't it 10mins apart, not 20? Because on a timeline it's sample-10min-sample-10min-sample-10min etc, I mean: 2 samples are 10mins apart
2. what do you mean with 'average' in before average latency generates an alert. This monitor doesn't make averages, does it? It samples taken breach the threshold, or they don't. No?
Or is the sample itself already an average from the 10min interval?
If the latter would be the case, then I don't see why our storage team can claim there are only short spikes. If an average latency over 10mins is over 40ms, then that's BAD!

*10min in our situation, the default is 5min
nico.weytens
Influencer
 
Posts: 17
Liked: 2 times
Joined: Mon Jul 02, 2012 8:30 am
Location: Belgium
Full Name: Nico Weytens

Re: Datastore Latency Analysis monitor

Veeam Logoby Alec King » Tue May 27, 2014 10:52 am

1. OK, so what I meant by "20 minutes" was ~20 minutes since sampling started. The timeline could be -
00.00 Collector sampling starts
00.02 high latency starts
sample for 10 mins...
00.10 deliver sample of >40 ms
sample for 10 mins...
00.20 deliver sample of >40 ms
If NumSamples = 2, then now we get an alert.
So, it could be ~20 minutes after high latency started (in my example, 18 minutes after). But you are correct, the time between samples is 10 minutes.

2. The latency metric is an average over the sample interval, we take "realtime" samples (in vCenter, that's every 20 seconds) and average those to deliver each data point.
So, if you get a sample of >40ms in SCOM; then in your case that means average latency over 10 minutes was >40 ms. And I agree with you - that's bad! That's why our default NumSamples setting is 1 :wink:
I'd say, that MP is working correctly as designed to give you those latency alerts, and maybe you should talk with your storage team again.....
Alec King
Veeam Software
 
Posts: 700
Liked: 116 times
Joined: Sun Jan 01, 2006 1:01 am

Re: Datastore Latency Analysis monitor

Veeam Logoby nico.weytens » Wed May 28, 2014 6:24 am

OK, crystal clear, Alec. Thanks for that.

We'll take this up with our storage guys again.
nico.weytens
Influencer
 
Posts: 17
Liked: 2 times
Joined: Mon Jul 02, 2012 8:30 am
Location: Belgium
Full Name: Nico Weytens

Re: Datastore Latency Analysis monitor

Veeam Logoby keithkleiman » Fri May 29, 2015 4:39 pm

Alec,

Some clarification the following comment:

"2. The latency metric is an average over the sample interval, we take "realtime" samples (in vCenter, that's every 20 seconds) and average those to deliver each data point."

So the "collection interval" in the "collector settings" of the web UI is a sample taken from an average of 20 second samples from vCenter? In other words...

If my "collection interval" in the collector settings is set to 15 minutes, then I am capturing a sample that is written back to SCOM every 15 minutes. That sample (taken every 15 min) is actually an average of 45 samples [15 (collection interval) x3 (vsphere samples per minute)] from vsphere.

So if I increase the sample to "2" in the "Veeam VMware: Datastore Latency Analysis" monitor, A performance sample will still be written every 15 minutes (per the collector settings), however an alert will not be generated by the monitor until 30 minutes has passed and both 15 min samples written to scom averages over the thresholds.

TIA,
Keith
keithkleiman
Enthusiast
 
Posts: 42
Liked: never
Joined: Mon May 23, 2011 8:38 pm
Full Name: Keith Kleiman

Re: Datastore Latency Analysis monitor

Veeam Logoby Alec King » Fri May 29, 2015 4:47 pm

Hi Keith,

Yes you got it exactly 8)

Cheers,
Alec
Alec King
Veeam Software
 
Posts: 700
Liked: 116 times
Joined: Sun Jan 01, 2006 1:01 am

Re: Datastore Latency Analysis monitor

Veeam Logoby stanyb » Tue Jan 19, 2016 8:35 am

Hi Nico,
last weeks we discover the same issue like you had mid 2014. Did your storage guys do something or did you find another way to solve it?

rgds,
Stany
stanyb
Influencer
 
Posts: 22
Liked: 2 times
Joined: Wed Nov 24, 2010 12:09 pm
Full Name: Stanislas Borgilion

Re: Datastore Latency Analysis monitor

Veeam Logoby sergey.g » Thu Jan 21, 2016 12:52 pm

Hi,

Hopefully Nico could reply to your question with his experience of dealing with the issue, but what I would recommend if we are talking about datastore latency alarms, is to check Datastore Traffic Analysis dashboard for the affected datastore, if you can spot a specific VM which has increased activity going around the time you receive latency alarm - it could be a root cause of the issue. If Datastore usage is within expected barriers, then probably it's a time to review storage configuration. However I would recommend also checking VM latency values in the Datastore Latency Analysis dashboard - if some group of VMs are more affected than others - it could be useful to check if they reside on the same host - it could be a host storage issues as well. There is a kernel latency counter on each host, so check this one too.

Hope this could be helpfull.
Thanks.
sergey.g
Veeam Software
 
Posts: 453
Liked: 75 times
Joined: Wed May 02, 2012 1:49 pm
Full Name: Sergey Goncharenko

Re: Datastore Latency Analysis monitor

Veeam Logoby stanyb » Fri Jan 22, 2016 8:03 am

Hi Sergey,
thanks for you reply. We believe it's storage related, but we have to convince our storage vendor of this.I'm not very familiar with the Veeam MP reports, but I'll check them today.

Concerning kernel latency counters, I know that the 4 hosts, that are connected to the storage box on which we discover these issues, are sometimes giving these alarms.

rgds,
Stany
stanyb
Influencer
 
Posts: 22
Liked: 2 times
Joined: Wed Nov 24, 2010 12:09 pm
Full Name: Stanislas Borgilion


Return to Veeam Management Pack for Microsoft System Center



Who is online

Users browsing this forum: No registered users and 2 guests