Monitoring and reporting for Veeam Data Platform
Post Reply
kevdpc
Influencer
Posts: 24
Liked: 2 times
Joined: Feb 18, 2020 5:45 pm
Full Name: Kevin Chubb
Contact:

Help me understand datastore latency alarms

Post by kevdpc »

Lately I've been getting alarms for datastore read and write latency and I'm having trouble understanding what I'm seeing.

For example I have an alarm email at 8:50 PM with details stating: "Disk/Datastore: Datastore Read Latency" (102.0 Milliseconds) is above a defined threshold (100.0 Milliseconds)

Here are the 'Datastore read latency' alarm rules (pretty sure they're the default).

Image

Here is the 'Datastore Read Latency' performance graph for 'Last day.' You can see that there is a spike up to 7 ms at about 8:50 PM.

Image

So why does the email alarm say 102.0 ms but the performance graph only shows 7 ms?
RomanK
Veeam Software
Posts: 745
Liked: 190 times
Joined: Nov 01, 2016 11:26 am
Contact:

Re: Help me understand datastore latency alarms

Post by RomanK » 1 person likes this post

Hello Kevin,

As far as I remember, the alarm checks the max_value metric, while performance graphs use several aggregated current_value metrics.
Having that, the alarm is more precise and you indeed faced 102 ms around 8:50:00, while the graph checked values at 8:49:10, 8:49:30...8:50:10 and provided a single average value for the point on a graph.

Thanks
kevdpc
Influencer
Posts: 24
Liked: 2 times
Joined: Feb 18, 2020 5:45 pm
Full Name: Kevin Chubb
Contact:

Re: Help me understand datastore latency alarms

Post by kevdpc »

Okay that's helpful, thank you.

If it's checking the max value then what's the purpose of the 15 minute time period?
kevdpc
Influencer
Posts: 24
Liked: 2 times
Joined: Feb 18, 2020 5:45 pm
Full Name: Kevin Chubb
Contact:

Re: Help me understand datastore latency alarms

Post by kevdpc »

Also if it's checking the max value then why is the field called Aggregation?
RomanK
Veeam Software
Posts: 745
Liked: 190 times
Joined: Nov 01, 2016 11:26 am
Contact:

Re: Help me understand datastore latency alarms

Post by RomanK »

Hello Kevin,

Aggregation is just a general label for the alarm rule. It is possible to use min, max and avg functions against the data set as the rule might be applied to the multiple alarms with customization.
So the alarm rule populates the performance data and select min/max or avg among all numbers for that period. Then it check thresholds and start again. In practice 15 minutes it enough to prevent the alarm storm.

Thanks
kevdpc
Influencer
Posts: 24
Liked: 2 times
Joined: Feb 18, 2020 5:45 pm
Full Name: Kevin Chubb
Contact:

Re: Help me understand datastore latency alarms

Post by kevdpc »

Okay I understand now, thank you.
Post Reply

Who is online

Users browsing this forum: No registered users and 10 guests