nWorks Alerts finetuning

Unleash the power of System Center for vSphere and Hyper-V | Veeam Task Manager for Hyper-V

nWorks Alerts finetuning

Veeam Logoby thenamaris » Thu Nov 03, 2011 5:13 pm

Hello,

we just recently purchased the SCOM MP and it is giving us several Alerts. Most of them close on their own after some minutes. Since we are monitoring our systems through the emails we receive by SCOM, we would like to filter/reduce the emails we get, this means we would like to finetune the alerting and resolve issue and only get emails about real issues. Hence you can find below two of the most common alerts that auto-close (they appear many times per day). We would like some help on those, meaning some understanding on what they tell us / what we can do / if we should somehow override them:

Alert: nworks VMware: ESX Host VMHBA has logged SCSI aborts
Source: vmhba3:C0:T0:L8
Path: server1.thenamaris.gr;DISK:server1.thenamaris.gr
Last modified by: System
Last modified time: 11/3/2011 6:58:05 PM Alert description: VMHost LUN vmhba3:C0:T0:L8 on ESX Host server1.thenamaris.gr has exceeded threshold over 4 samples by logging 6 aborts.

Alert: nworks VMware: ESX Host VMHBA has exceeded threshold for queueLatency
Source: vmhba3:C0:T0:L8
Path: server2.thenamaris.gr;DISK:server2.thenamaris.gr
Last modified by: System
Last modified time: 11/3/2011 3:58:08 PM Alert description: VMHost LUN vmhba3:C0:T0:L8 on ESX Host server2.thenamaris.gr has exceeded threshold over 2 samples by logging 2390 ms.

The latter also often comes up with TotalReadLatency or TotalWriteLatency.

Many Thanks in advance.
thenamaris
Novice
 
Posts: 4
Liked: never
Joined: Thu Nov 03, 2011 4:39 pm
Full Name: Thenamaris Inc.

Re: nWorks Alerts finetuning

Veeam Logoby ZachW » Fri Nov 04, 2011 12:14 am

Hi,

Please open up a case with support and we would be more than happy to assist you with this.

http://www.veeam.com/support-form.html

-Zach
ZachW
Enthusiast
 
Posts: 68
Liked: 10 times
Joined: Tue Aug 02, 2011 6:09 pm
Full Name: Zach Weed

Re: nWorks Alerts finetuning

Veeam Logoby thenamaris » Fri Nov 04, 2011 10:07 am

Hello and many thanks for the answer.
We have opened a support case.
thenamaris
Novice
 
Posts: 4
Liked: never
Joined: Thu Nov 03, 2011 4:39 pm
Full Name: Thenamaris Inc.

Re: nWorks Alerts finetuning

Veeam Logoby Alec King » Mon Nov 07, 2011 7:16 am

Hi! I would also say, from the two alerts that you listed - you are having some problem with your back-end storage.
The aborts monitor is looking for storage commands that have timed out.
And the latency monitor is looking for storage commands which are spending too long in the internal vmkernel queue waiting to be processed.

I'd advise diving into the performance and configuration of that VMHBA on that host. 6 aborts is bad but not terrible, however queue latency of 2390ms = two and a half seconds! That is a lifetime of waiting in disk IO terms.

I'd say you have a storage performance issue on that host. And I'd say the nworks MP is working as designed by alerting you to that! :wink:

Cheers,
Alec
Alec King
Vice President, Product Management
Veeam Software
Alec King
Veeam Software
 
Posts: 700
Liked: 116 times
Joined: Sun Jan 01, 2006 1:01 am

Re: nWorks Alerts finetuning

Veeam Logoby thenamaris » Tue Nov 08, 2011 8:43 am

Hello Alec and many thanks for the answer.
We have contacted our IT infrastructure support in order to investigate the backend issue.
I will revert as soon as possible.
thenamaris
Novice
 
Posts: 4
Liked: never
Joined: Thu Nov 03, 2011 4:39 pm
Full Name: Thenamaris Inc.

Re: nWorks Alerts finetuning

Veeam Logoby thenamaris » Mon Nov 14, 2011 1:48 pm

Hello all,

Some new issues with totalReadLateny and totalWriteLatency have appeared on some LUNs.

The default threshold levels are:

totalWriteLatency: 60/100
totalReadLatency: 100/250

The ‘problematic’ LUNs produce values that range between:

totalWriteLatency: 65 - 410
totalReadLatency: 110 - 480

No overrides have been set up.

From your experience, do you think that these metrics should be overridden?
Are these thresholds a bit “strict” or should we check our storage infrastructure for bottlenecks?

Last but not least, we're kind of puzzled by the definition of the deviceReadLatency/deviceWriteLatency counters:

The “Product knowledge” tab for the above metrics state:

*** totalReadLatency ***
This totalReadLatency counter shows the latency from vmkernel to device (HBA) through to the back-end storage, e.g. SAN.
Note there is another counter deviceReadLatency that show latency from vmkernel to HBA only, this should help you troubleshoot where the performance bottleneck is located.

*** totalWriteLatency ***
This totalWriteLatency counter shows the latency from vmkernel to device (HBA) through to the back-end storage, e.g. SAN.
Note there is another counter deviceWriteLatency that show latency from vmkernel to HBA only, this should help you troubleshoot where the performance bottleneck is located.

So that means that deviceReadLatency and deviceWriteLatency check the VM <--> HBA path.

But, copying from your “metrics definition” (http://www.veeam.com/support/metrics/dictionary.html):

*** deviceReadLatency ***
The average amount of time taken to complete a read from the physical device.
This is the time from the device to the HBA in milliseconds.

*** deviceWriteLatency ***
The average amount of time taken to complete a write to the physical device.
This is the time from the HBA to the device in milliseconds.

So here, these 2 metrics seem to check the HBA <--> Device (Storage) path.

Can you please clarify which path these metrics exactly monitor?

Thanks in advance.
thenamaris
Novice
 
Posts: 4
Liked: never
Joined: Thu Nov 03, 2011 4:39 pm
Full Name: Thenamaris Inc.

Re: nWorks Alerts finetuning

Veeam Logoby vBPav » Thu Jan 19, 2012 7:05 am

Hello,

We will be releasing our Best Practice and Advanced Configuration Guide here shortly which will explain in detail how you may want to tune the Latency monitors. The short answer is, YES, you will probably want to tune these monitors for your environment. Disk latency is dependant on several factors.

IO throughput
LUNs ability to service IO

If you have some LUN with slow storage (iSCSI with SATA disks for example) you can expect a higher latency versus a LUN with fast storage (fiber with fiber disks). Baselining using our reports would be the best way to determine which thresholds you should set for each vmHBA. It is always a good idea to baseline the different types of storage in your environment. You may come to realize that for faster storage, a 40-60ms response time is expected where for slower storage a 100-200ms may be expected.

The monitors "totalWriteLatency" and "totalReadLatency" measure the total time it takes to write/read data from the kernel to the HBA to the SAN and then back. deviceReadLatency and deviceWriteLatency is the time it takes just from the Kernel to the HBA. High Device latency is an indication of some sort of issue or bottleneck at the Host/HostHBA level. A low Device Latency, but a high Total Latency is an indication that the SAN is having performance issues.

Keep a look out for our BPAC Guides. These should be published soon! :)
Brian Pavnick | Cireson| Solutions Architect

- Follow me on Twitter @ vbpav
- Reach me on e-mail @ brian.pavnick@cireson.com
vBPav
Expert
 
Posts: 181
Liked: 13 times
Joined: Wed Jan 13, 2010 6:08 pm
Full Name: Brian Pavnick

Re: nWorks Alerts finetuning

Veeam Logoby treemon » Wed Feb 29, 2012 11:21 am

Hi there

we are also getting a few latency issues
are there any updates on the BPAC guides?

tx
treemon
Lurker
 
Posts: 1
Liked: never
Joined: Wed Feb 29, 2012 11:19 am

Re: nWorks Alerts finetuning

Veeam Logoby Alec King » Wed Feb 29, 2012 7:28 pm

Hi, the BPAC Guides have been released and are in the downloads section here - http://www.veeam.com/vmware-microsoft-esx-monitoring/resources.html
Enjoy! :D
Alec King
Veeam Software
 
Posts: 700
Liked: 116 times
Joined: Sun Jan 01, 2006 1:01 am


Return to Veeam Management Pack for Microsoft System Center



Who is online

Users browsing this forum: No registered users and 4 guests