NetApp OnCommand Unified Manager Reporting Errors Related to Veeam

May 28, 2020 1:15 pm

This is a common question that comes in to the NetApp Alliance here at Veeam in regards to error alerts being generated by NetApp's ONTAP monitoring tool called "OnCommand Unified Manager" or "OCUM" for short. These errors often look like this:

An incident was generated by localhost that requires your attention.

Incident - Volume Offline
Impact Area - Availability
Severity - Critical
State - New
Source - svm01:/VeeamAUX_EXCHANGE_LOGS_Rescan
Trigger Condition - Volume offline

It’s important to understand some behavior of Veeam and the OnCommand Unified Manager’s monitoring to understand why these alerts are happening.

Veeam’s Behavior:
By default, every 10 minutes, Veeam will check to see if there are any snapshots on volumes that we are allowed to scan that have snapshots created in them that we (Veeam B&R) didn’t create. If we find some, we then have to scan that snapshot to determine what VMs are in that snapshot. If the volume in question is a “block” volume, meaning it has a LUN in it, we have to create a FlexClone of that volume (it’s a Veeam created FlexClone when it has “VeeamAux” in the name) and then mount that snapshot to a Veeam proxy for scanning. Once that LUN has been scanned, we offline the cloned volume and delete the FlexClone.

OCUM’s Behavior:
OnCommand Unified Manager is reporting on seeing FlexClone volumes go offline and sending alerts on this expected behavior. OCUM only scans on regular intervals, so not all Veeam activity will be alerted on, as we may create and remove FlexClones before they would be discovered by OCUM. This is the reason why the alerting may be “sporadic” or that all activity is not generating an alert.

What can be done:
First, we need to verify that in VBR, we are properly using the “All volumes except” or “Only the following volumes” functionality to verify we are not scanning non-VMware VMDK volumes.

Also, it’s recommended that you are using Veeam to schedule and manage the creation of all snapshots in volumes that we are scanning, and not using a third party tool or scheduling that snapshots with ONTAP directly.

Second, it’s possible to adjust what OCUM alerts on. If you are getting false positive alerts, they can be disabled. See here: https://library.netapp.com/ecmdocs/ECMP ... CF24F.html

Thirdly, you can change Veeam’s default 10 minute rescan timer if absolutely necessary. This can be done with the following registry key:

SanMonitorTimeout
Type: REG_DWORD
Default value: 600 (seconds)
Description: defines how frequently we should monitor the SAN infrastructure and run incremental rescan in case of new instances

Hope this helps everyone to understand what's happening in your monitoring tools!

Regards,
Adam Bergh

NetApp OnCommand Unified Manager Reporting Errors Related to Veeam

NetApp OnCommand Unified Manager Reporting Errors Related to Veeam

Who is online