Metrics skewed - Hyper-V 2019 Cluster

JRRW · Post by **JRRW** » Nov 04, 2020 7:08 pm this post

All,
I'm on the CE so figured I'd start here, before going to support:

(5) 2019 Hyper-V (Not core) installs in a cluster, with ~ 20 CSV running 2x8gb FC to 3PAR 7200c and 2x8gb to SAN and a Pure x50

I'm seeing some really odd metrics that don't match Task manager and/or other monitoring in two key areas:
-Disk latency
-Memory usage

For Disk Latency when I compare max to whatsup gold monitoring the same systems, I see a huge difference (as in WUG shows latency of <1-2ms and Veeam reports >7-20ms) that doesn't make any sense, our SAN switches don't show even half the bandwidth being used, and our SANs are asleep (the Pure x50 is an all NVMe array making it next to impossible to get high latency outside of network congestion)
When we run weekly performance reports, multiple CSV will show 'error' on latency with max latency waaayyy beyond what it should be hitting, and so I'm wondering if they're wrong.

For Memory, it mostly errors at hyper-v services memory usage - but these hosts all have at LEAST 150Gb of RAM free at all times; they're 512gb RAM and rarely use more than 300-330gb per host.

Anyone see a similar issue in their environments?

Post by **HannesK** » Nov 05, 2020 6:20 am this post

Hello,
task manager has different metrics, yes. Can you maybe tell us what "WUG" is? I tried to google it, but no luck.

Veeam ONE uses the values that Microsoft / Hyper-V (well, or VMware if used) gives us. If you don't believe in them, then I can only recommend to check with support.

Best regards,
Hannes

JRRW · Post by **JRRW** » Nov 06, 2020 9:19 pm this post

Hi Hannes,

WUG=Whatsup Gold (I reference it in 'when I compare max to whatsup gold monitoring') - sorry for the acronym usage!

I just sat through an SAP presentation and wanted to throw things in anger over how many acronyms they use...

So my question then is if there is documentation on where some of these metrics are pulled /combined in/from. Even within VeeamONE itself many of the alarms and such define very vaguely.

In specific for Disk Latency on CSV: When I look at the metrics on WhatsUp Gold which is polling CSV latency (looking at it on each host, not all hosts combined) I do not show the latency VeeamONE is reporting --- is that because VeeamONE is combining ALL hosts CSV Latency into 'one' metric? If so, the alarms/warnings are super misleading and not all that helpful.
In example then during a poll at Thursday at 23:45:

Host01 -> CSV01 = 2.5ms Latency
Host02 -> CSV01 = 1.5ms Latency
Host03 -> CSV01 = .5ms Latency
Host04 -> CSV01 = .25ms Latency
Host05 -> CSV01 = .25ms Latency

Does Veeam report under Hyper-V Cluster -> Cluster Shared Volumes -> CSV01 = 2ms (as the 'max' during that poll) or does it show as 5ms (2.5+1.5+.5+.25+.25=5)

For memory: is this Hyper-v services Memory? This Page defines the alarm as "Average Hyper-V Services memory usage for 15 minutes is above 80%." and reason of "This host is low on available memory." which in 99% of the cases we get this warning, is not true. The hosts have plenty of remaining unused memory.

~PerplexedInArms

wishr · Nov 09, 2020 10:29 am

Hi Ryan,

Could you please create a support case and let me know your case ID? I'll ask our support team to collect all the details and take a look at them.

Thanks

R&D Forums

Metrics skewed - Hyper-V 2019 Cluster

Re: Metrics skewed - Hyper-V 2019 Cluster

Re: Metrics skewed - Hyper-V 2019 Cluster

Re: Metrics skewed - Hyper-V 2019 Cluster

Who is online