Host-based backup of Microsoft Hyper-V VMs.
Post Reply
MPECSInc
Service Provider
Posts: 24
Liked: 11 times
Joined: Jul 25, 2016 2:36 pm
Full Name: Philip Elder
Location: St. Albert, AB, Canada
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by MPECSInc »

5x Node Dell RX740
@cptkommin

2x NVMe cache devices to 10x HDDs = 1:5.
80TB Capacity : 12.8TB Cache = 16%

That's the first killer I see there. Cache disk count to capacity disk count is weak.

How are the two Intel E810 series pNIC ports set up?

We would set them up:
pNIC 0 Port 0 + pNIC 1 Port 0 = SET vSwitch - Management/VM Intent
pNIC 0 Port 1 + pNIC 1 Port 1 = NO SET vSwitch only dedicated I/O traffic.

That's an aggregate of 50Gbps across 5 nodes.

How many VMs? How many CSVs?

What are the switches?
cptkommin
Novice
Posts: 9
Liked: 3 times
Joined: Apr 17, 2023 6:25 am
Full Name: Fred Lessing
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by cptkommin »

@MPECSInc

The build was as quoted from the supplier.
From what I can see, our read cache hit is 98%+ (so good)
Our issue is more write latency related, but I haven't seen any metrics that could show me if there are write cache issues.

We have all 4 ports in a SET vSwitch, with 4xRDMA vNICs each set to a preferred interface, the management and backup vNICs have no preference.
We have 2x Dell S5248F switches. From the switching metrics and general network throughput metrics on the NICs, we do not see any cause for concern.

5 CSVs, 165VMs

Hence, taking the CPU spikes into account, the effect of cache to capacity, and seeing the write latency on the storage, the effect on the VM performance and CPU time. It makes me believe this could be cause and effect of the underlying file system issue identified.
joelg
Influencer
Posts: 17
Liked: 7 times
Joined: Jan 16, 2023 3:13 pm
Full Name: Joel G
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by joelg » 1 person likes this post

briani wrote: Oct 14, 2024 2:26 pm Could someone please explain how to actually open a Hyper-V case with MS and have them actively work on it for an extended period like this? Usually, we can't ever get past the third-party support partners and reach actual Microsoft teams and employees.
Does your organization have a Customer Success Account Manager or an Incident Manager? It might be something that only comes with E5, but that is how we had our case escalated.

Joel
nmdange
Veteran
Posts: 528
Liked: 144 times
Joined: Aug 20, 2015 9:30 pm
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by nmdange » 2 people like this post

cptkommin wrote: Oct 14, 2024 9:40 pm We are experiencing an issue with our WS2022 HCI S2D Cluster where our storage/CSV performance degrades rapidly once a node is taken offline, ie for Windows Patching. We see massive (Seconds latency) on our CSVs, causing the hosted VMs to start to fail, and causing data corruption. We also see random high write latencies during the day. In Event Viewer, the only event ID we find that claims anything to high latency is an EventID 9 under Hyper-V-StorageVSP Channel:

"An I/O request for device 'C:\ClusterStorage\3WM_CSV03\Virtual Machine Name\Virtual Hard Disks\Virtual Machine Name - C_Drive.vhdx' took 24040 milliseconds to complete. Operation code = SYNCHRONIZE CACHE, Data transfer length = 0, Status = SRB_STATUS_SUCCESS."
To be honest, I suspect your issue is not related to the bug in this post if it only happens when a node is offline. I can only say that I do not see your issue in my environment with multiple Server 2022 clusters with 4 nodes even when I forgot I left a node in maintenance mode for like 3 weeks. However, my #1 piece of advice to any Hyper-V customer running S2D is to only use Mellanox/Nvidia NICs. They seem to be the most stable when it comes to RDMA and other advanced Hyper-V Features. It does require ensuring you have correctly configured RoCE/Priority Flow Control on your switches.

There are some great folks that may be able to help you troubleshoot on the Azure Stack HCI slack channel at https://azurestackhci.slack.com/
mbroaders
Service Provider
Posts: 128
Liked: 11 times
Joined: May 15, 2012 9:06 am
Full Name: Martin Broaders
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mbroaders » 2 people like this post

Coming in late to this one. I have had issues with a Server 2022 HV Cluster where a SQL VM is seeing significant write latency within the guest OS but i could not see any indication of latency either on the hyper-v host or on the underlying storage. Would this be indicative of the issue here? And if so, if i am reading this correctly if i Live Migrate the VM as a test i should see a performance improvement as it resets the VM to be accessed in an unbuffered manner?
john.aubrey
Enthusiast
Posts: 31
Liked: 5 times
Joined: Jan 29, 2020 9:43 pm
Full Name: John Aubrey
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by john.aubrey »

briani wrote: Oct 14, 2024 2:26 pm Could someone please explain how to actually open a Hyper-V case with MS and have them actively work on it for an extended period like this? Usually, we can't ever get past the third-party support partners and reach actual Microsoft teams and employees.
I think Veeam was probably putting enough pressure on them about this.
mkeating44
Influencer
Posts: 13
Liked: 3 times
Joined: Jun 07, 2022 10:57 pm
Full Name: Michael Keating
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mkeating44 » 3 people like this post

mbroaders wrote: Oct 18, 2024 10:09 am Coming in late to this one. I have had issues with a Server 2022 HV Cluster where a SQL VM is seeing significant write latency within the guest OS but i could not see any indication of latency either on the hyper-v host or on the underlying storage. Would this be indicative of the issue here? And if so, if i am reading this correctly if i Live Migrate the VM as a test i should see a performance improvement as it resets the VM to be accessed in an unbuffered manner?
This has been what we see in our environment with this issue, yes. Live Migration should make this behave again.
cptkommin
Novice
Posts: 9
Liked: 3 times
Joined: Apr 17, 2023 6:25 am
Full Name: Fred Lessing
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by cptkommin »

stephc_msft wrote: Oct 01, 2024 7:54 am Investigations and work on a likely fix are ongoing, as are internal discussions with Veeam. More news in due course.
Hi, just checking in on this. Do you perhaps have any update on progress or status of the fix for us?

Thank You
MPECSInc
Service Provider
Posts: 24
Liked: 11 times
Joined: Jul 25, 2016 2:36 pm
Full Name: Philip Elder
Location: St. Albert, AB, Canada
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by MPECSInc »

cptkommin wrote: Oct 14, 2024 10:53 pm @MPECSInc

The build was as quoted from the supplier.
From what I can see, our read cache hit is 98%+ (so good)
Our issue is more write latency related, but I haven't seen any metrics that could show me if there are write cache issues.

We have all 4 ports in a SET vSwitch, with 4xRDMA vNICs each set to a preferred interface, the management and backup vNICs have no preference.
We have 2x Dell S5248F switches. From the switching metrics and general network throughput metrics on the NICs, we do not see any cause for concern.

5 CSVs, 165VMs

Hence, taking the CPU spikes into account, the effect of cache to capacity, and seeing the write latency on the storage, the effect on the VM performance and CPU time. It makes me believe this could be cause and effect of the underlying file system issue identified.
We would not go that route as far as the network setup goes.
SET vSwitch Team Port 0 on each pNIC.
Standalone port VLAN 999 + RoCE RDMA (PFC/ETS)
Standalone port VLAN 888 + RoCE RDMA (PFC/ETS)
All on Mellanox/NVIDIA.
Intel X7xx are to be avoided like the plague while E8xx are not far behind. :-(

1: Given the cache to capacity count I think what's happening here is that when a node is taken offline all of a sudden your disk access balance is off. That pulls in the spindles.
2: CSV count should be at least ten. 2x CSV per node = 10.
3: VM to CSV ownership should be the same to maximize performance. This may also be why things choke down.
4: Cache:Capacity when the node goes down Storage Spaces will rebuild any missing parity into free Pool Space. This again will drag in the spindles.

Question: How much free space is there in the Storage Pool?
cptkommin
Novice
Posts: 9
Liked: 3 times
Joined: Apr 17, 2023 6:25 am
Full Name: Fred Lessing
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by cptkommin » 2 people like this post

Hi, thank you for the input MPECSInc and nmdange, I appreciate it.

Some feedback.
We had a maintenance window this past weekend where we tested the live migration workaround. Success.

We saw a 10x decrease in latency, we were able to successfully put nodes into maintenance without write latency spiking into the seconds. In fact, before live migrations, the avg latency was ~25ms+. Once we live migrated all the VMs, our avg latency dropped to sub ms digits. We then proceeded to put nodes into maintenance one by one and the highest spike we saw was 5ms, no high increase in write latency into the 100s, none of the previous issues. We would start seeing VMs fallover once the node has been paused for ~20 minutes. Now, nothing, stable, and latencies were low. VMs were happy, everything was happy.

The cluster and its storage were stable. Our avg latency now during production hours, is ~2ms, and this is with the same IO footprint on the volumes. Further monitoring shows that the latencies start increasing after backups, but once I live migrate all the VMs, it drops again.

I am confident that this CBT(RCT)/ReFS bug is present on our platform.

Regarding our storage layout, MPECSInc, thank you for bringing this to my attention, I have over subscribed the pool a bit, and will be correcting that in a future maintenance window. But, I don't think this has much of an impact on the cluster if I look at the metrics now.

Regarding the NICs, yes, we are also firm believers in Mellanox/NVIDIA. However, this build was specced by our suppliers, and I will say, we have had our share of issues with the Intel NICs and their drivers, but with the current driver we have in place, they are behaving themselves nicely.

We have provided our findings (This forum, the case numbers in it, and the results of the maintenance over the weekend) on our MS Case, and will be speaking to an Escalation Team Lead and Support Engineer tomorrow, who will share their findings.
janeggen
Lurker
Posts: 1
Liked: never
Joined: Nov 19, 2024 8:05 am
Full Name: Jan Eggen
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by janeggen »

Any updates on this issue?

We have the exact "high disk latency af Veeam backup until live migration" issue with our 6 node Windows Server 2022 cluster running on a HPE Primera SAN cluster with 3 CSV's presented to Hyper-V.
Our consultants that installed our Hyper-V cluster have another customer with the exact same issue and they could replicate it on Windows Server 2016 and 2019 also and it also only happens on a VM that runs SQL server.

As a side note, we do not take SQL backups through Veeam, we use the built in maintenance plans to create .BAK and .TRN dumps to another server due to various reasons, so the Veeam backup causing the high disk latency is a pure VM backup without anything fancy.
We do experience this disk latency issue almost daily, sometimes 2-3 days apart.
Andreas Neufert
VP, Product Management
Posts: 7077
Liked: 1510 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Andreas Neufert »

Can you please check with HPE if you could temporarily deactivate ODX and see if this is helping in your case?
Twikkibanan
Lurker
Posts: 2
Liked: 1 time
Joined: Nov 19, 2024 10:13 pm
Full Name: Daniel Schmidt
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Twikkibanan »

Good evening folks. I have been pulling my hair out for months now. And finally I found this post.

I think I am experiencing the same issue as you guys, and just wanted to give my input.

We've installed a new 2 node cluster, server 2022 with S2D. All NVME disks.
However, people started reporting time-outs of databases, and slow performance on the SMB side.

We also installed Veeam, and I can for the life of me not understand why this would be slow. However, when I take a backup, Veeam reports source is 99% the bottleneck. However, my initial test of the server when it was clustered is I have a Read of about 12-15 GB/S read. And this should in no way be the bottleneck.

When the backup is taken, my VMs almost hault to a stop, and they are almost unuseable. I can max pull out 200Mb/s from them with Veeam. ???

Is the fix here to disable the CBT in the veeam job, or do I need to shut down the Vm, delete those files (.mrt + .rct) and start up the VM again?
EricDJ
Novice
Posts: 4
Liked: 2 times
Joined: Nov 10, 2020 1:49 pm
Full Name: Eric de Jonge
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by EricDJ » 1 person likes this post

Hi,
We also experiencing the same issue as you guys for several years now, first at Hyper-V 2019 and now on Hyper-V 2022.
For workaround we have a powershell script running, to do the daily overfail and resolve the latency issue.

After creating a support call at Veeam, they gave the following statement:
To clarify, I’d like to share an overview of how the Veeam backup process interacts with Hyper-V, explained in this article: https://helpcenter.veeam.com/docs/backu ... ml?ver=120, in summary about the backup process Veeam doesn't manage or directly manipulate either the Hyper-V host or the machines there, Veeam sends WMI call to trigger a checkpoint creation which is performed and managed by Hyper-V itself, the same process for the checkpoint removal.
In summary, Veeam doesn’t directly manage or manipulate Hyper-V hosts or VMs. Instead, it initiates backup processes through WMI calls that trigger checkpoint creation and removal, tasks which are fully managed by Hyper-V itself.
Due to this, reaching out to Microsoft would be the most suitable approach when you encounter performance issues like these. Since Veeam only initiates the checkpoint/snapshot processes without directly handling VM management, any performance issues that arise are likely related to the underlying Hyper-V management of these processes.
So, we created a Microsoft support call for this issue today, number: 2411200050000927
My question to all of you, perhaps we can bundle the Microsoft support calls in order to achieve a faster response?
gman42
Novice
Posts: 4
Liked: 1 time
Joined: Jun 13, 2024 3:43 pm
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by gman42 » 1 person likes this post

stephc_msft wrote: Oct 01, 2024 7:54 am Investigations and work on a likely fix are ongoing, as are internal discussions with Veeam. More news in due course.
I know Rome wasn't built in a day, but shouldn't there be some kind of update six weeks later? Even if it's an "oops sorry didn't fix the problem, back to the drawing board".
Twikkibanan
Lurker
Posts: 2
Liked: 1 time
Joined: Nov 19, 2024 10:13 pm
Full Name: Daniel Schmidt
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Twikkibanan » 1 person likes this post

EricDJ wrote: Nov 20, 2024 10:04 am Hi,
We also experiencing the same issue as you guys for several years now, first at Hyper-V 2019 and now on Hyper-V 2022.
For workaround we have a powershell script running, to do the daily overfail and resolve the latency issue.

After creating a support call at Veeam, they gave the following statement:



So, we created a Microsoft support call for this issue today, number: 2411200050000927
My question to all of you, perhaps we can bundle the Microsoft support calls in order to achieve a faster response?
I've created a Microsoft ticket today, and refered to your ticket number.

Its unacceptable that this has not been fixed yet. Especially if its fixed in 2025, it can be patched in 2022. If not, Microsoft need to provide us with free upgrades to 2025.

Let me know if you hear anything. I'm gonna push them hard now. :)
rold
Service Provider
Posts: 12
Liked: 8 times
Joined: Sep 14, 2016 12:04 pm
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by rold » 1 person likes this post

Good luck guys pushing that five-year-old issue :D :D
lgeorgiev
Lurker
Posts: 1
Liked: never
Joined: Dec 05, 2024 8:20 pm
Full Name: Lyubomir Georgiev
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by lgeorgiev »

Hello,

Does anyone have a reliable way to reproduce the issue as needed? We seem to get it randomly and I need to easily reproduce it for troubleshooting.
halvorsond
Novice
Posts: 3
Liked: 6 times
Joined: Jun 03, 2014 7:40 pm
Full Name: Dustin
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by halvorsond » 6 people like this post

We received an update on our support case, and with lots of communication from the product team. Bullet points of what's coming/what you need to do.

- Windows Server 2025, its resolved
- Windows Server 2022, a hotfix exists and can be obtained via support. A permanent fix is slated to be included in Q1 for updates, likely February.
- Windows Server 2019, a hotfix exists, but you are required to stay on an old security update. There will not be a new patch issued for this OS. There are no plans at this point of patching Server 2019 with a permanent fix.


Long story short....if you are on Server 2022 you should be getting the permanent patch in the coming 2 months. If you are still on Server 2019, start planning your upgrade to Server 2022 (or Server 2025 if you feel you need to be bleeding edge).
Why is Server 2019 not being patched? It came out of mainstream support in January 2024.
Rmachado
Service Provider
Posts: 26
Liked: 6 times
Joined: Dec 15, 2016 11:39 pm
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Rmachado »

Hey Halvorsond, thanks for the info!

Do you know the name of the 2019 hotfix and what security patch is this? can you share more info about that?

Many costumers still are on 2019 because of storage requirements.

Thanks!
SodaPop87
Influencer
Posts: 10
Liked: 3 times
Joined: Oct 26, 2023 3:09 pm
Full Name: Daniel Roth
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by SodaPop87 »

halvorsond wrote: Dec 07, 2024 6:19 pm We received an update on our support case, and with lots of communication from the product team. Bullet points of what's coming/what you need to do.

- Windows Server 2025, its resolved
- Windows Server 2022, a hotfix exists and can be obtained via support. A permanent fix is slated to be included in Q1 for updates, likely February.
- Windows Server 2019, a hotfix exists, but you are required to stay on an old security update. There will not be a new patch issued for this OS. There are no plans at this point of patching Server 2019 with a permanent fix.


Long story short....if you are on Server 2022 you should be getting the permanent patch in the coming 2 months. If you are still on Server 2019, start planning your upgrade to Server 2022 (or Server 2025 if you feel you need to be bleeding edge).
Why is Server 2019 not being patched? It came out of mainstream support in January 2024.
Hey thanks for the update Halvorsond. So, reach out to Microsoft support for the hotfix for 2022 if we don't want to wait for the Q1 patch? Is there a suggested way on how we should reach out? Thanks again.
andy.dedeckker
Lurker
Posts: 1
Liked: never
Joined: Dec 09, 2024 2:38 pm
Full Name: Andy De Deckker
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by andy.dedeckker »

I just started a ticket / request for the Hotfix through our o365 portal. I hope it will work that way.
Does anyone know the hotfix id? It will probably help /speed up the ticket.

@halvorsond: many thanks for the info.
halvorsond
Novice
Posts: 3
Liked: 6 times
Joined: Jun 03, 2014 7:40 pm
Full Name: Dustin
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by halvorsond »

At the time there was no ID for the hotpatch, just the exe (noted for testing purposes only).

If they are still offering this (its 6 months old now, and you would need to be downgrading your servers to get back to the July CU if you want to apply it), you need to obtain it via support / your CSM.

We are still waiting for the 2022 hotfix ourselves, and it likely comes with the same restriction that you won't be able to patch your servers while the hotfix is installed.
Post Reply

Who is online

Users browsing this forum: No registered users and 18 guests