Host-based backup of Microsoft Hyper-V VMs.
Post Reply
stephc_msft
Technology Partner
Posts: 30
Liked: 26 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft »

Hello all
Can anyone confirm if seeing the IO slow down of an affected VM when a manual checkpoint is taken. And then when checkpoint is deleted (merged).
We expect a slight degredation when checkpoint and .avhdx in use, but some people seeing significant slow down, then ok again after the merge.

However in the problem scenario the veeam initiated backup checkpoint (and merge after backup finishes) seems to be somewhat different
(unclear how much degradation while backup active and checkpoint there, but obviously expect some due to disk contention etc)
and have the issue where sometimes it remains degraded (or even more degraded?) when backup finishes and merge has happened.
slwaldrop
Novice
Posts: 4
Liked: 1 time
Joined: Mar 02, 2023 3:10 pm
Full Name: Shane Waldrop
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by slwaldrop » 1 person likes this post

We've been constantly dealing with slowness since moving from a 2012 R2 cluster to 2019 and have worked around it by writing custom PowerShell scripts to Live Migrate the VMs during the health check or merge phases. Live Migrating definitely fixes the issue but it's been nothing but trying to find and implement workarounds. Has anyone that's tried the patch had it actually work?
Stev0
Lurker
Posts: 1
Liked: 2 times
Joined: Mar 03, 2023 4:51 pm
Full Name: Steve
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Stev0 » 2 people like this post

This thread has been very informative. I have a very different use case, but what I found might be closely related. In my case, it used to be (in Server 2012 and 2016) that I could do unbuffered writes to a file, perform buffered reads back from the same file, and writes would not be sacrificed.

Ever since attempting to support Server 2019, it appears that the buffered reads dramatically impact the ability to write those files at full speed. Write queue depths do indeed drop substantially when attempting the read. If this is a regression that was introduced, would it explain the issues seen here? I would like to see the details of the private fix in case it addresses my issue as well.

Hoping someone insightful can help me determine if these dots connect. Cheers!
joelg
Influencer
Posts: 17
Liked: 7 times
Joined: Jan 16, 2023 3:13 pm
Full Name: Joel G
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by joelg »

We installed the beta fix on our disaster recovery site Hyper-V host and have not had an error since (Jan 19), however we were only seeing these approximately one night a month on that host, so not a great test.

We're hesitant to install a beta fix on our prod hosts where we're having these errors most days. So I tried disabling CBT in the backup jobs, shutting down the guests and deleting the MRT and RCT files for some of our guests. We still noted the errors with CBT disabled. To workaround the issue, would the VM's still need to be migrated to another host?

Joel
MarkGould
Lurker
Posts: 1
Liked: never
Joined: Mar 08, 2023 12:16 am
Full Name: Mark Gould
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by MarkGould »

stephc_msft wrote: Feb 05, 2023 5:44 pm KB900379 is the test fix (and that is just a temporary KB number, it will not called that if/when gets rolled up into an update)
Ping me if you want to try it (ws2019/ws2022), but note this does not guarantee it will become an official fix.

The above is for the RCT issue
Also note, there seems to be a similar issue where significant io degradation can occur even without RCT being used.
And again the only way to 'clear' the host problem state is to reboot the host or live migrate the affected VM to a 'fresh' host.
This is also being investigated.
Stephen

Any ideas how I can contact you as I am being told "We are sorry, but you are not authorised to use this feature. You may have just registered here and may need to participate more in discussions to be able to use this feature."

We are on Arcserve UDP9 on new R450's (WIN2022) with an MD5024 with 12Gb SAS on Raid 5 Linear and are seeing this issue.
I have case open with Dell and Microsoft and have just requested KB900379
I believe we are seeing the exact issue particularly on our main data server that holds user profiles

Thanks

Mark
steendp
Influencer
Posts: 11
Liked: 3 times
Joined: Jan 11, 2023 2:47 pm
Full Name: Steen Dalsgaard Pedersen
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by steendp »

stephc_msft wrote: Feb 14, 2023 1:56 pm Just a reminder, there seem to be two separate, distinct issues, although seem very similar and somewhat related.
1. Issue with RCT leading to some serialization of io
2. Issue with IO slowdown after veeam backup even without RCT [and also possibly triggered by other things involving checkpoints][and only on ceratin disks with certain workloads]

In fact most people are probably hitting issue2, and this is the one live migrating to a fresh host works around

A characteristic of Issue 2 seems to be the Queue Length in the problem state shows as very low.
[as per the user 'rold' 's finding above]

If anyone can perfmon their HV host and look at physicaldisk counter for the relevant disk (or csv) and check on the Avg. Disk Write Queue Length, it would be useful.
This might confirm a highish number (20+) in good state, and bad a number (0 or 1) in bad state.
Exactly why queue depth usage is getting affected in the problem state is being investigated.
Hi Stephen,

I can confirm this. I ran diskspd and Avg. Disk Write Queue Length before live migration was just below 1 and after migration closer to 20 (fluctates a bit in testing).

Difference in performance from diskspd was crazy:
Total IO
thread | bytes | I/Os | MiB/s | I/O per s | file
------------------------------------------------------------------------------
PRE MIGRATION total: 262471680 | 4005 | 4.17 | 66.75
POST MIGRATION total: 43597365248 | 665243 | 692.85 | 11085.60

I'm really really interestered in any progress/testing/suggestions you might have.

Br Steen
peno@edgemo.com
Novice
Posts: 4
Liked: never
Joined: Feb 02, 2023 9:42 am
Full Name: Peter Rostén Nørredal
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by peno@edgemo.com »

Hi
It seems that all troubleshooting activities has come to a HALT
Hopefully MS is still investigation this issue
bhead
Influencer
Posts: 12
Liked: 6 times
Joined: Sep 30, 2020 9:18 am
Full Name: Bjoern Goerlich
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by bhead »

Hello everyone,

unfortunately there are no news on this issue in todays release notes :evil:

https://support.microsoft.com/en-us/top ... 57e1600ccf

Has anybody rolled out KB5023702 yet?

For some reason the IO issue stopped inside one of our VMs after installing Veeam V.12 last week.
This is getting even worse and more frustrating!
bhead
Influencer
Posts: 12
Liked: 6 times
Joined: Sep 30, 2020 9:18 am
Full Name: Bjoern Goerlich
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by bhead »

Hello everyone,

I ran some tests today after installing KB5023702 and unfortunately, nothing has changed.

I also talked to somebody who's running thousands of VMs in a similar environment.
The only difference: They are using FC instead of ISCSI.
They're not dealing with this issue.

I kindly asked Microsoft to reopen our case!

Regards
mbauer
Lurker
Posts: 2
Liked: 1 time
Joined: Mar 27, 2023 3:05 pm
Full Name: Mike Bauer
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mbauer »

@stephc_msft We have been experiencing VM disk I/O performance problems since upgrading our failover cluster from Hyper-V Server 2016 to Hyper-V Server 2019. Hosts are directly connected to the SAN via FC. Live migrating the VMs to another host temporarily resolves the issue until the next backup job runs. It does seem slightly better after installing KB5023702. How can we obtain the KB900379 private hotfix to test?
mbauer
Lurker
Posts: 2
Liked: 1 time
Joined: Mar 27, 2023 3:05 pm
Full Name: Mike Bauer
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mbauer » 1 person likes this post

@stephc_msft supplied us with the KB900379 private hotfix which we initially installed on all Hyper-V Server 2019 hosts in our DR environment. Disk transfer speeds increased from ~20 MB/s to 400+ MB/s and are stable now, even after running multiple Veeam backup jobs. I was a little nervous but decided to install the hotfix today on one of the hosts in our production environment. I will let everyone know how it goes.

@stephc_msft do you know when an official fix will be released for this issue?
isnetworking
Lurker
Posts: 1
Liked: never
Joined: Mar 31, 2023 12:16 am
Full Name: Garrett Hooper
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by isnetworking »

Hey @stephc_msft ,

Is there anyway I could get this test patch sent to me? I tried to PM you, but my account is too fresh. We're suffering from the same issues, I just ran CUA and it didn't pick up on either of the two patches mentioned in this thread.

Our open Veeam case # 05981808
the_extremist
Lurker
Posts: 1
Liked: never
Joined: Apr 06, 2023 8:39 pm
Full Name: extremist
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by the_extremist »

Hello, we have the same problems with different configurations. Dell / HP servers. One envirement with cluster, another not. The same errors in event log
The common point is Veeam on all these machines

@stephc_msft

Could you share your private fix ?
arc-xel
Lurker
Posts: 1
Liked: never
Joined: Apr 10, 2023 5:19 am
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by arc-xel »

Hello,

We have been suffered from the same situation about 4 month after installed Hyperv + veeam.
I don't know how to describe it, cause every f*ing week require to move all VMs to another host.
We've upgrade veeam from 11 to 12 ver. and issue became worse. And now it's appear every 3 days.
My boss demand me to solve this issue in short period of time.
I have only one decision - change backup vendor :(
Gostev
Chief Product Officer
Posts: 31804
Liked: 7298 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Gostev » 1 person likes this post

Hello, would it be a smart decision though in a situation when the issue is confirmed to be with Hyper-V itself, and there's even a private fix available from Microsoft?
RexfordHaugen_COLT
Novice
Posts: 4
Liked: never
Joined: Feb 15, 2023 7:37 am
Full Name: Rexford Haugen
Location: Colorado
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by RexfordHaugen_COLT »

Have the April updates for Server helped anyone so far? Specifically, KB5025230 for 2022 and KB5025229 for 2019. I do not see any references to this issue in the release notes though the 2022 KB does address a CSV issue relating to BitLocker.
msavage21
Lurker
Posts: 1
Liked: never
Joined: Apr 19, 2023 2:32 am
Full Name: Savage
Location: Texas
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by msavage21 »

This issue seems to be exisitng problem, getting worse, have verified testing on hyper-v 2022 and 2019 with some RDSH at 2016, how can we get the hotfix to test. Have several environments for testing.
nielsengelen
Product Manager
Posts: 5796
Liked: 1215 times
Joined: Jul 15, 2013 11:09 am
Full Name: Niels Engelen
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by nielsengelen »

As the hotfix seems to be by Microsoft, best would be to contact them directly as Veeam will not be able to pass it out.
Personal blog: https://foonet.be
GitHub: https://github.com/nielsengelen
steendp
Influencer
Posts: 11
Liked: 3 times
Joined: Jan 11, 2023 2:47 pm
Full Name: Steen Dalsgaard Pedersen
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by steendp »

The latest update I received in the ms case is that the hotfix is not effective in solving the issue... As standard operation procedure we are now live-migration +200 vms every morning and every time we get a servicedesk call that is related to performance. Crazy that it has been ongoing for so long.
johan.h
Veeam Software
Posts: 723
Liked: 185 times
Joined: Jun 05, 2013 9:45 am
Full Name: Johan Huttenga
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by johan.h »

@steendp, can you clarify which private hotfix from Microsoft you installed and what OS version you are on? Did you end up seeing any changes in behavior?
RBeismann
Lurker
Posts: 1
Liked: 1 time
Joined: Feb 01, 2022 8:24 am
Full Name: Robin Beismann
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by RBeismann » 1 person likes this post

For those of you that had to migrate to a newer OS like we did (during a hardware renewal), here is a script I made that actively searches for EventID 19080 from Hyper-V VMMS which indicates a successful snapshot merge and triggers a live migration using SCVMM.
The script saves its' state (processed event record IDs) in the registry to not trigger a migration for the same event twice.
We'll use this as workaround until the underlying issue is fixed.

https://gist.github.com/RobinBeismann/6 ... 471d6f2e62

You running this script/function means you will not blame the author(s) if this breaks your stuff.
This script/function is provided AS IS without warranty of any kind. Author(s) disclaim all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose.
The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall author(s) be held liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the script or documentation.
Neither this script/function, nor any part of it other than those parts that are explicitly copied from others, may be republished without author(s) express written permission. Author(s) retain the right to alter this disclaimer at any time.
stephc_msft
Technology Partner
Posts: 30
Liked: 26 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft »

As mentioned earlier
there seem to be two separate, distinct issues, although they seem very similar and somewhat related
The RCT issue leading to some serialization of io (that the private attempts to fix)
Another issue that can cause slow io in a VM, usually after a backup, and 'clogs up (for want of a better word) the Hyper-V host handling of that vhdx, and that Live Migrating the VM to a fresh host overcomes (till the next time the slowness occurs). And possibly even causing StorageVSP event 9's as well on the Hyper-V host

For the latter, and anyone who successfully uses the Live Migration as a workaround, can you confirm the storage in use is a CSV (as opposed to a standard cluster shared drive with a drive letter), as latest investigation suggests this other issue only occurs if its a CSV.
ANd/or anyone hitting the issue on a CSV that can temporarily take the disk out of being a CSV, back to a normal cluster shared drive (with a drive letter) and put it in a role with one or more VM's that use it and access it by drive letter) and see if that avoids the VM getting in the state such that no longer need to live migrate
[Obvious not being a CSV is a less flexible arrangement, especially with multiple VM's on the same disk, and the [non CSV] need for them to all be on the same node]
ChristineAlexa
Enthusiast
Posts: 47
Liked: 10 times
Joined: Aug 26, 2019 7:04 am
Full Name: Christine Boersen
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by ChristineAlexa »

stephc_msft
Issue one, is the one where disabling hyperthreading made it tolerable (eliminated the serialization) at the expense of losing hyperthreading.
MisterLuciano
Lurker
Posts: 2
Liked: never
Joined: May 12, 2023 11:54 am
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by MisterLuciano »

with us also the same problem. The problem has occurred with us but only since 01.05.2023. and then within 1 week on all 6 hosts.

there is still no solution?
JRRW
Enthusiast
Posts: 78
Liked: 46 times
Joined: Dec 10, 2019 3:59 pm
Full Name: Ryan Walker
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by JRRW »

stephc_msft wrote: May 04, 2023 4:49 pm As mentioned earlier
there seem to be two separate, distinct issues, although they seem very similar and somewhat related
The RCT issue leading to some serialization of io (that the private attempts to fix)
Another issue that can cause slow io in a VM, usually after a backup, and 'clogs up (for want of a better word) the Hyper-V host handling of that vhdx, and that Live Migrating the VM to a fresh host overcomes (till the next time the slowness occurs). And possibly even causing StorageVSP event 9's as well on the Hyper-V host

For the latter, and anyone who successfully uses the Live Migration as a workaround, can you confirm the storage in use is a CSV (as opposed to a standard cluster shared drive with a drive letter), as latest investigation suggests this other issue only occurs if its a CSV.
ANd/or anyone hitting the issue on a CSV that can temporarily take the disk out of being a CSV, back to a normal cluster shared drive (with a drive letter) and put it in a role with one or more VM's that use it and access it by drive letter) and see if that avoids the VM getting in the state such that no longer need to live migrate
[Obvious not being a CSV is a less flexible arrangement, especially with multiple VM's on the same disk, and the [non CSV] need for them to all be on the same node]
I have a private patch (not installed yet) created from Premier Support because of this issue.

It impacted a main file server so bad that running a diskperf on the CSV in question resulted in something like 1-5iops. Hardware was (2)32gb FC CSV to a PureStorage X50R2 all nvme array, with (7)R650 hosts (Xeon Gold) that can do 150k IOPS without breaking a sweat.

Migrating the VM resolved the issue, and we have seen it twice since then - but I haven't installed the patch as we're somewhat warry still about getting out of 'supported' code.

One thing I've used as a canary in the well is checkpoints - when this happens, Veeam's removal of the checkpoint will take a long time, vs right away (as the performance is so badly impacted).
GabesVirtualWorld
Expert
Posts: 248
Liked: 38 times
Joined: Jun 15, 2009 10:49 am
Full Name: Gabrie van Zanten
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by GabesVirtualWorld »

stephc_msft wrote: May 04, 2023 4:49 pm As mentioned earlier
there seem to be two separate, distinct issues, although they seem very similar and somewhat related
The RCT issue leading to some serialization of io (that the private attempts to fix)
Another issue that can cause slow io in a VM, usually after a backup, and 'clogs up (for want of a better word) the Hyper-V host handling of that vhdx, and that Live Migrating the VM to a fresh host overcomes (till the next time the slowness occurs). And possibly even causing StorageVSP event 9's as well on the Hyper-V host

For the latter, and anyone who successfully uses the Live Migration as a workaround, can you confirm the storage in use is a CSV (as opposed to a standard cluster shared drive with a drive letter), as latest investigation suggests this other issue only occurs if its a CSV.
ANd/or anyone hitting the issue on a CSV that can temporarily take the disk out of being a CSV, back to a normal cluster shared drive (with a drive letter) and put it in a role with one or more VM's that use it and access it by drive letter) and see if that avoids the VM getting in the state such that no longer need to live migrate
[Obvious not being a CSV is a less flexible arrangement, especially with multiple VM's on the same disk, and the [non CSV] need for them to all be on the same node]
I have a long running support case with Microsoft on the CBT issue and I just saw your CSV question. We're having the CBT issue on CSV volumes as well, though the customer has two Hyper-V environments. One is managed by us and running on Pure storage connected over FC with CSV volumes. The CBT issue pops up after backup and is resolved by Live Migrating the VM.

The other environment they have uses CBT for backup as well, but is running on SMB shares. There we have no issues at all. Hope that adds some info.
joelg
Influencer
Posts: 17
Liked: 7 times
Joined: Jan 16, 2023 3:13 pm
Full Name: Joel G
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by joelg »

We installed the beta fix on our 3 clustered Server 2019 hosts on May 10th. We have seen a reasonable decrease in the number of Event ID 9 errors in our environment:

Even ID 9 occurance over 3 week period:

Server1:
Apr 18-May 9 - 11717
May 11-Jun 1 - 2361
Server2:
Apr 18-May 9 - 8772
May 11-Jun 1 - 2897
Server3:
Apr 18-May 9 - 9765
May 11-Jun 1 - 1901
3 Week Totals:
Apr 18-May 9 - 30254
May 11-Jun 1 - 7159

We're still looking into if we have the second of the issues Stephen noted, and obviously I'd like to not see any of those errors at all, but it is an improvemnet.

Joel
stephc_msft
Technology Partner
Posts: 30
Liked: 26 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft »

Has anyone tried moving from CSV based storage to either non-CSV based or to SMB based storage, and if so did that work-around the issue ?
Non CSV based means shared drive would have a drive letter and only be online on the owner node, and any VM's using it would have to be on the same node (which limits flexibility and a balancing movement etc)
SMB based would allow more flexible movement, but requires separating the storage from the HV side and adds another layer in (which would normally potentially add extar io overhead, but in this case there are already other bad io issue that it might avoid)
Not sure if could have a HA file server on the same cluster as the HV workload and have the VM's use an SMB connection back to a fileshare on the same cluster (to be tested, but is supposed to be possible now)
ChristineAlexa
Enthusiast
Posts: 47
Liked: 10 times
Joined: Aug 26, 2019 7:04 am
Full Name: Christine Boersen
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by ChristineAlexa »

I know in my prior testing (see old old comments), CSV seemed fine, on same cluster or another, with hyper-threading on (which was the scenario that caused the serialization of IO writes). This was the scenario where turning hyper-threading off "mostly" solved the issue. I pretty much only get the serialization issue now during a cluster node up/down, or under very high stress for a few seconds. That was the compromise I had to make on my equipment (give up hyper threading to get the disks performing correctly).

The combination I never tried, leaving hyper-threading enabled, but use the "classic" scheduler, due to its security vulnerabilities - https://learn.microsoft.com/en-us/windo ... -selection.
JRRW
Enthusiast
Posts: 78
Liked: 46 times
Joined: Dec 10, 2019 3:59 pm
Full Name: Ryan Walker
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by JRRW »

GabesVirtualWorld wrote: May 30, 2023 6:48 am I have a long running support case with Microsoft on the CBT issue and I just saw your CSV question. We're having the CBT issue on CSV volumes as well, though the customer has two Hyper-V environments. One is managed by us and running on Pure storage connected over FC with CSV volumes. The CBT issue pops up after backup and is resolved by Live Migrating the VM.

The other environment they have uses CBT for backup as well, but is running on SMB shares. There we have no issues at all. Hope that adds some info.
Hi! That's me. I'm the problem.

Haven't installed the hotfix yet as we're pretty warry of 'non-GA' fixes, vs Microsoft just... You know, fixing their product. It's also only happened on LARGE VMs (namely a 21TB File Server) so while heavily impactful, it hasn't - since the first time - caused downtime as we're able to quickly mitigate it.
ChristineAlexa wrote: Jun 08, 2023 6:48 pm I know in my prior testing (see old old comments), CSV seemed fine, on same cluster or another, with hyper-threading on (which was the scenario that caused the serialization of IO writes). This was the scenario where turning hyper-threading off "mostly" solved the issue. I pretty much only get the serialization issue now during a cluster node up/down, or under very high stress for a few seconds. That was the compromise I had to make on my equipment (give up hyper threading to get the disks performing correctly).

The combination I never tried, leaving hyper-threading enabled, but use the "classic" scheduler, due to its security vulnerabilities - https://learn.microsoft.com/en-us/windo ... -selection.
Yeah from my perspective, "Disabling security and/or HyperThreading" isn't even remotely a solution, at best it's a temporary band-aid, you know?
Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 16 guests