Host-based backup of Microsoft Hyper-V VMs.
john.friel
Lurker
Posts: 1
Liked: never
Joined: Nov 07, 2023 7:49 pm
Full Name: John Friel
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by john.friel »

Also found this thread because of the StorageVSP Event ID 9 errors I am experiencing. Have a 2 node S2D cluster, and somewhere late in the Server 2016 updates I started having issues with being able to live migrate CSVs between the two. Upgraded the cluster to Server 2019 a year ago hoping that would fix everything, but it hasn't. I'm to the point that the only way to apply updates anymore is to shutdown all the VMs, Pause Drain the server, then slowly bring them back up on the other. Then I can do maintenance. What a hassle. Back in the Server 2012R2 days (before S2D) my 3 node cluster was a simple Pays/Drain, do maintenance, and back up.

I now have DataON working with me to find the root cause as the latest Server CU did not fix anything for me. Will keep everyone posted if I find anything.
stephc_msft
Technology Partner
Posts: 26
Liked: 15 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft »

As mentioned previously, the October Windows update has a fix that should improve VM IO performance when RCT is in use
(and an updated version of that fix is coming in the November update to avoid the VM wont start issue).

However there may still be some other issue that some people were reporting where the IO performance seems to get stuck in a low IO state for just a particular VM or disk in a VM (usually used by SQL?) after something triggers it to get into that state (trigger is usually a Veeam backup).

and the only effective work around is to live migrate the VM to another host, which then brings the IO performance back to normal

If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
rold
Service Provider
Posts: 11
Liked: 7 times
Joined: Sep 14, 2016 12:04 pm
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by rold »

Internally we call this second issue "queue depth=1". Both read and write IO are affected.

I can't get stable repro but yes usually VMs with big disks are affected when they are backed up for more than X minutes.
SodaPop87
Novice
Posts: 9
Liked: 3 times
Joined: Oct 26, 2023 3:09 pm
Full Name: Daniel Roth
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by SodaPop87 »

stephc_msft wrote: Nov 09, 2023 10:54 am However there may still be some other issue that some people were reporting where the IO performance seems to get stuck in a low IO state for just a particular VM or disk in a VM (usually used by SQL?) after something triggers it to get into that state (trigger is usually a Veeam backup).

and the only effective work around is to live migrate the VM to another host, which then brings the IO performance back to normal

If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
Hey Steph I have a consistent repro, well, it seems to happen after about 7-10 days after running backups running 2x-3x a day, on a large SQL server. Hyper-V host is WS2022. Last time I had to perform a live migration was 11/7 so I should see some degradation soon normally.
SodaPop87
Novice
Posts: 9
Liked: 3 times
Joined: Oct 26, 2023 3:09 pm
Full Name: Daniel Roth
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by SodaPop87 »

Correction, we did a LM on it 4 days ago so may see it in a few days
dm_ch
Novice
Posts: 4
Liked: 2 times
Joined: Nov 15, 2023 8:01 am
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by dm_ch »

stephc_msft wrote: Nov 09, 2023 10:54 am As mentioned previously, the October Windows update has a fix that should improve VM IO performance when RCT is in use
(and an updated version of that fix is coming in the November update to avoid the VM wont start issue).

However there may still be some other issue that some people were reporting where the IO performance seems to get stuck in a low IO state for just a particular VM or disk in a VM (usually used by SQL?) after something triggers it to get into that state (trigger is usually a Veeam backup).

and the only effective work around is to live migrate the VM to another host, which then brings the IO performance back to normal

If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
Hi Steph,

unfortunatly I can only provide a proper repro on a windows 2019 host :-(. Just let me know if it will help anyway..

Br DM
GabesVirtualWorld
Expert
Posts: 244
Liked: 38 times
Joined: Jun 15, 2009 10:49 am
Full Name: Gabrie van Zanten
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by GabesVirtualWorld »

stephc_msft wrote: Nov 09, 2023 10:54 am As mentioned previously, the October Windows update has a fix that should improve VM IO performance when RCT is in use
(and an updated version of that fix is coming in the November update to avoid the VM wont start issue).

However there may still be some other issue that some people were reporting where the IO performance seems to get stuck in a low IO state for just a particular VM or disk in a VM (usually used by SQL?) after something triggers it to get into that state (trigger is usually a Veeam backup).

and the only effective work around is to live migrate the VM to another host, which then brings the IO performance back to normal

If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
I only have a Win2019 environment on which we can see it almost daily but can provide info if that helps.

Btw is there a counter, perfmon counter or eventmessage that we can monitor to see that the bug is being hit? Because with about 6 big SQL VMs we have we can clearly notice the impact and live migrate to solve it, but there are many more VMs in that environment of which users don't report the issue because probably the tasks the VMs are running are not that heavy to be noticed. But I still would like to see which VMs are affected.
dm_ch
Novice
Posts: 4
Liked: 2 times
Joined: Nov 15, 2023 8:01 am
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by dm_ch » 2 people like this post

Hi Gabe,

we used the PS to find out the VHDX with high InitiatorLatency.
"Get-StorageQoSFlow -VolumeId <c73462030054.....> | Sort-Object InitiatorLatency"

Maybe it helps..

BR DM
slwaldrop
Novice
Posts: 4
Liked: 1 time
Joined: Mar 02, 2023 3:10 pm
Full Name: Shane Waldrop
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by slwaldrop »

I'm currently migrating 181 VMs every night after backups with a post PS script so I'm hoping this issue can finally be tracked down. We're running two clusters of 8 Hyper-V 2019 hosts with Nimble on the backend for CSVs.
SodaPop87
Novice
Posts: 9
Liked: 3 times
Joined: Oct 26, 2023 3:09 pm
Full Name: Daniel Roth
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by SodaPop87 » 2 people like this post

stephc_msft wrote: Nov 09, 2023 10:54 am If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
Hi Stephc_msft, do you still have some ideas, or have they been thwarted? I have a WS2022 host(s) and continue to have these issues. Have to Live Migrate a large SQL database server being hosted every morning essentially to make sure there is no performance issues that appear after backups. The slowness creeps in maybe 3-5 days before it really is slammed so we have now just been doing the live migration first thing every morning after its morning backup.
mlehmann
Novice
Posts: 5
Liked: 1 time
Joined: Jun 21, 2018 1:18 am
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mlehmann »

stephc_msft wrote: Nov 09, 2023 10:54 am If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
I have the same issue here and can provide information if needed. Am on WS2022.
sira38
Lurker
Posts: 1
Liked: 1 time
Joined: Dec 15, 2023 10:47 pm
Full Name: Sylvain Desreumaux
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by sira38 » 1 person likes this post

stephc_msft wrote: Feb 05, 2023 5:44 pm KB900379 is the test fix (and that is just a temporary KB number, it will not called that if/when gets rolled up into an update)
Ping me if you want to try it (ws2019/ws2022), but note this does not guarantee it will become an official fix.

The above is for the RCT issue
Also note, there seems to be a similar issue where significant io degradation can occur even without RCT being used.
And again the only way to 'clear' the host problem state is to reboot the host or live migrate the affected VM to a 'fresh' host.
This is also being investigated.
Hello !
Could you please send me the kb?
Thanks!!
stephc_msft
Technology Partner
Posts: 26
Liked: 15 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft »

Thats an old post. The RCT fix is in the Oct 2023 update (well best to use Nov 2023 update as there was an issue in some scenarios with the Oct version).
There is still the 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state, that is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
joelg
Influencer
Posts: 11
Liked: 2 times
Joined: Jan 16, 2023 3:13 pm
Full Name: Joel G
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by joelg »

We're consistantly getting thousands of Even ID 9 daily, Server 2019.. Happy to provide any logs and access to our servers for diagnostics!

Joel
slwaldrop
Novice
Posts: 4
Liked: 1 time
Joined: Mar 02, 2023 3:10 pm
Full Name: Shane Waldrop
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by slwaldrop »

stephc_msft wrote: Dec 18, 2023 9:38 am Thats an old post. The RCT fix is in the Oct 2023 update (well best to use Nov 2023 update as there was an issue in some scenarios with the Oct version).
There is still the 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state, that is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
Can we name this issue at least for now to avoid any confusion moving forward?

I also have 2019 hosts that I could try a fix on.
stephc_msft
Technology Partner
Posts: 26
Liked: 15 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft » 1 person likes this post

'Welcome' to year 4? of the ongoing issues.
There are basically three issues affecting Hyper-V VM's doing heavy disk io to vhdx files (in WS2019 and WS2022 hosts)

1. The RCT issue where use of RCT can slow down IO (especially writes) to vhdx's, due to some serialization of the io pattern.
This should be mitigated somewhat by the Oct2023 update [or better the Nov2023 update which corrects an issue with it]
Have been some reports here of it successfully improving things.

2. The 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state (and get 10X lower throughput and 10X higher latency)
usually triggered by a veeam backup (even if not using RCT/CBT)
Note, other VM's and their disks, even if running on the same Hyper-v host are NOT affected
ie it is only a particular VM and a particular vhdx, that gets affected.
Usually a VM and disk doing high IO eg SQL, and only after something triggers the issue.
This is the issue that can be worked-around by live migrating the affected VM to another host, which restores the io performance to normal, until the next trigger event.
This issue is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
and relates to something on the Hyper-V host and how it is accessing the vhdx on behalf of the VM [via the Hyper-v storage-VSP layer]
and seems likely to be related to io pattern to the file on the CSV when unexpected buffered reads are causing degradation to the normal unbuffered io [Although there are a few reports of it happening on non CSV storage!?]
Also very strong evidence or reprots that this only started happening with WS2019 (and remain in WS2022) and never happened in older WS2016 systems.

3. Unexpected Hyper-V-StorageVSP/Admin event 9's about IO request (reads, writes, other) taking > 10000 mS ie >10 seconds to a particular vhdx
And often, but not always, reports that the affected VM (using the vhdx) is having slow io issues.
Note this event was added in WS2019 (hence never seen in WS2016) and the threshold for logging it is 10000 mS hence only see the ones longer than this, even though there probably other in the 1-9 sec range as well occuring)
Often this event 9 is related to the vhdx going in the 'low io state' of issue #2 (even though issue#2 only degrades latency from say 2mS to 20mS, definitely not to several seconds)
Other times it only occurs during real high io activity or backups etc ie not continuous
The >10000mS is a massive delay, if it really is happening and not some sort of false reading, and surely would get noticed by the VM, yet often the VM seems to be totally unaware of it.
side note - one scenario where an event 9 may occur is just at the start of a backup, as the related checkpoint causes an avhdx file to be created and used. This avhdx starts small and rapidly grows as the VM's io continues. This initial growing of the avhdx can cause delays and the occasional event 9. (it is logged against the vhdx even though it is actually an avhdx causing it)

3a. Occasional or even frequent, unexpected Hyper-V-StorageVSP/Admin event 8's about "Failed to map guest I/O buffer". Often coming in a batch? (0xC0000044 implying some resource exhaustion?). Doesnt seem to have any impact on the relevant VM?, presumanly as gets successfully retried. Unclear if this is related to issue #3 or some other unique issue in its own right
Nick-SAC
Enthusiast
Posts: 75
Liked: 15 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC »

The >10000mS is a massive delay, if it really is happening and not some sort of false reading, and surely would get noticed by the VM, yet often the VM seems to be totally unaware of it.

The VM might not ‘notice’ the delay if it wasn’t doing anything ‘important’ at the time... In our case; the Exchange and SQL Servers ALWAYS noticed – and balked about it.
rold
Service Provider
Posts: 11
Liked: 7 times
Joined: Sep 14, 2016 12:04 pm
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by rold »

stephc_msft wrote: Jan 04, 2024 2:39 pm This issue is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
I'll try to build a lab (on virtual machines) reproducing this issue next week. As far as I have researched the problem, this only occurs when the backup system copies the VHDX file for longer than some time, about 30 minutes. So VHDX size and its daily diff should be big enough.
stephc_msft wrote: Jan 04, 2024 2:39 pm Although there are a few reports of it happening on non CSV storage!?
We use disaggregated S2D with SOFS and it also happens. But sometimes live VM migration is not enough and we need to move the CSV with affected VDHX from one storage node to another and also the SOFS role to another node. Turning the VM off and on completely also helps :)
mlehmann
Novice
Posts: 5
Liked: 1 time
Joined: Jun 21, 2018 1:18 am
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mlehmann »

"hampered by the fact that I cant repro or get consistent access to a system showing it"

I have this issue happening currently if you have time to check. I have about 30 minutes before I have to failover the VM to get it to start performing at speed again.
mkeating44
Influencer
Posts: 10
Liked: never
Joined: Jun 07, 2022 10:57 pm
Full Name: Michael Keating
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mkeating44 »

stephc_msft wrote: Jan 04, 2024 2:39 pm 2. The 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state (and get 10X lower throughput and 10X higher latency)
usually triggered by a veeam backup (even if not using RCT/CBT)
Note, other VM's and their disks, even if running on the same Hyper-v host are NOT affected
ie it is only a particular VM and a particular vhdx, that gets affected.
Usually a VM and disk doing high IO eg SQL, and only after something triggers the issue.
This is the issue that can be worked-around by live migrating the affected VM to another host, which restores the io performance to normal, until the next trigger event.
This issue is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
and relates to something on the Hyper-V host and how it is accessing the vhdx on behalf of the VM [via the Hyper-v storage-VSP layer]
and seems likely to be related to io pattern to the file on the CSV when unexpected buffered reads are causing degradation to the normal unbuffered io [Although there are a few reports of it happening on non CSV storage!?]
Also very strong evidence or reprots that this only started happening with WS2019 (and remain in WS2022) and never happened in older WS2016 systems.
Hi - we currently have this issue on our S2D hyperconverged clusters. Regularly see our high use SQL servers trigger excessive volume latency after a Veeam backup completes. Not sure if it's related to VSS on the host itself or another cause, and it doesn't happen every backup, but we will easily see 5-6x the volume latency for the volume on the of the VHDX files is on. Live migrating the VM instantly brings the volume latency back to normal. We don't have any consistency with this issue with another S2D cluster of the same spec in another data centre running similar workloads - this cluster will be fine for weeks, sometimes it comes up as an issue.
stephc_msft wrote: Jan 04, 2024 2:39 pm 3. Unexpected Hyper-V-StorageVSP/Admin event 9's about IO request (reads, writes, other) taking > 10000 mS ie >10 seconds to a particular vhdx
And often, but not always, reports that the affected VM (using the vhdx) is having slow io issues.
Note this event was added in WS2019 (hence never seen in WS2016) and the threshold for logging it is 10000 mS hence only see the ones longer than this, even though there probably other in the 1-9 sec range as well occuring)
Often this event 9 is related to the vhdx going in the 'low io state' of issue #2 (even though issue#2 only degrades latency from say 2mS to 20mS, definitely not to several seconds)
Other times it only occurs during real high io activity or backups etc ie not continuous
The >10000mS is a massive delay, if it really is happening and not some sort of false reading, and surely would get noticed by the VM, yet often the VM seems to be totally unaware of it.
side note - one scenario where an event 9 may occur is just at the start of a backup, as the related checkpoint causes an avhdx file to be created and used. This avhdx starts small and rapidly grows as the VM's io continues. This initial growing of the avhdx can cause delays and the occasional event 9. (it is logged against the vhdx even though it is actually an avhdx causing it)

3a. Occasional or even frequent, unexpected Hyper-V-StorageVSP/Admin event 8's about "Failed to map guest I/O buffer". Often coming in a batch? (0xC0000044 implying some resource exhaustion?). Doesn't seem to have any impact on the relevant VM?, presumably as gets successfully retried. Unclear if this is related to issue #3 or some other unique issue in its own right
I'm running through our logging to find these as we do see "pauses" inside VMs that look to correlate to this. VM disk access will stop, Windows Explorer windows will go to "Not responding" for 30 seconds before returning to normal. This doesn't seem to line up with backups being taken. We also see random cluster events where a CSV is not able to be accessed from one of the cluster nodes, with event 5120. Sometimes there is no impact, other times VMs will restart, other times they need to be powered off and back on for their disk access to resume.

stephc_msft - what is the best way to get this to be looked at? Lodge a case with Microsoft? Or is there another avenue?
steendp
Influencer
Posts: 11
Liked: 3 times
Joined: Jan 11, 2023 2:47 pm
Full Name: Steen Dalsgaard Pedersen
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by steendp »

mkeating44 wrote: stephc_msft - what is the best way to get this to be looked at? Lodge a case with Microsoft? Or is there another avenue?
Our Microsoft case has been ongoing for over a year, and today, a representative from Microsoft contacted us with the suggestion to archive the case due to the unavailability of a solution. I strongly recommend you initiating a new case, submitting relevant logs, and potentially assisting in applying pressure for the allocation of adequate resources to address the issue. Offboarding Hyper-V seems to be the only alternative at the moment, albeit not an ideal solution.
joelg
Influencer
Posts: 11
Liked: 2 times
Joined: Jan 16, 2023 3:13 pm
Full Name: Joel G
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by joelg »

stephc_msft wrote: Jan 04, 2024 2:39 pm 'Welcome' to year 4? of the ongoing issues.
There are basically three issues affecting Hyper-V VM's doing heavy disk io to vhdx files (in WS2019 and WS2022 hosts)....
Thank you for the detailed description of the potential causes of the problem. I know we're all anxiously awaiting resolution to the problem, but if nothing else it is nice to at least have confirmation from Microsoft that there are bugs that need resolving!

Joel
m.novelli
Veeam ProPartner
Posts: 521
Liked: 91 times
Joined: Dec 29, 2009 12:48 pm
Full Name: Marco Novelli
Location: Asti - Italy
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by m.novelli »

Following the thread

Marco
mkaec
Veteran
Posts: 464
Liked: 134 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mkaec » 1 person likes this post

steendp wrote: Jan 22, 2024 12:28 pm Our Microsoft case has been ongoing for over a year, and today, a representative from Microsoft contacted us with the suggestion to archive the case due to the unavailability of a solution...
Good support would want to escalate a case in this condition, not get rid of it. Sadly, good support is not common these days.
dm_ch
Novice
Posts: 4
Liked: 2 times
Joined: Nov 15, 2023 8:01 am
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by dm_ch »

stephc_msft wrote: Jan 04, 2024 2:39 pm 'Welcome' to year 4? of the ongoing issues.
There are basically three issues affecting Hyper-V VM's doing heavy disk io to vhdx files (in WS2019 and WS2022 hosts)

1. The RCT issue where use of RCT can slow down IO (especially writes) to vhdx's, due to some serialization of the io pattern.
This should be mitigated somewhat by the Oct2023 update [or better the Nov2023 update which corrects an issue with it]
Have been some reports here of it successfully improving things.

2. The 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state (and get 10X lower throughput and 10X higher latency)
usually triggered by a veeam backup (even if not using RCT/CBT)
Note, other VM's and their disks, even if running on the same Hyper-v host are NOT affected
ie it is only a particular VM and a particular vhdx, that gets affected.
Usually a VM and disk doing high IO eg SQL, and only after something triggers the issue.
This is the issue that can be worked-around by live migrating the affected VM to another host, which restores the io performance to normal, until the next trigger event.
This issue is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
and relates to something on the Hyper-V host and how it is accessing the vhdx on behalf of the VM [via the Hyper-v storage-VSP layer]
and seems likely to be related to io pattern to the file on the CSV when unexpected buffered reads are causing degradation to the normal unbuffered io [Although there are a few reports of it happening on non CSV storage!?]
Also very strong evidence or reprots that this only started happening with WS2019 (and remain in WS2022) and never happened in older WS2016 systems.

3. Unexpected Hyper-V-StorageVSP/Admin event 9's about IO request (reads, writes, other) taking > 10000 mS ie >10 seconds to a particular vhdx
And often, but not always, reports that the affected VM (using the vhdx) is having slow io issues.
Note this event was added in WS2019 (hence never seen in WS2016) and the threshold for logging it is 10000 mS hence only see the ones longer than this, even though there probably other in the 1-9 sec range as well occuring)
Often this event 9 is related to the vhdx going in the 'low io state' of issue #2 (even though issue#2 only degrades latency from say 2mS to 20mS, definitely not to several seconds)
Other times it only occurs during real high io activity or backups etc ie not continuous
The >10000mS is a massive delay, if it really is happening and not some sort of false reading, and surely would get noticed by the VM, yet often the VM seems to be totally unaware of it.
side note - one scenario where an event 9 may occur is just at the start of a backup, as the related checkpoint causes an avhdx file to be created and used. This avhdx starts small and rapidly grows as the VM's io continues. This initial growing of the avhdx can cause delays and the occasional event 9. (it is logged against the vhdx even though it is actually an avhdx causing it)

3a. Occasional or even frequent, unexpected Hyper-V-StorageVSP/Admin event 8's about "Failed to map guest I/O buffer". Often coming in a batch? (0xC0000044 implying some resource exhaustion?). Doesnt seem to have any impact on the relevant VM?, presumanly as gets successfully retried. Unclear if this is related to issue #3 or some other unique issue in its own right
Many thanks to stephc_msft for trying to solve the problem (2. The 'other issue'..). For me, more intensive collaboration between MS and Veeam would be necessary, as the trigger is clearly the Veeam backup (or maybe how Veeam is using the MS Backup Tools) and the workaround (life migration) from Microsoft. Please contact each other and try to improve technical support and finaly find a solution.

@stephc_msft: By the way, we have had a case open with MS since Sept. 23 and I'm looking forward for a solution. Please send me a privat message if you need the case nr.

BR DM
Rmachado
Service Provider
Posts: 23
Liked: 4 times
Joined: Dec 15, 2016 11:39 pm
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Rmachado »

Still waiting for a solution

We have many costumers with this exactly problem.
hallsos
Novice
Posts: 4
Liked: 1 time
Joined: Oct 10, 2019 6:20 pm
Full Name: Chris
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by hallsos »

Same here, lots of customers with this issue and still no resolution. As this has been an ongoing issue for many years now its quite frustrating that Microsoft has not provided a fix. Frustrating enough that we now need to re-think if we will recommend Microsoft storage spaces direct for future engagements.
m.novelli
Veeam ProPartner
Posts: 521
Liked: 91 times
Joined: Dec 29, 2009 12:48 pm
Full Name: Marco Novelli
Location: Asti - Italy
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by m.novelli »

How it’s possible this works in Azure and not On-Premises? :shock:

Maybe Azure Backup works differently from Veeam?

Marco
Ci2Group
Service Provider
Posts: 19
Liked: 2 times
Joined: Dec 20, 2016 3:16 pm
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Ci2Group »

stephc_msft wrote: Jan 04, 2024 2:39 pm 'Welcome' to year 4? of the ongoing issues.
There are basically three issues affecting Hyper-V VM's doing heavy disk io to vhdx files (in WS2019 and WS2022 hosts)

1. The RCT issue where use of RCT can slow down IO (especially writes) to vhdx's, due to some serialization of the io pattern.
This should be mitigated somewhat by the Oct2023 update [or better the Nov2023 update which corrects an issue with it]
Have been some reports here of it successfully improving things.

2. The 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state (and get 10X lower throughput and 10X higher latency)
usually triggered by a veeam backup (even if not using RCT/CBT)
Note, other VM's and their disks, even if running on the same Hyper-v host are NOT affected
ie it is only a particular VM and a particular vhdx, that gets affected.
Usually a VM and disk doing high IO eg SQL, and only after something triggers the issue.
This is the issue that can be worked-around by live migrating the affected VM to another host, which restores the io performance to normal, until the next trigger event.
This issue is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
and relates to something on the Hyper-V host and how it is accessing the vhdx on behalf of the VM [via the Hyper-v storage-VSP layer]
and seems likely to be related to io pattern to the file on the CSV when unexpected buffered reads are causing degradation to the normal unbuffered io [Although there are a few reports of it happening on non CSV storage!?]
Also very strong evidence or reprots that this only started happening with WS2019 (and remain in WS2022) and never happened in older WS2016 systems.

3. Unexpected Hyper-V-StorageVSP/Admin event 9's about IO request (reads, writes, other) taking > 10000 mS ie >10 seconds to a particular vhdx
And often, but not always, reports that the affected VM (using the vhdx) is having slow io issues.
Note this event was added in WS2019 (hence never seen in WS2016) and the threshold for logging it is 10000 mS hence only see the ones longer than this, even though there probably other in the 1-9 sec range as well occuring)
Often this event 9 is related to the vhdx going in the 'low io state' of issue #2 (even though issue#2 only degrades latency from say 2mS to 20mS, definitely not to several seconds)
Other times it only occurs during real high io activity or backups etc ie not continuous
The >10000mS is a massive delay, if it really is happening and not some sort of false reading, and surely would get noticed by the VM, yet often the VM seems to be totally unaware of it.
side note - one scenario where an event 9 may occur is just at the start of a backup, as the related checkpoint causes an avhdx file to be created and used. This avhdx starts small and rapidly grows as the VM's io continues. This initial growing of the avhdx can cause delays and the occasional event 9. (it is logged against the vhdx even though it is actually an avhdx causing it)

3a. Occasional or even frequent, unexpected Hyper-V-StorageVSP/Admin event 8's about "Failed to map guest I/O buffer". Often coming in a batch? (0xC0000044 implying some resource exhaustion?). Doesnt seem to have any impact on the relevant VM?, presumanly as gets successfully retried. Unclear if this is related to issue #3 or some other unique issue in its own right
stephc_msft , if you are still looking for that 2022 environment.....I have been waiting since January for someone from MSFT to take a look at my environment so it seems like we need to talk.

Our environment
4 node HyperV cluster
Dell 640s
all VMs running on a dell SVC3020 with all flash drives.
Each host server is connected iSCSI using MPIO
We are regularly seeing this issue and is causing a great deal of frustration. We cannot seem to reproduce the issue on demand but it is happening consistently. Some times the consequences are not to serious. Other times it seems to completely overwhelm the cluster / hosts or something and stuff really goes sideways. I am not ruling out a possible configuration problem but if i never run a backup using veeam , we never see any issues and the cluster runs very well.
mkaec
Veteran
Posts: 464
Liked: 134 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mkaec » 2 people like this post

m.novelli wrote: Feb 20, 2024 9:52 pm How it’s possible this works in Azure and not On-Premises? :shock:

Code: Select all

if (!isAzure)
{
    Random rnd = new Random();
    if (rnd.Next(1, 13) = 7)
        this.SlowDownIO(rnd.Next(1,6));
}
Post Reply

Who is online

Users browsing this forum: admcomputing and 18 guests