-
- Lurker
- Posts: 1
- Liked: never
- Joined: Nov 07, 2023 7:49 pm
- Full Name: John Friel
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Also found this thread because of the StorageVSP Event ID 9 errors I am experiencing. Have a 2 node S2D cluster, and somewhere late in the Server 2016 updates I started having issues with being able to live migrate CSVs between the two. Upgraded the cluster to Server 2019 a year ago hoping that would fix everything, but it hasn't. I'm to the point that the only way to apply updates anymore is to shutdown all the VMs, Pause Drain the server, then slowly bring them back up on the other. Then I can do maintenance. What a hassle. Back in the Server 2012R2 days (before S2D) my 3 node cluster was a simple Pays/Drain, do maintenance, and back up.
I now have DataON working with me to find the root cause as the latest Server CU did not fix anything for me. Will keep everyone posted if I find anything.
I now have DataON working with me to find the root cause as the latest Server CU did not fix anything for me. Will keep everyone posted if I find anything.
-
- Technology Partner
- Posts: 30
- Liked: 26 times
- Joined: May 04, 2016 12:35 pm
- Full Name: Stephen Cole
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
As mentioned previously, the October Windows update has a fix that should improve VM IO performance when RCT is in use
(and an updated version of that fix is coming in the November update to avoid the VM wont start issue).
However there may still be some other issue that some people were reporting where the IO performance seems to get stuck in a low IO state for just a particular VM or disk in a VM (usually used by SQL?) after something triggers it to get into that state (trigger is usually a Veeam backup).
and the only effective work around is to live migrate the VM to another host, which then brings the IO performance back to normal
If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
(and an updated version of that fix is coming in the November update to avoid the VM wont start issue).
However there may still be some other issue that some people were reporting where the IO performance seems to get stuck in a low IO state for just a particular VM or disk in a VM (usually used by SQL?) after something triggers it to get into that state (trigger is usually a Veeam backup).
and the only effective work around is to live migrate the VM to another host, which then brings the IO performance back to normal
If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
-
- Service Provider
- Posts: 12
- Liked: 8 times
- Joined: Sep 14, 2016 12:04 pm
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Internally we call this second issue "queue depth=1". Both read and write IO are affected.
I can't get stable repro but yes usually VMs with big disks are affected when they are backed up for more than X minutes.
I can't get stable repro but yes usually VMs with big disks are affected when they are backed up for more than X minutes.
-
- Influencer
- Posts: 10
- Liked: 3 times
- Joined: Oct 26, 2023 3:09 pm
- Full Name: Daniel Roth
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hey Steph I have a consistent repro, well, it seems to happen after about 7-10 days after running backups running 2x-3x a day, on a large SQL server. Hyper-V host is WS2022. Last time I had to perform a live migration was 11/7 so I should see some degradation soon normally.stephc_msft wrote: ↑Nov 09, 2023 10:54 am However there may still be some other issue that some people were reporting where the IO performance seems to get stuck in a low IO state for just a particular VM or disk in a VM (usually used by SQL?) after something triggers it to get into that state (trigger is usually a Veeam backup).
and the only effective work around is to live migrate the VM to another host, which then brings the IO performance back to normal
If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
-
- Influencer
- Posts: 10
- Liked: 3 times
- Joined: Oct 26, 2023 3:09 pm
- Full Name: Daniel Roth
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Correction, we did a LM on it 4 days ago so may see it in a few days
-
- Novice
- Posts: 5
- Liked: 2 times
- Joined: Nov 15, 2023 8:01 am
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hi Steph,stephc_msft wrote: ↑Nov 09, 2023 10:54 am As mentioned previously, the October Windows update has a fix that should improve VM IO performance when RCT is in use
(and an updated version of that fix is coming in the November update to avoid the VM wont start issue).
However there may still be some other issue that some people were reporting where the IO performance seems to get stuck in a low IO state for just a particular VM or disk in a VM (usually used by SQL?) after something triggers it to get into that state (trigger is usually a Veeam backup).
and the only effective work around is to live migrate the VM to another host, which then brings the IO performance back to normal
If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
unfortunatly I can only provide a proper repro on a windows 2019 host . Just let me know if it will help anyway..
Br DM
-
- Expert
- Posts: 249
- Liked: 38 times
- Joined: Jun 15, 2009 10:49 am
- Full Name: Gabrie van Zanten
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
I only have a Win2019 environment on which we can see it almost daily but can provide info if that helps.stephc_msft wrote: ↑Nov 09, 2023 10:54 am As mentioned previously, the October Windows update has a fix that should improve VM IO performance when RCT is in use
(and an updated version of that fix is coming in the November update to avoid the VM wont start issue).
However there may still be some other issue that some people were reporting where the IO performance seems to get stuck in a low IO state for just a particular VM or disk in a VM (usually used by SQL?) after something triggers it to get into that state (trigger is usually a Veeam backup).
and the only effective work around is to live migrate the VM to another host, which then brings the IO performance back to normal
If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
Btw is there a counter, perfmon counter or eventmessage that we can monitor to see that the bug is being hit? Because with about 6 big SQL VMs we have we can clearly notice the impact and live migrate to solve it, but there are many more VMs in that environment of which users don't report the issue because probably the tasks the VMs are running are not that heavy to be noticed. But I still would like to see which VMs are affected.
-
- Novice
- Posts: 5
- Liked: 2 times
- Joined: Nov 15, 2023 8:01 am
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hi Gabe,
we used the PS to find out the VHDX with high InitiatorLatency.
"Get-StorageQoSFlow -VolumeId <c73462030054.....> | Sort-Object InitiatorLatency"
Maybe it helps..
BR DM
we used the PS to find out the VHDX with high InitiatorLatency.
"Get-StorageQoSFlow -VolumeId <c73462030054.....> | Sort-Object InitiatorLatency"
Maybe it helps..
BR DM
-
- Novice
- Posts: 4
- Liked: 1 time
- Joined: Mar 02, 2023 3:10 pm
- Full Name: Shane Waldrop
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
I'm currently migrating 181 VMs every night after backups with a post PS script so I'm hoping this issue can finally be tracked down. We're running two clusters of 8 Hyper-V 2019 hosts with Nimble on the backend for CSVs.
-
- Influencer
- Posts: 10
- Liked: 3 times
- Joined: Oct 26, 2023 3:09 pm
- Full Name: Daniel Roth
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hi Stephc_msft, do you still have some ideas, or have they been thwarted? I have a WS2022 host(s) and continue to have these issues. Have to Live Migrate a large SQL database server being hosted every morning essentially to make sure there is no performance issues that appear after backups. The slowness creeps in maybe 3-5 days before it really is slammed so we have now just been doing the live migration first thing every morning after its morning backup.stephc_msft wrote: ↑Nov 09, 2023 10:54 am If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
-
- Novice
- Posts: 5
- Liked: 1 time
- Joined: Jun 21, 2018 1:18 am
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
I have the same issue here and can provide information if needed. Am on WS2022.stephc_msft wrote: ↑Nov 09, 2023 10:54 am If anyone has a consistent repro, on WS2022 hosts, of this VM IO degradation aspect that is resolved by live migrating the VM, please let me know.
[I have some ideas, but only applicable to WS2022 at the moment]
-
- Lurker
- Posts: 1
- Liked: 1 time
- Joined: Dec 15, 2023 10:47 pm
- Full Name: Sylvain Desreumaux
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hello !stephc_msft wrote: ↑Feb 05, 2023 5:44 pm KB900379 is the test fix (and that is just a temporary KB number, it will not called that if/when gets rolled up into an update)
Ping me if you want to try it (ws2019/ws2022), but note this does not guarantee it will become an official fix.
The above is for the RCT issue
Also note, there seems to be a similar issue where significant io degradation can occur even without RCT being used.
And again the only way to 'clear' the host problem state is to reboot the host or live migrate the affected VM to a 'fresh' host.
This is also being investigated.
Could you please send me the kb?
Thanks!!
-
- Technology Partner
- Posts: 30
- Liked: 26 times
- Joined: May 04, 2016 12:35 pm
- Full Name: Stephen Cole
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Thats an old post. The RCT fix is in the Oct 2023 update (well best to use Nov 2023 update as there was an issue in some scenarios with the Oct version).
There is still the 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state, that is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
There is still the 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state, that is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
-
- Influencer
- Posts: 17
- Liked: 7 times
- Joined: Jan 16, 2023 3:13 pm
- Full Name: Joel G
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
We're consistantly getting thousands of Even ID 9 daily, Server 2019.. Happy to provide any logs and access to our servers for diagnostics!
Joel
Joel
-
- Novice
- Posts: 4
- Liked: 1 time
- Joined: Mar 02, 2023 3:10 pm
- Full Name: Shane Waldrop
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Can we name this issue at least for now to avoid any confusion moving forward?stephc_msft wrote: ↑Dec 18, 2023 9:38 am Thats an old post. The RCT fix is in the Oct 2023 update (well best to use Nov 2023 update as there was an issue in some scenarios with the Oct version).
There is still the 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state, that is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
I also have 2019 hosts that I could try a fix on.
-
- Technology Partner
- Posts: 30
- Liked: 26 times
- Joined: May 04, 2016 12:35 pm
- Full Name: Stephen Cole
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
'Welcome' to year 4? of the ongoing issues.
There are basically three issues affecting Hyper-V VM's doing heavy disk io to vhdx files (in WS2019 and WS2022 hosts)
1. The RCT issue where use of RCT can slow down IO (especially writes) to vhdx's, due to some serialization of the io pattern.
This should be mitigated somewhat by the Oct2023 update [or better the Nov2023 update which corrects an issue with it]
Have been some reports here of it successfully improving things.
2. The 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state (and get 10X lower throughput and 10X higher latency)
usually triggered by a veeam backup (even if not using RCT/CBT)
Note, other VM's and their disks, even if running on the same Hyper-v host are NOT affected
ie it is only a particular VM and a particular vhdx, that gets affected.
Usually a VM and disk doing high IO eg SQL, and only after something triggers the issue.
This is the issue that can be worked-around by live migrating the affected VM to another host, which restores the io performance to normal, until the next trigger event.
This issue is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
and relates to something on the Hyper-V host and how it is accessing the vhdx on behalf of the VM [via the Hyper-v storage-VSP layer]
and seems likely to be related to io pattern to the file on the CSV when unexpected buffered reads are causing degradation to the normal unbuffered io [Although there are a few reports of it happening on non CSV storage!?]
Also very strong evidence or reprots that this only started happening with WS2019 (and remain in WS2022) and never happened in older WS2016 systems.
3. Unexpected Hyper-V-StorageVSP/Admin event 9's about IO request (reads, writes, other) taking > 10000 mS ie >10 seconds to a particular vhdx
And often, but not always, reports that the affected VM (using the vhdx) is having slow io issues.
Note this event was added in WS2019 (hence never seen in WS2016) and the threshold for logging it is 10000 mS hence only see the ones longer than this, even though there probably other in the 1-9 sec range as well occuring)
Often this event 9 is related to the vhdx going in the 'low io state' of issue #2 (even though issue#2 only degrades latency from say 2mS to 20mS, definitely not to several seconds)
Other times it only occurs during real high io activity or backups etc ie not continuous
The >10000mS is a massive delay, if it really is happening and not some sort of false reading, and surely would get noticed by the VM, yet often the VM seems to be totally unaware of it.
side note - one scenario where an event 9 may occur is just at the start of a backup, as the related checkpoint causes an avhdx file to be created and used. This avhdx starts small and rapidly grows as the VM's io continues. This initial growing of the avhdx can cause delays and the occasional event 9. (it is logged against the vhdx even though it is actually an avhdx causing it)
3a. Occasional or even frequent, unexpected Hyper-V-StorageVSP/Admin event 8's about "Failed to map guest I/O buffer". Often coming in a batch? (0xC0000044 implying some resource exhaustion?). Doesnt seem to have any impact on the relevant VM?, presumanly as gets successfully retried. Unclear if this is related to issue #3 or some other unique issue in its own right
There are basically three issues affecting Hyper-V VM's doing heavy disk io to vhdx files (in WS2019 and WS2022 hosts)
1. The RCT issue where use of RCT can slow down IO (especially writes) to vhdx's, due to some serialization of the io pattern.
This should be mitigated somewhat by the Oct2023 update [or better the Nov2023 update which corrects an issue with it]
Have been some reports here of it successfully improving things.
2. The 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state (and get 10X lower throughput and 10X higher latency)
usually triggered by a veeam backup (even if not using RCT/CBT)
Note, other VM's and their disks, even if running on the same Hyper-v host are NOT affected
ie it is only a particular VM and a particular vhdx, that gets affected.
Usually a VM and disk doing high IO eg SQL, and only after something triggers the issue.
This is the issue that can be worked-around by live migrating the affected VM to another host, which restores the io performance to normal, until the next trigger event.
This issue is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
and relates to something on the Hyper-V host and how it is accessing the vhdx on behalf of the VM [via the Hyper-v storage-VSP layer]
and seems likely to be related to io pattern to the file on the CSV when unexpected buffered reads are causing degradation to the normal unbuffered io [Although there are a few reports of it happening on non CSV storage!?]
Also very strong evidence or reprots that this only started happening with WS2019 (and remain in WS2022) and never happened in older WS2016 systems.
3. Unexpected Hyper-V-StorageVSP/Admin event 9's about IO request (reads, writes, other) taking > 10000 mS ie >10 seconds to a particular vhdx
And often, but not always, reports that the affected VM (using the vhdx) is having slow io issues.
Note this event was added in WS2019 (hence never seen in WS2016) and the threshold for logging it is 10000 mS hence only see the ones longer than this, even though there probably other in the 1-9 sec range as well occuring)
Often this event 9 is related to the vhdx going in the 'low io state' of issue #2 (even though issue#2 only degrades latency from say 2mS to 20mS, definitely not to several seconds)
Other times it only occurs during real high io activity or backups etc ie not continuous
The >10000mS is a massive delay, if it really is happening and not some sort of false reading, and surely would get noticed by the VM, yet often the VM seems to be totally unaware of it.
side note - one scenario where an event 9 may occur is just at the start of a backup, as the related checkpoint causes an avhdx file to be created and used. This avhdx starts small and rapidly grows as the VM's io continues. This initial growing of the avhdx can cause delays and the occasional event 9. (it is logged against the vhdx even though it is actually an avhdx causing it)
3a. Occasional or even frequent, unexpected Hyper-V-StorageVSP/Admin event 8's about "Failed to map guest I/O buffer". Often coming in a batch? (0xC0000044 implying some resource exhaustion?). Doesnt seem to have any impact on the relevant VM?, presumanly as gets successfully retried. Unclear if this is related to issue #3 or some other unique issue in its own right
-
- Enthusiast
- Posts: 76
- Liked: 16 times
- Joined: Oct 27, 2017 5:42 pm
- Full Name: Nick
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
The >10000mS is a massive delay, if it really is happening and not some sort of false reading, and surely would get noticed by the VM, yet often the VM seems to be totally unaware of it.
The VM might not ‘notice’ the delay if it wasn’t doing anything ‘important’ at the time... In our case; the Exchange and SQL Servers ALWAYS noticed – and balked about it.
-
- Service Provider
- Posts: 12
- Liked: 8 times
- Joined: Sep 14, 2016 12:04 pm
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
I'll try to build a lab (on virtual machines) reproducing this issue next week. As far as I have researched the problem, this only occurs when the backup system copies the VHDX file for longer than some time, about 30 minutes. So VHDX size and its daily diff should be big enough.stephc_msft wrote: ↑Jan 04, 2024 2:39 pm This issue is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
We use disaggregated S2D with SOFS and it also happens. But sometimes live VM migration is not enough and we need to move the CSV with affected VDHX from one storage node to another and also the SOFS role to another node. Turning the VM off and on completely also helpsstephc_msft wrote: ↑Jan 04, 2024 2:39 pm Although there are a few reports of it happening on non CSV storage!?
-
- Novice
- Posts: 5
- Liked: 1 time
- Joined: Jun 21, 2018 1:18 am
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
"hampered by the fact that I cant repro or get consistent access to a system showing it"
I have this issue happening currently if you have time to check. I have about 30 minutes before I have to failover the VM to get it to start performing at speed again.
I have this issue happening currently if you have time to check. I have about 30 minutes before I have to failover the VM to get it to start performing at speed again.
-
- Influencer
- Posts: 13
- Liked: 3 times
- Joined: Jun 07, 2022 10:57 pm
- Full Name: Michael Keating
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hi - we currently have this issue on our S2D hyperconverged clusters. Regularly see our high use SQL servers trigger excessive volume latency after a Veeam backup completes. Not sure if it's related to VSS on the host itself or another cause, and it doesn't happen every backup, but we will easily see 5-6x the volume latency for the volume on the of the VHDX files is on. Live migrating the VM instantly brings the volume latency back to normal. We don't have any consistency with this issue with another S2D cluster of the same spec in another data centre running similar workloads - this cluster will be fine for weeks, sometimes it comes up as an issue.stephc_msft wrote: ↑Jan 04, 2024 2:39 pm 2. The 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state (and get 10X lower throughput and 10X higher latency)
usually triggered by a veeam backup (even if not using RCT/CBT)
Note, other VM's and their disks, even if running on the same Hyper-v host are NOT affected
ie it is only a particular VM and a particular vhdx, that gets affected.
Usually a VM and disk doing high IO eg SQL, and only after something triggers the issue.
This is the issue that can be worked-around by live migrating the affected VM to another host, which restores the io performance to normal, until the next trigger event.
This issue is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
and relates to something on the Hyper-V host and how it is accessing the vhdx on behalf of the VM [via the Hyper-v storage-VSP layer]
and seems likely to be related to io pattern to the file on the CSV when unexpected buffered reads are causing degradation to the normal unbuffered io [Although there are a few reports of it happening on non CSV storage!?]
Also very strong evidence or reprots that this only started happening with WS2019 (and remain in WS2022) and never happened in older WS2016 systems.
I'm running through our logging to find these as we do see "pauses" inside VMs that look to correlate to this. VM disk access will stop, Windows Explorer windows will go to "Not responding" for 30 seconds before returning to normal. This doesn't seem to line up with backups being taken. We also see random cluster events where a CSV is not able to be accessed from one of the cluster nodes, with event 5120. Sometimes there is no impact, other times VMs will restart, other times they need to be powered off and back on for their disk access to resume.stephc_msft wrote: ↑Jan 04, 2024 2:39 pm 3. Unexpected Hyper-V-StorageVSP/Admin event 9's about IO request (reads, writes, other) taking > 10000 mS ie >10 seconds to a particular vhdx
And often, but not always, reports that the affected VM (using the vhdx) is having slow io issues.
Note this event was added in WS2019 (hence never seen in WS2016) and the threshold for logging it is 10000 mS hence only see the ones longer than this, even though there probably other in the 1-9 sec range as well occuring)
Often this event 9 is related to the vhdx going in the 'low io state' of issue #2 (even though issue#2 only degrades latency from say 2mS to 20mS, definitely not to several seconds)
Other times it only occurs during real high io activity or backups etc ie not continuous
The >10000mS is a massive delay, if it really is happening and not some sort of false reading, and surely would get noticed by the VM, yet often the VM seems to be totally unaware of it.
side note - one scenario where an event 9 may occur is just at the start of a backup, as the related checkpoint causes an avhdx file to be created and used. This avhdx starts small and rapidly grows as the VM's io continues. This initial growing of the avhdx can cause delays and the occasional event 9. (it is logged against the vhdx even though it is actually an avhdx causing it)
3a. Occasional or even frequent, unexpected Hyper-V-StorageVSP/Admin event 8's about "Failed to map guest I/O buffer". Often coming in a batch? (0xC0000044 implying some resource exhaustion?). Doesn't seem to have any impact on the relevant VM?, presumably as gets successfully retried. Unclear if this is related to issue #3 or some other unique issue in its own right
stephc_msft - what is the best way to get this to be looked at? Lodge a case with Microsoft? Or is there another avenue?
-
- Influencer
- Posts: 11
- Liked: 3 times
- Joined: Jan 11, 2023 2:47 pm
- Full Name: Steen Dalsgaard Pedersen
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Our Microsoft case has been ongoing for over a year, and today, a representative from Microsoft contacted us with the suggestion to archive the case due to the unavailability of a solution. I strongly recommend you initiating a new case, submitting relevant logs, and potentially assisting in applying pressure for the allocation of adequate resources to address the issue. Offboarding Hyper-V seems to be the only alternative at the moment, albeit not an ideal solution.mkeating44 wrote: stephc_msft - what is the best way to get this to be looked at? Lodge a case with Microsoft? Or is there another avenue?
-
- Influencer
- Posts: 17
- Liked: 7 times
- Joined: Jan 16, 2023 3:13 pm
- Full Name: Joel G
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Thank you for the detailed description of the potential causes of the problem. I know we're all anxiously awaiting resolution to the problem, but if nothing else it is nice to at least have confirmation from Microsoft that there are bugs that need resolving!stephc_msft wrote: ↑Jan 04, 2024 2:39 pm 'Welcome' to year 4? of the ongoing issues.
There are basically three issues affecting Hyper-V VM's doing heavy disk io to vhdx files (in WS2019 and WS2022 hosts)....
Joel
-
- Veeam ProPartner
- Posts: 566
- Liked: 103 times
- Joined: Dec 29, 2009 12:48 pm
- Full Name: Marco Novelli
- Location: Asti - Italy
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Following the thread
Marco
Marco
-
- Veteran
- Posts: 465
- Liked: 136 times
- Joined: Jul 16, 2015 1:31 pm
- Full Name: Marc K
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Good support would want to escalate a case in this condition, not get rid of it. Sadly, good support is not common these days.
-
- Novice
- Posts: 5
- Liked: 2 times
- Joined: Nov 15, 2023 8:01 am
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Many thanks to stephc_msft for trying to solve the problem (2. The 'other issue'..). For me, more intensive collaboration between MS and Veeam would be necessary, as the trigger is clearly the Veeam backup (or maybe how Veeam is using the MS Backup Tools) and the workaround (life migration) from Microsoft. Please contact each other and try to improve technical support and finaly find a solution.stephc_msft wrote: ↑Jan 04, 2024 2:39 pm 'Welcome' to year 4? of the ongoing issues.
There are basically three issues affecting Hyper-V VM's doing heavy disk io to vhdx files (in WS2019 and WS2022 hosts)
1. The RCT issue where use of RCT can slow down IO (especially writes) to vhdx's, due to some serialization of the io pattern.
This should be mitigated somewhat by the Oct2023 update [or better the Nov2023 update which corrects an issue with it]
Have been some reports here of it successfully improving things.
2. The 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state (and get 10X lower throughput and 10X higher latency)
usually triggered by a veeam backup (even if not using RCT/CBT)
Note, other VM's and their disks, even if running on the same Hyper-v host are NOT affected
ie it is only a particular VM and a particular vhdx, that gets affected.
Usually a VM and disk doing high IO eg SQL, and only after something triggers the issue.
This is the issue that can be worked-around by live migrating the affected VM to another host, which restores the io performance to normal, until the next trigger event.
This issue is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
and relates to something on the Hyper-V host and how it is accessing the vhdx on behalf of the VM [via the Hyper-v storage-VSP layer]
and seems likely to be related to io pattern to the file on the CSV when unexpected buffered reads are causing degradation to the normal unbuffered io [Although there are a few reports of it happening on non CSV storage!?]
Also very strong evidence or reprots that this only started happening with WS2019 (and remain in WS2022) and never happened in older WS2016 systems.
3. Unexpected Hyper-V-StorageVSP/Admin event 9's about IO request (reads, writes, other) taking > 10000 mS ie >10 seconds to a particular vhdx
And often, but not always, reports that the affected VM (using the vhdx) is having slow io issues.
Note this event was added in WS2019 (hence never seen in WS2016) and the threshold for logging it is 10000 mS hence only see the ones longer than this, even though there probably other in the 1-9 sec range as well occuring)
Often this event 9 is related to the vhdx going in the 'low io state' of issue #2 (even though issue#2 only degrades latency from say 2mS to 20mS, definitely not to several seconds)
Other times it only occurs during real high io activity or backups etc ie not continuous
The >10000mS is a massive delay, if it really is happening and not some sort of false reading, and surely would get noticed by the VM, yet often the VM seems to be totally unaware of it.
side note - one scenario where an event 9 may occur is just at the start of a backup, as the related checkpoint causes an avhdx file to be created and used. This avhdx starts small and rapidly grows as the VM's io continues. This initial growing of the avhdx can cause delays and the occasional event 9. (it is logged against the vhdx even though it is actually an avhdx causing it)
3a. Occasional or even frequent, unexpected Hyper-V-StorageVSP/Admin event 8's about "Failed to map guest I/O buffer". Often coming in a batch? (0xC0000044 implying some resource exhaustion?). Doesnt seem to have any impact on the relevant VM?, presumanly as gets successfully retried. Unclear if this is related to issue #3 or some other unique issue in its own right
@stephc_msft: By the way, we have had a case open with MS since Sept. 23 and I'm looking forward for a solution. Please send me a privat message if you need the case nr.
BR DM
-
- Service Provider
- Posts: 26
- Liked: 6 times
- Joined: Dec 15, 2016 11:39 pm
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Still waiting for a solution
We have many costumers with this exactly problem.
We have many costumers with this exactly problem.
-
- Novice
- Posts: 4
- Liked: 1 time
- Joined: Oct 10, 2019 6:20 pm
- Full Name: Chris
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Same here, lots of customers with this issue and still no resolution. As this has been an ongoing issue for many years now its quite frustrating that Microsoft has not provided a fix. Frustrating enough that we now need to re-think if we will recommend Microsoft storage spaces direct for future engagements.
-
- Veeam ProPartner
- Posts: 566
- Liked: 103 times
- Joined: Dec 29, 2009 12:48 pm
- Full Name: Marco Novelli
- Location: Asti - Italy
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
How it’s possible this works in Azure and not On-Premises?
Maybe Azure Backup works differently from Veeam?
Marco
Maybe Azure Backup works differently from Veeam?
Marco
-
- Service Provider
- Posts: 19
- Liked: 2 times
- Joined: Dec 20, 2016 3:16 pm
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
stephc_msft , if you are still looking for that 2022 environment.....I have been waiting since January for someone from MSFT to take a look at my environment so it seems like we need to talk.stephc_msft wrote: ↑Jan 04, 2024 2:39 pm 'Welcome' to year 4? of the ongoing issues.
There are basically three issues affecting Hyper-V VM's doing heavy disk io to vhdx files (in WS2019 and WS2022 hosts)
1. The RCT issue where use of RCT can slow down IO (especially writes) to vhdx's, due to some serialization of the io pattern.
This should be mitigated somewhat by the Oct2023 update [or better the Nov2023 update which corrects an issue with it]
Have been some reports here of it successfully improving things.
2. The 'other issue', where a disk in a VM sometimes seems to get stuck in a 'low io' state (and get 10X lower throughput and 10X higher latency)
usually triggered by a veeam backup (even if not using RCT/CBT)
Note, other VM's and their disks, even if running on the same Hyper-v host are NOT affected
ie it is only a particular VM and a particular vhdx, that gets affected.
Usually a VM and disk doing high IO eg SQL, and only after something triggers the issue.
This is the issue that can be worked-around by live migrating the affected VM to another host, which restores the io performance to normal, until the next trigger event.
This issue is still being investigated [hampered by the fact that I cant repro or get consistent access to a system showing it]
and relates to something on the Hyper-V host and how it is accessing the vhdx on behalf of the VM [via the Hyper-v storage-VSP layer]
and seems likely to be related to io pattern to the file on the CSV when unexpected buffered reads are causing degradation to the normal unbuffered io [Although there are a few reports of it happening on non CSV storage!?]
Also very strong evidence or reprots that this only started happening with WS2019 (and remain in WS2022) and never happened in older WS2016 systems.
3. Unexpected Hyper-V-StorageVSP/Admin event 9's about IO request (reads, writes, other) taking > 10000 mS ie >10 seconds to a particular vhdx
And often, but not always, reports that the affected VM (using the vhdx) is having slow io issues.
Note this event was added in WS2019 (hence never seen in WS2016) and the threshold for logging it is 10000 mS hence only see the ones longer than this, even though there probably other in the 1-9 sec range as well occuring)
Often this event 9 is related to the vhdx going in the 'low io state' of issue #2 (even though issue#2 only degrades latency from say 2mS to 20mS, definitely not to several seconds)
Other times it only occurs during real high io activity or backups etc ie not continuous
The >10000mS is a massive delay, if it really is happening and not some sort of false reading, and surely would get noticed by the VM, yet often the VM seems to be totally unaware of it.
side note - one scenario where an event 9 may occur is just at the start of a backup, as the related checkpoint causes an avhdx file to be created and used. This avhdx starts small and rapidly grows as the VM's io continues. This initial growing of the avhdx can cause delays and the occasional event 9. (it is logged against the vhdx even though it is actually an avhdx causing it)
3a. Occasional or even frequent, unexpected Hyper-V-StorageVSP/Admin event 8's about "Failed to map guest I/O buffer". Often coming in a batch? (0xC0000044 implying some resource exhaustion?). Doesnt seem to have any impact on the relevant VM?, presumanly as gets successfully retried. Unclear if this is related to issue #3 or some other unique issue in its own right
Our environment
4 node HyperV cluster
Dell 640s
all VMs running on a dell SVC3020 with all flash drives.
Each host server is connected iSCSI using MPIO
We are regularly seeing this issue and is causing a great deal of frustration. We cannot seem to reproduce the issue on demand but it is happening consistently. Some times the consequences are not to serious. Other times it seems to completely overwhelm the cluster / hosts or something and stuff really goes sideways. I am not ruling out a possible configuration problem but if i never run a backup using veeam , we never see any issues and the cluster runs very well.
-
- Veteran
- Posts: 465
- Liked: 136 times
- Joined: Jul 16, 2015 1:31 pm
- Full Name: Marc K
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Code: Select all
if (!isAzure)
{
Random rnd = new Random();
if (rnd.Next(1, 13) = 7)
this.SlowDownIO(rnd.Next(1,6));
}
Who is online
Users browsing this forum: No registered users and 29 guests