-
- Enthusiast
- Posts: 76
- Liked: 16 times
- Joined: Oct 27, 2017 5:42 pm
- Full Name: Nick
- Contact:
Windows Server 2019 Hyper-V VM I/O Performance Problem
In yesterday’s Word from Gostev Forum Digest he stated:
“Important Hyper-V news: for the past few months, we've been working on a strange issue with a few customers: poor I/O performance on VMs protected by Veeam. ... our support was able to isolate the issue to VMs with Resilient Changed Tracking (RCT) enabled. So we've opened a case with Microsoft ... and finally got a solid answer last week. They were able to reproduce noticeable and abnormal I/O performance degradation from enabling RCT on a VM, and believe this is the result of security changes made as a part of Meltdown/Spectre patches. So they're now tracking this as a bug to be fixed.”
And since we’ve been troubleshooting an inexplicable Hyper-V VM I/O Performance problem for the last 2 months – even though it is not occurring during the VBR backups – I figured I’d throw the situation & details up here just in case there is some connection...
The Configuration:
Dell PowerEdge R540 Server
Dual 8 Core CPUs
64 GB RAM
Dell PERC H730P RAID Controller w/2GB Cache
8 HDDs configured as 4 RAID-1 Volumes
iSCSI attached NAS as the VBR Primary Repository (ReFS)
Windows Server 2019 Hyper-V Host
Windows Server 2019 VM [Domain Controller]
Windows Server 2016 VM [Exchange Server 2016]
Windows Server 2012R2 VM [Exchange Server 2013] (this was a temp install used to migrate from SBS2008/Exchange2007 and is now shut down)
The Host and each of the VMs are on their own dedicated RAID-1 Volume
The Exchange Server 2016 Database & Logs are now on their own dedicated RAID-1 Volume
VB&R is installed as an All-In-One configuration on the Hyper-V Host
This is currently a very lightly loaded environment; with 70 active users (1/2 of which are sparse email only) and only processing a total of several hundred emails a day and with 5 to 10 people working on Word docs & Excel sheets.
The Problem:
Periodically the Hyper-V Host will Log a simultaneous sequence of 5 to 15 of these events:
-------------------------------------------------------------------------------------------------------------
Log Name: Microsoft-Windows-Hyper-V-StorageVSP-Admin
Source: Microsoft-Windows-Hyper-V-StorageVSP
Date: 7/27/2019 2:08:58 AM
Event ID: 9
Task Category: None
Level: Warning
Keywords:
User: N/A
Computer: <Server Name>
Description:
An I/O request for device 'E:\Hyper-V\<Server Name>\Virtual Hard Disks\<Server Name>.vhdx' took 13470 milliseconds to complete.
Operation code = WRITE16, Data transfer length = 4096, Status = SRB_STATUS_SUCCESS.
-------------------------------------------------------------------------------------------------------------
The events are tied to all of the VMs (first the 3 and now the 2 that are still active) and all of their VHDs (in no apparent order or sequence). It has never occurred on the Host’s RAID Volume.
The delay times always run a remarkably consistent 10 to 14 seconds (10,000 to 14,000ms) in fact the times are so consistent that I thought it might be a Power Management Disk Suspend/Spin-up issue but that doesn’t appear to be the case.
There is no apparent rhyme or reason to the occurrence of these events, i.e., I can’t find any other process or task which coincides with them and sometimes they’ll occur several times in one day & at random hours (including at night when there is no user activity at all) and then not do it at all for several days?!
Dell Tech Support is stumped and is now recommending that we take it to Microsoft and I was just about to do that when I saw Gostev’s message which has me wondering... Could this be the same RCT-Spectre/Meltdown Patch bug even though it’s not occurring during the backups?
“Important Hyper-V news: for the past few months, we've been working on a strange issue with a few customers: poor I/O performance on VMs protected by Veeam. ... our support was able to isolate the issue to VMs with Resilient Changed Tracking (RCT) enabled. So we've opened a case with Microsoft ... and finally got a solid answer last week. They were able to reproduce noticeable and abnormal I/O performance degradation from enabling RCT on a VM, and believe this is the result of security changes made as a part of Meltdown/Spectre patches. So they're now tracking this as a bug to be fixed.”
And since we’ve been troubleshooting an inexplicable Hyper-V VM I/O Performance problem for the last 2 months – even though it is not occurring during the VBR backups – I figured I’d throw the situation & details up here just in case there is some connection...
The Configuration:
Dell PowerEdge R540 Server
Dual 8 Core CPUs
64 GB RAM
Dell PERC H730P RAID Controller w/2GB Cache
8 HDDs configured as 4 RAID-1 Volumes
iSCSI attached NAS as the VBR Primary Repository (ReFS)
Windows Server 2019 Hyper-V Host
Windows Server 2019 VM [Domain Controller]
Windows Server 2016 VM [Exchange Server 2016]
Windows Server 2012R2 VM [Exchange Server 2013] (this was a temp install used to migrate from SBS2008/Exchange2007 and is now shut down)
The Host and each of the VMs are on their own dedicated RAID-1 Volume
The Exchange Server 2016 Database & Logs are now on their own dedicated RAID-1 Volume
VB&R is installed as an All-In-One configuration on the Hyper-V Host
This is currently a very lightly loaded environment; with 70 active users (1/2 of which are sparse email only) and only processing a total of several hundred emails a day and with 5 to 10 people working on Word docs & Excel sheets.
The Problem:
Periodically the Hyper-V Host will Log a simultaneous sequence of 5 to 15 of these events:
-------------------------------------------------------------------------------------------------------------
Log Name: Microsoft-Windows-Hyper-V-StorageVSP-Admin
Source: Microsoft-Windows-Hyper-V-StorageVSP
Date: 7/27/2019 2:08:58 AM
Event ID: 9
Task Category: None
Level: Warning
Keywords:
User: N/A
Computer: <Server Name>
Description:
An I/O request for device 'E:\Hyper-V\<Server Name>\Virtual Hard Disks\<Server Name>.vhdx' took 13470 milliseconds to complete.
Operation code = WRITE16, Data transfer length = 4096, Status = SRB_STATUS_SUCCESS.
-------------------------------------------------------------------------------------------------------------
The events are tied to all of the VMs (first the 3 and now the 2 that are still active) and all of their VHDs (in no apparent order or sequence). It has never occurred on the Host’s RAID Volume.
The delay times always run a remarkably consistent 10 to 14 seconds (10,000 to 14,000ms) in fact the times are so consistent that I thought it might be a Power Management Disk Suspend/Spin-up issue but that doesn’t appear to be the case.
There is no apparent rhyme or reason to the occurrence of these events, i.e., I can’t find any other process or task which coincides with them and sometimes they’ll occur several times in one day & at random hours (including at night when there is no user activity at all) and then not do it at all for several days?!
Dell Tech Support is stumped and is now recommending that we take it to Microsoft and I was just about to do that when I saw Gostev’s message which has me wondering... Could this be the same RCT-Spectre/Meltdown Patch bug even though it’s not occurring during the backups?
-
- Product Manager
- Posts: 14844
- Liked: 3086 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hello,
I would say it's impossible to give a good answer without logs. So please open a case and post the case number here for reference.
The issue I see: even if you know the answer... there is no solution at this point in time. So honestly I would just wait for Microsoft updates as Veeam support cannot really help you anyway.
Best regards,
Hannes
I would say it's impossible to give a good answer without logs. So please open a case and post the case number here for reference.
The issue I see: even if you know the answer... there is no solution at this point in time. So honestly I would just wait for Microsoft updates as Veeam support cannot really help you anyway.
Best regards,
Hannes
-
- Enthusiast
- Posts: 76
- Liked: 16 times
- Joined: Oct 27, 2017 5:42 pm
- Full Name: Nick
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Thanks Hannes but all I was really looking for here was feedback as to whether this may be the same bug that Gostev spoke of, i.e., if you are only seeing the I/O problem during the Backups (in which case my issue may be unrelated and I’ll contact Microsoft now) or if you’re seeing the same thing as I described (in which case I’ll just wait for Microsoft to fix it).
Thanks again,
Nick
Thanks again,
Nick
-
- Product Manager
- Posts: 14844
- Liked: 3086 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
-
- Enthusiast
- Posts: 76
- Liked: 16 times
- Joined: Oct 27, 2017 5:42 pm
- Full Name: Nick
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Close enough... I'll go with a 'definite maybe'...
Thanks again
Nick
Thanks again
Nick
-
- Veeam Vanguard
- Posts: 39
- Liked: 11 times
- Joined: Feb 14, 2014 1:27 pm
- Full Name: Didier Van Hoye
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Have you checked the Meltdown/Spectre mitigations are in place? Can you disable them and compare with/without? We have not seen the issue and are 100% patched. What BIOS versions are you running? I have been in contact with MSFT about this but I am waiting to see if they have any scenarios to test to reproduce/mitigate.
-
- Expert
- Posts: 239
- Liked: 13 times
- Joined: Feb 14, 2012 8:56 pm
- Full Name: Collin P
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
I'm seeing a huge jump in latency on source storage during backups and our backup windows are extended. We've had to implement throttling as a result. Does anyone know which patch it is that we would need to uninstall on the Hyper-V hosts to return to normal?
-
- Veteran
- Posts: 465
- Liked: 136 times
- Joined: Jul 16, 2015 1:31 pm
- Full Name: Marc K
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
I too am interested in more details about the issue Gostev mentioned. It sounds like Hyper-V performance could take a dive simply by activating RCT. But under what circumstances?
-
- Service Provider
- Posts: 64
- Liked: 18 times
- Joined: Apr 20, 2018 6:17 am
- Full Name: Michael Høyer
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
totally unrelated to the discussion about the I/O performance problem bug, but have you considered it could be because each VM only has the performance of one physical disk to work with in your RAID setup ?
with your hardware, i would have done a RAID 10 of all the disks, and put all the VM's on that, unless you have specific reasons to split it up?
with your hardware, i would have done a RAID 10 of all the disks, and put all the VM's on that, unless you have specific reasons to split it up?
-
- Enthusiast
- Posts: 76
- Liked: 16 times
- Joined: Oct 27, 2017 5:42 pm
- Full Name: Nick
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Frankly (and I am embarrassed to say this) I’m not certain about which of the Spectre/Meltdown mitigations are or aren’t in place anymore!Post by WorkingHardInIt » Wed Oct 02, 2019 6:25 am
Have you checked the Meltdown/Spectre mitigations are in place?
Between all of the Spectre/Meltdown vulnerability variants, the CVE & ADV articles, the KB Articles & Patches, the Microcode Updates and the Registry mods... this has become one of the most murky morasses of confusion I’ve ever seen!
I haven’t been too, too worried about it because all of the HV Hosts & their respective VMs that we’re administering are all single tenant environments running trusted code and NOBODY has direct access to them other than a very small number of highly trusted Admins – so no one is browsing the Internet with them, etc. and I don’t think we have to worry about the SQL Server spying on the Exchange Server... yet!
-
- Enthusiast
- Posts: 76
- Liked: 16 times
- Joined: Oct 27, 2017 5:42 pm
- Full Name: Nick
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
I did consider alternate RAID configs but this is such a lightly loaded environment that even their former SBS 2008 Server providing the AD DS, DNS, DHCP, Exchange Server and WSUS Roles & Services which were ALL on a single RAID-1 Volume (including the Exchange DB & Log Drives) never hit unacceptable Response Times or Disk Queues... and certainly never came anywhere near these absurd 10,000ms + I/O times.by mkh » Mon Oct 07, 2019 3:07 am
... have you considered it could be because each VM only has the performance of one physical disk to work with in your RAID setup ?
By contrast, the current config has the Exchange Server on its own Spindle/RAID-1 pair and the Exchange DB & Logs on its own Spindle/RAID-1 pair — and when I just rebooted it (while the office is closed and there’s Zero load on anything) it threw a bunch of these 10+ Second I/O Times, including & simultaneously on the DC VM’s different RAID-1 Volume.
To me, this sure appears to be an issue with the Server’s RAID Controller and/or the Server 2019 HV Host’s Drivers for it.
-
- Veteran
- Posts: 528
- Liked: 144 times
- Joined: Aug 20, 2015 9:30 pm
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
You can check the status of mitigations using a PowerShell module https://support.microsoft.com/en-us/hel ... powershellNick-SAC wrote: ↑Oct 08, 2019 3:15 am Frankly (and I am embarrassed to say this) I’m not certain about which of the Spectre/Meltdown mitigations are or aren’t in place anymore!
Between all of the Spectre/Meltdown vulnerability variants, the CVE & ADV articles, the KB Articles & Patches, the Microcode Updates and the Registry mods... this has become one of the most murky morasses of confusion I’ve ever seen!
I haven’t been too, too worried about it because all of the HV Hosts & their respective VMs that we’re administering are all single tenant environments running trusted code and NOBODY has direct access to them other than a very small number of highly trusted Admins – so no one is browsing the Internet with them, etc. and I don’t think we have to worry about the SQL Server spying on the Exchange Server... yet!
-
- Enthusiast
- Posts: 76
- Liked: 16 times
- Joined: Oct 27, 2017 5:42 pm
- Full Name: Nick
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
nmdange
Thanks for Powershell tip. Very helpful...
Given that some of these Meltdown/Spectre vulnerabilities need to be addressed (or not) on a case-by-case environment basis (to mitigate or not to mitigate, that is the question) and I haven’t yet seen which of the patches might be the problem... I’m going to leave them as is, at least until I hear back from Dell, Microsoft or Veeam (all of whom I’ve opened support cases with on this issue.
FWIW, the Veeam Case # is 03805975 (and I just sent them a bunch of Logs, etc.)
Thanks for Powershell tip. Very helpful...
Given that some of these Meltdown/Spectre vulnerabilities need to be addressed (or not) on a case-by-case environment basis (to mitigate or not to mitigate, that is the question) and I haven’t yet seen which of the patches might be the problem... I’m going to leave them as is, at least until I hear back from Dell, Microsoft or Veeam (all of whom I’ve opened support cases with on this issue.
FWIW, the Veeam Case # is 03805975 (and I just sent them a bunch of Logs, etc.)
-
- Enthusiast
- Posts: 76
- Liked: 16 times
- Joined: Oct 27, 2017 5:42 pm
- Full Name: Nick
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
It’s now clearly evident that the I/O Delay problem that I’m seeing has nothing to do with Veeam.
After completely uninstalling VAW and VBR, including ALL of its sub-components (other than SQL Server) from the Server 2019 Hyper-V Host; the I/O problem reoccurred (simultaneously, 4 times on both active VMs and 3 of their VHDs spanning all 3 RAID-1 Volumes).
So, I’m back to believing that it’s either the:
Dell/MS Server 2019 Driver for this RAID Controller or
Some incompatibility between this RAID Controller and the HDDs that are holding the VMs VHDs or
The RAID Controller itself is defective (a long shot to be sure but every once in a while it is the hardware).
After completely uninstalling VAW and VBR, including ALL of its sub-components (other than SQL Server) from the Server 2019 Hyper-V Host; the I/O problem reoccurred (simultaneously, 4 times on both active VMs and 3 of their VHDs spanning all 3 RAID-1 Volumes).
So, I’m back to believing that it’s either the:
Dell/MS Server 2019 Driver for this RAID Controller or
Some incompatibility between this RAID Controller and the HDDs that are holding the VMs VHDs or
The RAID Controller itself is defective (a long shot to be sure but every once in a while it is the hardware).
-
- Service Provider
- Posts: 185
- Liked: 12 times
- Joined: Jan 30, 2018 3:24 pm
- Full Name: Kevin Boddy
- Contact:
[MERGED] Poor I/O performance on VMs protected by Veeam
Hi,
We have seen this I/O performance issue that Gostev mentioned in his blog email with a number of customers on our Hyper-V cluster who run heavily loaded SQL instances. I have logged a support request #03870293 to try find out more but it doesn't seem like anyone actually knows what is going on.
I know it's not a Veeam issue and relates to RCT in Windows but surely Veeam initially assisted customers with this issue and have some kind of understanding around what causes it?
Would removing any of the spectre/meltdown patches help at all? How are other customers dealing with this issue?
Thanks
We have seen this I/O performance issue that Gostev mentioned in his blog email with a number of customers on our Hyper-V cluster who run heavily loaded SQL instances. I have logged a support request #03870293 to try find out more but it doesn't seem like anyone actually knows what is going on.
I know it's not a Veeam issue and relates to RCT in Windows but surely Veeam initially assisted customers with this issue and have some kind of understanding around what causes it?
Would removing any of the spectre/meltdown patches help at all? How are other customers dealing with this issue?
Thanks
-
- Lurker
- Posts: 2
- Liked: 1 time
- Joined: Dec 09, 2019 10:33 am
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hi @Nick-SAC,
have you been able to solve your problem? I've the same issue with an Hyper-V 2019 Server, but on a HP Gen10 Server with 4x 2TB SSD HDDs.
The customer is facing extreme big latency issues with his fileserver and I see Event ID 9 at the Hyper-V-StorageVSP Eventlog with error messages like this:
An I/O request for device 'D:\Hyper-V\Virtual Hard Disks\MyServername-02.vhdx' took 172343 milliseconds to complete. Operation code = SYNCHRONIZE CACHE, Data transfer length = 0, Status = SRB_STATUS_SUCCESS.
Best regards,
Johnny
have you been able to solve your problem? I've the same issue with an Hyper-V 2019 Server, but on a HP Gen10 Server with 4x 2TB SSD HDDs.
The customer is facing extreme big latency issues with his fileserver and I see Event ID 9 at the Hyper-V-StorageVSP Eventlog with error messages like this:
An I/O request for device 'D:\Hyper-V\Virtual Hard Disks\MyServername-02.vhdx' took 172343 milliseconds to complete. Operation code = SYNCHRONIZE CACHE, Data transfer length = 0, Status = SRB_STATUS_SUCCESS.
Best regards,
Johnny
-
- Enthusiast
- Posts: 76
- Liked: 16 times
- Joined: Oct 27, 2017 5:42 pm
- Full Name: Nick
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hi Johnny,
Sadly no, despite a slew of testing and countless interactions with Dell and Microsoft Support, I have not had any success in solving the I/O Delay problem.
I’ll spare you the details at the moment for the sake of brevity but it boils down to this:
The I/O Delay Events are still very intermittent, sometime not occurring at all for up to 6 Days... and then occurring several times in a single day – at no discernible time, pattern or correlation with any other Event, Task or Process.
When the I/O Delays do occur, they often do so in a group, i.e., the Event Log may show 10 or 20 simultaneous entries where each one is pointing to a different VM and its different VHDs which reside on different RAID-1 Volumes.
1) The problem occurs on all of the running VM's VHDs that reside on all of the RAID Volumes with all of the HDDs (SAS and NLSAS).
2) The problem has never occurred on the Hyper-V Host itself, i.e., it appears in the Host's Event Logs but only as related to the VHDs of running VMs and never any of the Host’s own OS Files or Processes, etc.
3) The problem did not occur with a Test VM whose VHDs resided on an iSCSI attached NAS. I’ve no idea what to make of this other than maaaybe the test period just wasn’t long enough to catch one or perhaps Hyper-V recognizes iSCSI attached devices as such and interacts with them differently.
4) It’s not a defective RAID Controller as we’ve just swapped that out and it changed nothing.
Does this correspond with what you’re seeing, particularly that the I/O Delays only occur when accessing the VM's VHDs and not the Host’s own OS File access?
Also, is there any chance that you typo’d your stated Delay Time? We’re seeing them at a very consistent 10,000 to 14,000ms (a bad enough 10-14 Seconds) but if you’re really getting 172,343ms that’s in the 3 Minutes go-get-a-cup-of-coffee YIKES range!
Thanks,
Nick
Sadly no, despite a slew of testing and countless interactions with Dell and Microsoft Support, I have not had any success in solving the I/O Delay problem.
I’ll spare you the details at the moment for the sake of brevity but it boils down to this:
The I/O Delay Events are still very intermittent, sometime not occurring at all for up to 6 Days... and then occurring several times in a single day – at no discernible time, pattern or correlation with any other Event, Task or Process.
When the I/O Delays do occur, they often do so in a group, i.e., the Event Log may show 10 or 20 simultaneous entries where each one is pointing to a different VM and its different VHDs which reside on different RAID-1 Volumes.
1) The problem occurs on all of the running VM's VHDs that reside on all of the RAID Volumes with all of the HDDs (SAS and NLSAS).
2) The problem has never occurred on the Hyper-V Host itself, i.e., it appears in the Host's Event Logs but only as related to the VHDs of running VMs and never any of the Host’s own OS Files or Processes, etc.
3) The problem did not occur with a Test VM whose VHDs resided on an iSCSI attached NAS. I’ve no idea what to make of this other than maaaybe the test period just wasn’t long enough to catch one or perhaps Hyper-V recognizes iSCSI attached devices as such and interacts with them differently.
4) It’s not a defective RAID Controller as we’ve just swapped that out and it changed nothing.
Does this correspond with what you’re seeing, particularly that the I/O Delays only occur when accessing the VM's VHDs and not the Host’s own OS File access?
Also, is there any chance that you typo’d your stated Delay Time? We’re seeing them at a very consistent 10,000 to 14,000ms (a bad enough 10-14 Seconds) but if you’re really getting 172,343ms that’s in the 3 Minutes go-get-a-cup-of-coffee YIKES range!
Thanks,
Nick
-
- Lurker
- Posts: 1
- Liked: never
- Joined: Dec 04, 2019 1:40 pm
- Full Name: Matthew McCord
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
I think we're running into the same problem. We are running Hyper-V Server 2016 on a Dell T630 w/ H730P. Slowdown for us starts about 5-20 minutes after an incremental backup is finished, and continues for another 5-20 minutes. During this time, there is no physical disk queue on the host, but the VM and vhdx indicate huge disk queues. Writes get delayed into the 10-15 second range, enough to trigger the 'delayed write' event log entries from MSSQL. Obviously, performance on the database goes to hell while it's happening.
This all happens well after the backup is completed and the checkpoint is merged back into the vhdx. Opened a case with veeam today, but my gut feeling is a hyper-v problem.
This all happens well after the backup is completed and the checkpoint is merged back into the vhdx. Opened a case with veeam today, but my gut feeling is a hyper-v problem.
-
- Veteran
- Posts: 3077
- Liked: 455 times
- Joined: Aug 07, 2018 3:11 pm
- Full Name: Fedor Maslov
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hi Matthew,
Could you please share your case ID with us?
Thanks
Could you please share your case ID with us?
Thanks
-
- Enthusiast
- Posts: 29
- Liked: 18 times
- Joined: Dec 09, 2019 5:41 pm
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Just wanted to leave a "me too":
Running a 4-node azurestack HCI solution (aka S2D-Cluster) and experiencing the same issues with it roughly since August. Logged events like the delayed IO requests (Event 9), from the vhdx, but also Event 1020 from SMB Server: "File System Operation has taken longer than expected." Here i see delays up to 25 seconds. Often, almost everytime, these errors occur when one of the nodes is pused due to maintenance like CAU or baseline updates. Sometimes the VM roles are switched off and restarted from RCM. Checked a lot of Things, Hardware vendor does not see a related issue. Now investigating in all kind of directions, pretty sure this is NOT a Veeam related issue, The Event 1020 from SMBServer Reports: "The underlying file system has taken too long to respond to an operation. This typically indicates a problem with the storage and not SMB" - and I think so, too. I do not know, but have a slight suspect that this is filesystem /ReFS related, because I do see the same events in my storage spaces servers, too. All Systems are WS2019 LTSC.
Running a 4-node azurestack HCI solution (aka S2D-Cluster) and experiencing the same issues with it roughly since August. Logged events like the delayed IO requests (Event 9), from the vhdx, but also Event 1020 from SMB Server: "File System Operation has taken longer than expected." Here i see delays up to 25 seconds. Often, almost everytime, these errors occur when one of the nodes is pused due to maintenance like CAU or baseline updates. Sometimes the VM roles are switched off and restarted from RCM. Checked a lot of Things, Hardware vendor does not see a related issue. Now investigating in all kind of directions, pretty sure this is NOT a Veeam related issue, The Event 1020 from SMBServer Reports: "The underlying file system has taken too long to respond to an operation. This typically indicates a problem with the storage and not SMB" - and I think so, too. I do not know, but have a slight suspect that this is filesystem /ReFS related, because I do see the same events in my storage spaces servers, too. All Systems are WS2019 LTSC.
-
- Lurker
- Posts: 2
- Liked: 1 time
- Joined: Dec 09, 2019 10:33 am
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
In our case the Microsoft Case found out, that our issues go along with the Deduplication.
We disabled dedup at the Hyper-V hosts and the problems are gone.
We disabled dedup at the Hyper-V hosts and the problems are gone.
-
- Service Provider
- Posts: 12
- Liked: 8 times
- Joined: Sep 14, 2016 12:04 pm
- Contact:
Re: [MERGED] Poor I/O performance on VMs protected by Veeam
It's not related to spectre/meltdown patches. I have tried Windows Server 2016 Technical Preview 2 and it's also affected.kevin.boddy wrote: ↑Dec 09, 2019 9:50 am I know it's not a Veeam issue and relates to RCT in Windows but surely Veeam initially assisted customers with this issue and have some kind of understanding around what causes it?
Would removing any of the spectre/meltdown patches help at all? How are other customers dealing with this issue?
Lot of testing - and looks like this is just Microsoft poor design of RCT. I don't believe it will be fixed in 2016 and 2019
Veeam, please make your own CBT Driver supported for 2016 and 2019.
-
- Lurker
- Posts: 1
- Liked: never
- Joined: Feb 07, 2020 8:46 am
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hi.
We have the same problem in 4 different clusters after Veeam Backup, two different i-SCSI SAN (Dell Compellent & Hitachi VSP G370).
Hosts: DELL R640 (latest patches)
OS: Windows Server 2019 (latest patches)
Storage: CSV, I-SCSI, Dell Compellent & Hitachi VSP G370
Backup: Veeam 9.5 u4
We have found a work around, if we move the VM that has problems (usually the SQL data disk, but no other disks are affected, weird) to another Host then we get normal performance again.
We have a script that checks the disk performance every two hours, and finds it a server (VM) with poor performance, then the script moves the server to another host in the cluster.
We suffer quite a lot from this.
/Mats
We have the same problem in 4 different clusters after Veeam Backup, two different i-SCSI SAN (Dell Compellent & Hitachi VSP G370).
Hosts: DELL R640 (latest patches)
OS: Windows Server 2019 (latest patches)
Storage: CSV, I-SCSI, Dell Compellent & Hitachi VSP G370
Backup: Veeam 9.5 u4
We have found a work around, if we move the VM that has problems (usually the SQL data disk, but no other disks are affected, weird) to another Host then we get normal performance again.
We have a script that checks the disk performance every two hours, and finds it a server (VM) with poor performance, then the script moves the server to another host in the cluster.
We suffer quite a lot from this.
/Mats
-
- Lurker
- Posts: 1
- Liked: never
- Joined: Mar 30, 2020 8:04 pm
- Full Name: Jim Gandy
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Is there any update to this issue? I am also seeing these event ID 9's
-
- Service Provider
- Posts: 1
- Liked: never
- Joined: May 04, 2017 8:31 am
- Full Name: Aragorn Labs
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Has someone any update?
I'm facing the exact same issue of you "An I/O request for device 'xxxxxxxxxx.vhdx' took xxxxx milliseconds to complete. Operation code = WRITE, Data transfer length = xxxx, Status = SRB_STATUS_SUCCESS."
It doesn't occur on Veeam operations themselves but appear randomly and surely against I/O massive operations like standard file backups, maybe causing big compression operations to fail.
I'm on Dell PowerEdge T340 with PERC 740P and RAID10 SAS 10k volume. All firmware perfectly updated, only Toshiba drives are not.
Dell support states that hardware is ok.
Thanks.
Nicola Farina
Aragorn Labs
I'm facing the exact same issue of you "An I/O request for device 'xxxxxxxxxx.vhdx' took xxxxx milliseconds to complete. Operation code = WRITE, Data transfer length = xxxx, Status = SRB_STATUS_SUCCESS."
It doesn't occur on Veeam operations themselves but appear randomly and surely against I/O massive operations like standard file backups, maybe causing big compression operations to fail.
I'm on Dell PowerEdge T340 with PERC 740P and RAID10 SAS 10k volume. All firmware perfectly updated, only Toshiba drives are not.
Dell support states that hardware is ok.
Thanks.
Nicola Farina
Aragorn Labs
-
- Influencer
- Posts: 12
- Liked: 3 times
- Joined: Aug 27, 2019 8:55 am
- Full Name: LeslieUC
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hi at Q3/Q4 of 2019 we also made a case (03857505 ) about the cbt and rct with a referral to the word of gustev.
We were seeing a huge performance degradation if we did a hyperv on host backup.
Ever since we do an agent backup of our hyperv SQL and exchange. It’s not ideal but without performance loss.
Veeams response:
As mentioned by colleague of mine we have worked with a similar issue before.
In hyper-v 2016/2019 MS introduced their own technologie for tracking changed blocks, so that backup solutions can gather this info during backup jobs. To get this info, a so-called recovery checkpoint has to be created. This type of checkpoint was designed specifically for the backup purposes and it generates RCT files next to vm disks. This files hold info about changed blocks.
During our previous testing we noticed that Random write operations specifically are taking longer after these type of files appear.
Way of reproducing the issue was similar to one that you've described in the email:
1. diskspd test for random write to FILE1
2. recovery checkoint creation
3. recovery checkoint deletion
4. on step RCT files are already created, but after step 3 there is no checkpoint itself already; think of it as backup job does not run anymore
5. run same test agains FILE1
6. Performance is lower than it was
7. same test was executed against new file. Lets say FILE2, while RCT files are still in place. Performance was back to original.
8. RCT files removed manually. Test with FILE1 executed again. Performance is back to normal.
We made a conclusion that this affects the files which are already on the disk. So most of production applications might suffer from performance degradation.
Microsoft case was created and after a while they have accepted this behavior as a problem with a promise to fix it in future versions, however there is no ETA. This fact is very unfortunate for us too.
On the other hand, we made few experiments with using another type of checkpoint during backup. This 'new' type is the same as you would use during checkppoint creation via HV manager. This checkpoint does not trigger RCT files creation, thus does not cause performance degradation.
As a downside, it does not allow to use Change block tracking, which might sagnificantly increase backup window.
This scenario still requires additional testing, since this type of checkpoints were not initially designed for backup purposes. So it might be a part of further releases. Also we rely on Microsoft in case this issues is addressed in any of upcoming updates.
We were seeing a huge performance degradation if we did a hyperv on host backup.
Ever since we do an agent backup of our hyperv SQL and exchange. It’s not ideal but without performance loss.
Veeams response:
As mentioned by colleague of mine we have worked with a similar issue before.
In hyper-v 2016/2019 MS introduced their own technologie for tracking changed blocks, so that backup solutions can gather this info during backup jobs. To get this info, a so-called recovery checkpoint has to be created. This type of checkpoint was designed specifically for the backup purposes and it generates RCT files next to vm disks. This files hold info about changed blocks.
During our previous testing we noticed that Random write operations specifically are taking longer after these type of files appear.
Way of reproducing the issue was similar to one that you've described in the email:
1. diskspd test for random write to FILE1
2. recovery checkoint creation
3. recovery checkoint deletion
4. on step RCT files are already created, but after step 3 there is no checkpoint itself already; think of it as backup job does not run anymore
5. run same test agains FILE1
6. Performance is lower than it was
7. same test was executed against new file. Lets say FILE2, while RCT files are still in place. Performance was back to original.
8. RCT files removed manually. Test with FILE1 executed again. Performance is back to normal.
We made a conclusion that this affects the files which are already on the disk. So most of production applications might suffer from performance degradation.
Microsoft case was created and after a while they have accepted this behavior as a problem with a promise to fix it in future versions, however there is no ETA. This fact is very unfortunate for us too.
On the other hand, we made few experiments with using another type of checkpoint during backup. This 'new' type is the same as you would use during checkppoint creation via HV manager. This checkpoint does not trigger RCT files creation, thus does not cause performance degradation.
As a downside, it does not allow to use Change block tracking, which might sagnificantly increase backup window.
This scenario still requires additional testing, since this type of checkpoints were not initially designed for backup purposes. So it might be a part of further releases. Also we rely on Microsoft in case this issues is addressed in any of upcoming updates.
-
- Veteran
- Posts: 465
- Liked: 136 times
- Joined: Jul 16, 2015 1:31 pm
- Full Name: Marc K
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Thanks for posting the details.
-
- Lurker
- Posts: 2
- Liked: never
- Joined: May 17, 2020 1:08 pm
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
Hey Guys,
I haven't done near as much research as everyone else on here but we are running an FX2 Chassis with 2 Blade servers, 2x 10Gbps ISCSI paths back to a Dell Compellent with MPIO enabled on each blade and we are seeing "random" event ID 9s indicating latency on various virtual machines across both hosts. Some googling led me to this post here.
Has anyone tested this without Veeam running and made the issue recur just by taking a recovery snapshot? I googled a bit more and the ONLY place talking about this issue is this one thread on Veeam's forums. I would have expected something more wide spread if its not specifically caused by the way Veeam is interacting.
Are we able to disable RCT easily without breaking existing backup sets that Veeam has?
I haven't done near as much research as everyone else on here but we are running an FX2 Chassis with 2 Blade servers, 2x 10Gbps ISCSI paths back to a Dell Compellent with MPIO enabled on each blade and we are seeing "random" event ID 9s indicating latency on various virtual machines across both hosts. Some googling led me to this post here.
Has anyone tested this without Veeam running and made the issue recur just by taking a recovery snapshot? I googled a bit more and the ONLY place talking about this issue is this one thread on Veeam's forums. I would have expected something more wide spread if its not specifically caused by the way Veeam is interacting.
Are we able to disable RCT easily without breaking existing backup sets that Veeam has?
-
- Enthusiast
- Posts: 76
- Liked: 16 times
- Joined: Oct 27, 2017 5:42 pm
- Full Name: Nick
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
MGREENINTELLICOMP
As I noted in my initial post, we’re seeing these I/O delays during times that VBR is NOT Running, i.e., not actively doing backups, however here’s an email exchange between myself and a Veeam Tech re the possibility of the CBT/RCT issue being involved...
--------------------------------------------------------->snip<-----------------------------------------------
[My Message to the Veeam Tech]
Over on the forum thread I started on this issue ( microsoft-hyper-v-f25/windows-server-20 ... 62112.html ) I see a number of folks now saying that their I/O delays have been identified as a Changed Block Tracking (CBT/RCT) issue (in fact that's what Gostev mentioned in the very first paragraph of the post where I quoted him).
Now, as you may recall, at one point I uninstalled VB&R from the Hyper-V Host that's experiencing the I/O Delays (which is also the Host for the Production VM's) however, what just occurred to me is that I then installed VB&R on another Hyper-V Host to use as a temporary Backup Server but I was still using CBT on the Backup Jobs!
Now, as I've previously noted; these I/O delays are NOT occurring DURING the Backup Jobs, however, I see where the VHDX.RCT files get updated at times that are also NOT during the Backups, so...
I frankly don't know squat about the inner workings of CBT & RCT, ETC... but do you think it's possible that simply having CBT enabled on the Backup Jobs & thus having Veeam's associated Hyper-V CBT Integration component deployed on the Host & working with Microsoft's RCT component could be causing it?
And, if there's even a remote theoretical possibility that it could be... can I just disable CBT in the Jobs or would I need to do something else to neuter the Hyper-V Host's use of RCT?
--------------------------------------------------------->snip<-----------------------------------------------
[Veeam Tech’s Reply]
Great questions. RCT basically takes over for Veeam's proprietary CBT mechanism that was developed pre-HV2016.
RCT now combines with VM reference points to determine the changes in the VM in what Microsoft calls the "most efficient manner". Starting in the VM hardware version 8 and above, RCT is enabled by default and as far as I understand it, it cannot be disabled. You can certainly disable Veeam CBT using the GUI but this would only affect the original Veeam CBT file system driver for pre-HV2016 hosts/cluster.
There's some additional information in the user guide about it:
https://helpcenter.veeam.com/docs/backu ... r=95u4#rct
It's possible that it could be causing the issue but I do not think there is a way to disable it aside from down leveling either the Hosts/Cluster or VM HW versions, which don't sound like good solutions. You may be able to set up a new VM with HW7 or below then try testing with that but it writes a lot easier than in practice I'm sure.
--------------------------------------------------------->snip<-----------------------------------------------
I did do some searching to see if it was possible to disable RCT on a 2019 Server but I wasn’t able to find any way to do so.
And I haven’t had the time to try setting up a Version 7 VM yet...
Nick
As I noted in my initial post, we’re seeing these I/O delays during times that VBR is NOT Running, i.e., not actively doing backups, however here’s an email exchange between myself and a Veeam Tech re the possibility of the CBT/RCT issue being involved...
--------------------------------------------------------->snip<-----------------------------------------------
[My Message to the Veeam Tech]
Over on the forum thread I started on this issue ( microsoft-hyper-v-f25/windows-server-20 ... 62112.html ) I see a number of folks now saying that their I/O delays have been identified as a Changed Block Tracking (CBT/RCT) issue (in fact that's what Gostev mentioned in the very first paragraph of the post where I quoted him).
Now, as you may recall, at one point I uninstalled VB&R from the Hyper-V Host that's experiencing the I/O Delays (which is also the Host for the Production VM's) however, what just occurred to me is that I then installed VB&R on another Hyper-V Host to use as a temporary Backup Server but I was still using CBT on the Backup Jobs!
Now, as I've previously noted; these I/O delays are NOT occurring DURING the Backup Jobs, however, I see where the VHDX.RCT files get updated at times that are also NOT during the Backups, so...
I frankly don't know squat about the inner workings of CBT & RCT, ETC... but do you think it's possible that simply having CBT enabled on the Backup Jobs & thus having Veeam's associated Hyper-V CBT Integration component deployed on the Host & working with Microsoft's RCT component could be causing it?
And, if there's even a remote theoretical possibility that it could be... can I just disable CBT in the Jobs or would I need to do something else to neuter the Hyper-V Host's use of RCT?
--------------------------------------------------------->snip<-----------------------------------------------
[Veeam Tech’s Reply]
Great questions. RCT basically takes over for Veeam's proprietary CBT mechanism that was developed pre-HV2016.
RCT now combines with VM reference points to determine the changes in the VM in what Microsoft calls the "most efficient manner". Starting in the VM hardware version 8 and above, RCT is enabled by default and as far as I understand it, it cannot be disabled. You can certainly disable Veeam CBT using the GUI but this would only affect the original Veeam CBT file system driver for pre-HV2016 hosts/cluster.
There's some additional information in the user guide about it:
https://helpcenter.veeam.com/docs/backu ... r=95u4#rct
It's possible that it could be causing the issue but I do not think there is a way to disable it aside from down leveling either the Hosts/Cluster or VM HW versions, which don't sound like good solutions. You may be able to set up a new VM with HW7 or below then try testing with that but it writes a lot easier than in practice I'm sure.
--------------------------------------------------------->snip<-----------------------------------------------
I did do some searching to see if it was possible to disable RCT on a 2019 Server but I wasn’t able to find any way to do so.
And I haven’t had the time to try setting up a Version 7 VM yet...
Nick
-
- Veteran
- Posts: 528
- Liked: 144 times
- Joined: Aug 20, 2015 9:30 pm
- Contact:
Re: Windows Server 2019 Hyper-V VM I/O Performance Problem
FYI you would want to use version 5.0 which is the version that corresponds to Windows Server 2012 R2. everything between 5.0 and 8.0 was used on early releases of Windows 10 and technical previews of Windows Server 2016.
Who is online
Users browsing this forum: Amazon [Bot] and 14 guests