Discussions specific to the Microsoft Hyper-V hypervisor
Post Reply
Nick-SAC
Enthusiast
Posts: 44
Liked: 7 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC » Sep 30, 2019 5:36 pm

In yesterday’s Word from Gostev Forum Digest he stated:

Important Hyper-V news: for the past few months, we've been working on a strange issue with a few customers: poor I/O performance on VMs protected by Veeam. ... our support was able to isolate the issue to VMs with Resilient Changed Tracking (RCT) enabled. So we've opened a case with Microsoft ... and finally got a solid answer last week. They were able to reproduce noticeable and abnormal I/O performance degradation from enabling RCT on a VM, and believe this is the result of security changes made as a part of Meltdown/Spectre patches. So they're now tracking this as a bug to be fixed.

And since we’ve been troubleshooting an inexplicable Hyper-V VM I/O Performance problem for the last 2 months – even though it is not occurring during the VBR backups – I figured I’d throw the situation & details up here just in case there is some connection...


The Configuration:

Dell PowerEdge R540 Server
Dual 8 Core CPUs
64 GB RAM
Dell PERC H730P RAID Controller w/2GB Cache
8 HDDs configured as 4 RAID-1 Volumes
iSCSI attached NAS as the VBR Primary Repository (ReFS)

Windows Server 2019 Hyper-V Host
Windows Server 2019 VM [Domain Controller]
Windows Server 2016 VM [Exchange Server 2016]
Windows Server 2012R2 VM [Exchange Server 2013] (this was a temp install used to migrate from SBS2008/Exchange2007 and is now shut down)

The Host and each of the VMs are on their own dedicated RAID-1 Volume
The Exchange Server 2016 Database & Logs are now on their own dedicated RAID-1 Volume

VB&R is installed as an All-In-One configuration on the Hyper-V Host

This is currently a very lightly loaded environment; with 70 active users (1/2 of which are sparse email only) and only processing a total of several hundred emails a day and with 5 to 10 people working on Word docs & Excel sheets.


The Problem:

Periodically the Hyper-V Host will Log a simultaneous sequence of 5 to 15 of these events:
-------------------------------------------------------------------------------------------------------------
Log Name: Microsoft-Windows-Hyper-V-StorageVSP-Admin
Source: Microsoft-Windows-Hyper-V-StorageVSP
Date: 7/27/2019 2:08:58 AM
Event ID: 9
Task Category: None
Level: Warning
Keywords:
User: N/A
Computer: <Server Name>
Description:
An I/O request for device 'E:\Hyper-V\<Server Name>\Virtual Hard Disks\<Server Name>.vhdx' took 13470 milliseconds to complete.
Operation code = WRITE16, Data transfer length = 4096, Status = SRB_STATUS_SUCCESS.
-------------------------------------------------------------------------------------------------------------

The events are tied to all of the VMs (first the 3 and now the 2 that are still active) and all of their VHDs (in no apparent order or sequence). It has never occurred on the Host’s RAID Volume.

The delay times always run a remarkably consistent 10 to 14 seconds (10,000 to 14,000ms) in fact the times are so consistent that I thought it might be a Power Management Disk Suspend/Spin-up issue but that doesn’t appear to be the case.

There is no apparent rhyme or reason to the occurrence of these events, i.e., I can’t find any other process or task which coincides with them and sometimes they’ll occur several times in one day & at random hours (including at night when there is no user activity at all) and then not do it at all for several days?!

Dell Tech Support is stumped and is now recommending that we take it to Microsoft and I was just about to do that when I saw Gostev’s message which has me wondering... Could this be the same RCT-Spectre/Meltdown Patch bug even though it’s not occurring during the backups?

HannesK
Veeam Software
Posts: 4492
Liked: 565 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by HannesK » Oct 01, 2019 5:25 am

Hello,
I would say it's impossible to give a good answer without logs. So please open a case and post the case number here for reference.

The issue I see: even if you know the answer... there is no solution at this point in time. So honestly I would just wait for Microsoft updates as Veeam support cannot really help you anyway.

Best regards,
Hannes

Nick-SAC
Enthusiast
Posts: 44
Liked: 7 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC » Oct 01, 2019 1:03 pm

Thanks Hannes but all I was really looking for here was feedback as to whether this may be the same bug that Gostev spoke of, i.e., if you are only seeing the I/O problem during the Backups (in which case my issue may be unrelated and I’ll contact Microsoft now) or if you’re seeing the same thing as I described (in which case I’ll just wait for Microsoft to fix it).

Thanks again,
Nick

HannesK
Veeam Software
Posts: 4492
Liked: 565 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by HannesK » Oct 01, 2019 2:08 pm

yes it may be :-)

Nick-SAC
Enthusiast
Posts: 44
Liked: 7 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC » Oct 01, 2019 2:15 pm

Close enough... I'll go with a 'definite maybe'... :wink:

Thanks again
Nick

WorkingHardInIt
Veeam Vanguard
Posts: 26
Liked: 4 times
Joined: Feb 14, 2014 1:27 pm
Full Name: Didier Van Hoye
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by WorkingHardInIt » Oct 02, 2019 10:25 am

Have you checked the Meltdown/Spectre mitigations are in place? Can you disable them and compare with/without? We have not seen the issue and are 100% patched. What BIOS versions are you running? I have been in contact with MSFT about this but I am waiting to see if they have any scenarios to test to reproduce/mitigate.

collinp
Expert
Posts: 152
Liked: 10 times
Joined: Feb 14, 2012 8:56 pm
Full Name: Collin P
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by collinp » Oct 02, 2019 4:50 pm

I'm seeing a huge jump in latency on source storage during backups and our backup windows are extended. We've had to implement throttling as a result. Does anyone know which patch it is that we would need to uninstall on the Hyper-V hosts to return to normal?

mkaec
Expert
Posts: 327
Liked: 76 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mkaec » Oct 04, 2019 4:34 pm

I too am interested in more details about the issue Gostev mentioned. It sounds like Hyper-V performance could take a dive simply by activating RCT. But under what circumstances?

mkh
Service Provider
Posts: 5
Liked: never
Joined: Apr 20, 2018 6:17 am
Full Name: Michael Høyer
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mkh » Oct 07, 2019 7:07 am

totally unrelated to the discussion about the I/O performance problem bug, but have you considered it could be because each VM only has the performance of one physical disk to work with in your RAID setup ?
with your hardware, i would have done a RAID 10 of all the disks, and put all the VM's on that, unless you have specific reasons to split it up?

Nick-SAC
Enthusiast
Posts: 44
Liked: 7 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC » Oct 08, 2019 3:15 am

Post by WorkingHardInIt » Wed Oct 02, 2019 6:25 am
Have you checked the Meltdown/Spectre mitigations are in place?
Frankly (and I am embarrassed to say this) I’m not certain about which of the Spectre/Meltdown mitigations are or aren’t in place anymore!

Between all of the Spectre/Meltdown vulnerability variants, the CVE & ADV articles, the KB Articles & Patches, the Microcode Updates and the Registry mods... this has become one of the most murky morasses of confusion I’ve ever seen!

I haven’t been too, too worried about it because all of the HV Hosts & their respective VMs that we’re administering are all single tenant environments running trusted code and NOBODY has direct access to them other than a very small number of highly trusted Admins – so no one is browsing the Internet with them, etc. and I don’t think we have to worry about the SQL Server spying on the Exchange Server... yet!

Nick-SAC
Enthusiast
Posts: 44
Liked: 7 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC » Oct 08, 2019 4:11 am

by mkh » Mon Oct 07, 2019 3:07 am
... have you considered it could be because each VM only has the performance of one physical disk to work with in your RAID setup ?
I did consider alternate RAID configs but this is such a lightly loaded environment that even their former SBS 2008 Server providing the AD DS, DNS, DHCP, Exchange Server and WSUS Roles & Services which were ALL on a single RAID-1 Volume (including the Exchange DB & Log Drives) never hit unacceptable Response Times or Disk Queues... and certainly never came anywhere near these absurd 10,000ms + I/O times.

By contrast, the current config has the Exchange Server on its own Spindle/RAID-1 pair and the Exchange DB & Logs on its own Spindle/RAID-1 pair — and when I just rebooted it (while the office is closed and there’s Zero load on anything) it threw a bunch of these 10+ Second I/O Times, including & simultaneously on the DC VM’s different RAID-1 Volume.

To me, this sure appears to be an issue with the Server’s RAID Controller and/or the Server 2019 HV Host’s Drivers for it.

nmdange
Expert
Posts: 482
Liked: 121 times
Joined: Aug 20, 2015 9:30 pm
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by nmdange » Oct 08, 2019 1:51 pm

Nick-SAC wrote:
Oct 08, 2019 3:15 am
Frankly (and I am embarrassed to say this) I’m not certain about which of the Spectre/Meltdown mitigations are or aren’t in place anymore!

Between all of the Spectre/Meltdown vulnerability variants, the CVE & ADV articles, the KB Articles & Patches, the Microcode Updates and the Registry mods... this has become one of the most murky morasses of confusion I’ve ever seen!

I haven’t been too, too worried about it because all of the HV Hosts & their respective VMs that we’re administering are all single tenant environments running trusted code and NOBODY has direct access to them other than a very small number of highly trusted Admins – so no one is browsing the Internet with them, etc. and I don’t think we have to worry about the SQL Server spying on the Exchange Server... yet!
You can check the status of mitigations using a PowerShell module https://support.microsoft.com/en-us/hel ... powershell

Nick-SAC
Enthusiast
Posts: 44
Liked: 7 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC » Oct 10, 2019 8:47 pm

nmdange
Thanks for Powershell tip. Very helpful...

Given that some of these Meltdown/Spectre vulnerabilities need to be addressed (or not) on a case-by-case environment basis (to mitigate or not to mitigate, that is the question) and I haven’t yet seen which of the patches might be the problem... I’m going to leave them as is, at least until I hear back from Dell, Microsoft or Veeam (all of whom I’ve opened support cases with on this issue.

FWIW, the Veeam Case # is 03805975 (and I just sent them a bunch of Logs, etc.)

Nick-SAC
Enthusiast
Posts: 44
Liked: 7 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC » Oct 30, 2019 12:22 am 2 people like this post

It’s now clearly evident that the I/O Delay problem that I’m seeing has nothing to do with Veeam.

After completely uninstalling VAW and VBR, including ALL of its sub-components (other than SQL Server) from the Server 2019 Hyper-V Host; the I/O problem reoccurred (simultaneously, 4 times on both active VMs and 3 of their VHDs spanning all 3 RAID-1 Volumes).

So, I’m back to believing that it’s either the:
Dell/MS Server 2019 Driver for this RAID Controller or
Some incompatibility between this RAID Controller and the HDDs that are holding the VMs VHDs or
The RAID Controller itself is defective (a long shot to be sure but every once in a while it is the hardware).

kevin.boddy
Service Provider
Posts: 12
Liked: never
Joined: Jan 30, 2018 3:24 pm
Full Name: Kevin Boddy
Contact:

[MERGED] Poor I/O performance on VMs protected by Veeam

Post by kevin.boddy » Dec 09, 2019 9:50 am

Hi,

We have seen this I/O performance issue that Gostev mentioned in his blog email with a number of customers on our Hyper-V cluster who run heavily loaded SQL instances. I have logged a support request #03870293 to try find out more but it doesn't seem like anyone actually knows what is going on.

I know it's not a Veeam issue and relates to RCT in Windows but surely Veeam initially assisted customers with this issue and have some kind of understanding around what causes it?

Would removing any of the spectre/meltdown patches help at all? How are other customers dealing with this issue?

Thanks

JohnnyB
Lurker
Posts: 2
Liked: never
Joined: Dec 09, 2019 10:33 am
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by JohnnyB » Dec 09, 2019 10:36 am

Hi @Nick-SAC,

have you been able to solve your problem? I've the same issue with an Hyper-V 2019 Server, but on a HP Gen10 Server with 4x 2TB SSD HDDs.
The customer is facing extreme big latency issues with his fileserver and I see Event ID 9 at the Hyper-V-StorageVSP Eventlog with error messages like this:
An I/O request for device 'D:\Hyper-V\Virtual Hard Disks\MyServername-02.vhdx' took 172343 milliseconds to complete. Operation code = SYNCHRONIZE CACHE, Data transfer length = 0, Status = SRB_STATUS_SUCCESS.

Best regards,
Johnny

Nick-SAC
Enthusiast
Posts: 44
Liked: 7 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC » Dec 10, 2019 5:30 am

Hi Johnny,

Sadly no, despite a slew of testing and countless interactions with Dell and Microsoft Support, I have not had any success in solving the I/O Delay problem.

I’ll spare you the details at the moment for the sake of brevity but it boils down to this:

The I/O Delay Events are still very intermittent, sometime not occurring at all for up to 6 Days... and then occurring several times in a single day – at no discernible time, pattern or correlation with any other Event, Task or Process.

When the I/O Delays do occur, they often do so in a group, i.e., the Event Log may show 10 or 20 simultaneous entries where each one is pointing to a different VM and its different VHDs which reside on different RAID-1 Volumes.


1) The problem occurs on all of the running VM's VHDs that reside on all of the RAID Volumes with all of the HDDs (SAS and NLSAS).

2) The problem has never occurred on the Hyper-V Host itself, i.e., it appears in the Host's Event Logs but only as related to the VHDs of running VMs and never any of the Host’s own OS Files or Processes, etc.

3) The problem did not occur with a Test VM whose VHDs resided on an iSCSI attached NAS. I’ve no idea what to make of this other than maaaybe the test period just wasn’t long enough to catch one or perhaps Hyper-V recognizes iSCSI attached devices as such and interacts with them differently.

4) It’s not a defective RAID Controller as we’ve just swapped that out and it changed nothing.


Does this correspond with what you’re seeing, particularly that the I/O Delays only occur when accessing the VM's VHDs and not the Host’s own OS File access?

Also, is there any chance that you typo’d your stated Delay Time? We’re seeing them at a very consistent 10,000 to 14,000ms (a bad enough 10-14 Seconds) but if you’re really getting 172,343ms that’s in the 3 Minutes go-get-a-cup-of-coffee YIKES range! :shock:

Thanks,
Nick

mmccord
Lurker
Posts: 1
Liked: never
Joined: Dec 04, 2019 1:40 pm
Full Name: Matthew McCord
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mmccord » Dec 13, 2019 6:03 pm

I think we're running into the same problem. We are running Hyper-V Server 2016 on a Dell T630 w/ H730P. Slowdown for us starts about 5-20 minutes after an incremental backup is finished, and continues for another 5-20 minutes. During this time, there is no physical disk queue on the host, but the VM and vhdx indicate huge disk queues. Writes get delayed into the 10-15 second range, enough to trigger the 'delayed write' event log entries from MSSQL. Obviously, performance on the database goes to hell while it's happening.

This all happens well after the backup is completed and the checkpoint is merged back into the vhdx. Opened a case with veeam today, but my gut feeling is a hyper-v problem.

wishr
Veeam Software
Posts: 1412
Liked: 142 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by wishr » Dec 16, 2019 9:59 am

Hi Matthew,

Could you please share your case ID with us?

Thanks

Markus M.
Novice
Posts: 3
Liked: never
Joined: Dec 09, 2019 5:41 pm
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Markus M. » Dec 16, 2019 4:57 pm

Just wanted to leave a "me too":
Running a 4-node azurestack HCI solution (aka S2D-Cluster) and experiencing the same issues with it roughly since August. Logged events like the delayed IO requests (Event 9), from the vhdx, but also Event 1020 from SMB Server: "File System Operation has taken longer than expected." Here i see delays up to 25 seconds. Often, almost everytime, these errors occur when one of the nodes is pused due to maintenance like CAU or baseline updates. Sometimes the VM roles are switched off and restarted from RCM. Checked a lot of Things, Hardware vendor does not see a related issue. Now investigating in all kind of directions, pretty sure this is NOT a Veeam related issue, The Event 1020 from SMBServer Reports: "The underlying file system has taken too long to respond to an operation. This typically indicates a problem with the storage and not SMB" - and I think so, too. I do not know, but have a slight suspect that this is filesystem /ReFS related, because I do see the same events in my storage spaces servers, too. All Systems are WS2019 LTSC.

JohnnyB
Lurker
Posts: 2
Liked: never
Joined: Dec 09, 2019 10:33 am
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by JohnnyB » Jan 27, 2020 7:44 am

In our case the Microsoft Case found out, that our issues go along with the Deduplication.
We disabled dedup at the Hyper-V hosts and the problems are gone.

Post Reply

Who is online

Users browsing this forum: No registered users and 11 guests