Host-based backup of Microsoft Hyper-V VMs.
Post Reply
wim17
Novice
Posts: 8
Liked: never
Joined: May 15, 2017 7:42 am
Full Name: Wim van Rooij
Contact:

Disconnecting D drives

Post by wim17 »

Hi Forum,

In our datacenter we sometimes come across the strange thing that D drives from Domain Controller servers disconnect. This rarety usually happens just after Veeam removes the snapshot. My colleagues are urging me to open a support case with Veeam because it is obvious a Veeam bug but I think it is more of a resources problem.
Just after the removal of de snapshot the domain controller with get this error in the event log:
ID 153
The IO operation at logical block address 0x10 for Disk 1 (PDO name: \Device\0000002d) was retried.

Are there any knows cases like this? The funny thing is it will only trigger on Domain controllers.
wim17
Novice
Posts: 8
Liked: never
Joined: May 15, 2017 7:42 am
Full Name: Wim van Rooij
Contact:

Re: Disconnecting D drives

Post by wim17 »

Little edit. The Veeam jobs don't give any errors and provide a good back-up.
tdewin
Veeam Software
Posts: 1775
Liked: 646 times
Joined: Mar 02, 2012 1:40 pm
Full Name: Timothy Dewin
Contact:

Re: Disconnecting D drives

Post by tdewin »

Are you running vsphere 5.5 or older? You might consider Backup From Storage Snapshot and/or upgrading to 6 : https://www.virtualtothecore.com/en/vsp ... hing-past/

In the end we just do a snapshot and remove it. You can see if you can emulate the problem by creating a snapshot and keep it open for the time it takes to do the backup. Then remove the snapshot and see if you get the same issue
wim17
Novice
Posts: 8
Liked: never
Joined: May 15, 2017 7:42 am
Full Name: Wim van Rooij
Contact:

Re: Disconnecting D drives

Post by wim17 »

Thanks for the reply.
We are usieng Hyper-V 2016 and hardware version 5 on the VM's.
I should have said Checkpoint, sorry.
evander
Enthusiast
Posts: 86
Liked: 5 times
Joined: Nov 17, 2011 7:55 am
Contact:

Re: Disconnecting D drives

Post by evander »

I don't believe this is related to Veeam at all but rather just your disk taking a bit of strain during a backup run, regardless of the backup vendor. I checked my Domain Controller and I have a few similar events in my logs too. Mostly its just retries that don't add up to much to worry about.
When you say your D drive is getting disconnected do you mean completely disconnected as in you lose your D drive and have to manually reconnect or just that its seems like the D drive has disconnected because of the errors messages listed above?

If possible I would suggest you migrate your AD virtual Machine to another storage platform or datastore and see if you get the same issues. Is your underlying storage good? Raid levels etc?
wim17
Novice
Posts: 8
Liked: never
Joined: May 15, 2017 7:42 am
Full Name: Wim van Rooij
Contact:

Re: Disconnecting D drives

Post by wim17 »

Really disconnected. We have to reboot the machines in order to get them back. The server itself is still running but with a lot of errors, after the reboot they work fine again. All of the servers are on a 3par that is configered by HP, we never had any troubles with it.
evander
Enthusiast
Posts: 86
Liked: 5 times
Joined: Nov 17, 2011 7:55 am
Contact:

Re: Disconnecting D drives

Post by evander »

Ouch thats not cool. Are the C drive and D drive on different LUNS?
I use VMware rather than Hyper V so cant offer too much advise on the setup of that but is your Domain Controller doing anything other than being a Domain Controller? The weirdest thing for me is that you say it only affects the domain controllers. Are they running on a different Windows Server version to the other servers? I'm thinking the retry value is different on them than the other servers perhaps but if they all share the same underlying infrastructure then they should all equally be affected. Pure Domain Controllers are also not very "busy" servers with large amounts of data changes so backups should be easy to complete

Does this help at all: https://community.spiceworks.com/topic/ ... under-load
wim17
Novice
Posts: 8
Liked: never
Joined: May 15, 2017 7:42 am
Full Name: Wim van Rooij
Contact:

Re: Disconnecting D drives

Post by wim17 »

Thanks for the link, I will check that out.

The Domain Controllers do nothing else then the DC tasks. The Windows version is the same then on other Windows servers. The Domain Controllers are not all in the same job. We host a lot of customers on the same platform and devided the Veeam jobs by Customer, so all of the jobs have a DC, a database server, fileserver, terminal server, etc etc. Almost all of those jobs have had the problem with the DC and not with the other servers. De C: and D: are on the same storage. The problems occurs during the first backup job (so no retry).
nielsengelen
Product Manager
Posts: 5618
Liked: 1177 times
Joined: Jul 15, 2013 11:09 am
Full Name: Niels Engelen
Contact:

Re: Disconnecting D drives

Post by nielsengelen »

You can always contact support for insight however I don't think this is related due to Veeam but a reaction of the checkpoint towards the VM (which we do create though).
Personal blog: https://foonet.be
GitHub: https://github.com/nielsengelen
ESWIT
Lurker
Posts: 1
Liked: never
Joined: Nov 23, 2016 10:07 am
Full Name: ESW IT
Contact:

Re: Disconnecting D drives

Post by ESWIT »

Hi,

we have the same problem with a disconnected drive during a backup. We recently upgraded our HyperV infrastructure to 2016.
In our case the error occurs on a 2012R2 SQL Server VM with a non system disk. Seconds before the disk disappears we can see disk io retries on the disk.

____The IO operation at logical block address 0x0 for Disk 1 (PDO name: \Device\00000030) was retried.
____Faulting application name: sqlservr.exe, version: 0.0.0.0, time stamp: 0x5764deac (sql databases resides on the problematic disk)

the backup job on veeam runs fine without problems. It happens when the snapshot is closed/merged and it's very randomly. You can have no problems for weeks or it can happen 2 times a day (we backup our SQL server 4 times a day incremental)
I think this is more a hyperv2016 related problem as the original poster also have a 2016 infrastructure. We had no problems with 2012R2.

Dear Veeam support, maybe you can use your excellent connections to the HyperV Team from Microsoft if there is a 'known' problem which is not that public yet?

Thank you
Christoph Werner
willa.t
Novice
Posts: 6
Liked: 1 time
Joined: Nov 25, 2013 3:33 pm
Full Name: Thomas Willa
Contact:

Re: Disconnecting D drives

Post by willa.t »

Hello togther,

we have the same issue? You are already found an solution?

Thank, Thomas
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Disconnecting D drives

Post by foggy »

Hi Thomas, I don't see any single case number here, so cannot check. Please contact technical support directly.
SteelContainer
Service Provider
Posts: 146
Liked: 21 times
Joined: May 21, 2014 8:47 am
Location: New Zealand
Contact:

Re: Disconnecting D drives

Post by SteelContainer » 2 people like this post

Hi there,

Just confirming we are also having this issue. We recently updated some of our hosts from 2012 R2 to Server 2016. This issue only affects VMs running on the 2016 hosts and the guests affected so far run SQL and IIS so I don't think the issue is AD related.
We opened a case with Veeam #02433873 but they could not find anything in the logs. We currently have a case open with Microsoft.

Safe to say it's an issue with Hyper-V on Server 2016.
willa.t
Novice
Posts: 6
Liked: 1 time
Joined: Nov 25, 2013 3:33 pm
Full Name: Thomas Willa
Contact:

Re: Disconnecting D drives

Post by willa.t » 1 person likes this post

Hi Cullan,

I can confirm that the Issue is Windows Server 2016 related. We also have trouble with VM, which run with SQL Server Version 2014 to 2016. The problem is 100% not AD related, but it occurs when the checkpoint is merging. I was able to reproduce this issue with generating heavy IOs on a test VM and create / delete Checkpoints manuelly. Onward it dosen't matter on which Storage Subsystem the VM are located. We have this issue also with local SSD Storage.

I also think this is one more issue of Hyper-V 2016 and have opened a MS Support call, too :-)

If I have any news, I let you know.
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Disconnecting D drives

Post by foggy »

Thanks, guys, I'd appreciate if you share the results of your findings.
pterpumpkin
Enthusiast
Posts: 36
Liked: 4 times
Joined: Jun 14, 2016 9:36 am
Full Name: Pter Pumpkin
Contact:

Re: Disconnecting D drives

Post by pterpumpkin »

Hi all!

We're are having the exact same issue.

Very random, and very hard to replicate.

I have tried setting a low storage QoS (500 IOPS and ~10 MB/s max throughput) on the C:\ and D:\ drives of a VM that has been affected, then running IO Meter to simulate high disk IO/latency. The disk queue lengths were 150+, but when backing up the server we were unable to replicate the issue.

Anyone else had any luck?


Pter
willa.t
Novice
Posts: 6
Liked: 1 time
Joined: Nov 25, 2013 3:33 pm
Full Name: Thomas Willa
Contact:

Re: Disconnecting D drives

Post by willa.t »

Hi Peter,

please try the follow:
- Make Heavy IO
- Create Checkpoint manuel over VMM or Hyper-V
- Delete Checkpoint in the following 60 sec.
Now you should get the Disk Warning on the Disk with high IOs

Regrads, Thomas
pterpumpkin
Enthusiast
Posts: 36
Liked: 4 times
Joined: Jun 14, 2016 9:36 am
Full Name: Pter Pumpkin
Contact:

Re: Disconnecting D drives

Post by pterpumpkin »

Hi willa.t,

That is what i did, however i was unable to replicate.


Pter
willa.t
Novice
Posts: 6
Liked: 1 time
Joined: Nov 25, 2013 3:33 pm
Full Name: Thomas Willa
Contact:

Re: Disconnecting D drives

Post by willa.t »

Hi Peter,

is it possible to deactivate your QoS Storage Policy. I have to generate round about 20-30k IOPS for reproducing this issue

Thomas
pterpumpkin
Enthusiast
Posts: 36
Liked: 4 times
Joined: Jun 14, 2016 9:36 am
Full Name: Pter Pumpkin
Contact:

Re: Disconnecting D drives

Post by pterpumpkin »

Hi willa.t,

Tried disabling storage QoS.

We have been unable to replicate the issue still. We've tried generating 100K+ IOPS of all different block sizes and read/write ratios. The disk queue length & latency is very high, but we still can't reproduce the issue.

We have noticed that all but one of the servers this has happened to (about 10 in total now) have had either an MS database or an Oracle database on them. Unsure if related or coincidence.

We have an MS case open, but it's not getting very far as they can't replicate it either.

Has anyone else had this issue?!


Pter.
willa.t
Novice
Posts: 6
Liked: 1 time
Joined: Nov 25, 2013 3:33 pm
Full Name: Thomas Willa
Contact:

Re: Disconnecting D drives

Post by willa.t »

Hello Pter,

that is curious. In our szenario we can reproduce it on this way. Anyway I have good news from our MS Call. So the issue is confirmed from MS and they are working on it. Unfortunately it is categorized as High Risk Bug and it will takes time for testing. The final release date for this fix is planend on August.

Thomas
pterpumpkin
Enthusiast
Posts: 36
Liked: 4 times
Joined: Jun 14, 2016 9:36 am
Full Name: Pter Pumpkin
Contact:

Re: Disconnecting D drives

Post by pterpumpkin »

Hi willa.t,

We have a case open with Microsoft, but we have made zero progress with them.

Would it be possible to get your case number so that we can get them to cross reference?


Thanks!
Pter.
willa.t
Novice
Posts: 6
Liked: 1 time
Joined: Nov 25, 2013 3:33 pm
Full Name: Thomas Willa
Contact:

Re: Disconnecting D drives

Post by willa.t »

You have PM :-)
jason.
Novice
Posts: 3
Liked: never
Joined: Jul 11, 2018 4:24 am
Full Name: Jason
Contact:

Re: Disconnecting D drives

Post by jason. »

Hey everyone!!

Is there any word back from MS yet?

We've been lucky enough to has this issue too...

Tx
nmdange
Veteran
Posts: 527
Liked: 142 times
Joined: Aug 20, 2015 9:30 pm
Contact:

Re: Disconnecting D drives

Post by nmdange »

I've had this issue intermittently as well. I think the fix may have been included in the August 30th CU. I'll be applying the September CU soon so hopefully that will resolve the problem.

https://support.microsoft.com/en-us/help/4343884
Addresses an issue that occurs when performing backup of a Virtual Machine (VM) or removing VM snapshots in a guest OS on Windows Server 2016. A “153 disk error” appears because the I/O takes longer than expected.
Post Reply

Who is online

Users browsing this forum: No registered users and 40 guests