Hotadd causing issues during Replication

VMware specific discussions

Hotadd causing issues during Replication

Postby th83 » Tue Dec 06, 2011 2:47 am

I spent most of the weekend tweaking our new v6 installation and have it working great - my backup window has been cut more than in half :D , but there was an issue I ran into (and worked around) with hotadd for writes on the new replication jobs.

Symptoms: During the initiation of the replication job, about half the time, the proxy used for the destination portion of the replica job would 'lock up' for about 30-40 seconds while vCenter said 'Reconfiguring VM-Name' during the time the VM is locked up. During this time, the VM did not respond to commands and could not be pinged. When the hotadd completed about 30-40 seconds later, the VM would start responding again, but the replication job (as well as any other jobs running on that proxy) would fail with timeout errors. Only the destination proxy seemed to lock up, the source proxy never did.

My workaround: Once I figured out the issue, I configured one proxy to only use NBD mode and assigned that proxy as the destination proxy for my replication jobs. Since I did this, I have not seen the issue again.

My question is, has anybody else seen this? If so, is it an issue with vSphere, or with Veeam? I am running vSphere 5. Any input would be appreciated as I'd like to be able to use hotadd for writes. I didn't want to bother Support with this yet since I have a good workaround.

Thanks!
th83
Member
 
Posts: 11
Liked: never
Joined: Mon Oct 24, 2011 7:42 pm
Full Name: Tim Haner

Re: Hotadd causing issues during Replication

Postby foggy » Tue Dec 06, 2011 9:56 am

Tim, the first thought is some environmental issue (slow datastore, for example). You can check how long it takes to mount the same disk manually to confirm or eliminate this assumption. Further, I would still suggest to investigate this with support. Thanks.
foggy
Veeam Software
 
Posts: 2390
Liked: 103 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: Hotadd causing issues during Replication

Postby tsightler » Tue Dec 06, 2011 1:19 pm

Hi Tim. I have seen exactly this behavior in my lab as well, but haven't yet discovered the exact nature of it, although I have my suspicions. I don't actually see the issue all the time, but once it starts with a particular VM, it seems that VM fails consistently. I would strongly suggest opening a support case for investigation since you are seeing the issue out in the wild.

My observed behavior is exactly the same as what you see, when using hotadd, the target will become unresponsive for anywhere from 10-30 seconds, which causes one of two failures, either the Veeam server will report "connection reset by peer" or the job will fail with a VDDK error. I'm not 100% sure, but in my lab it seemed to always start if a previous replication had failed for some reason. I'm also suspicious that it might have something to do with the number of restore points, their size, and the performance of the underlying disk. Also, as you noted, it only affect hotadd, other modes work great.

I'm also suspicious that it may only affect W2K8/W2K8R2, and not W2K3 servers.
tsightler
Veeam MVP
 
Posts: 2406
Liked: 402 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: Hotadd causing issues during Replication

Postby jyounght » Mon Dec 12, 2011 9:43 pm

I noticed this time out too. I am trying out your fix now. I would have a job randomly work but most replication would fail with the following error. Then all the other servers on the job would fail. Very frustrating... Support has my logs since Friday and I am waiting for a response.
jyounght
Novice
 
Posts: 5
Liked: never
Joined: Mon Dec 12, 2011 6:40 pm
Full Name: Justin young

Re: Hotadd causing issues during Replication

Postby -Gertjan- » Tue Dec 13, 2011 10:45 am

th83 wrote:Symptoms: During the initiation of the replication job, about half the time, the proxy used for the destination portion of the replica job would 'lock up' for about 30-40 seconds while vCenter said 'Reconfiguring VM-Name' during the time the VM is locked up. During this time, the VM did not respond to commands and could not be pinged. When the hotadd completed about 30-40 seconds later, the VM would start responding again, but the replication job (as well as any other jobs running on that proxy) would fail with timeout errors. Only the destination proxy seemed to lock up, the source proxy never did.

My workaround: Once I figured out the issue, I configured one proxy to only use NBD mode and assigned that proxy as the destination proxy for my replication jobs. Since I did this, I have not seen the issue again.

My question is, has anybody else seen this? If so, is it an issue with vSphere, or with Veeam? I am running vSphere 5. Any input would be appreciated as I'd like to be able to use hotadd for writes. I didn't want to bother Support with this yet since I have a good workaround.

Thanks!


Exactly the same issue in my lab, What is your proxy OS? Mine is windows XP
Switching to NDB solved this for me.
-Gertjan-
Novice
 
Posts: 3
Liked: never
Joined: Tue Dec 13, 2011 10:30 am
Full Name: Gertjan

Re: Hotadd causing issues during Replication

Postby jyounght » Tue Dec 13, 2011 1:44 pm

Running Server 2003R2 on both proxy agents and 2008R2 as the main veeam server. I have Vmware 4.1 esxi on all servers (Standard License).
jyounght
Novice
 
Posts: 5
Liked: never
Joined: Mon Dec 12, 2011 6:40 pm
Full Name: Justin young

Re: Hotadd causing issues during Replication

Postby jyounght » Tue Dec 13, 2011 1:48 pm

Ticket ID#5159767 Is the ticket I already opened before I found this thread. They said I am seeing disconnects from the logs which is correct except Its the proxy that disconnects
jyounght
Novice
 
Posts: 5
Liked: never
Joined: Mon Dec 12, 2011 6:40 pm
Full Name: Justin young

Re: Hotadd causing issues during Replication

Postby th83 » Wed Dec 14, 2011 12:37 am

-Gertjan- wrote:Exactly the same issue in my lab, What is your proxy OS? Mine is windows XP
Switching to NDB solved this for me.

I tried Server 2008 R2 (obviously 64 bit) and Server 2003 (32 bit) as the target proxies with the same result. I only had to change the proxy used as the destination proxy to NBD to fix the error. I've been running hotadd for the 'read' side of the replication for a while now and have not seen any errors, but when the 'write' proxy uses hotadd, it fails almost every time. Very strange.
th83
Member
 
Posts: 11
Liked: never
Joined: Mon Oct 24, 2011 7:42 pm
Full Name: Tim Haner

Re: Hotadd causing issues during Replication

Postby jyounght » Wed Dec 14, 2011 1:02 am

I have a seed job running until tomorrow so I can not test. This is what support suggested. If anyone else wants to test before I can please post your results.

It appears that the communications on your host was having difficulty.
You can restart the management agents which oftentimes corrects this.
See this KB for instructions on how to do this:
http://kb.vmware.com/selfservice/micros ... Id=1003490
Note that doing this will not affect any running VMs but will affect any Veeam jobs that are running.
jyounght
Novice
 
Posts: 5
Liked: never
Joined: Mon Dec 12, 2011 6:40 pm
Full Name: Justin young

Re: Hotadd causing issues during Replication

Postby skoch » Thu Dec 15, 2011 5:01 pm

Add me to the list of people having the issue. I have also opened a ticket with Veeam. What is everyone using for storage for their destination proxy (Fiber, iSCSI, onboard)?

Environment:

Windows 2008 R2 for source and target proxy.
Target vSphere storage is all iSCSI
vSphere 4.1
skoch
Member
 
Posts: 11
Liked: 1 time
Joined: Mon Jun 20, 2011 5:12 pm

Re: Hotadd causing issues during Replication

Postby th83 » Fri Dec 16, 2011 1:50 am

skoch wrote:Add me to the list of people having the issue. I have also opened a ticket with Veeam. What is everyone using for storage for their destination proxy (Fiber, iSCSI, onboard)?

Environment:

Windows 2008 R2 for source and target proxy.
Target vSphere storage is all iSCSI
vSphere 4.1

iSCSI at both the source and target
Proxy servers running Server 2008 R2
Target storage is a Starwind iSCSI target running on Server 2003, source storage is Dell Equallogic.
vSphere 5

I have not opened a ticket with Veeam because I simply don't have the time right now and changing the target proxy to NBD fixed it for now.
th83
Member
 
Posts: 11
Liked: never
Joined: Mon Oct 24, 2011 7:42 pm
Full Name: Tim Haner

Re: Hotadd causing issues during Replication

Postby Gostev » Fri Dec 16, 2011 2:24 pm

We are ready to investigate this with anyone willing to spend some time working with our support (webex would be ideal). The more people will open support cases, the better - may help to track down similarities. Thanks!
Gostev
Veeam Software
 
Posts: 12927
Liked: 315 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: Hotadd causing issues during Replication

Postby skoch » Fri Dec 16, 2011 3:49 pm

Gostev wrote:We are ready to investigate this with anyone willing to spend some time working with our support (webex would be ideal). The more people will open support cases, the better - may help to track down similarities. Thanks!


I was just working with Tim (Ticket# 5160363). We did a webex and he took some screenshots of the behavior. Hopefully this helps in troubleshooting the cause. If you need more info or a guinea pig, just let me know.
skoch
Member
 
Posts: 11
Liked: 1 time
Joined: Mon Jun 20, 2011 5:12 pm

Re: Hotadd causing issues during Replication

Postby Gostev » Tue Dec 27, 2011 4:26 pm

Based on investigation of this issue, the immediately available workaround is to disable CBT on Veeam backup proxy VM. We have opened a support case with VMware as well, since this issue ( long VM stun on hot add operation) is easily confirmed without our software present.
Gostev
Veeam Software
 
Posts: 12927
Liked: 315 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: Hotadd causing issues during Replication

Postby skoch » Tue Dec 27, 2011 4:39 pm

Gostev wrote:Based on investigation of this issue, the immediately available workaround is to disable CBT on Veeam backup proxy VM. We have opened a support case with VMware as well, since this issue (super long VM freeze on hot add) can be easily confirmed without our software present.

Any idea why the issue only appears on the destination proxy and not the source proxy?
skoch
Member
 
Posts: 11
Liked: 1 time
Joined: Mon Jun 20, 2011 5:12 pm

Next

Return to VMware vSphere



Who is online

Users browsing this forum: No registered users and 10 guests