Host-based backup of VMware vSphere VMs.
Post Reply
th83
Influencer
Posts: 11
Liked: never
Joined: Oct 24, 2011 7:42 pm
Full Name: Tim Haner
Contact:

Hotadd causing issues during Replication

Post by th83 »

I spent most of the weekend tweaking our new v6 installation and have it working great - my backup window has been cut more than in half :D , but there was an issue I ran into (and worked around) with hotadd for writes on the new replication jobs.

Symptoms: During the initiation of the replication job, about half the time, the proxy used for the destination portion of the replica job would 'lock up' for about 30-40 seconds while vCenter said 'Reconfiguring VM-Name' during the time the VM is locked up. During this time, the VM did not respond to commands and could not be pinged. When the hotadd completed about 30-40 seconds later, the VM would start responding again, but the replication job (as well as any other jobs running on that proxy) would fail with timeout errors. Only the destination proxy seemed to lock up, the source proxy never did.

My workaround: Once I figured out the issue, I configured one proxy to only use NBD mode and assigned that proxy as the destination proxy for my replication jobs. Since I did this, I have not seen the issue again.

My question is, has anybody else seen this? If so, is it an issue with vSphere, or with Veeam? I am running vSphere 5. Any input would be appreciated as I'd like to be able to use hotadd for writes. I didn't want to bother Support with this yet since I have a good workaround.

Thanks!
foggy
Veeam Software
Posts: 21139
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Hotadd causing issues during Replication

Post by foggy »

Tim, the first thought is some environmental issue (slow datastore, for example). You can check how long it takes to mount the same disk manually to confirm or eliminate this assumption. Further, I would still suggest to investigate this with support. Thanks.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Hotadd causing issues during Replication

Post by tsightler »

Hi Tim. I have seen exactly this behavior in my lab as well, but haven't yet discovered the exact nature of it, although I have my suspicions. I don't actually see the issue all the time, but once it starts with a particular VM, it seems that VM fails consistently. I would strongly suggest opening a support case for investigation since you are seeing the issue out in the wild.

My observed behavior is exactly the same as what you see, when using hotadd, the target will become unresponsive for anywhere from 10-30 seconds, which causes one of two failures, either the Veeam server will report "connection reset by peer" or the job will fail with a VDDK error. I'm not 100% sure, but in my lab it seemed to always start if a previous replication had failed for some reason. I'm also suspicious that it might have something to do with the number of restore points, their size, and the performance of the underlying disk. Also, as you noted, it only affect hotadd, other modes work great.

I'm also suspicious that it may only affect W2K8/W2K8R2, and not W2K3 servers.
jyounght
Novice
Posts: 5
Liked: never
Joined: Dec 12, 2011 6:40 pm
Full Name: Justin young

Re: Hotadd causing issues during Replication

Post by jyounght »

I noticed this time out too. I am trying out your fix now. I would have a job randomly work but most replication would fail with the following error. Then all the other servers on the job would fail. Very frustrating... Support has my logs since Friday and I am waiting for a response.
-Gertjan-
Novice
Posts: 3
Liked: never
Joined: Dec 13, 2011 10:30 am
Full Name: Gertjan

Re: Hotadd causing issues during Replication

Post by -Gertjan- »

th83 wrote: Symptoms: During the initiation of the replication job, about half the time, the proxy used for the destination portion of the replica job would 'lock up' for about 30-40 seconds while vCenter said 'Reconfiguring VM-Name' during the time the VM is locked up. During this time, the VM did not respond to commands and could not be pinged. When the hotadd completed about 30-40 seconds later, the VM would start responding again, but the replication job (as well as any other jobs running on that proxy) would fail with timeout errors. Only the destination proxy seemed to lock up, the source proxy never did.

My workaround: Once I figured out the issue, I configured one proxy to only use NBD mode and assigned that proxy as the destination proxy for my replication jobs. Since I did this, I have not seen the issue again.

My question is, has anybody else seen this? If so, is it an issue with vSphere, or with Veeam? I am running vSphere 5. Any input would be appreciated as I'd like to be able to use hotadd for writes. I didn't want to bother Support with this yet since I have a good workaround.

Thanks!
Exactly the same issue in my lab, What is your proxy OS? Mine is windows XP
Switching to NDB solved this for me.
jyounght
Novice
Posts: 5
Liked: never
Joined: Dec 12, 2011 6:40 pm
Full Name: Justin young

Re: Hotadd causing issues during Replication

Post by jyounght »

Running Server 2003R2 on both proxy agents and 2008R2 as the main veeam server. I have Vmware 4.1 esxi on all servers (Standard License).
jyounght
Novice
Posts: 5
Liked: never
Joined: Dec 12, 2011 6:40 pm
Full Name: Justin young

Re: Hotadd causing issues during Replication

Post by jyounght »

Ticket ID#5159767 Is the ticket I already opened before I found this thread. They said I am seeing disconnects from the logs which is correct except Its the proxy that disconnects
th83
Influencer
Posts: 11
Liked: never
Joined: Oct 24, 2011 7:42 pm
Full Name: Tim Haner
Contact:

Re: Hotadd causing issues during Replication

Post by th83 »

-Gertjan- wrote:Exactly the same issue in my lab, What is your proxy OS? Mine is windows XP
Switching to NDB solved this for me.
I tried Server 2008 R2 (obviously 64 bit) and Server 2003 (32 bit) as the target proxies with the same result. I only had to change the proxy used as the destination proxy to NBD to fix the error. I've been running hotadd for the 'read' side of the replication for a while now and have not seen any errors, but when the 'write' proxy uses hotadd, it fails almost every time. Very strange.
jyounght
Novice
Posts: 5
Liked: never
Joined: Dec 12, 2011 6:40 pm
Full Name: Justin young

Re: Hotadd causing issues during Replication

Post by jyounght »

I have a seed job running until tomorrow so I can not test. This is what support suggested. If anyone else wants to test before I can please post your results.

It appears that the communications on your host was having difficulty.
You can restart the management agents which oftentimes corrects this.
See this KB for instructions on how to do this:
http://kb.vmware.com/selfservice/micros ... Id=1003490
Note that doing this will not affect any running VMs but will affect any Veeam jobs that are running.
skoch
Influencer
Posts: 11
Liked: 1 time
Joined: Jun 20, 2011 5:12 pm
Contact:

Re: Hotadd causing issues during Replication

Post by skoch »

Add me to the list of people having the issue. I have also opened a ticket with Veeam. What is everyone using for storage for their destination proxy (Fiber, iSCSI, onboard)?

Environment:

Windows 2008 R2 for source and target proxy.
Target vSphere storage is all iSCSI
vSphere 4.1
th83
Influencer
Posts: 11
Liked: never
Joined: Oct 24, 2011 7:42 pm
Full Name: Tim Haner
Contact:

Re: Hotadd causing issues during Replication

Post by th83 »

skoch wrote:Add me to the list of people having the issue. I have also opened a ticket with Veeam. What is everyone using for storage for their destination proxy (Fiber, iSCSI, onboard)?

Environment:

Windows 2008 R2 for source and target proxy.
Target vSphere storage is all iSCSI
vSphere 4.1
iSCSI at both the source and target
Proxy servers running Server 2008 R2
Target storage is a Starwind iSCSI target running on Server 2003, source storage is Dell Equallogic.
vSphere 5

I have not opened a ticket with Veeam because I simply don't have the time right now and changing the target proxy to NBD fixed it for now.
Gostev
Chief Product Officer
Posts: 31807
Liked: 7300 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Hotadd causing issues during Replication

Post by Gostev »

We are ready to investigate this with anyone willing to spend some time working with our support (webex would be ideal). The more people will open support cases, the better - may help to track down similarities. Thanks!
skoch
Influencer
Posts: 11
Liked: 1 time
Joined: Jun 20, 2011 5:12 pm
Contact:

Re: Hotadd causing issues during Replication

Post by skoch »

Gostev wrote:We are ready to investigate this with anyone willing to spend some time working with our support (webex would be ideal). The more people will open support cases, the better - may help to track down similarities. Thanks!
I was just working with Tim (Ticket# 5160363). We did a webex and he took some screenshots of the behavior. Hopefully this helps in troubleshooting the cause. If you need more info or a guinea pig, just let me know.
Gostev
Chief Product Officer
Posts: 31807
Liked: 7300 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Hotadd causing issues during Replication

Post by Gostev »

Based on investigation of this issue, the immediately available workaround is to disable CBT on Veeam backup proxy VM. We have opened a support case with VMware as well, since this issue ( long VM stun on hot add operation) is easily confirmed without our software present.
skoch
Influencer
Posts: 11
Liked: 1 time
Joined: Jun 20, 2011 5:12 pm
Contact:

Re: Hotadd causing issues during Replication

Post by skoch »

Gostev wrote:Based on investigation of this issue, the immediately available workaround is to disable CBT on Veeam backup proxy VM. We have opened a support case with VMware as well, since this issue (super long VM freeze on hot add) can be easily confirmed without our software present.
Any idea why the issue only appears on the destination proxy and not the source proxy?
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Hotadd causing issues during Replication

Post by tsightler »

In my experience, the longer/larger the snapshot chain, the longer the delay. Since the source side doesn't typically have snapshots there is no delay, however, the destination side typically has a snapshot chain for restore points and the longer and larger those are then longer the delay is. It's good to know that disabling CBT will work around this as that's pretty reasonable as long as it's a dedicated proxy.
Gostev
Chief Product Officer
Posts: 31807
Liked: 7300 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Hotadd causing issues during Replication

Post by Gostev »

skoch wrote:Any idea why the issue only appears on the destination proxy and not the source proxy?
Actually, I've seen the issue affecting both sides, although admittedly source is more of an exception. What happens is VMware trying to initialize CBT for all hot added disks, including their snapshots. For some reason, it tries to do them all at once (design flaw), so in some case the backup proxy VM may remain stunned for minutes - depending on the amount of disks (and snapshots) hot added.
Gostev
Chief Product Officer
Posts: 31807
Liked: 7300 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Hotadd causing issues during Replication

Post by Gostev »

tsightler wrote:It's good to know that disabling CBT will work around this as that's pretty reasonable as long as it's a dedicated proxy.
That, plus for replication specifically, you can simply force the target backup proxy to use network processing mode.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Hotadd causing issues during Replication

Post by tsightler »

Yep, that works great for probably 90% of replication scenarios where inter-site bandwidth is already the limiting factor. But that's a little more limiting for high change rate, high bandwidth setups with lots of servers to replicate. In those cases hotadd is still generally quite a bit faster, but those are also the cases that are generally using dedicated proxies on the target so they now have the option of either continuing to use network mode, or disabling CBT and going back to hotadd. I have one client that will definitely be happy with the second option as they were seeing 2X the performance with hotadd on the target.
pufferdude
Expert
Posts: 223
Liked: 15 times
Joined: Jul 02, 2009 8:26 pm
Full Name: Jim
Contact:

Re: Hotadd causing issues during Replication

Post by pufferdude »

I'm adding my experience with this issue here in case it's relevant... Yesterday I set up remote replication of a VM with v6 for the first time. The remote proxy is on Win2008R2 doing hotadd to local ESXi 4.1 storage, and the initial replication went fine as well as a manual subsequent delta. I then left the job to run automatically early the next morning. Today I looked at my alerts and saw that the job had failed with the dreaded "connection forcibly reset..." error.

The only thing that changed between the initial replication/first delta and the next run was that in between those times I'd actually *backed up* the proxy VM on the remote site (backed it up to the main site, because it's not a dedicated proxy VM and contains data I want to back up off-site). I assumed something "changed" in the proxy VM as a result of backing it up and that's what caused the subsequent replications using *it* as a proxy to fail...

Then I found this thread, set the remote proxy to NBD-only, and voila, replication is working again. So now I don't know if the replication failure was just a coincidence and would have happened regardless of whether I'd backed-up said proxy, but thought I'd mention it in case it's somehow related.
MattR
Influencer
Posts: 10
Liked: never
Joined: May 23, 2011 1:33 pm
Full Name: Matt
Contact:

Re: Hotadd causing issues during Replication

Post by MattR »

Add me to the list. From my experience it is not causing jobs to fail unless they are to a WAN target (target proxy is used for WAN jobs) or a file copy job, which are more sensitive to interruptions in the job. Setup includes:

Same Source and target Proxy (virtual on ESX ServerA) - Veeam on Win2008 SP2
Datastore = SAS SAN w/ 10krpm
Target storage = local datastore w/ 7200RPM disks

Symptoms:
Job gets past create helper snapshot and begins to add the disk at which point the system stun/freeze occurs
Only happens for replications from SAN to ESX ServerA not to other ESX servers

Troubleshooting Efforts:
Restarted Management Agent on ESX ServerA - Not fixed.
Changed proxy to Network Mode to remove hotadd and establish NBD - Job indicates nbd only and successful completion with no stun/freeze
Changed proxy to Virtual Appliance instead of automatic to re-establish hotadd - Job indicates hotadd;nbd and returns to stun/freeze

I will leave at Network Mode and await an update.

--
agentduke
Lurker
Posts: 2
Liked: never
Joined: Sep 20, 2011 1:12 am
Full Name: AgentDuke

Re: Hotadd causing issues during Replication

Post by agentduke »

I am having this same error
"Error: Client error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ip x.x.x.x:2501"

But not for the same reason as others I think. I am trying to use the ESX server as a repository. When I use a windows repository, everything works fine. And this also worked fine on version 5.x. This only started right after the upgrade to v6. I have even reinstalled v6 from scratch with same results. Somehow using an ESX repository gives this error (and if I try to add it as a linux repository after it already exists in the tree under the vCenter creates a whole world of new issues).

I have tried all the workarounds on this topic with no success. I have even updated the esx 4.1u2 to the latest version (version 500K something). vCenter is 4.1. Also updated the latest v6 patch with no success. Any suggestions?
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Hotadd causing issues during Replication

Post by tsightler »

Using ESX as a backup target is no longer support with V6. I would think you could add it as a Linux box, perhaps by IP address, or with an alternate DNS name entered in the hosts file on the Veeam server, or perhaps with a secondary management nic. If you tried to add it with the same name I can certainly understand that it might cause a lot of trouble.
kevvy_wilch
Novice
Posts: 3
Liked: never
Joined: Nov 02, 2011 4:19 pm
Full Name: Kevin Wiltshire
Contact:

Re: Hotadd causing issues during Replication

Post by kevvy_wilch »

agentduke wrote:I am having this same error
"Error: Client error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ip x.x.x.x:2501"

But not for the same reason as others I think. I am trying to use the ESX server as a repository. When I use a windows repository, everything works fine. And this also worked fine on version 5.x. This only started right after the upgrade to v6. I have even reinstalled v6 from scratch with same results. Somehow using an ESX repository gives this error (and if I try to add it as a linux repository after it already exists in the tree under the vCenter creates a whole world of new issues).

I have tried all the workarounds on this topic with no success. I have even updated the esx 4.1u2 to the latest version (version 500K something). vCenter is 4.1. Also updated the latest v6 patch with no success. Any suggestions?
Hi, I'm having the same issues as you, appeared after upgrading to V6. Have you managed to work around the issues?

Thanks

Kevin
Gostev
Chief Product Officer
Posts: 31807
Liked: 7300 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Hotadd causing issues during Replication

Post by Gostev »

The issue discussed in this topic was addressed in v6 Patch #3. Thanks.
mquinn
Lurker
Posts: 2
Liked: never
Joined: Nov 01, 2012 2:49 pm
Full Name: Mike Quinn
Contact:

Re: Hotadd causing issues during Replication

Post by mquinn »

I don't mean to bump an old thread but seeing this problem on Veeam 7 today.
foggy
Veeam Software
Posts: 21139
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Hotadd causing issues during Replication

Post by foggy »

Mike, have you contacted support already?
Post Reply

Who is online

Users browsing this forum: saurabh.jain, Semrush [Bot] and 80 guests