-
- Influencer
- Posts: 11
- Liked: never
- Joined: Oct 24, 2011 7:42 pm
- Full Name: Tim Haner
- Contact:
Hotadd causing issues during Replication
I spent most of the weekend tweaking our new v6 installation and have it working great - my backup window has been cut more than in half , but there was an issue I ran into (and worked around) with hotadd for writes on the new replication jobs.
Symptoms: During the initiation of the replication job, about half the time, the proxy used for the destination portion of the replica job would 'lock up' for about 30-40 seconds while vCenter said 'Reconfiguring VM-Name' during the time the VM is locked up. During this time, the VM did not respond to commands and could not be pinged. When the hotadd completed about 30-40 seconds later, the VM would start responding again, but the replication job (as well as any other jobs running on that proxy) would fail with timeout errors. Only the destination proxy seemed to lock up, the source proxy never did.
My workaround: Once I figured out the issue, I configured one proxy to only use NBD mode and assigned that proxy as the destination proxy for my replication jobs. Since I did this, I have not seen the issue again.
My question is, has anybody else seen this? If so, is it an issue with vSphere, or with Veeam? I am running vSphere 5. Any input would be appreciated as I'd like to be able to use hotadd for writes. I didn't want to bother Support with this yet since I have a good workaround.
Thanks!
Symptoms: During the initiation of the replication job, about half the time, the proxy used for the destination portion of the replica job would 'lock up' for about 30-40 seconds while vCenter said 'Reconfiguring VM-Name' during the time the VM is locked up. During this time, the VM did not respond to commands and could not be pinged. When the hotadd completed about 30-40 seconds later, the VM would start responding again, but the replication job (as well as any other jobs running on that proxy) would fail with timeout errors. Only the destination proxy seemed to lock up, the source proxy never did.
My workaround: Once I figured out the issue, I configured one proxy to only use NBD mode and assigned that proxy as the destination proxy for my replication jobs. Since I did this, I have not seen the issue again.
My question is, has anybody else seen this? If so, is it an issue with vSphere, or with Veeam? I am running vSphere 5. Any input would be appreciated as I'd like to be able to use hotadd for writes. I didn't want to bother Support with this yet since I have a good workaround.
Thanks!
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Hotadd causing issues during Replication
Tim, the first thought is some environmental issue (slow datastore, for example). You can check how long it takes to mount the same disk manually to confirm or eliminate this assumption. Further, I would still suggest to investigate this with support. Thanks.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Hotadd causing issues during Replication
Hi Tim. I have seen exactly this behavior in my lab as well, but haven't yet discovered the exact nature of it, although I have my suspicions. I don't actually see the issue all the time, but once it starts with a particular VM, it seems that VM fails consistently. I would strongly suggest opening a support case for investigation since you are seeing the issue out in the wild.
My observed behavior is exactly the same as what you see, when using hotadd, the target will become unresponsive for anywhere from 10-30 seconds, which causes one of two failures, either the Veeam server will report "connection reset by peer" or the job will fail with a VDDK error. I'm not 100% sure, but in my lab it seemed to always start if a previous replication had failed for some reason. I'm also suspicious that it might have something to do with the number of restore points, their size, and the performance of the underlying disk. Also, as you noted, it only affect hotadd, other modes work great.
I'm also suspicious that it may only affect W2K8/W2K8R2, and not W2K3 servers.
My observed behavior is exactly the same as what you see, when using hotadd, the target will become unresponsive for anywhere from 10-30 seconds, which causes one of two failures, either the Veeam server will report "connection reset by peer" or the job will fail with a VDDK error. I'm not 100% sure, but in my lab it seemed to always start if a previous replication had failed for some reason. I'm also suspicious that it might have something to do with the number of restore points, their size, and the performance of the underlying disk. Also, as you noted, it only affect hotadd, other modes work great.
I'm also suspicious that it may only affect W2K8/W2K8R2, and not W2K3 servers.
-
- Novice
- Posts: 5
- Liked: never
- Joined: Dec 12, 2011 6:40 pm
- Full Name: Justin young
Re: Hotadd causing issues during Replication
I noticed this time out too. I am trying out your fix now. I would have a job randomly work but most replication would fail with the following error. Then all the other servers on the job would fail. Very frustrating... Support has my logs since Friday and I am waiting for a response.
-
- Novice
- Posts: 3
- Liked: never
- Joined: Dec 13, 2011 10:30 am
- Full Name: Gertjan
Re: Hotadd causing issues during Replication
Exactly the same issue in my lab, What is your proxy OS? Mine is windows XPth83 wrote: Symptoms: During the initiation of the replication job, about half the time, the proxy used for the destination portion of the replica job would 'lock up' for about 30-40 seconds while vCenter said 'Reconfiguring VM-Name' during the time the VM is locked up. During this time, the VM did not respond to commands and could not be pinged. When the hotadd completed about 30-40 seconds later, the VM would start responding again, but the replication job (as well as any other jobs running on that proxy) would fail with timeout errors. Only the destination proxy seemed to lock up, the source proxy never did.
My workaround: Once I figured out the issue, I configured one proxy to only use NBD mode and assigned that proxy as the destination proxy for my replication jobs. Since I did this, I have not seen the issue again.
My question is, has anybody else seen this? If so, is it an issue with vSphere, or with Veeam? I am running vSphere 5. Any input would be appreciated as I'd like to be able to use hotadd for writes. I didn't want to bother Support with this yet since I have a good workaround.
Thanks!
Switching to NDB solved this for me.
-
- Novice
- Posts: 5
- Liked: never
- Joined: Dec 12, 2011 6:40 pm
- Full Name: Justin young
Re: Hotadd causing issues during Replication
Running Server 2003R2 on both proxy agents and 2008R2 as the main veeam server. I have Vmware 4.1 esxi on all servers (Standard License).
-
- Novice
- Posts: 5
- Liked: never
- Joined: Dec 12, 2011 6:40 pm
- Full Name: Justin young
Re: Hotadd causing issues during Replication
Ticket ID#5159767 Is the ticket I already opened before I found this thread. They said I am seeing disconnects from the logs which is correct except Its the proxy that disconnects
-
- Influencer
- Posts: 11
- Liked: never
- Joined: Oct 24, 2011 7:42 pm
- Full Name: Tim Haner
- Contact:
Re: Hotadd causing issues during Replication
I tried Server 2008 R2 (obviously 64 bit) and Server 2003 (32 bit) as the target proxies with the same result. I only had to change the proxy used as the destination proxy to NBD to fix the error. I've been running hotadd for the 'read' side of the replication for a while now and have not seen any errors, but when the 'write' proxy uses hotadd, it fails almost every time. Very strange.-Gertjan- wrote:Exactly the same issue in my lab, What is your proxy OS? Mine is windows XP
Switching to NDB solved this for me.
-
- Novice
- Posts: 5
- Liked: never
- Joined: Dec 12, 2011 6:40 pm
- Full Name: Justin young
Re: Hotadd causing issues during Replication
I have a seed job running until tomorrow so I can not test. This is what support suggested. If anyone else wants to test before I can please post your results.
It appears that the communications on your host was having difficulty.
You can restart the management agents which oftentimes corrects this.
See this KB for instructions on how to do this:
http://kb.vmware.com/selfservice/micros ... Id=1003490
Note that doing this will not affect any running VMs but will affect any Veeam jobs that are running.
It appears that the communications on your host was having difficulty.
You can restart the management agents which oftentimes corrects this.
See this KB for instructions on how to do this:
http://kb.vmware.com/selfservice/micros ... Id=1003490
Note that doing this will not affect any running VMs but will affect any Veeam jobs that are running.
-
- Influencer
- Posts: 11
- Liked: 1 time
- Joined: Jun 20, 2011 5:12 pm
- Contact:
Re: Hotadd causing issues during Replication
Add me to the list of people having the issue. I have also opened a ticket with Veeam. What is everyone using for storage for their destination proxy (Fiber, iSCSI, onboard)?
Environment:
Windows 2008 R2 for source and target proxy.
Target vSphere storage is all iSCSI
vSphere 4.1
Environment:
Windows 2008 R2 for source and target proxy.
Target vSphere storage is all iSCSI
vSphere 4.1
-
- Influencer
- Posts: 11
- Liked: never
- Joined: Oct 24, 2011 7:42 pm
- Full Name: Tim Haner
- Contact:
Re: Hotadd causing issues during Replication
iSCSI at both the source and targetskoch wrote:Add me to the list of people having the issue. I have also opened a ticket with Veeam. What is everyone using for storage for their destination proxy (Fiber, iSCSI, onboard)?
Environment:
Windows 2008 R2 for source and target proxy.
Target vSphere storage is all iSCSI
vSphere 4.1
Proxy servers running Server 2008 R2
Target storage is a Starwind iSCSI target running on Server 2003, source storage is Dell Equallogic.
vSphere 5
I have not opened a ticket with Veeam because I simply don't have the time right now and changing the target proxy to NBD fixed it for now.
-
- Chief Product Officer
- Posts: 31789
- Liked: 7291 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Hotadd causing issues during Replication
We are ready to investigate this with anyone willing to spend some time working with our support (webex would be ideal). The more people will open support cases, the better - may help to track down similarities. Thanks!
-
- Influencer
- Posts: 11
- Liked: 1 time
- Joined: Jun 20, 2011 5:12 pm
- Contact:
Re: Hotadd causing issues during Replication
I was just working with Tim (Ticket# 5160363). We did a webex and he took some screenshots of the behavior. Hopefully this helps in troubleshooting the cause. If you need more info or a guinea pig, just let me know.Gostev wrote:We are ready to investigate this with anyone willing to spend some time working with our support (webex would be ideal). The more people will open support cases, the better - may help to track down similarities. Thanks!
-
- Chief Product Officer
- Posts: 31789
- Liked: 7291 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Hotadd causing issues during Replication
Based on investigation of this issue, the immediately available workaround is to disable CBT on Veeam backup proxy VM. We have opened a support case with VMware as well, since this issue ( long VM stun on hot add operation) is easily confirmed without our software present.
-
- Influencer
- Posts: 11
- Liked: 1 time
- Joined: Jun 20, 2011 5:12 pm
- Contact:
Re: Hotadd causing issues during Replication
Any idea why the issue only appears on the destination proxy and not the source proxy?Gostev wrote:Based on investigation of this issue, the immediately available workaround is to disable CBT on Veeam backup proxy VM. We have opened a support case with VMware as well, since this issue (super long VM freeze on hot add) can be easily confirmed without our software present.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Hotadd causing issues during Replication
In my experience, the longer/larger the snapshot chain, the longer the delay. Since the source side doesn't typically have snapshots there is no delay, however, the destination side typically has a snapshot chain for restore points and the longer and larger those are then longer the delay is. It's good to know that disabling CBT will work around this as that's pretty reasonable as long as it's a dedicated proxy.
-
- Chief Product Officer
- Posts: 31789
- Liked: 7291 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Hotadd causing issues during Replication
Actually, I've seen the issue affecting both sides, although admittedly source is more of an exception. What happens is VMware trying to initialize CBT for all hot added disks, including their snapshots. For some reason, it tries to do them all at once (design flaw), so in some case the backup proxy VM may remain stunned for minutes - depending on the amount of disks (and snapshots) hot added.skoch wrote:Any idea why the issue only appears on the destination proxy and not the source proxy?
-
- Chief Product Officer
- Posts: 31789
- Liked: 7291 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Hotadd causing issues during Replication
That, plus for replication specifically, you can simply force the target backup proxy to use network processing mode.tsightler wrote:It's good to know that disabling CBT will work around this as that's pretty reasonable as long as it's a dedicated proxy.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Hotadd causing issues during Replication
Yep, that works great for probably 90% of replication scenarios where inter-site bandwidth is already the limiting factor. But that's a little more limiting for high change rate, high bandwidth setups with lots of servers to replicate. In those cases hotadd is still generally quite a bit faster, but those are also the cases that are generally using dedicated proxies on the target so they now have the option of either continuing to use network mode, or disabling CBT and going back to hotadd. I have one client that will definitely be happy with the second option as they were seeing 2X the performance with hotadd on the target.
-
- Expert
- Posts: 223
- Liked: 15 times
- Joined: Jul 02, 2009 8:26 pm
- Full Name: Jim
- Contact:
Re: Hotadd causing issues during Replication
I'm adding my experience with this issue here in case it's relevant... Yesterday I set up remote replication of a VM with v6 for the first time. The remote proxy is on Win2008R2 doing hotadd to local ESXi 4.1 storage, and the initial replication went fine as well as a manual subsequent delta. I then left the job to run automatically early the next morning. Today I looked at my alerts and saw that the job had failed with the dreaded "connection forcibly reset..." error.
The only thing that changed between the initial replication/first delta and the next run was that in between those times I'd actually *backed up* the proxy VM on the remote site (backed it up to the main site, because it's not a dedicated proxy VM and contains data I want to back up off-site). I assumed something "changed" in the proxy VM as a result of backing it up and that's what caused the subsequent replications using *it* as a proxy to fail...
Then I found this thread, set the remote proxy to NBD-only, and voila, replication is working again. So now I don't know if the replication failure was just a coincidence and would have happened regardless of whether I'd backed-up said proxy, but thought I'd mention it in case it's somehow related.
The only thing that changed between the initial replication/first delta and the next run was that in between those times I'd actually *backed up* the proxy VM on the remote site (backed it up to the main site, because it's not a dedicated proxy VM and contains data I want to back up off-site). I assumed something "changed" in the proxy VM as a result of backing it up and that's what caused the subsequent replications using *it* as a proxy to fail...
Then I found this thread, set the remote proxy to NBD-only, and voila, replication is working again. So now I don't know if the replication failure was just a coincidence and would have happened regardless of whether I'd backed-up said proxy, but thought I'd mention it in case it's somehow related.
-
- Influencer
- Posts: 10
- Liked: never
- Joined: May 23, 2011 1:33 pm
- Full Name: Matt
- Contact:
Re: Hotadd causing issues during Replication
Add me to the list. From my experience it is not causing jobs to fail unless they are to a WAN target (target proxy is used for WAN jobs) or a file copy job, which are more sensitive to interruptions in the job. Setup includes:
Same Source and target Proxy (virtual on ESX ServerA) - Veeam on Win2008 SP2
Datastore = SAS SAN w/ 10krpm
Target storage = local datastore w/ 7200RPM disks
Symptoms:
Job gets past create helper snapshot and begins to add the disk at which point the system stun/freeze occurs
Only happens for replications from SAN to ESX ServerA not to other ESX servers
Troubleshooting Efforts:
Restarted Management Agent on ESX ServerA - Not fixed.
Changed proxy to Network Mode to remove hotadd and establish NBD - Job indicates nbd only and successful completion with no stun/freeze
Changed proxy to Virtual Appliance instead of automatic to re-establish hotadd - Job indicates hotadd;nbd and returns to stun/freeze
I will leave at Network Mode and await an update.
--
Same Source and target Proxy (virtual on ESX ServerA) - Veeam on Win2008 SP2
Datastore = SAS SAN w/ 10krpm
Target storage = local datastore w/ 7200RPM disks
Symptoms:
Job gets past create helper snapshot and begins to add the disk at which point the system stun/freeze occurs
Only happens for replications from SAN to ESX ServerA not to other ESX servers
Troubleshooting Efforts:
Restarted Management Agent on ESX ServerA - Not fixed.
Changed proxy to Network Mode to remove hotadd and establish NBD - Job indicates nbd only and successful completion with no stun/freeze
Changed proxy to Virtual Appliance instead of automatic to re-establish hotadd - Job indicates hotadd;nbd and returns to stun/freeze
I will leave at Network Mode and await an update.
--
-
- Lurker
- Posts: 2
- Liked: never
- Joined: Sep 20, 2011 1:12 am
- Full Name: AgentDuke
Re: Hotadd causing issues during Replication
I am having this same error
"Error: Client error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ip x.x.x.x:2501"
But not for the same reason as others I think. I am trying to use the ESX server as a repository. When I use a windows repository, everything works fine. And this also worked fine on version 5.x. This only started right after the upgrade to v6. I have even reinstalled v6 from scratch with same results. Somehow using an ESX repository gives this error (and if I try to add it as a linux repository after it already exists in the tree under the vCenter creates a whole world of new issues).
I have tried all the workarounds on this topic with no success. I have even updated the esx 4.1u2 to the latest version (version 500K something). vCenter is 4.1. Also updated the latest v6 patch with no success. Any suggestions?
"Error: Client error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ip x.x.x.x:2501"
But not for the same reason as others I think. I am trying to use the ESX server as a repository. When I use a windows repository, everything works fine. And this also worked fine on version 5.x. This only started right after the upgrade to v6. I have even reinstalled v6 from scratch with same results. Somehow using an ESX repository gives this error (and if I try to add it as a linux repository after it already exists in the tree under the vCenter creates a whole world of new issues).
I have tried all the workarounds on this topic with no success. I have even updated the esx 4.1u2 to the latest version (version 500K something). vCenter is 4.1. Also updated the latest v6 patch with no success. Any suggestions?
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Hotadd causing issues during Replication
Using ESX as a backup target is no longer support with V6. I would think you could add it as a Linux box, perhaps by IP address, or with an alternate DNS name entered in the hosts file on the Veeam server, or perhaps with a secondary management nic. If you tried to add it with the same name I can certainly understand that it might cause a lot of trouble.
-
- Novice
- Posts: 3
- Liked: never
- Joined: Nov 02, 2011 4:19 pm
- Full Name: Kevin Wiltshire
- Contact:
Re: Hotadd causing issues during Replication
Hi, I'm having the same issues as you, appeared after upgrading to V6. Have you managed to work around the issues?agentduke wrote:I am having this same error
"Error: Client error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ip x.x.x.x:2501"
But not for the same reason as others I think. I am trying to use the ESX server as a repository. When I use a windows repository, everything works fine. And this also worked fine on version 5.x. This only started right after the upgrade to v6. I have even reinstalled v6 from scratch with same results. Somehow using an ESX repository gives this error (and if I try to add it as a linux repository after it already exists in the tree under the vCenter creates a whole world of new issues).
I have tried all the workarounds on this topic with no success. I have even updated the esx 4.1u2 to the latest version (version 500K something). vCenter is 4.1. Also updated the latest v6 patch with no success. Any suggestions?
Thanks
Kevin
-
- Chief Product Officer
- Posts: 31789
- Liked: 7291 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Hotadd causing issues during Replication
The issue discussed in this topic was addressed in v6 Patch #3. Thanks.
-
- Lurker
- Posts: 2
- Liked: never
- Joined: Nov 01, 2012 2:49 pm
- Full Name: Mike Quinn
- Contact:
Re: Hotadd causing issues during Replication
I don't mean to bump an old thread but seeing this problem on Veeam 7 today.
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Hotadd causing issues during Replication
Mike, have you contacted support already?
Who is online
Users browsing this forum: ETJ and 22 guests