-
- Service Provider
- Posts: 29
- Liked: 3 times
- Joined: Jul 15, 2015 2:13 pm
- Contact:
Replication Jobs Failing With Connection Errors - 02023509
We have been running about a dozen replication jobs over a high speed WAN connection for the past 3 months. Until last week these jobs ran without a hitch.
Last week we started seeing numerous retries on the jobs, with several resulting in complete failures. A review of the logs indicates the jobs fail due to a connection error with the vCenter Server.
Sample of error: 12/30/2016 10:23:33 AM :: Processing DBnode2 Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.0.100.24:443
What is odd, is that the jobs actually do connect to the vCenter Server and often stay connected for an extended period of time and even process data (I have watched the activities on both the Veeam backup server and the target vSphere environment). Some jobs get to almost 100% complete during disk processing before failing. Even more perplexing is that all jobs contain multiple virtual machines and some of the machines within a job will fail while others will succeed. The issue is getting progressively worse, with more an more jobs failing completely.
I have ticket # 02023509 open with Veeam. They are pointing to the vCenter Server as the culprit and so I have checked the vCenter logs, general health and even rebooted it a few times ... no luck. They also are suggesting network instability; however, we are not having any issues with the 20 or so backup copy jobs we also process across this same WAN to the same target vSphere environment using the same vCenter Server.
There have been no changes made to the target vSphere environment and/or vcenter server in over a year. We are running vSphere 6.0 in the target environment and 5.5 u3d in the production environment. The only change that has been made in the past (3) months has been an upgrade from Veeam Availability Suite v9.0 Ent plus to v9.5
Anyone have any ideas or experiencing this same problem?
Last week we started seeing numerous retries on the jobs, with several resulting in complete failures. A review of the logs indicates the jobs fail due to a connection error with the vCenter Server.
Sample of error: 12/30/2016 10:23:33 AM :: Processing DBnode2 Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.0.100.24:443
What is odd, is that the jobs actually do connect to the vCenter Server and often stay connected for an extended period of time and even process data (I have watched the activities on both the Veeam backup server and the target vSphere environment). Some jobs get to almost 100% complete during disk processing before failing. Even more perplexing is that all jobs contain multiple virtual machines and some of the machines within a job will fail while others will succeed. The issue is getting progressively worse, with more an more jobs failing completely.
I have ticket # 02023509 open with Veeam. They are pointing to the vCenter Server as the culprit and so I have checked the vCenter logs, general health and even rebooted it a few times ... no luck. They also are suggesting network instability; however, we are not having any issues with the 20 or so backup copy jobs we also process across this same WAN to the same target vSphere environment using the same vCenter Server.
There have been no changes made to the target vSphere environment and/or vcenter server in over a year. We are running vSphere 6.0 in the target environment and 5.5 u3d in the production environment. The only change that has been made in the past (3) months has been an upgrade from Veeam Availability Suite v9.0 Ent plus to v9.5
Anyone have any ideas or experiencing this same problem?
-
- Product Manager
- Posts: 14716
- Liked: 1703 times
- Joined: Feb 04, 2013 2:07 pm
- Full Name: Dmitry Popov
- Location: Prague
- Contact:
Re: Replication Jobs Failing With Connection Errors - 020235
Hi manofbronze,
Try to create a new replication job for a VM that’s constantly fails to replicate with this error and check the result of this new job. Let us know how it goes.
Try to create a new replication job for a VM that’s constantly fails to replicate with this error and check the result of this new job. Let us know how it goes.
-
- VP, Product Management
- Posts: 7076
- Liked: 1510 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
Re: Replication Jobs Failing With Connection Errors - 020235
I saw that the case was already escalated within our support system.
In the past when such errors occurred, the usage of Network/NBD mode helped till the real cause was detected. Is your vCenter environment under heavy load (SOAP connection errors)?
In the past when such errors occurred, the usage of Network/NBD mode helped till the real cause was detected. Is your vCenter environment under heavy load (SOAP connection errors)?
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Replication Jobs Failing With Connection Errors - 020235
I'd also try to limit the number of concurrently running jobs and see whether reducing the load on virtual environment help to avoid this issue.
-
- Service Provider
- Posts: 29
- Liked: 3 times
- Joined: Jul 15, 2015 2:13 pm
- Contact:
Re: Replication Jobs Failing With Connection Errors - 020235
After numerous network "tweaks", a job recreation, vcenter server resets, vcenter server resource review, etc... no luck. Despite all performance counters showing healthy inside and outside the backup Server VM, in a fit of frustration I powered the darn thing off, gave it a 'rest" and powered it back on.
And of course, replication jobs are running swimmingly again.
Not a fix, I know, but the issue is "resolved" for now ... until next time that is ...
And of course, replication jobs are running swimmingly again.
Not a fix, I know, but the issue is "resolved" for now ... until next time that is ...
-
- VP, Product Management
- Posts: 7076
- Liked: 1510 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
Re: Replication Jobs Failing With Connection Errors - 020235
Interesting.... Was it the ESXi host that you had rebooted ? Which built is it?
-
- Service Provider
- Posts: 29
- Liked: 3 times
- Joined: Jul 15, 2015 2:13 pm
- Contact:
Re: Replication Jobs Failing With Connection Errors - 020235
I reset the Veeam Backup Server. It is a Windows 2012 r2 VMware with most recent patches applied running under vsphere 5.5 u3d.
-
- Service Provider
- Posts: 9
- Liked: 2 times
- Joined: Jan 21, 2019 2:29 am
- Full Name: John Loy
- Contact:
Re: Replication Jobs Failing With Connection Errors - 02023509
I just ran into this issue myself. I am using 9.5 Update4a and I had to reboot the veeam server to get it working again. I am glad I found this post because I would sent a lot of time trying to figure this out.
Who is online
Users browsing this forum: shangwsh, woifgaung and 150 guests