Comprehensive data protection for all workloads
Post Reply
manofbronze
Service Provider
Posts: 29
Liked: 3 times
Joined: Jul 15, 2015 2:13 pm
Contact:

Replication Jobs Failing With Connection Errors - 02023509

Post by manofbronze »

We have been running about a dozen replication jobs over a high speed WAN connection for the past 3 months. Until last week these jobs ran without a hitch.
Last week we started seeing numerous retries on the jobs, with several resulting in complete failures. A review of the logs indicates the jobs fail due to a connection error with the vCenter Server.

Sample of error: 12/30/2016 10:23:33 AM :: Processing DBnode2 Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.0.100.24:443

What is odd, is that the jobs actually do connect to the vCenter Server and often stay connected for an extended period of time and even process data (I have watched the activities on both the Veeam backup server and the target vSphere environment). Some jobs get to almost 100% complete during disk processing before failing. Even more perplexing is that all jobs contain multiple virtual machines and some of the machines within a job will fail while others will succeed. The issue is getting progressively worse, with more an more jobs failing completely. :|

I have ticket # 02023509 open with Veeam. They are pointing to the vCenter Server as the culprit and so I have checked the vCenter logs, general health and even rebooted it a few times ... no luck. They also are suggesting network instability; however, we are not having any issues with the 20 or so backup copy jobs we also process across this same WAN to the same target vSphere environment using the same vCenter Server.

There have been no changes made to the target vSphere environment and/or vcenter server in over a year. We are running vSphere 6.0 in the target environment and 5.5 u3d in the production environment. The only change that has been made in the past (3) months has been an upgrade from Veeam Availability Suite v9.0 Ent plus to v9.5

Anyone have any ideas or experiencing this same problem?
Dima P.
Product Manager
Posts: 14716
Liked: 1703 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Replication Jobs Failing With Connection Errors - 020235

Post by Dima P. » 1 person likes this post

Hi manofbronze,

Try to create a new replication job for a VM that’s constantly fails to replicate with this error and check the result of this new job. Let us know how it goes.
Andreas Neufert
VP, Product Management
Posts: 7076
Liked: 1510 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Replication Jobs Failing With Connection Errors - 020235

Post by Andreas Neufert » 1 person likes this post

I saw that the case was already escalated within our support system.
In the past when such errors occurred, the usage of Network/NBD mode helped till the real cause was detected. Is your vCenter environment under heavy load (SOAP connection errors)?
foggy
Veeam Software
Posts: 21138
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Replication Jobs Failing With Connection Errors - 020235

Post by foggy » 1 person likes this post

I'd also try to limit the number of concurrently running jobs and see whether reducing the load on virtual environment help to avoid this issue.
manofbronze
Service Provider
Posts: 29
Liked: 3 times
Joined: Jul 15, 2015 2:13 pm
Contact:

Re: Replication Jobs Failing With Connection Errors - 020235

Post by manofbronze »

After numerous network "tweaks", a job recreation, vcenter server resets, vcenter server resource review, etc... no luck. Despite all performance counters showing healthy inside and outside the backup Server VM, in a fit of frustration I powered the darn thing off, gave it a 'rest" and powered it back on.

And of course, replication jobs are running swimmingly again. :?

Not a fix, I know, but the issue is "resolved" for now ... until next time that is ...
Andreas Neufert
VP, Product Management
Posts: 7076
Liked: 1510 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Replication Jobs Failing With Connection Errors - 020235

Post by Andreas Neufert »

Interesting.... Was it the ESXi host that you had rebooted ? Which built is it?
manofbronze
Service Provider
Posts: 29
Liked: 3 times
Joined: Jul 15, 2015 2:13 pm
Contact:

Re: Replication Jobs Failing With Connection Errors - 020235

Post by manofbronze »

I reset the Veeam Backup Server. It is a Windows 2012 r2 VMware with most recent patches applied running under vsphere 5.5 u3d.
GT-Engineer
Service Provider
Posts: 9
Liked: 2 times
Joined: Jan 21, 2019 2:29 am
Full Name: John Loy
Contact:

Re: Replication Jobs Failing With Connection Errors - 02023509

Post by GT-Engineer »

I just ran into this issue myself. I am using 9.5 Update4a and I had to reboot the veeam server to get it working again. I am glad I found this post because I would sent a lot of time trying to figure this out.
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot], Semrush [Bot] and 141 guests