Replication Jobs Failing With Connection Errors - 02023509

Post by **manofbronze** » Dec 30, 2016 4:00 pm this post

We have been running about a dozen replication jobs over a high speed WAN connection for the past 3 months. Until last week these jobs ran without a hitch.
Last week we started seeing numerous retries on the jobs, with several resulting in complete failures. A review of the logs indicates the jobs fail due to a connection error with the vCenter Server.

Sample of error: 12/30/2016 10:23:33 AM :: Processing DBnode2 Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.0.100.24:443

What is odd, is that the jobs actually do connect to the vCenter Server and often stay connected for an extended period of time and even process data (I have watched the activities on both the Veeam backup server and the target vSphere environment). Some jobs get to almost 100% complete during disk processing before failing. Even more perplexing is that all jobs contain multiple virtual machines and some of the machines within a job will fail while others will succeed. The issue is getting progressively worse, with more an more jobs failing completely.

I have ticket # 02023509 open with Veeam. They are pointing to the vCenter Server as the culprit and so I have checked the vCenter logs, general health and even rebooted it a few times ... no luck. They also are suggesting network instability; however, we are not having any issues with the 20 or so backup copy jobs we also process across this same WAN to the same target vSphere environment using the same vCenter Server.

There have been no changes made to the target vSphere environment and/or vcenter server in over a year. We are running vSphere 6.0 in the target environment and 5.5 u3d in the production environment. The only change that has been made in the past (3) months has been an upgrade from Veeam Availability Suite v9.0 Ent plus to v9.5

Anyone have any ideas or experiencing this same problem?

Jan 02, 2017 1:50 pm

Hi manofbronze,

Try to create a new replication job for a VM that’s constantly fails to replicate with this error and check the result of this new job. Let us know how it goes.

Jan 02, 2017 11:15 pm

I saw that the case was already escalated within our support system.
In the past when such errors occurred, the usage of Network/NBD mode helped till the real cause was detected. Is your vCenter environment under heavy load (SOAP connection errors)?

Jan 03, 2017 10:13 am

I'd also try to limit the number of concurrently running jobs and see whether reducing the load on virtual environment help to avoid this issue.

Post by **manofbronze** » Jan 12, 2017 2:44 pm this post

After numerous network "tweaks", a job recreation, vcenter server resets, vcenter server resource review, etc... no luck. Despite all performance counters showing healthy inside and outside the backup Server VM, in a fit of frustration I powered the darn thing off, gave it a 'rest" and powered it back on.

And of course, replication jobs are running swimmingly again.

Not a fix, I know, but the issue is "resolved" for now ... until next time that is ...

Jan 12, 2017 2:51 pm

Interesting.... Was it the ESXi host that you had rebooted ? Which built is it?

Post by **manofbronze** » Jan 16, 2017 1:21 am this post

I reset the Veeam Backup Server. It is a Windows 2012 r2 VMware with most recent patches applied running under vsphere 5.5 u3d.

Post by **GT-Engineer** » Jun 14, 2019 12:21 pm this post

I just ran into this issue myself. I am using 9.5 Update4a and I had to reboot the veeam server to get it working again. I am glad I found this post because I would sent a lot of time trying to figure this out.

R&D Forums

Replication Jobs Failing With Connection Errors - 02023509

Re: Replication Jobs Failing With Connection Errors - 020235

Re: Replication Jobs Failing With Connection Errors - 020235

Re: Replication Jobs Failing With Connection Errors - 020235

Re: Replication Jobs Failing With Connection Errors - 020235

Re: Replication Jobs Failing With Connection Errors - 020235

Re: Replication Jobs Failing With Connection Errors - 020235

Re: Replication Jobs Failing With Connection Errors - 02023509

Who is online