Replication Jobs Failing With Connection Errors - 02023509

Availability for the Always-On Enterprise

Replication Jobs Failing With Connection Errors - 02023509

Veeam Logoby manofbronze » Fri Dec 30, 2016 4:00 pm

We have been running about a dozen replication jobs over a high speed WAN connection for the past 3 months. Until last week these jobs ran without a hitch.
Last week we started seeing numerous retries on the jobs, with several resulting in complete failures. A review of the logs indicates the jobs fail due to a connection error with the vCenter Server.

Sample of error: 12/30/2016 10:23:33 AM :: Processing DBnode2 Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.0.100.24:443

What is odd, is that the jobs actually do connect to the vCenter Server and often stay connected for an extended period of time and even process data (I have watched the activities on both the Veeam backup server and the target vSphere environment). Some jobs get to almost 100% complete during disk processing before failing. Even more perplexing is that all jobs contain multiple virtual machines and some of the machines within a job will fail while others will succeed. The issue is getting progressively worse, with more an more jobs failing completely. :|

I have ticket # 02023509 open with Veeam. They are pointing to the vCenter Server as the culprit and so I have checked the vCenter logs, general health and even rebooted it a few times ... no luck. They also are suggesting network instability; however, we are not having any issues with the 20 or so backup copy jobs we also process across this same WAN to the same target vSphere environment using the same vCenter Server.

There have been no changes made to the target vSphere environment and/or vcenter server in over a year. We are running vSphere 6.0 in the target environment and 5.5 u3d in the production environment. The only change that has been made in the past (3) months has been an upgrade from Veeam Availability Suite v9.0 Ent plus to v9.5

Anyone have any ideas or experiencing this same problem?
manofbronze
Service Provider
 
Posts: 10
Liked: never
Joined: Wed Jul 15, 2015 2:13 pm

Re: Replication Jobs Failing With Connection Errors - 020235

Veeam Logoby Dima P. » Mon Jan 02, 2017 1:50 pm 1 person likes this post

Hi manofbronze,

Try to create a new replication job for a VM that’s constantly fails to replicate with this error and check the result of this new job. Let us know how it goes.
Dima P.
Veeam Software
 
Posts: 6497
Liked: 454 times
Joined: Mon Feb 04, 2013 2:07 pm
Location: SPb
Full Name: Dmitry Popov

Re: Replication Jobs Failing With Connection Errors - 020235

Veeam Logoby Andreas Neufert » Mon Jan 02, 2017 11:15 pm 1 person likes this post

I saw that the case was already escalated within our support system.
In the past when such errors occurred, the usage of Network/NBD mode helped till the real cause was detected. Is your vCenter environment under heavy load (SOAP connection errors)?
Andreas Neufert
Veeam Software
 
Posts: 2250
Liked: 374 times
Joined: Wed May 04, 2011 8:36 am
Location: Germany
Full Name: @AndyandtheVMs Veeam PM

Re: Replication Jobs Failing With Connection Errors - 020235

Veeam Logoby foggy » Tue Jan 03, 2017 10:13 am 1 person likes this post

I'd also try to limit the number of concurrently running jobs and see whether reducing the load on virtual environment help to avoid this issue.
foggy
Veeam Software
 
Posts: 15086
Liked: 1110 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: Replication Jobs Failing With Connection Errors - 020235

Veeam Logoby manofbronze » Thu Jan 12, 2017 2:44 pm

After numerous network "tweaks", a job recreation, vcenter server resets, vcenter server resource review, etc... no luck. Despite all performance counters showing healthy inside and outside the backup Server VM, in a fit of frustration I powered the darn thing off, gave it a 'rest" and powered it back on.

And of course, replication jobs are running swimmingly again. :?

Not a fix, I know, but the issue is "resolved" for now ... until next time that is ...
manofbronze
Service Provider
 
Posts: 10
Liked: never
Joined: Wed Jul 15, 2015 2:13 pm

Re: Replication Jobs Failing With Connection Errors - 020235

Veeam Logoby Andreas Neufert » Thu Jan 12, 2017 2:51 pm

Interesting.... Was it the ESXi host that you had rebooted ? Which built is it?
Andreas Neufert
Veeam Software
 
Posts: 2250
Liked: 374 times
Joined: Wed May 04, 2011 8:36 am
Location: Germany
Full Name: @AndyandtheVMs Veeam PM

Re: Replication Jobs Failing With Connection Errors - 020235

Veeam Logoby manofbronze » Mon Jan 16, 2017 1:21 am

I reset the Veeam Backup Server. It is a Windows 2012 r2 VMware with most recent patches applied running under vsphere 5.5 u3d.
manofbronze
Service Provider
 
Posts: 10
Liked: never
Joined: Wed Jul 15, 2015 2:13 pm


Return to Veeam Backup & Replication



Who is online

Users browsing this forum: DGrinev, foggy, jozne, vClintWyckoff, Yahoo [Bot] and 51 guests