Comprehensive data protection for all workloads
Post Reply
ChrisGundry
Veteran
Posts: 258
Liked: 40 times
Joined: Aug 26, 2015 2:56 pm
Full Name: Chris Gundry
Contact:

Replication job failing with connection errors

Post by ChrisGundry »

Hi all

We have a case open for this, case #04727832. At the moment we have not really got anywhere with a resolution.

The error we see is as follows:
19/04/2021 02:15:45 :: Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond VEEAMPROXYSERVERIP:2523

95% of the time we see this error in a single replication job, which runs daily, sending a replica offsite for certain VMs. The job has 45 VMs in it. The errors always occur within the first 10 VMs in the job, never any of the later VMs. If we change the order of the VMs so that the first 10 are then at the end, the new top 10 will start to show the error, when they didn't ever error previously when they were in the 10-20 slots in the job... Support are saying they think it is a problem with the number of connections that are being attempted at the start of the job. My issue with this is that it is only 45 VMs, not a crazy number, we have always had 40+ VMs in this job, so why is this now a problem? The Veeam servers are not stretched in terms of resources, the network is not busy at the time of the job start.

There have been no recent changes to VMware, vCenter, the VMs within the job, the number of VMs in the job or the job settings.

I recently noticed that we were seeing this error, it seems for a while now, but unfortunately it was not reported to me so not 100% sure when it started. I do know that we were not seeing this happening late last year, or early this year. I don't believe we have made any changes that would have caused this to start happening, but equally we have not made any Veeam changes or done any updates either.

Has anyone else seen this and worked out what the cause was?

Thanks!
HannesK
Product Manager
Posts: 14322
Liked: 2890 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Replication job failing with connection errors

Post by HannesK »

Hello,
hmm, port 2523 is undocumented... that definitely needs to be checked by support. To me it sounds like a network / load issue. The proxy might be the issue. Just guessing.

Support has to figure out the reason. They have the logs, they can tell the reason. You can escalate the case via the "talk to manager" button if the answers are not satisfying.

Best regards,
Hannes
ChrisGundry
Veteran
Posts: 258
Liked: 40 times
Joined: Aug 26, 2015 2:56 pm
Full Name: Chris Gundry
Contact:

Re: Replication job failing with connection errors

Post by ChrisGundry »

Surely it falls under the 'Communication with Backup Proxies' TCP 2500 to 3300 port range? "Default range of ports used as transmission channels for replication jobs. For every TCP connection that a job uses, one port from this range is assigned."

As I said, there is no network or load issue, we have two proxies and it happens to both of them equally.

Support are saying they have no real ideas. I will likely escalate the case this week as I don't feel I am getting anywhere.

I have posted this to see if others have had this issue.
HannesK
Product Manager
Posts: 14322
Liked: 2890 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Replication job failing with connection errors

Post by HannesK »

ah sorry, yes, the port is correct :-)

I'm more talking about CPU & RAM load. Yes, please escalate.
ChrisGundry
Veteran
Posts: 258
Liked: 40 times
Joined: Aug 26, 2015 2:56 pm
Full Name: Chris Gundry
Contact:

Re: Replication job failing with connection errors

Post by ChrisGundry »

Well as I said "The Veeam servers are not stretched in terms of resources, the network is not busy at the time of the job start."
PetrM
Veeam Software
Posts: 3264
Liked: 528 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: Replication job failing with connection errors

Post by PetrM »

Hello,

In fact, the error is pretty generic as there are many different factors which can provoke such an issue so the best action plan is to continue working with our support team. It's definitely not something that we can easily address over the forum posts. Perhaps, network traffic dump analysis would be helpful or it would make sense to check logs in order to see that the corresponding process is still running on the proxy server when the error is thrown. Anyway, I believe that we should let our engineers to determine the direction of research.

Thanks!
Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 105 guests