-
- Veteran
- Posts: 259
- Liked: 40 times
- Joined: Aug 26, 2015 2:56 pm
- Full Name: Chris Gundry
- Contact:
Replication job failing with connection errors
Hi all
We have a case open for this, case #04727832. At the moment we have not really got anywhere with a resolution.
The error we see is as follows:
19/04/2021 02:15:45 :: Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond VEEAMPROXYSERVERIP:2523
95% of the time we see this error in a single replication job, which runs daily, sending a replica offsite for certain VMs. The job has 45 VMs in it. The errors always occur within the first 10 VMs in the job, never any of the later VMs. If we change the order of the VMs so that the first 10 are then at the end, the new top 10 will start to show the error, when they didn't ever error previously when they were in the 10-20 slots in the job... Support are saying they think it is a problem with the number of connections that are being attempted at the start of the job. My issue with this is that it is only 45 VMs, not a crazy number, we have always had 40+ VMs in this job, so why is this now a problem? The Veeam servers are not stretched in terms of resources, the network is not busy at the time of the job start.
There have been no recent changes to VMware, vCenter, the VMs within the job, the number of VMs in the job or the job settings.
I recently noticed that we were seeing this error, it seems for a while now, but unfortunately it was not reported to me so not 100% sure when it started. I do know that we were not seeing this happening late last year, or early this year. I don't believe we have made any changes that would have caused this to start happening, but equally we have not made any Veeam changes or done any updates either.
Has anyone else seen this and worked out what the cause was?
Thanks!
We have a case open for this, case #04727832. At the moment we have not really got anywhere with a resolution.
The error we see is as follows:
19/04/2021 02:15:45 :: Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond VEEAMPROXYSERVERIP:2523
95% of the time we see this error in a single replication job, which runs daily, sending a replica offsite for certain VMs. The job has 45 VMs in it. The errors always occur within the first 10 VMs in the job, never any of the later VMs. If we change the order of the VMs so that the first 10 are then at the end, the new top 10 will start to show the error, when they didn't ever error previously when they were in the 10-20 slots in the job... Support are saying they think it is a problem with the number of connections that are being attempted at the start of the job. My issue with this is that it is only 45 VMs, not a crazy number, we have always had 40+ VMs in this job, so why is this now a problem? The Veeam servers are not stretched in terms of resources, the network is not busy at the time of the job start.
There have been no recent changes to VMware, vCenter, the VMs within the job, the number of VMs in the job or the job settings.
I recently noticed that we were seeing this error, it seems for a while now, but unfortunately it was not reported to me so not 100% sure when it started. I do know that we were not seeing this happening late last year, or early this year. I don't believe we have made any changes that would have caused this to start happening, but equally we have not made any Veeam changes or done any updates either.
Has anyone else seen this and worked out what the cause was?
Thanks!
-
- Product Manager
- Posts: 14840
- Liked: 3086 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Replication job failing with connection errors
Hello,
hmm, port 2523 is undocumented... that definitely needs to be checked by support. To me it sounds like a network / load issue. The proxy might be the issue. Just guessing.
Support has to figure out the reason. They have the logs, they can tell the reason. You can escalate the case via the "talk to manager" button if the answers are not satisfying.
Best regards,
Hannes
hmm, port 2523 is undocumented... that definitely needs to be checked by support. To me it sounds like a network / load issue. The proxy might be the issue. Just guessing.
Support has to figure out the reason. They have the logs, they can tell the reason. You can escalate the case via the "talk to manager" button if the answers are not satisfying.
Best regards,
Hannes
-
- Veteran
- Posts: 259
- Liked: 40 times
- Joined: Aug 26, 2015 2:56 pm
- Full Name: Chris Gundry
- Contact:
Re: Replication job failing with connection errors
Surely it falls under the 'Communication with Backup Proxies' TCP 2500 to 3300 port range? "Default range of ports used as transmission channels for replication jobs. For every TCP connection that a job uses, one port from this range is assigned."
As I said, there is no network or load issue, we have two proxies and it happens to both of them equally.
Support are saying they have no real ideas. I will likely escalate the case this week as I don't feel I am getting anywhere.
I have posted this to see if others have had this issue.
As I said, there is no network or load issue, we have two proxies and it happens to both of them equally.
Support are saying they have no real ideas. I will likely escalate the case this week as I don't feel I am getting anywhere.
I have posted this to see if others have had this issue.
-
- Product Manager
- Posts: 14840
- Liked: 3086 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Replication job failing with connection errors
ah sorry, yes, the port is correct
I'm more talking about CPU & RAM load. Yes, please escalate.
I'm more talking about CPU & RAM load. Yes, please escalate.
-
- Veteran
- Posts: 259
- Liked: 40 times
- Joined: Aug 26, 2015 2:56 pm
- Full Name: Chris Gundry
- Contact:
Re: Replication job failing with connection errors
Well as I said "The Veeam servers are not stretched in terms of resources, the network is not busy at the time of the job start."
-
- Veeam Software
- Posts: 3626
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: Replication job failing with connection errors
Hello,
In fact, the error is pretty generic as there are many different factors which can provoke such an issue so the best action plan is to continue working with our support team. It's definitely not something that we can easily address over the forum posts. Perhaps, network traffic dump analysis would be helpful or it would make sense to check logs in order to see that the corresponding process is still running on the proxy server when the error is thrown. Anyway, I believe that we should let our engineers to determine the direction of research.
Thanks!
In fact, the error is pretty generic as there are many different factors which can provoke such an issue so the best action plan is to continue working with our support team. It's definitely not something that we can easily address over the forum posts. Perhaps, network traffic dump analysis would be helpful or it would make sense to check logs in order to see that the corresponding process is still running on the proxy server when the error is thrown. Anyway, I believe that we should let our engineers to determine the direction of research.
Thanks!
-
- Lurker
- Posts: 1
- Liked: never
- Joined: Sep 11, 2024 5:33 pm
- Full Name: Andy Turnage
- Contact:
Re: Replication job failing with connection errors
I am having the same issue with Veeam Backup and Replication V12. 17 VM's replicate fine and 1-2 do not. This started a bout 45 days ago with no config changes on the network. Same connection issues between Production and DR site. Opened a case with Veeam support and it has been escalated 2-3 times with no luck. Same VM fails replication at the same point every time while the other VM's continue to replicate fine. Per the moderator, he wanted a case #. So here it is... Case # 07340636. Case has been opened for 30-45 days. Exhausted multiple troubleshooting steps. Did the above issue ever get resolved? and if so how?
-
- Veeam Software
- Posts: 3626
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: Replication job failing with connection errors
Hi Andy,
I had a quick look at the case, and as far as I understand, there is a connectivity issue related to the fact that some packets are being lost between the source and target proxy. Have you had a chance to discuss this situation with your network team?
Thanks!
I had a quick look at the case, and as far as I understand, there is a connectivity issue related to the fact that some packets are being lost between the source and target proxy. Have you had a chance to discuss this situation with your network team?
Thanks!
-
- Veteran
- Posts: 259
- Liked: 40 times
- Joined: Aug 26, 2015 2:56 pm
- Full Name: Chris Gundry
- Contact:
Re: Replication job failing with connection errors
We never got to the bottom of this issue. Our issue was intermittent on several machines each day, but retries would usually work. Support wanted us to do network traces, but as the replica jobs that had the issue were running OOH, and running traces all night was not viable we could not gather the traces. At that point Veeam essentially said they couldn't/wouldn't do anything else.
We disabled all firewall services, packet inspection etc between both ends of the replication and that did not help. We opened all ports and services between both ends, again no change. IIRC the issue still existed up until we stopped using Veeam last year.
Sorry I can't give you a fix!
We disabled all firewall services, packet inspection etc between both ends of the replication and that did not help. We opened all ports and services between both ends, again no change. IIRC the issue still existed up until we stopped using Veeam last year.
Sorry I can't give you a fix!
-
- Veeam Software
- Posts: 3626
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: Replication job failing with connection errors
Hi Chris,
It's difficult to comment on it without a provided case ID, but in 99.9% of cases, such problems come from the infrastructure, particularly due to intermittent network outages that should be investigated by the network team. As a workaround, I propose trying to increase the number of job retries.
Thanks!
It's difficult to comment on it without a provided case ID, but in 99.9% of cases, such problems come from the infrastructure, particularly due to intermittent network outages that should be investigated by the network team. As a workaround, I propose trying to increase the number of job retries.
Thanks!
Who is online
Users browsing this forum: Majestic-12 [Bot], Semrush [Bot] and 43 guests