-
- Lurker
- Posts: 2
- Liked: never
- Joined: Mar 28, 2024 12:29 pm
- Full Name: Mariusz
- Contact:
Backup Copy jobs (when first time pushed or full active initiated) stop working after some time
Dear Forum,
this is our first and very recent meeting with Veeam product so forgive perhaps the naive questions from an inexperienced user...
We have got two sites in our organization where VM hosts with their VMs reside.
There is a WAN link between them. Because the circuit's bandwidth is only 100 Mbps we have set up a Backup Proxy that initiates the Backup Jobs for the remote site VMs (so that they run encapsulated in the remote site LAN). The local Veeam server (with its Backup Proxy role) makes backups for the local site VMs.
Then we want to copy the local backups to the remote site and the other way around. That is why we have created Backup Copy jobs to copy backups across both our sites.
And now our issue appears. When you deploy a new machine and run an initial full Backup job that is executed by either of the Backup Proxy - depending in which site the VM has been deployed (which work perfectly well) the Backup Copy job follows that copies (or maybe creates) a full backup in the other site for redundancy - in case the machine that keeps the local backups fails.
Because - as mentioned earlier - the circuit has its bandwidth limits - the job lasts several hours to copy a few hundreds of GB end to end.
But usually after a few hours of transfers with full link capacity - hard to say precisely how long after initializing because that period veries - the transfer slows down to ridiculous values so that it practically stops - even though the link is not utilized at all. When you observe the VB&R console and the Performance Monitor of the Veeam server to track the copy (job) progress you see that that job still runs but the server sends (or receives) a pack of data once every several seconds and slows down to zero. Then again another pack of data up to 10-20 Mbps for 1-2 seconds and slow down to zero. Another pause for several seconds and so on...
Such a behaviour causes that the Backup Copy job lasts ages and has never ended successfully. It also overlaps other scheduled jobs that cannot even start because there are limits of concurrent jobs set up because of the machines' resources limitations.
How to diagnose what is the root cause of such slow downs? How to correct the situation?
Let me mention that we can easily copy the gigantic files using Windows shares between the same machines using the same circuit. That is the thing that we have tested while troubleshooting.
Do not think that I opened the case without any analyses earlier.
Currently we have got the WAN accelerators switched off. When they were on - which we have also tested while troubleshooting - it had not changed anything. Plus we were advised by the partner (reseller) not to use it at all after we had earlier (while troubleshooting the case by ourselves) set it up.
RAM and CPU utilization is normal during the jobs. RAM oscillates around 50% of the assigned amount and CPU around 40% in the Resource Monitor of the Veeam server.
The Veeam support (yes, I have opened the case) suggest to seed the offsite location. But it seems to be only a workaround, not a permanent solution.
Besides, there is already one workaround that we have applied. After the job stops (or switches into a slow mode described above) we rebooted the Veeam server. It triggered a continuation of the Backup Copy jobs after the OS stood up. After several such cycles it was finally able to complete the job. It required looking into it for the whole weekend. This is absolutely unacceptable and it prognoses bad for future machines.
Have you encountered similar issues? Can you share your working solution?
this is our first and very recent meeting with Veeam product so forgive perhaps the naive questions from an inexperienced user...
We have got two sites in our organization where VM hosts with their VMs reside.
There is a WAN link between them. Because the circuit's bandwidth is only 100 Mbps we have set up a Backup Proxy that initiates the Backup Jobs for the remote site VMs (so that they run encapsulated in the remote site LAN). The local Veeam server (with its Backup Proxy role) makes backups for the local site VMs.
Then we want to copy the local backups to the remote site and the other way around. That is why we have created Backup Copy jobs to copy backups across both our sites.
And now our issue appears. When you deploy a new machine and run an initial full Backup job that is executed by either of the Backup Proxy - depending in which site the VM has been deployed (which work perfectly well) the Backup Copy job follows that copies (or maybe creates) a full backup in the other site for redundancy - in case the machine that keeps the local backups fails.
Because - as mentioned earlier - the circuit has its bandwidth limits - the job lasts several hours to copy a few hundreds of GB end to end.
But usually after a few hours of transfers with full link capacity - hard to say precisely how long after initializing because that period veries - the transfer slows down to ridiculous values so that it practically stops - even though the link is not utilized at all. When you observe the VB&R console and the Performance Monitor of the Veeam server to track the copy (job) progress you see that that job still runs but the server sends (or receives) a pack of data once every several seconds and slows down to zero. Then again another pack of data up to 10-20 Mbps for 1-2 seconds and slow down to zero. Another pause for several seconds and so on...
Such a behaviour causes that the Backup Copy job lasts ages and has never ended successfully. It also overlaps other scheduled jobs that cannot even start because there are limits of concurrent jobs set up because of the machines' resources limitations.
How to diagnose what is the root cause of such slow downs? How to correct the situation?
Let me mention that we can easily copy the gigantic files using Windows shares between the same machines using the same circuit. That is the thing that we have tested while troubleshooting.
Do not think that I opened the case without any analyses earlier.
Currently we have got the WAN accelerators switched off. When they were on - which we have also tested while troubleshooting - it had not changed anything. Plus we were advised by the partner (reseller) not to use it at all after we had earlier (while troubleshooting the case by ourselves) set it up.
RAM and CPU utilization is normal during the jobs. RAM oscillates around 50% of the assigned amount and CPU around 40% in the Resource Monitor of the Veeam server.
The Veeam support (yes, I have opened the case) suggest to seed the offsite location. But it seems to be only a workaround, not a permanent solution.
Besides, there is already one workaround that we have applied. After the job stops (or switches into a slow mode described above) we rebooted the Veeam server. It triggered a continuation of the Backup Copy jobs after the OS stood up. After several such cycles it was finally able to complete the job. It required looking into it for the whole weekend. This is absolutely unacceptable and it prognoses bad for future machines.
Have you encountered similar issues? Can you share your working solution?
-
- Product Manager
- Posts: 15127
- Liked: 3232 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Backup Copy jobs (when first time pushed or full active initiated) stop working after some time
Hello,
and welcome to the forums
Please post the support case number. Otherwise the forum post will be deleted eventually
Best regards,
Hannes
and welcome to the forums
My suggestion would be to ask support to investigate why that happens. That seems to be the root causeBut usually after a few hours of transfers with full link capacity [...] the transfer slows down to ridiculous values so that it practically stops
Please post the support case number. Otherwise the forum post will be deleted eventually
Best regards,
Hannes
-
- Lurker
- Posts: 2
- Liked: never
- Joined: Mar 28, 2024 12:29 pm
- Full Name: Mariusz
- Contact:
-
- Enthusiast
- Posts: 42
- Liked: 6 times
- Joined: May 17, 2010 7:41 pm
- Full Name: Christian Moeller
- Location: Denmark
- Contact:
Re: Backup Copy jobs (when first time pushed or full active initiated) stop working after some time
Hi.
I have experience that exact same issue - both in ver. 11 and 12.1 !
Support suggested (windows) re-install of the proxys - that helped for some months - then the issue started again!
Recently I found that if I disable the proxys (virtual) network cards (from vCenter) and then after a few seconds re-enabled them again - then Data starts to flow at normal speed again. The speed of the lines are not the issue because at the same time Veeam is slow I can copy data with much higher speed between the involved proxys.
I have experience that exact same issue - both in ver. 11 and 12.1 !
Support suggested (windows) re-install of the proxys - that helped for some months - then the issue started again!
Recently I found that if I disable the proxys (virtual) network cards (from vCenter) and then after a few seconds re-enabled them again - then Data starts to flow at normal speed again. The speed of the lines are not the issue because at the same time Veeam is slow I can copy data with much higher speed between the involved proxys.
-
- Novice
- Posts: 8
- Liked: never
- Joined: Sep 24, 2019 3:56 pm
- Full Name: Wayne Mery
- Contact:
Re: Backup Copy jobs (when first time pushed or full active initiated) stop working after some time
I have seen something similar with regular backup jobs (but not slow connections).
-
- Expert
- Posts: 136
- Liked: 12 times
- Joined: Nov 12, 2018 8:24 pm
- Full Name: Tim Thomas
- Contact:
Re: Backup Copy jobs (when first time pushed or full active initiated) stop working after some time
I have had similar issues too. In one case, a network device incorrectly tagged the traffic as malicious. We had to get it untagged and then it ran normally.
In another case, totally different, it went on for quite awhile with no resolution. I think if i recall we had to completely recreate the backup copy job from scratch and we never figured out the cause.
In another case, totally different, it went on for quite awhile with no resolution. I think if i recall we had to completely recreate the backup copy job from scratch and we never figured out the cause.
Who is online
Users browsing this forum: No registered users and 28 guests