Host-based backup of VMware vSphere VMs.
Post Reply
uszy
Lurker
Posts: 2
Liked: never
Joined: Mar 28, 2024 12:29 pm
Full Name: Mariusz
Contact:

Backup Copy jobs (when first time pushed or full active initiated) stop working after some time

Post by uszy »

Dear Forum,

this is our first and very recent meeting with Veeam product so forgive perhaps the naive questions from an inexperienced user...

We have got two sites in our organization where VM hosts with their VMs reside.
There is a WAN link between them. Because the circuit's bandwidth is only 100 Mbps we have set up a Backup Proxy that initiates the Backup Jobs for the remote site VMs (so that they run encapsulated in the remote site LAN). The local Veeam server (with its Backup Proxy role) makes backups for the local site VMs.

Then we want to copy the local backups to the remote site and the other way around. That is why we have created Backup Copy jobs to copy backups across both our sites.

And now our issue appears. When you deploy a new machine and run an initial full Backup job that is executed by either of the Backup Proxy - depending in which site the VM has been deployed (which work perfectly well) the Backup Copy job follows that copies (or maybe creates) a full backup in the other site for redundancy - in case the machine that keeps the local backups fails.

Because - as mentioned earlier - the circuit has its bandwidth limits - the job lasts several hours to copy a few hundreds of GB end to end.

But usually after a few hours of transfers with full link capacity - hard to say precisely how long after initializing because that period veries - the transfer slows down to ridiculous values so that it practically stops - even though the link is not utilized at all. When you observe the VB&R console and the Performance Monitor of the Veeam server to track the copy (job) progress you see that that job still runs but the server sends (or receives) a pack of data once every several seconds and slows down to zero. Then again another pack of data up to 10-20 Mbps for 1-2 seconds and slow down to zero. Another pause for several seconds and so on...

Such a behaviour causes that the Backup Copy job lasts ages and has never ended successfully. It also overlaps other scheduled jobs that cannot even start because there are limits of concurrent jobs set up because of the machines' resources limitations.

How to diagnose what is the root cause of such slow downs? How to correct the situation?

Let me mention that we can easily copy the gigantic files using Windows shares between the same machines using the same circuit. That is the thing that we have tested while troubleshooting.
Do not think that I opened the case without any analyses earlier.
Currently we have got the WAN accelerators switched off. When they were on - which we have also tested while troubleshooting - it had not changed anything. Plus we were advised by the partner (reseller) not to use it at all after we had earlier (while troubleshooting the case by ourselves) set it up.

RAM and CPU utilization is normal during the jobs. RAM oscillates around 50% of the assigned amount and CPU around 40% in the Resource Monitor of the Veeam server.

The Veeam support (yes, I have opened the case) suggest to seed the offsite location. But it seems to be only a workaround, not a permanent solution.

Besides, there is already one workaround that we have applied. After the job stops (or switches into a slow mode described above) we rebooted the Veeam server. It triggered a continuation of the Backup Copy jobs after the OS stood up. After several such cycles it was finally able to complete the job. It required looking into it for the whole weekend. This is absolutely unacceptable and it prognoses bad for future machines.

Have you encountered similar issues? Can you share your working solution?
HannesK
Product Manager
Posts: 14322
Liked: 2890 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Backup Copy jobs (when first time pushed or full active initiated) stop working after some time

Post by HannesK »

Hello,
and welcome to the forums
But usually after a few hours of transfers with full link capacity [...] the transfer slows down to ridiculous values so that it practically stops
My suggestion would be to ask support to investigate why that happens. That seems to be the root cause

Please post the support case number. Otherwise the forum post will be deleted eventually

Best regards,
Hannes
uszy
Lurker
Posts: 2
Liked: never
Joined: Mar 28, 2024 12:29 pm
Full Name: Mariusz
Contact:

Re: Backup Copy jobs (when first time pushed or full active initiated) stop working after some time

Post by uszy »

Case #07199125
chrmol
Enthusiast
Posts: 38
Liked: 2 times
Joined: May 17, 2010 7:41 pm
Full Name: Christian Moeller
Location: Denmark
Contact:

Re: Backup Copy jobs (when first time pushed or full active initiated) stop working after some time

Post by chrmol »

Hi.
I have experience that exact same issue - both in ver. 11 and 12.1 !

Support suggested (windows) re-install of the proxys - that helped for some months - then the issue started again!
Recently I found that if I disable the proxys (virtual) network cards (from vCenter) and then after a few seconds re-enabled them again - then Data starts to flow at normal speed again. The speed of the lines are not the issue because at the same time Veeam is slow I can copy data with much higher speed between the involved proxys.
wsmery
Novice
Posts: 5
Liked: never
Joined: Sep 24, 2019 3:56 pm
Full Name: Wayne Mery
Contact:

Re: Backup Copy jobs (when first time pushed or full active initiated) stop working after some time

Post by wsmery »

I have seen something similar with regular backup jobs (but not slow connections).
tthomas1@ebsco.com
Expert
Posts: 115
Liked: 10 times
Joined: Nov 12, 2018 8:24 pm
Full Name: Tim Thomas
Contact:

Re: Backup Copy jobs (when first time pushed or full active initiated) stop working after some time

Post by tthomas1@ebsco.com »

I have had similar issues too. In one case, a network device incorrectly tagged the traffic as malicious. We had to get it untagged and then it ran normally.

In another case, totally different, it went on for quite awhile with no resolution. I think if i recall we had to completely recreate the backup copy job from scratch and we never figured out the cause.
Post Reply

Who is online

Users browsing this forum: Semrush [Bot] and 50 guests