Comprehensive data protection for all workloads
Post Reply
MCU_Networking
Enthusiast
Posts: 25
Liked: 5 times
Joined: May 02, 2016 10:21 pm
Full Name: Michael Taylor
Contact:

Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by MCU_Networking » Mar 19, 2020 3:48 am

After upgrading to v10 from 9.5 Update 4, my Backup Copy jobs are painfully slow. After the upgrade the jobs were going so slow they were nowhere near finishing.
The interval was set to 1 hour previously. They use to take a couple of hours on average to copy. I am using WAN accelerators, with 8 Copy jobs that use them.
I have 3 Source WAN Accelerators and 2 Target WAN Accelerators. I have an MPLS that these jobs are going over that is 150-200 Mbps. 150 is guaranteed, but it could get up to 200.
I also use the Network Traffic Rules feature to limit traffic.

Reminder, here is the before setup, before v10 upgrade when things were fine:
-Copy Jobs intervals set to 1 day, starting at varying times, usually a few hours after the Backup Jobs
-I use Network Traffic Rules, which were set to 30-40 Mbps off hours and 5-15 Mbps during business hours
-Copy jobs took about 2-3 hours to copy their latest backup point on average
-I used WAN Accelerators, 3 Source WAN Accelerators and 2 Target WAN Accelerators(Sources had 50GB and Targets has 100GB cache each)
-Mixture of Copy Jobs where 5 are for Physical Servers and 3 are for Virtual Servers

I created a case, 04046131, and the technician thought that the cache may be corrupt. So they advised I follow these steps:
-Clear Cache for all WAN Accels
-Populate Cache on all WAN Accels(I specifically populated cache from Repositories that held the backups to be copied)
-Increase Backup Copy Job Interval(No specified how much longer)
-Technician mentioned kicking off an active full for the jobs.


I did not have space for a new active full of every job, so I deleted all "Backup (Copy)" backups to start fresh, I assumed that would cover the "Active Full" situation and free up space.
After all that and kicking off active fulls on 2 jobs, I saw no visible difference. Now I know that the WAN Acceleration will not benefit a whole lot from the Fulls because some of my servers are over or close to 1TB of data, but I do not remember it taking this long on v8 when I purchased Veeam a few years ago.

Here are some things I tried, I will try to list them in order, in order to make things quicker:
-Increase cache on WAN Accels to 100-150GB
-Some I increased after populating cache, fyi. They are definitely using the full GB I set though, actually over it all cases
-Change the Network Traffic Rules to 50Mbps during business hours and 100MB outside of business hours
-Extended Copy Interval to 7 days on all jobs(After recommendation from another Veeam tech)
-Same "other" tech told me to try the new "High Bandiwth" mode. I switched to this between the 2nd and 3rd intervals mentioned below.




Current status, I have 2 of the Copy Jobs that have finished completely and are running ok in the incremental copies. They are both jobs that contain 1 Physical server.
Both are not small servers, one has 750GB and the other has 303GB. The 750GB server has a lot of files, it could be deduped fairly well I think, it took 16 hours at a processing rate of 15MB/s to finish the full.
The server that was 303GB took 10.5 hours at a processing rate of 15MB/s to get the first full. I have a couple of problem child jobs though.

1st problem child job is one that has 2 Physical servers in it. 1st of the servers is about 450GB and the 2nd is 1.5TB. In 3 separate interval tries, 1st being 2 day interval(I think), 2nd being 2 day interval(I think) and on the current being a 7 day interval, the 1st server finished it's backup each night. The 2nd server however, keeps getting interrupted and failing. Keep in mind the 1.5TB server has a lot of images on it, so it is not very dedupe-able or compressable. The first interval try ran at 15MB/s processing rate for a duration of 55.5 hours. The first server(450GB) finished its Full in 9 hours. The rest of the time was the 2nd server and it ony seemed to get to about 900GB or so. 2nd interval took about 33 hours and only got about 500GB or so. The 3rd interval, and the one it is on now, is at about 22 hours in, processing rate of about 34MB/s and has read about 55Gb so far. Processing rate seems slow, but the amount of data it parsed so far makes me think it will slow down over time.


Another problem child is a copy job of Virtuals, where there are 28 virtuals in it and 10 of them have never finished the Full job in similar intervals listed above in problem child 1. Some of the virtuals have completed 3 backup points also. So this job is a hodge podge right now.





So some questions for those who still might be reading the essay above:
-Would it be recommended to restrict the Network Traffic Rules to something super low like 5-15Mbps again, or should I leave them at 50/100Mbps
-Since I didn't see a large difference between "High Bandwidth" mode and non "High Bandwidth" mode, should I take it back off?
-Combo of 1 and 2?
-I used to see moments of "effective speed", or whatever is used to be called in 9.5, where it was hundreds of MB/s in the Throughput graph, where is now highest I have seen has been in the 10's and low 20's.
-Does anyone else have a simlar situation where you have an about 1.5TB server with a 30-100Mbps WAN/MPLS/Ethernet link that can give me an idea of what there "Active/Full Backup" transfer time was?
-Maybe now that the copy interval is 7 days it will somehow complete?
-Anyone have any suggestions, or notice any obvious no-no's for best practice I may have broken that is causing it to be this slow?
-Anyone else seeing a slow down in the times after upgrading to version 10?

HannesK
Veeam Software
Posts: 5139
Liked: 681 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by HannesK » Mar 19, 2020 6:07 am

Hello,
increasing the WAN cache usually does not help. You don't need cache at all at the source (only digest space which you cannot configure). For target, it's 10 GByte per operating system. So usually 20-40 GByte is enough.

750BG in 16 hours is about 109 MBit/s on average. That is an expected value. Totally fine for low bandwidth mode.

High bandwidth mode helps above 100 MBit up to 1 GBit.
where it was hundreds of MB/s in the Throughput graph
that's hard to check for values in the past. I would ignore that graph for now and only look at the transfer times.
Maybe now that the copy interval is 7 days it will somehow complete?
for initial full backup copy job: the long interval only helps to get less warnings. For the initial full, the BCJ can continue transfer, even if the interval expired. I have seen even more than two weeks at customers with 2MBit/s WAN connection with 1 day interval... A long interval is only to "fix" incremental overload situation (incremental runs cannot continue. they start from the latest valid point)


I only have my "standard guess": do you have enough RAM for the WAN accelerators? If you combine roles on one system, then you need to have resources for each role (WAN accelerator has at least 8 GByte RAM with 4 vCPUs recommendation).

Best regards,
Hannes

PetrM
Veeam Software
Posts: 362
Liked: 49 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by PetrM » Mar 19, 2020 3:41 pm

Hi Michael,

Which "bottleneck" do you see in job statistics? And was it the same before upgrade?
This information could be critically important when we're trying to find the root cause of any performance issue because it helps us to quickly detect the slowest processing stage.

I think one more option for you is to request an escalation of the support case, deep technical analysis would be required if Hannes's idea about RAM did not help.

Thanks!

MCU_Networking
Enthusiast
Posts: 25
Liked: 5 times
Joined: May 02, 2016 10:21 pm
Full Name: Michael Taylor
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by MCU_Networking » Mar 19, 2020 5:31 pm

The large copy job with the 1.5B server finally finished its full! Supposedly, we in the next couple of days how quick the incrementals go.
It is usually Target WAN Accel. On one job right now however it says throttling and all percentages are lower than 10%. However I don't believe the throttling is causing an issue, the Read is in the low hundreds of KB/s and the transfer speed is sitting at 0KB/s quite often, processing rate shows 2MB/s. The Bandwidth sitting is sitting at 50Mbps right now.



The source WAN Accelerators have the following:
Veeam Server(Physical): 32GB mem, 8/16 core processor, Raid 10 of 15K SAS drives
Extra Accel 1(Virtual): 6GB memory, 1/2 core processor, on SSD on all SSD array
Extra Accel 2(Virtual): 6GB memory, 1/1 core processor, on SSD on all SSD array

The target WAN Accelerators have the following:
Veeam DR Server(Physical): 32GB mem, 8/16 core processor, Raid 10 of NLSAS drives
Extra Accel 1 DR(Virtual): 6GB memory, 1/1 core processor, on SSD on all SSD array


So sounds like I need to bump up resources on the Extra WAN Accelerators at the very least. Do the physicals seem good enough? the DR physical is only a repository and accelerator, it doesn't handle any jobs. The Extra WAN Accelerators are only WAN Accelerators, nothing else.

MCU_Networking
Enthusiast
Posts: 25
Liked: 5 times
Joined: May 02, 2016 10:21 pm
Full Name: Michael Taylor
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by MCU_Networking » Mar 19, 2020 5:33 pm

I upped the mem on the extra WAN Accels to 12GB. I cannot change the CPU count without shutting down the virtual though, which will disrupt jobs.

HannesK
Veeam Software
Posts: 5139
Liked: 681 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by HannesK » Mar 20, 2020 8:24 am

for the repository it depends on the amount of tasks you configured on the repository. 1 core & 4 GB RAM are recommended for a repository + 8 GB for Windows. So if you go with 4 tasks (16 GB) + OS (8GB) + WAN accelerator (8GB), then you are at 32 GB.

lethallynx
Influencer
Posts: 19
Liked: 6 times
Joined: Aug 17, 2009 3:47 am
Full Name: Justin Kirkby
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by lethallynx » Mar 25, 2020 1:07 pm

Interestingly I saw the same thing happen after I upgraded from 9.5 update 4 to version 10.

All of my backup copy jobs were just running so slow after the Update to version 10.

I have tried tweaking the streams and also enabling High bandwidth mode but it doesn't really seem to have much of an effect.
Last thing I tried was clearing the cache on all accelerators then bumping all wan accelerators on the source side up to 2CPU and 8GB ram but they are barely using any CPU?

We are mainly trying to use them for the bandwidth savings.
For now I have had to disable the wan accelerators and just let it hammer our wan link.

Here is a screenshot which is using freshly cache cleared wan accelerators.
I wish there was a way to see the percentage status on Fingerprints are missing at source, loading them from the target...
Image

HannesK
Veeam Software
Posts: 5139
Liked: 681 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by HannesK » Mar 25, 2020 1:49 pm

Hello,
could you please open a case and upload logs? and please post the case number here for reference.

Thanks,
Hannes

lethallynx
Influencer
Posts: 19
Liked: 6 times
Joined: Aug 17, 2009 3:47 am
Full Name: Justin Kirkby
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by lethallynx » Mar 26, 2020 12:56 am

Cased logged with support: 04082974

MCU_Networking
Enthusiast
Posts: 25
Liked: 5 times
Joined: May 02, 2016 10:21 pm
Full Name: Michael Taylor
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by MCU_Networking » Mar 30, 2020 3:17 am

lethallynx, looks like yours is doing the same thing as mine, just stalling. I say this based on the speed of the reading compared to the duration, they don't match in my opinion.

When you look at your sync log, found on the Veeam server in the C:\program data\Veeam\Backup\Copy_JobName\, do you see a repeated error of the following about mid way through the copy job on:
"Info [RemoteBackupTaskBuilder] Skip tasks full check due to no OIBs with specified objIds were changed."


ANd when you look in the ***.source log file do you see repeated lines of:
"cli| Number of sessions: 7. Interval: 11996 sec." <<< seconds increment"


Just curious if you see similar behavior and if a Veeam tech recognize these errors, assuming they are error. They pile up shortly after the copy jobs mysteriously just stops working behind the scenes and nothing shows on the front end of it.

HannesK
Veeam Software
Posts: 5139
Liked: 681 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by HannesK » Mar 30, 2020 6:32 am

Hello,
please open a case and post the case number here for reference.

I just got feedback from one of "my" customers that got a hotfix (case #04067239) and everything is working fine now.

Best regards,
Hannes

Gostev
SVP, Product Management
Posts: 25868
Liked: 3986 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by Gostev » Mar 30, 2020 10:12 am

MCU_Networking wrote:
Mar 30, 2020 3:17 am
lethallynx, looks like yours is doing the same thing as mine, just stalling.
If the jobs are literally stalling, then our support has a hotfix for this already (this is caused by a race condition bug in v10).

If they are just slower than before upgrade, then this is not currently a known issue, so it needs to be investigated with support more closely. Because in our labs, v10 WAN accelerators are much faster than in the previous versions.

firefox15
Novice
Posts: 9
Liked: never
Joined: Aug 25, 2016 1:02 am
Full Name: Alex Bahret
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by firefox15 » Mar 30, 2020 4:12 pm

I'm definitely seeing the same issue. My jobs have basically stalled out with no progress. I had to switch to direct mode to get anything to transfer. My support case is 04065475.

NickLittleNZ
Lurker
Posts: 2
Liked: 1 time
Joined: Apr 23, 2018 4:37 am
Full Name: Nick Little
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by NickLittleNZ » Mar 30, 2020 8:57 pm

Also seem to be seeing this issue, case# 04092588.

NickLittleNZ
Lurker
Posts: 2
Liked: 1 time
Joined: Apr 23, 2018 4:37 am
Full Name: Nick Little
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by NickLittleNZ » Mar 31, 2020 1:24 am 1 person likes this post

Have applied supplied hotfix from Veeam and initial testing is positive. :D

lethallynx
Influencer
Posts: 19
Liked: 6 times
Joined: Aug 17, 2009 3:47 am
Full Name: Justin Kirkby
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by lethallynx » Mar 31, 2020 12:05 pm

They got me to do an active full on my backup copy jobs....
Currently been waiting for them to complete for the past few days!

Doubt its going to fix anything, hopefully when I go back to support they offer me the hotfix!

obwielnls
Lurker
Posts: 2
Liked: never
Joined: Nov 21, 2018 2:00 am
Full Name: Bill Owens
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by obwielnls » Apr 06, 2020 12:16 am

Is this stalling hotfix included in the CU Patch 1?

HannesK
Veeam Software
Posts: 5139
Liked: 681 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by HannesK » Apr 06, 2020 7:07 am

www.veeam.com/kb3127 see Resolved Issues - no. the hotfix was too short in advance before the CU1. please contact support to get it.

lethallynx
Influencer
Posts: 19
Liked: 6 times
Joined: Aug 17, 2009 3:47 am
Full Name: Justin Kirkby
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by lethallynx » Apr 06, 2020 1:33 pm 1 person likes this post

Hotfix worked a treat for my install!

Speeds are back to normal :)

Any idea if there is any issues applying CU1?
I am guessing I will need to re-apply the hotfix?
Will I need a new updated version of the hotfix that works with CU1?

Mike72677
Novice
Posts: 7
Liked: never
Joined: Nov 09, 2015 7:25 pm
Full Name: Mike T
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by Mike72677 » Apr 06, 2020 2:20 pm

Have the same issues with a Backup Copy Job using a WAN Accelerator. The job would sit on a random or the same VM and never finish. If I ran a full, it would work. I have Case # 04099901. I referred them to this forum post and they provided an updated VeeamWANSvc.exe as the hot fix. Running the job now.

Gostev
SVP, Product Management
Posts: 25868
Liked: 3986 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by Gostev » Apr 06, 2020 5:55 pm

@lethallynx the WAN hotfix is "perpendicular" to CP1. You can install it with or without CP1, since they belong to the same build, but don't overlap as far as the included product files. Thanks!

ferrus
Veeam ProPartner
Posts: 257
Liked: 32 times
Joined: Dec 03, 2015 3:41 pm
Location: UK
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by ferrus » Apr 09, 2020 2:59 pm

Is the hotfix a general one - suitable for any users, or just specific to people with the issue?

The reason I ask, is that we're just about to start using WAN acceleration, but don't have v9.5 baselines to check against.
Is it worth applying this fix anyway, or is it only for certain environments?

Gostev
SVP, Product Management
Posts: 25868
Liked: 3986 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Slow Copy Jobs w/WAN Accel after v10 upgrade

Post by Gostev » Apr 09, 2020 3:30 pm

The bug is caused by a deadlock issue due to a race condition that is specific to the particular virtual disk content pattern, so while the overall scope is relatively small - it's impossible to say which environments will be affected. Race conditions are tricky in general, for example we could never reproduce the issue in our own labs. Having said that, I think it's a good idea to install the hotfix regardless.

We're planning to include this in the Cumulative Patch 2 to be released in the next few weeks (currently waiting for enough "new" bugs to be reported, to justify creating one).

Post Reply

Who is online

Users browsing this forum: No registered users and 36 guests