Comprehensive data protection for all workloads
Post Reply
jtupeck
Enthusiast
Posts: 76
Liked: 22 times
Joined: Aug 27, 2013 3:44 pm
Full Name: Jason Tupeck
Contact:

Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash

Post by jtupeck »

I have a customer using an S3260 (Windows Server 2016) as a target repo for Backup Jobs and then another S3260 offsite in an annex building connected via 10G fiber as a target for Backup Copy Jobs. Each S3260 has been carved into two volumes and added to a SOBR for each respective site. Everything has been working well for quite some time, but the engineer showed me something I wasn’t sure how to explain, which left me a little perplexed and concerned. Backup jobs work quite well, with CPU consumption on the production site S3260 being relatively balanced and throughput reaching 1-2GBps at times, but the offsite repo’s CPU does not fare well at all. CPU0 spikes to 100% usage and causes the machine to become unresponsive, if he does not throttle the bandwidth to the repo. All other CPU cores are relatively unused.

Essentially, through trial and error he has found that he needs to throttle the network traffic to the offsite repository to ~225 MB/s in order to keep the target repository server from going offline. It appears that, even though it has multiple cores, CPU0 gets hammered during Backup Copy Jobs and eventually will tip the system over. It is my understanding that Backup Copy Jobs are now multithreaded and should be pushing multiple streams to the target repo, which we would anticipate could/would be load balanced for CPU consumption on the target end. The job window definitely shows ‘throttling’ as the bottleneck and 225MBps is only ~1.7 Gbps by my calculations, which means they are leaving a heap of bandwidth on the table when it comes to getting their backups offsite and would like to eek as much performance out of this setup as possible.

With it being a CPU core issue when network traffic is high, I thought maybe it would have something to do with Receive Side Scaling on the target device, but according to documentation that setting is on by default in a standalone S3260 setup, such as this one. We are going to confirm this tomorrow when working with them again, but I was hoping someone on the forums might have seen/heard of something similar and knew of a potential fix, or if this is expected behavior with a backup copy job target. Perhaps it’s even just a Windows issue where it’s not using additional cores for additional network streams? Hoping someone can help us to determine how to balance the incoming network traffic handling across more CPU and get these guys working with a bit more performance.

Thanks in advance for anything you can suggest to help!
HannesK
Product Manager
Posts: 14322
Liked: 2890 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash

Post by HannesK »

Hello,
as you mention RSS... I remember from tests results some years ago, that Receive Side Scaling should be enabled on Cisco S3260 systems.

10 Gbit/s should be no problem for a fully loaded S3260.

Best regards,
Hannes
jtupeck
Enthusiast
Posts: 76
Liked: 22 times
Joined: Aug 27, 2013 3:44 pm
Full Name: Jason Tupeck
Contact:

Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash

Post by jtupeck »

Thank you, Hannes. We will be jumping into the customer's environment in about an hour and will check on the RSS setting then. I'll update again when I know more, but I really hope it's that easy of a fix. Thanks for the reply!
nitramd
Veteran
Posts: 297
Liked: 85 times
Joined: Feb 16, 2017 8:05 pm
Contact:

Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash

Post by nitramd » 2 people like this post

Hello Jason.

I can attest to Hannes's comment 10Gbs is not a problem for a fully loaded 3260 - I have several of these in prod, they don't break a sweat CPU-wise. I'm throwing hundreds of GBs at them and they gleefully ask for more.

In addition to RSS being abled, as Hannes mentioned, check Cisco's website to see if you can tune other BIOS parameters. If you're able, check for the number of dropped packets on the affected repo.

Also, do you have parallel processing enabled?
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash

Post by Gostev » 2 people like this post

Starting from v10, it is impossible to disable parallel processing :) since the corresponding check box is gone, while it is automatically enabled for all jobs when upgrading to v10.
jtupeck
Enthusiast
Posts: 76
Liked: 22 times
Joined: Aug 27, 2013 3:44 pm
Full Name: Jason Tupeck
Contact:

Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash

Post by jtupeck » 1 person likes this post

@Gostev, thank you! That's good info to know. They are still on 9.5U4 right now and we looked at the setting today...it was unchecked! @nitramd - I swear it was checked the other day when I looked, but sure enough...it wasn't. We reenabled it this morning and are ratcheting the streams number up slowly to default over the next couple days. Also verified that the RSS settings in UCS were configured correctly, so I anticipate we should be seeing better balancing tonight when the BCJs refresh. Fingers crossed.
jtupeck
Enthusiast
Posts: 76
Liked: 22 times
Joined: Aug 27, 2013 3:44 pm
Full Name: Jason Tupeck
Contact:

Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash

Post by jtupeck »

nitramd wrote: Jan 12, 2021 3:57 pmI can attest to Hannes's comment 10Gbs is not a problem for a fully loaded 3260 - I have several of these in prod, they don't break a sweat CPU-wise. I'm throwing hundreds of GBs at them and they gleefully ask for more.
Just as an aside...are you running Windows on the S3260, by any chance? Or Linux? We are looking at the potential to need to adjust the NIC:CPU affinity, if the settings changes we made today don't have any effect. Colleague was wondering if you were on Windows as well and if you had to tweak these settings at all.
nitramd
Veteran
Posts: 297
Liked: 85 times
Joined: Feb 16, 2017 8:05 pm
Contact:

Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash

Post by nitramd » 1 person likes this post

Jason, yes I'm running Windows on one of my S3260s while the others are running Linux. I also have a couple of older S3160s that also run Windows and they are due to be retired at some point.

For the Windows based repos, no they did not need tweaking, AFAIK. I tweaked the Linux repos to reduce dropped packets.

Edited for clarification.
jtupeck
Enthusiast
Posts: 76
Liked: 22 times
Joined: Aug 27, 2013 3:44 pm
Full Name: Jason Tupeck
Contact:

Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash

Post by jtupeck »

Awesome. Thank you!
jtupeck
Enthusiast
Posts: 76
Liked: 22 times
Joined: Aug 27, 2013 3:44 pm
Full Name: Jason Tupeck
Contact:

Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash

Post by jtupeck »

Gostev wrote: Jan 12, 2021 4:08 pm Starting from v10, it is impossible to disable parallel processing :) since the corresponding check box is gone, while it is automatically enabled for all jobs when upgrading to v10.
Gostev - are you talking about the 'use multiple streams' checkbox in the Global Network Traffic rules window? I just looked at v10 and v11 beta, and that checkbox is still there and I didn't see anything in the 9.5 Backup Copy Job settings for a 'per job' change, so now I am wondering what parallel processing setting you are referring to. Please educate me senpai! :)

Edit: I think I may have found the answer. Is it possible you're referring to the parallel processing setting that was removed in 9.5U4? I found a couple other forums posts about it when it was removed:

veeam-backup-replication-f2/parallel-pr ... 56766.html
veeam-backup-replication-f2/9-5-update- ... 56962.html
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash

Post by Gostev » 1 person likes this post

You are right, it looks like the change was done one version before v10.
Post Reply

Who is online

Users browsing this forum: Google [Bot] and 143 guests