-
- Enthusiast
- Posts: 76
- Liked: 22 times
- Joined: Aug 27, 2013 3:44 pm
- Full Name: Jason Tupeck
- Contact:
Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash
I have a customer using an S3260 (Windows Server 2016) as a target repo for Backup Jobs and then another S3260 offsite in an annex building connected via 10G fiber as a target for Backup Copy Jobs. Each S3260 has been carved into two volumes and added to a SOBR for each respective site. Everything has been working well for quite some time, but the engineer showed me something I wasn’t sure how to explain, which left me a little perplexed and concerned. Backup jobs work quite well, with CPU consumption on the production site S3260 being relatively balanced and throughput reaching 1-2GBps at times, but the offsite repo’s CPU does not fare well at all. CPU0 spikes to 100% usage and causes the machine to become unresponsive, if he does not throttle the bandwidth to the repo. All other CPU cores are relatively unused.
Essentially, through trial and error he has found that he needs to throttle the network traffic to the offsite repository to ~225 MB/s in order to keep the target repository server from going offline. It appears that, even though it has multiple cores, CPU0 gets hammered during Backup Copy Jobs and eventually will tip the system over. It is my understanding that Backup Copy Jobs are now multithreaded and should be pushing multiple streams to the target repo, which we would anticipate could/would be load balanced for CPU consumption on the target end. The job window definitely shows ‘throttling’ as the bottleneck and 225MBps is only ~1.7 Gbps by my calculations, which means they are leaving a heap of bandwidth on the table when it comes to getting their backups offsite and would like to eek as much performance out of this setup as possible.
With it being a CPU core issue when network traffic is high, I thought maybe it would have something to do with Receive Side Scaling on the target device, but according to documentation that setting is on by default in a standalone S3260 setup, such as this one. We are going to confirm this tomorrow when working with them again, but I was hoping someone on the forums might have seen/heard of something similar and knew of a potential fix, or if this is expected behavior with a backup copy job target. Perhaps it’s even just a Windows issue where it’s not using additional cores for additional network streams? Hoping someone can help us to determine how to balance the incoming network traffic handling across more CPU and get these guys working with a bit more performance.
Thanks in advance for anything you can suggest to help!
Essentially, through trial and error he has found that he needs to throttle the network traffic to the offsite repository to ~225 MB/s in order to keep the target repository server from going offline. It appears that, even though it has multiple cores, CPU0 gets hammered during Backup Copy Jobs and eventually will tip the system over. It is my understanding that Backup Copy Jobs are now multithreaded and should be pushing multiple streams to the target repo, which we would anticipate could/would be load balanced for CPU consumption on the target end. The job window definitely shows ‘throttling’ as the bottleneck and 225MBps is only ~1.7 Gbps by my calculations, which means they are leaving a heap of bandwidth on the table when it comes to getting their backups offsite and would like to eek as much performance out of this setup as possible.
With it being a CPU core issue when network traffic is high, I thought maybe it would have something to do with Receive Side Scaling on the target device, but according to documentation that setting is on by default in a standalone S3260 setup, such as this one. We are going to confirm this tomorrow when working with them again, but I was hoping someone on the forums might have seen/heard of something similar and knew of a potential fix, or if this is expected behavior with a backup copy job target. Perhaps it’s even just a Windows issue where it’s not using additional cores for additional network streams? Hoping someone can help us to determine how to balance the incoming network traffic handling across more CPU and get these guys working with a bit more performance.
Thanks in advance for anything you can suggest to help!
-
- Product Manager
- Posts: 14844
- Liked: 3086 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash
Hello,
as you mention RSS... I remember from tests results some years ago, that Receive Side Scaling should be enabled on Cisco S3260 systems.
10 Gbit/s should be no problem for a fully loaded S3260.
Best regards,
Hannes
as you mention RSS... I remember from tests results some years ago, that Receive Side Scaling should be enabled on Cisco S3260 systems.
10 Gbit/s should be no problem for a fully loaded S3260.
Best regards,
Hannes
-
- Enthusiast
- Posts: 76
- Liked: 22 times
- Joined: Aug 27, 2013 3:44 pm
- Full Name: Jason Tupeck
- Contact:
Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash
Thank you, Hannes. We will be jumping into the customer's environment in about an hour and will check on the RSS setting then. I'll update again when I know more, but I really hope it's that easy of a fix. Thanks for the reply!
-
- Veteran
- Posts: 298
- Liked: 85 times
- Joined: Feb 16, 2017 8:05 pm
- Contact:
Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash
Hello Jason.
I can attest to Hannes's comment 10Gbs is not a problem for a fully loaded 3260 - I have several of these in prod, they don't break a sweat CPU-wise. I'm throwing hundreds of GBs at them and they gleefully ask for more.
In addition to RSS being abled, as Hannes mentioned, check Cisco's website to see if you can tune other BIOS parameters. If you're able, check for the number of dropped packets on the affected repo.
Also, do you have parallel processing enabled?
I can attest to Hannes's comment 10Gbs is not a problem for a fully loaded 3260 - I have several of these in prod, they don't break a sweat CPU-wise. I'm throwing hundreds of GBs at them and they gleefully ask for more.
In addition to RSS being abled, as Hannes mentioned, check Cisco's website to see if you can tune other BIOS parameters. If you're able, check for the number of dropped packets on the affected repo.
Also, do you have parallel processing enabled?
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash
Starting from v10, it is impossible to disable parallel processing since the corresponding check box is gone, while it is automatically enabled for all jobs when upgrading to v10.
-
- Enthusiast
- Posts: 76
- Liked: 22 times
- Joined: Aug 27, 2013 3:44 pm
- Full Name: Jason Tupeck
- Contact:
Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash
@Gostev, thank you! That's good info to know. They are still on 9.5U4 right now and we looked at the setting today...it was unchecked! @nitramd - I swear it was checked the other day when I looked, but sure enough...it wasn't. We reenabled it this morning and are ratcheting the streams number up slowly to default over the next couple days. Also verified that the RSS settings in UCS were configured correctly, so I anticipate we should be seeing better balancing tonight when the BCJs refresh. Fingers crossed.
-
- Enthusiast
- Posts: 76
- Liked: 22 times
- Joined: Aug 27, 2013 3:44 pm
- Full Name: Jason Tupeck
- Contact:
Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash
Just as an aside...are you running Windows on the S3260, by any chance? Or Linux? We are looking at the potential to need to adjust the NIC:CPU affinity, if the settings changes we made today don't have any effect. Colleague was wondering if you were on Windows as well and if you had to tweak these settings at all.
-
- Veteran
- Posts: 298
- Liked: 85 times
- Joined: Feb 16, 2017 8:05 pm
- Contact:
Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash
Jason, yes I'm running Windows on one of my S3260s while the others are running Linux. I also have a couple of older S3160s that also run Windows and they are due to be retired at some point.
For the Windows based repos, no they did not need tweaking, AFAIK. I tweaked the Linux repos to reduce dropped packets.
Edited for clarification.
For the Windows based repos, no they did not need tweaking, AFAIK. I tweaked the Linux repos to reduce dropped packets.
Edited for clarification.
-
- Enthusiast
- Posts: 76
- Liked: 22 times
- Joined: Aug 27, 2013 3:44 pm
- Full Name: Jason Tupeck
- Contact:
Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash
Awesome. Thank you!
-
- Enthusiast
- Posts: 76
- Liked: 22 times
- Joined: Aug 27, 2013 3:44 pm
- Full Name: Jason Tupeck
- Contact:
Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash
Gostev - are you talking about the 'use multiple streams' checkbox in the Global Network Traffic rules window? I just looked at v10 and v11 beta, and that checkbox is still there and I didn't see anything in the 9.5 Backup Copy Job settings for a 'per job' change, so now I am wondering what parallel processing setting you are referring to. Please educate me senpai!
Edit: I think I may have found the answer. Is it possible you're referring to the parallel processing setting that was removed in 9.5U4? I found a couple other forums posts about it when it was removed:
veeam-backup-replication-f2/parallel-pr ... 56766.html
veeam-backup-replication-f2/9-5-update- ... 56962.html
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Backup Copy Job Processing - CPU Usage on Target Repo Causes Crash
You are right, it looks like the change was done one version before v10.
Who is online
Users browsing this forum: Egor Yakovlev and 79 guests