-
- Enthusiast
- Posts: 85
- Liked: 8 times
- Joined: Jun 11, 2012 3:17 pm
- Contact:
Sizing Veeam for large environments
I did a little noodling on the boards, and couldn't find a good answer to this question, so I figured I'd throw this out there.
What are the best practices for sizing a Veeam server? System in question :
Virtual Machine - Windows 2008 R2 Standard
4 vCPU
32GB RAM
Locally attached R60 storage on LSI 9280 RAID controller (48 1TB drives, 10+2 RAID60 stripe configuration)
Remote SQL 2008 Cluster
4 Proxies, 1vCPU, 4GB RAM, Hot-Add backup mode.
Reverse Incremental backup
Currently backing up ~130 VM's, 17.76TB utilized on storage.
The storage is slow, and is the current bottleneck, but that is being solved with a second deployment of Veeam, and using 48 drives in RAID10, rather than RAID60.
Question is - how large can I theoretically grow 1 Veeam master server, and what are the recommendations around Storage Repositories? Right now there are two storage repositories that are configured in the master server that reside on the local RAID array. Memory utilization during the backup window is high, along with CPU utilization (95% spike at beginning of backup window, and flattening to 60% until the end of the backup window).
I'm worried that continuing to add backup jobs will continue to add load to the primary Veeam server. I've considered simply adding a second physical server just as a windows CIFS share, and adding it as a second repository, but I worry that adding more jobs will continue to compound the CPU utilization on the primary Veeam server. Is it typical to see that type of CPU utilization on the primary Veeam server, even when using proxies?
How are people that are backing up a large amount of VM's handling these questions, and how are you sizing your systems?
What are the best practices for sizing a Veeam server? System in question :
Virtual Machine - Windows 2008 R2 Standard
4 vCPU
32GB RAM
Locally attached R60 storage on LSI 9280 RAID controller (48 1TB drives, 10+2 RAID60 stripe configuration)
Remote SQL 2008 Cluster
4 Proxies, 1vCPU, 4GB RAM, Hot-Add backup mode.
Reverse Incremental backup
Currently backing up ~130 VM's, 17.76TB utilized on storage.
The storage is slow, and is the current bottleneck, but that is being solved with a second deployment of Veeam, and using 48 drives in RAID10, rather than RAID60.
Question is - how large can I theoretically grow 1 Veeam master server, and what are the recommendations around Storage Repositories? Right now there are two storage repositories that are configured in the master server that reside on the local RAID array. Memory utilization during the backup window is high, along with CPU utilization (95% spike at beginning of backup window, and flattening to 60% until the end of the backup window).
I'm worried that continuing to add backup jobs will continue to add load to the primary Veeam server. I've considered simply adding a second physical server just as a windows CIFS share, and adding it as a second repository, but I worry that adding more jobs will continue to compound the CPU utilization on the primary Veeam server. Is it typical to see that type of CPU utilization on the primary Veeam server, even when using proxies?
How are people that are backing up a large amount of VM's handling these questions, and how are you sizing your systems?
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Sizing Veeam for large environments
So you are using your primary Veeam server as the repository? For scaling I would strongly recommend the Veeam "master" server being dedicated and have a separate repository server. The repository will certainly always have high memory utilization during backups as the number of jobs increase, this is due to the fact that each job will start a VeeamAgent.exe process to receive data from the proxy. This process will use enough memory to hold the dedupe hashes for the entire job and can grow quite large if you backup jobs are large. Also, if your target storage is slow, Windows will use a lot of memory for the write cache so memory usage will grow pretty much to maximum if the target device cannot keep up with flush requests.
The general sizing recommendation is 2 CPU cores + 2GB RAM per concurrent job on proxies and repositories and about 512MB of RAM per concurrent job on the Veeam manager server. If a server is running multiple roles, you'll need to add the numbers together. Unfortunately, Windows has no real way to reserve memory for specific processes so if you are running multiple roles together (for example, repository on management server) the memory and CPU usage of the repository services can "starve" the management processes and cause random and unexpected job failures. This is why I prefer separating out the roles.
I notice you only have 1vCPU per proxy, it's generally recommended to have 2 vCPUs per concurrent job if using compression and dedupe. This may not be an issue in you case if the target is the bottleneck, but faster storage would likely move the bottleneck to the proxy CPU.
Having a storage system with battery backed write caching is important for reverse incremental storage, and make sure that the Windows caching settings are configured correctly on the device (under Policies on the volume settings you can disable Windows write cache buffer flushing if you have BBWC storage).
A dedicated Veeam management server generally has no issues running dozens of jobs. I know many clients that run 50 or more jobs and some with even more than that (there is actually a hard limit of 100 concurrent jobs with 6.5) and these clients backup 1000's of VMs with a single Veeam management server. Many of these are virtual machines with only 4 vCPUs and a nice chunk of memory. Some clients get by with significantly less that the 512MB per job recommendation from above for the dedicated management server, but I recommend because it should work pretty much with any environment. You can monitor the size of your Veeam.Backup.Manager.exe services while the backups are running and get some idea of the amount of memory needed per job. It may be significantly less in your environment, based on it's total size.
So, in your case, I suspect you're scaling issues are related more to the repository memory requirements rather than the mangement server itself, but even there, when using reverse incremental, you will likely exceed the performance capacity of the storage long before you hit the memory limits. There's not much use to continue to add concurrent jobs to a repository once the disk becomes the bottleneck. You can set the repository concurrent job maximum to limit the total number of jobs that the repository will service at one time. If this limit is hit, other jobs in the queue will simply wait for the repository to become available.
The general sizing recommendation is 2 CPU cores + 2GB RAM per concurrent job on proxies and repositories and about 512MB of RAM per concurrent job on the Veeam manager server. If a server is running multiple roles, you'll need to add the numbers together. Unfortunately, Windows has no real way to reserve memory for specific processes so if you are running multiple roles together (for example, repository on management server) the memory and CPU usage of the repository services can "starve" the management processes and cause random and unexpected job failures. This is why I prefer separating out the roles.
I notice you only have 1vCPU per proxy, it's generally recommended to have 2 vCPUs per concurrent job if using compression and dedupe. This may not be an issue in you case if the target is the bottleneck, but faster storage would likely move the bottleneck to the proxy CPU.
Having a storage system with battery backed write caching is important for reverse incremental storage, and make sure that the Windows caching settings are configured correctly on the device (under Policies on the volume settings you can disable Windows write cache buffer flushing if you have BBWC storage).
A dedicated Veeam management server generally has no issues running dozens of jobs. I know many clients that run 50 or more jobs and some with even more than that (there is actually a hard limit of 100 concurrent jobs with 6.5) and these clients backup 1000's of VMs with a single Veeam management server. Many of these are virtual machines with only 4 vCPUs and a nice chunk of memory. Some clients get by with significantly less that the 512MB per job recommendation from above for the dedicated management server, but I recommend because it should work pretty much with any environment. You can monitor the size of your Veeam.Backup.Manager.exe services while the backups are running and get some idea of the amount of memory needed per job. It may be significantly less in your environment, based on it's total size.
So, in your case, I suspect you're scaling issues are related more to the repository memory requirements rather than the mangement server itself, but even there, when using reverse incremental, you will likely exceed the performance capacity of the storage long before you hit the memory limits. There's not much use to continue to add concurrent jobs to a repository once the disk becomes the bottleneck. You can set the repository concurrent job maximum to limit the total number of jobs that the repository will service at one time. If this limit is hit, other jobs in the queue will simply wait for the repository to become available.
-
- Veeam Vanguard
- Posts: 26
- Liked: 1 time
- Joined: Jan 17, 2013 5:09 pm
- Full Name: Stephen Seagrave
- Contact:
Re: Sizing Veeam for large environments
I backup 250+ servers a night and the key to it is to use physical proxy's with direct attached storage.
Use one veeam install on the VM master server to run all of the jobs and then add the physical servers as proxy's and repository's, leave automatic selection enabled for proxy to load balance between them and disable the default proxy on the VM master server.
All of the CPU load and ram load for the reverse incremental processing move to the physical proxy's. this way you can also have SAN transport mode.
Use one veeam install on the VM master server to run all of the jobs and then add the physical servers as proxy's and repository's, leave automatic selection enabled for proxy to load balance between them and disable the default proxy on the VM master server.
All of the CPU load and ram load for the reverse incremental processing move to the physical proxy's. this way you can also have SAN transport mode.
-
- Enthusiast
- Posts: 85
- Liked: 8 times
- Joined: Jun 11, 2012 3:17 pm
- Contact:
Re: Sizing Veeam for large environments
Thanks for the hint - I always wondered why the master server utilized so much RAM, I never thought about it being due to the repository rather than the actual Veeam management system.
-
- Enthusiast
- Posts: 85
- Liked: 8 times
- Joined: Jun 11, 2012 3:17 pm
- Contact:
Re: Sizing Veeam for large environments
Just out of curiosity - about how large are you sizing your proxy/repository systems? Single Socket, Dual Socket, lots of RAM, little RAM, RAID5/50, RAID10, RAID60? TB per system?
-
- Enthusiast
- Posts: 36
- Liked: never
- Joined: Feb 09, 2010 8:26 pm
- Full Name: Chad
- Contact:
Re: Sizing Veeam for large environments
Our largest Veeam server is backing up ~150 VMs @ over 30TB no problem. 2x quad core, 32 GB RAM, FC attached to Compellent SAN. We have 11 jobs configured plus a couple of once a week jobs, we set it to run 3 concurrent jobs, we ran into some occassional failures when running 4 concurrent jobs using the 2 cores per job formula, but I imagine with all the tasks being run on this one server it was a bit much. No problem meeting our backup windows.
-Chad
-Chad
-
- Enthusiast
- Posts: 85
- Liked: 8 times
- Joined: Jun 11, 2012 3:17 pm
- Contact:
Re: Sizing Veeam for large environments
Thanks for all of the comments - I really appreciate it. This helps me with figuring out random issues that I've seen over the last year, and will definitely influence my architecture decisions in the future.
-
- Veeam Vanguard
- Posts: 26
- Liked: 1 time
- Joined: Jan 17, 2013 5:09 pm
- Full Name: Stephen Seagrave
- Contact:
Re: Sizing Veeam for large environments
One thing to keep in mind is when you see Target as the bottle neck it does not necessarily mean the disk you are writing too. it means the repository system as a whole, and that includes the cpu and ram usage for reverse incremental and deduplication processing.
if you have large VM's these can take a lot of RAM and repository power to process, I did have issues going to disk and I though that my msa with raid 5 was the problem, however i just gave the repository server more ram and cpu and the problem went away. it was just running out of recourses trying to do reverse incremental and dedpe on a few large 1TB+ VM's.
My rule of thumb is large VM's need more ram on the proxy/repository to process. and a large number of VM's means more jobs running concurrently so you need more CPU.
But if you have one Veeam install controlling the physical proxy/repository then you can always just add more physical proxy's and repository's and the auto selection will just load balance across them all.
if you have large VM's these can take a lot of RAM and repository power to process, I did have issues going to disk and I though that my msa with raid 5 was the problem, however i just gave the repository server more ram and cpu and the problem went away. it was just running out of recourses trying to do reverse incremental and dedpe on a few large 1TB+ VM's.
My rule of thumb is large VM's need more ram on the proxy/repository to process. and a large number of VM's means more jobs running concurrently so you need more CPU.
But if you have one Veeam install controlling the physical proxy/repository then you can always just add more physical proxy's and repository's and the auto selection will just load balance across them all.
-
- Enthusiast
- Posts: 62
- Liked: never
- Joined: Nov 03, 2011 2:55 pm
- Full Name: Ivor Dillen
- Contact:
Re: Sizing Veeam for large environments
I was wondering. If I can adjust the write vs read cache. What is the best ratio for the veeam repository?
-
- VeeaMVP
- Posts: 6166
- Liked: 1971 times
- Joined: Jul 26, 2009 3:39 pm
- Full Name: Luca Dell'Oca
- Location: Varese, Italy
- Contact:
Re: Sizing Veeam for large environments
It depends on the chosen backup mode, you can see the I/O profiles of any mode in the paper I recently published:
http://www.veeam.com/wp-veeam-backup-re ... mance.html
Also, remember any restore operation will be 100% read
http://www.veeam.com/wp-veeam-backup-re ... mance.html
Also, remember any restore operation will be 100% read
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
-
- Influencer
- Posts: 12
- Liked: never
- Joined: May 05, 2014 2:49 pm
- Full Name: Denis Ishchishin
Re: Sizing Veeam for large environments
Hey guys. I need to do RAM sizing for a repository heavily loaded with concurrent jobs. Max values from System Requirements do not satisfy me because in my case I will obviously use much less and it gives a huge difference at my scale. Please advise appropriate value for these parameters (only incremental sessions):
- only Backup Copy
- 51 concurrent jobs
- biggest increment ~13 GB (before deduplication and compression)
- retention of 15 points each
I would also appreciate you describe the logic behind your calculation.
- only Backup Copy
- 51 concurrent jobs
- biggest increment ~13 GB (before deduplication and compression)
- retention of 15 points each
I would also appreciate you describe the logic behind your calculation.
-
- Product Manager
- Posts: 6551
- Liked: 765 times
- Joined: May 19, 2015 1:46 pm
- Contact:
Re: Sizing Veeam for large environments
Hi,
That depends on your job block size settings and .vbk size... There's already been a discussion on repo's memory, please follow this link. Calculations there were based on 1 Job, so I believe that multiplication by 51 (number of jobs) would be valid in your case.
Thank you.
That depends on your job block size settings and .vbk size... There's already been a discussion on repo's memory, please follow this link. Calculations there were based on 1 Job, so I believe that multiplication by 51 (number of jobs) would be valid in your case.
Thank you.
Who is online
Users browsing this forum: Bing [Bot], Google [Bot] and 26 guests