Please, forgive my delay.
How much memory?
V11 is simply awesome with RAM utilization. If you are running V10, open Windows Resource Monitor on the Memory tab to monitor the RAM utilization. When your server is working at its maximum speed, you can see that the Orange ("Modified") portion of the bar often takes half of the available RAM. On V11 that orange segment is practically invisible.
During my tests, running up to 45 concurrent backup streams in "per-VM file" mode, I have seen the memory utilization normally below 45GB. Only in rare situations I have seen the utilization going above 100GB. For this reason, I precautionarily suggest to install a little more than 100GB. The HPE Apollo 4510 Gen10 gives the best performance when there are exactly 12 DIMMs (6 per CPU).
My recommended RAM configuration is 12 * 16GB=192GB. Maybe 12*8GB=96GB is enough, but the savings are not worth the risk of slowdowns.
How many Disk-array controllers?
This configuration has
2 "HPE Smart Array P408i-p SR Gen10" controllers. I have tested that a single controller cannot write more than 3GB/s. For this reason, I installed two controllers, and this gave me about 6GB/s of write throughput (remember, data is compressed and deduped, so the backup speed is about 2x higher). With 2 controllers it is possible to assign 30 spindles to each one. Each controller has one RAID60 on 2 RAID6 parity-groups of 14 disks each. Each controller is managing 29 disks: 28 for data plus one Hot Spare.
In total there are 58 x "HPE 16TB SAS 12G 7.2K LFF" for a net usable capacity of 768TB
Why RAID60 instead of 2 RAID6 per controller, or simply a larger RAID6?
A RAID60 is a HW-managed stripe of multiple RAID6 groups.
It is as fast as the sum of its subcomponent and offers an effective load balancing across more disks. With large 16TB spindles, when a disk fails, it takes several hours to rebuild it. A RAID6 survives to 2 disk failures, so if a second disk fails during the rebuild period, there is no data loss. It is a common practice to not create RAID6 groups larger then 14-16 disks to reduce the risk of multiple concurrent disk failures, and also to reduce the rebuild impact to the overall performance.
This configuration also includes 2 SSDs in mirror that are installed on the CPU blade and are connected to a third P408i-a controller. The 2 SSDs are intended for the OS and for the Veeam vPower NFS cache.
What is the best RAID strip size?
This controller gives the option to set the strip size, and this value has an impact on the overall performance. I would like to say that there is a math rule to find the best value, but this is influenced by so many variables that the best way to find the fastest strip size is... testing all the possible settings. Yes, it takes time, but you have someone who did it for you
The fastest strip size is 256K, but 128K is close to it.
What is the best configuration for the Controller battery-protected cache
Each controller has 4GB of cache including a battery for writing the content to flash in case of power loss. This cache is actively used to optimize physical write operations to disks and is a key element for performance. In my tests, I assigned
95% of the cache to write and 5% to read.
How many File Systems?
Here we have 2 options
- Option 1) 2 file systems, one per volume, grouped by Veeam Scale Out Backup Repositories (SOBR). Each FS is formatted as ReFS with 64KB pages. This option is usually preferrable because it is 15% faster. This is my preferred configuration.
- Option 2) 1 file systems. The two volumes are groped in a single Windows striped volume. The resulting single volume is formatted as ReFS with 64KB pages This option is a little bit easier to manage, but it is slower and the striping layer is an additional potential point of failure.
What is the best backup block size for performance?
This is controlled by "Storage optimization" (Veeam backup Job --> Storage section --> Advanced setting --> Storage tab --> “Storage optimization” field).
“Local target (large blocks)” is about 8% faster than standard “Local target”.
With “local target” blocks, the Incremental backup size usually requires a little less capacity. There isn’t a clear winner on this setting, and both options are usable. The best one depends on your source data and if "local target" produces a significantly smaller incremental.
My personal preference is for “Local target (large blocks)”.
Use per-VM backup files: yes or no?
Modern systems need multiple write streams to run fast. On HPE Apollo 4510 Gen10 each VBR write stream runs at 1GB/s, and this generates a backup speed of about 2GB/s when the compression effect is 2:1. It is necessary to have at least 7-15 concurrent stream to run backups at 10GB/s.
- If you do not use the "per VM-backup files" then make sure to have your workload distributed in multiple jobs and have about 10 jobs running concurrently.
- If you use the "per VM-backup files" then everything is easier because each VM-backup generates its own write streams and we just need at least 10 VMs for running at the maximum speed.
Is there a maximum number of concurrent streams before the throughput starts slowing down? I do not know the answer, I can say that I tested a job with 45 VMs and it run for 10 mins at an average speed of 10.3GB/s with a 2.1x reduction (and then it slowed down because there were no other VM to process).
Are there smaller storage-optimized servers and smaller/scalable configuration options?
The 4U HPE Apollo 4510 Gen10 has a 2U brother,
the Apollo 4200 Gen10. This server has 2 front drawers with 12 LFF disks, plus a rear cage for other 4LFF disks. In total there are 28 LFF slots.
On the Veeam V11 optimized configuration, the Apollo 4200 provides up to 320TB net usable.
There are multiple configuration options, based on smaller disks (the most common are 8,10,12,14,16TB), or with the internal disk slots only 50% populated, and ready for a future upgrade.
In the next few days, I'll complete my tests on the Apollo 4200 Gen 10 with V11, and I'll post an update on the performance.
A personal note: a 30% increase in performance would have been defined exceptional for a solution that Gartner already ranks first for ability to execute, but here we are facing a doubling of performance between V10 and V11 and I don't know how it can be defined.
P.S. Thank you for your interest on my lab results.