New Backup hardware solution

Post by **enoch** » May 09, 2018 1:40 pm this post

Hi everyone,

Hope to hear some suggestion / reply on new solution we need for our Veeam Backup infrastructure.

Currently we have an "older" DELL PowerEdge 730XD with about 100 TB storage on NTFS formatted drive which have Backup Copy to two other locations on DELL hardware also. One location is with "up-to-date" Windows 2016 with REFS formatted drive and the other location is with NTFS formatted drive.

So currently we only have one Backup Copy that we have more then 30 days rentention (the location with REFS).

We're going to replace our main Veeam Backup server with new hardware and currently looking on a HP Apollo 4200 with 22x12 TB SAS 7.2K drives in RAID-60, 2x480 GB SSD in RAID-1 for OS and 2x1.92 TB SSD for SmartCache. We would like to create one big REFS volume.

For Backup Copy we will use our current location that have drive formatted with REFS and on 3. location we currenty looks for buying a LTO8 Tape Drive robot connected with SAS to get backup on good old plain tapes.

When Veeam v10. is launch we could look at removing the location with Backup Copy to REFS and send those data to Cloud (Amazon, Azure, etc.)

We want min. 2 or all locations to have same rentention policy so if one "system" fails we have an "backup".

Let me hear what people think about our plan and if anyone have another solution we should look at

Looking forward to hear from you all..

Post by **PTide** » May 09, 2018 3:18 pm this post

Hi,

Looks solid, however I would be very careful with ReFS combined with RAID since such combination is not supported by MS yet. Please see here, and here. Also you have to check this thread.

Post by **enoch** » May 09, 2018 3:27 pm this post

Hi,

Thanks for reply. I'm pretty sure that REFS is supported on RAID controllers - please check your own link: "Bottom line: ReFS is in fact fully supported on ANY storage hardware that is listed on Microsoft HCL. This includes general purpose servers with certified RAID controllers"

I think earlier that there was informations about REFS only was supported on Storage Spaces Direct and not SAN like iSCSI etc - but that also now seems supported as I remember a post from Gostav?

I think REFS for with RAID have been supported for long time and now REFS "seems" stable with lastest Windows 2016 patches.

Post by **PTide** » May 09, 2018 4:03 pm this post

Hmm, right, it seems that it is supported now, at least this MS article says so:

Deploying ReFS on basic disks is best suited for applications that implement their own software resiliency and availability solutions.

Applications that introduce their own resiliency and availability software solutions can leverage integrity-streams, block-cloning, and the ability to scale and support large data sets.
Note

Basic disks include local non-removable direct-attached via BusTypes SATA, SAS, NVME, or RAID.

So now I stand corrected. Sorry for confusion.

Thanks

nunciate · Post by **nunciate** » May 09, 2018 4:26 pm this post

I created a post about our Apollo experience. In case you didn't find that it is linked below.

https://forums.veeam.com/veeam-backup-r ... 45140.html

The recommended setup for the Apollo was to create your RAID logical drives as big as possible so I chose to just split my drive count in half and created 2 RAID-60 logical drives.
Just remember, if you are going to do any file to tape jobs or if you are going to use Windows Deduplication you cannot have drive partitions in windows larger than 64Tb. If you do you will not be able to use deduplication and file to tape fails because windows cannot create a VSS Snapshot of larger volumes. I chose to format my large logical drives into 2 partitions of equal size which brought them down to 40Tb each in windows. Then I just made repositories out of each and placed them all into a new Scale-Out Repository. Everything works great for me but I chose to go with Windows 2012 R2 so no REFS for me.

If you have the option, I highly recommend fiber attaching your Apollo to your SAN. Not sure what you options are there but it is way better. Same for attaching your tape drives.

As for the HP Smart Cache. I had to experiment a bit on setting that up. It is a bit odd. What I ended up with was 6 480Gb SSD Drives. When you setup the cache you create a RAID-5 set out of the drives you have. Then you can associate chunks of that logical drive to act as cache for any other logical drives you have setup. It doesn't give you amounts in Gb or TB. It gives it in GiB and Tib. So I had to play around a bit to give equal amounts of cache to each of my logical disks. My recommendation is to go with more not less cache drives. Put as many as you can in there while still allowing you to hit your total disk size requirements for your repository logical drives. Using 2Tb SSDs is good. I would go with 4 of those if it were me.

Feel free to contact me when you are getting setup or if you run into any odd issues.

Post by **billcouper** » May 11, 2018 8:46 am this post

We have a medium sized Veeam setup using RAID arrays. We combine multiple smaller arrays into larger SOBR's to avoid any single volume being too large. We have 78TB in SSD-Only SOBR, and 320TB in HDD-only SOBR.
On top of the RAID arrays we run REFS volumes so I have first-hand experience using this type of configuration.

Everybody is correct in saying that Microsoft does support REFS on RAID. It's just a file-system after all.
I wanted to clarify something about the REFS support of RAID volumes though, since it hasn't been mentioned.

One of the benefits of using a next-gen file-system like REFS is the self-healing capability. If REFS detects data corruption (like bitrot) it can repair it automatically.
But not if it's on top of a RAID array. For REFS to self-heal it has to be running on top of a Storage Spaces pool, not a RAID array.

Microsoft released a major bugfix for REFS earlier this year. Before that update, we had seen multiple instances of data corruption that destroyed entire backup chains (per-VM chains mitigate the risk, but it was still bad).
REFS identified the data corruption but was unable to repair it automatically because it had no knowledge of the underlying redundancy in the RAID array. The underlying RAID array didn't know anything was wrong because it was just writing what REFS told it to write and doing that perfectly. But REFS was screwing up the data and corrupting files, then not being able to fix it's mistakes.
Thankfully, since that update, we have not seen a single corrupt file. So crossing fingers REFS actually works properly now, almost 6 years after initial release.

edit: I should also mention this. If you are using this backup storage for hosted customer backups, or any type of large enterprise where you might need to charge-back the storage costs to individual departments, beware. If your billing model is based on actual disk usage (customer gets deduplication/compression as bonus) REFS block-cloning makes it damn hard to find out how much space anything is actually using on disk. Look into this if it affects you. We put a large number of hours into trying to get reliable numbers out of tools like blockstat. In the end we decided to change our billing model instead.
Also... there doesn't appear to be a way to copy block-cloned files between REFS volumes intact. The files get rehydrated. In our case we had multiple copies of the backup chains that REFS corrupted (before it was fixed) and couldn't replace the broken chains with the good chains because if you tried to copy a backup chain it exploded and filled the entire volume.

Post by **enoch** » May 11, 2018 9:31 am this post

nunciate wrote:Feel free to contact me when you are getting setup or if you run into any odd issues.

If I use 21x12TB disk with RAID-60 (3x5+2) I should get one large logical drive with 180 TB right?

For the HP Smart Cache, I don't think there is room for more drives so thinking about RAID-1 for the 2x2 TB SSD's.

The Veeam Availability Suite on HPE Apollo Servers guide writes about "Path" for the HP Smart Cache:
On a side note, during our tests we had 2 SSDs configured for Smart Cache as mentioned above, but they were
both configured to use the same single path of the 6Gb controller. It would be best practice to split and
configure the SSDs to use both paths, each SSD using its own path, to achieve even better performance.

What does this mean?

Post by **enoch** » May 11, 2018 9:33 am this post

Is VSS really used when doing Backup to Tape from and Veeam Repository?

Post by **enoch** » May 11, 2018 9:34 am this post

billcouper wrote:We have a medium sized Veeam setup using RAID arrays. We combine multiple smaller arrays into larger SOBR's to avoid any single volume being too large. We have 78TB in SSD-Only SOBR, and 320TB in HDD-only SOBR.
On top of the RAID arrays we run REFS volumes so I have first-hand experience using this type of configuration.

We can use SOBR but won't a large drive give more performance or should I split in 3x60 TB and use SOBR?

Post by **PTide** » May 11, 2018 9:38 am this post

Is VSS really used when doing Backup to Tape from and Veeam Repository?

VSS can be used for file-to-tape jobs.

Thanks

nunciate · Post by **nunciate** » May 11, 2018 12:28 pm this post

Right only File to tapes jobs would use VSS on your repository server. We have some non-veeam backups that are written to a share and some data that is sent via FTP to the drives on that server. We use File to Tape jobs to get that data offsite.

May 14, 2018 12:29 am

enoch wrote:We can use SOBR but won't a large drive give more performance or should I split in 3x60 TB and use SOBR?

You may want to limit individual volumes to <64TB for other reasons mentioned in this thread.
If you have no such restrictions then it's up to you how large the volumes are I guess. The particular storage appliance we use limits our individual volumes to 16TB so I don't have that luxury

Using per-VM chains the smaller extent size we use hasn't been an issue, yet.
When using REFS, block-cloning only works within a single volume, so if an SOBR extent fills up and a backup chain is split across multiple extents it has to rehydrate a complete full backup file onto the new extent, it some cases taking considerable time and unplanned capacity usage.
It may not be the best idea, but we started with lots of small extents instead of fewer large ones. The idea was to spread the per-VM chains across as many extents as seemed practical, then monitor the extent utilization and expand individual extents as required. This has worked out for us with our environment consisting of mostly small VM's <1TB and the largest VM being ~3TB.

R&D Forums

New Backup hardware solution

Re: New Backup hardware solution

Re: New Backup hardware solution

Re: New Backup hardware solution

Re: New Backup hardware solution

Re: New Backup hardware solution

Re: New Backup hardware solution

Re: New Backup hardware solution

Re: New Backup hardware solution

Re: New Backup hardware solution

Re: New Backup hardware solution

Re: New Backup hardware solution

Who is online