Moving from REFS to EMC DD for primary storage

yavor.indzhev · Post by **yavor.indzhev** » Apr 28, 2020 7:47 pm this post

Hello,
I know is pretty hard to receive specific answer without giving details, but I don't have them all.

We plan to move from ScaleOut - multiple 16TB repositories to a DD for backups and copy jobs.
Currently the REFS storage doesn't have impressive speed. Simple copy-paste is around 100-150MB/s. Merge often is partially fast-clone or not fast clone at all. We use 2 repository servers - some of the jobs are split between them, so merge is using network, etc.

We have simple setup - 30 days retention, incremental forever. For the future we plan weekly fulls, but currently we don't have enough space. We expect big space savings implementing DD + the ability to have well reduplicated weekly fulls.

Unfortunately without testing I have litterally no clue how much space I will need or will the backups will be fast enough. My understanding is that DDboost should handle merge at least as good as REFS block clone. But what about the backup speed? What about the deduplication rate?

What will be the best way to assess the situation BEFORE buying the DD?

Thank you

Post by **Gostev** » Apr 28, 2020 9:01 pm this post

Hi, Yavor

If you truly worry about performance, then keep in mind deduplicating storage provides the worst one. We specifically don't recommend deduplicating storage as the primary backup storage due to backup and restore performance issues, so think twice here before making the move. ReFS can provide great performance with the appropriate hardware underneath, but in any case being raw storage without any inline data processing it will certainly be faster than any deduplicating storage that does inline deduplication.

Our recommendation is to use deduplicating storage as a secondary backup repository, where performance is not important.

yavor.indzhev wrote: ↑Apr 28, 2020 7:47 pmWe have simple setup - 30 days retention, incremental forever. For the future we plan weekly fulls, but currently we don't have enough space.

Actually, enabling weekly fulls does not require any extra disk space with ReFS, comparing to the disk consumption with your current settings? This is why I am particularly lost as to what you're trying to achieve by moving to Data Domain.

Thanks!

soncscy · Post by **soncscy** » Apr 28, 2020 10:52 pm this post

Hey Yavor,

"Merge often is partially fast-clone or not fast clone at all. We use 2 repository servers"

This stands out to me. I don't fully get the language here, but as I understand it, Veeam has three Synthetic Full Settings for Block cloning: Full-on Fast Clone (all the files are on the same ReFS Volume), Partial Fast Clone (some files necessary are not on the same volume), and no Fast Clone (the backups/volume aren't eligible).

As I get it, you have the second one, and from what I've seen in client environments, this is only possible with a scale-out repository. By chance, do you use SOBR and see placement policy warnings in your job report? This would explain everything, and you can save yourself a ton of $$$ by just figuring out why the violation happened. I'm fine with dedup appliances, but I think Anton has it right -- except for the highest grade model, they're only good for a secondary target. If you are not needing the space/concurrent file locks, such a storage is wasted money for you -- you will likely not grow into it and Dell/HPE will just laugh their way to the bank with your money

I'd check out why ReFS is not meeting your needs before I dropped money on a dedup appliance. Their literature is attractive, but deceptive.

Post by **AlexandreD** » Apr 29, 2020 9:26 am this post

Hello

I am in the same situation. I often read that data domain arrays are slow to restore. But what model are we talking about? the new powerprotectDD (from dd6900 and more) are equipped with SSD / NVMe if I'm not mistaken. I guess it should improve the write / read operations compared to the old models. All Dell backup software works only with datadomains (Avamar, Powerprotect data manager). Would all restorations be doomed to be slow or do they work differently?

We have around 1300VM and 1.2PB of backup in forever forward incremental.

yavor.indzhev · Post by **yavor.indzhev** » Apr 29, 2020 5:21 pm this post

Hello Gostev,
ReFS WILL take extra space if the new full is on another LUN, because fast cloning is not working between LUNs, am I right? Our LUNs are on average 20% free, but some of them are so full, that the next backup goes to a different LUN. I already spoke with veeam support engineer long time ago, and he confirmed that in this situation we will start loosing space fast when a particular LUN is full, and veeam start filling a different one. As I stated we use Scale-out - imagine 25 LUNs 16TB each. If we have any bigger, our storage team will not be able to migrate them on storage level and they want to keep this possibility available.

soncscy, that's why for some of our backups fast-clone is partial. Yes I see plenty of SOBR policy placement warnings

.

We are thinking also (as alternative) for combination - small retention backups pointed to REFS, and 30 days retention copy jobs on a DD (or other reduplicated storage)

Post by **Gostev** » Apr 30, 2020 5:42 pm this post

yavor.indzhev wrote: ↑Apr 29, 2020 5:21 pmWe are thinking also (as alternative) for combination - small retention backups pointed to REFS, and 30 days retention copy jobs on a DD (or other reduplicated storage)

Well, good thinking - because this is precisely our recommended reference architecture

as only this approach delivers:
1. Fastest possible backups.
2. Fastest possible restores from recent backups.
3. Meets the 3-2-1 rule due to having two copies of recent backups on a different media.

Post by **davidrnexon** » May 04, 2020 5:12 am this post

Hi Yavor, on your SOBR I'm guessing you have picked the 'Performance' option for the placement policy. In this case, you are right, you will lose ReFS savings for the whole chain (full and incremental. You will have ReFS savings on full and then have ReFS savings on incrementals). If you want the full benefit of ReFS (combing full and incremental savings), then you need to use the 'Data locality' option for the Placement Policy so the full and incrementals are located on the same extent.
The other note to make here is that if the extent with the full backups on it is not available you will not be able to restore from any incrementals.
One last comment, are you using the capacity tier option to offload older backups to S3, this can also help in your scenario.

SE-1 · Post by **SE-1** » May 04, 2020 6:43 am this post

When using older models like DD2200 you are right, restore speed is sluggish but this is due to the fact that there are not a lot of spindles and no SSD caching.

When using a DD6800 with 120 drives, the story is different.
You also must understand that power comes from parallelism

Some examples of restore time:
Restoring a VM with single vmdk of 18GB takes me 5 min, 2.30 min for processing and 2.30 min for the actual write operation.
The restore operation processes at 123MB/s.
Restoring a VM with 2 drives gives me a restore operation of 111MB/s for drive 1 & 99MB/s for drive 2
The total restore was 100GB of data (340GB vm in thin..), it took in total 14 minutes (2 min processing & 12 min actual restore)
(Restores are done over NBDSSL* to a full flash vsan)

We are currently using DD6800 in combination with veeam & DDBOOST
We have an environment of +- 100TB source & +- 1000 vm's (+- 2000 VMDK)
The power of global dedup of 18X is simply amazing, storing 3 PiB of logical backup data on 179 GiB physical storage

May 04, 2020 7:50 am

hi,

we've tested restore performance for a full recovery scenario with a DD6800 and around 50 disks.
Since DDboost is not beneficial for restores (no source dedupe), we have mapped the backups by CIFS.
Single copy was around 200-300MB/s. The more copy restore was initiated the more performance we got.
With about 5 running copy access (read ofc) we got about 600MB/s which is VERY decent for a DD6800
with 50 disks. I expect bigger/newer DDs to be even faster.

But I wouldn't use a DD for "every day" restores and as the only/primary backup source.
Hope this helps
Regards

ITP-Stan · Post by **ITP-Stan** » May 04, 2020 8:19 am this post

SE-1 wrote: ↑May 04, 2020 6:43 am The power of global dedup of 18X is simply amazing, storing 3 PiB of logical backup data on 179 GiB physical storage

I'm guessing you mean 179 TiB, or that's some unbelievable magic.
I guess it all comes down to price/performance.

Zweistein · May 04, 2020 9:00 am

Hello yavor,

we have a DD for about 1 year as primary storage. I can only advise against using the DD as primary storage. The device needs a period of time during which no data is read or written to opt for deduplication. If it doesn't get it, it will slow down.
For weekly, monthly, quarterly and yearly backups this is a storage option.
If you have to use backup-copy jobs for these retention backups, copying from one directory to another on the DD, the DD will be slow despite DDBOST.
Keep writing your daily backups on the REFS-HS and use a deduplication device (DD) for all other longer-term storage histories. You can then connect them directly to your REFS server.
Better change your REFS servers to Windows 2019, this will bring some performance.
The DD with DDBOST cannot:
-work as 1goal for Linux Pysiklavian servers.
-Has a limitation at Incremental for ever.
-Should not be used as source for Backups to Tape (according to our documentation).

ers_kentwick · May 04, 2020 1:08 pm

We are currently using a DD2500 (44.4 TB usable disk drives for ~460 TB pre-compression. The compression factor bounces around between 10 and 16x.

Backup throughput varies depending on what else is sending data to the DD (and how much). Examples from this past weekend are:
Friday night, regular VM = 227 MB/s
Friday night, file servers, varies with type of data = lots of large files = 126 MB/s, millions of individual files = 56 - 58 MB/s
Saturday day time, regular VM = 333 MB/s
Saturday day time, SQL Server App Aware = 163 MB/s

Where the DD is slow for us is cloning / copying the backups to physical tape. Because the DD has to re-hydrate what it just got done writing we're getting betwwen 40 - 70 MB/s DD to LTO5 tape. The re-hydrate is where the DD has to slow down.

We have our Exchange backups being written to a Linux filesystem (one week retention), copied to the DD (56 restore points) and copied to tape from the Linux host. Backup throughput, using Veeam compression comes to over 600 MB/s, Backup copy from Linux to DD is at 170 MB/s, Tape copy from Linux to LTO5 is at 129 MB/s.

So far we rarely have to restore anything except individual small files where someone deleted something they didn't intend to. We have done some experimental restores of entire VM's and, yes, the restore speed is slower since the DD has to expand the data before sending it to Veeam.

End result: your experience will vary depending on exactly what you do with the DD. Lot's for writes with hardly any reads = reasonably good write speed and somewhat reasonable read speed. 50/50 mix of write/read = elapsed times for write will still be adequate but read will be slower.

As another commenter stated - the DD hardware can make a difference. Higher performance DD hardware = much better throughput. Base model DD = adequate write, so-so read.

yavor.indzhev · May 05, 2020 10:52 am

Hello,
Thank you very much for all this input. It is very useful for me.
Speed is essential for us, because we have a time window that we must fit and possible expansion as backup scope.
@davidrnexon, no is not on performance, but when a LUN is full veeam's only option is to write to another LUN

.

I believe we will implement "small" single LUN for primary storage and 1 week retention backups, and use some deduplicated storage (still not sure, DD, IBM, or just Windows) as an actual main backup location.

SE-1 · Post by **SE-1** » May 25, 2020 12:02 pm this post

Coming back on this: There are several improvements on DD

Since DDOS 6.2 there is a new feature called MSR (Multi Streams Restore) which will increase restore time.
The DD FS internally will spawn several internal threads for the file to be read in parallel at the same time, which would amount to a similar behavior as if the backup application would have started reading in parallel for increased read speeds.

Since latest generation of DD (6900-9400-9900 models) there are onboard Intel Quick Assist cards that do offloading of compression & un-compression, which frees up CPU cycles for de-dup process.
This will be beneficially for restore speed also.
The default compression type will change from LZ to GZfast, and will result up to 30% more efficiency regarding capacity.

R&D Forums

Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Re: Moving from REFS to EMC DD for primary storage

Who is online