REFS performance issue "workaround"

Post by **mkretzer** » Jun 26, 2017 10:01 am this post

Hello,

as stated in other threads we still have great REFS performance issues in the following situations:

- When data is deleted from the volume
- From a certain point on when there is much data on the disk that has been block cloned

Because of this we are in the middle of moving back to NTFS. To do this we took a tempral storage with half the capacity of or primary backup storage and created a new REFS on that storage. As before, REFS is extremly fast again. It will get slow after about 4 weeks (we did this before also on that storage).

I was thinking if there would be a way to live with the REFS limitations/bugs: Would if be an option to use a scale out repo with two backend storages in which one extend is always in maitenance mode (without evacuation) and every 3-4 weeks this is switched? From what i understand active fulls will be necessary then but when we use one SOBR per backup job (in different directories on the same two drives) it can be scripted when this switch and the active fulls should occour.

This way fast clone is really fast most of the month and the necesarry backup storage is the same or less than with NTFS.

Markus

Post by **bjervis** » Jun 26, 2017 4:31 pm this post

You're having these issues with 64k block size? I have quite a few large refs volumes and haven't seen any of these issues yet *knock on wood*

Post by **mkretzer** » Jun 26, 2017 5:09 pm this post

Yes, 64 k. 192 TB volume size. Volume filled 70 % when the issue starts happening.

Post by **Gostev** » Jun 26, 2017 9:42 pm this post

Yes, your idea will work fine. Just remember to enable the option to Perform full backup when required extent is offline. Thanks!

Post by **dellock6** » Jun 27, 2017 8:25 pm this post

As we are trying to collect some field numbers, would you mind to share yours, specifically:
- size of the whole volume (192 TB)
- used size
- RAM size
- is the backup forever forward? What's the retention?
We have seen around and can confirm that the performance penalty is more visible during the deletion options, but we have also seen that in many situations, a good amount of memory fixes many issues. If possible, we are trying to define what this amount could be. We don't have algoritms or calculators, but based on people information, we may compile a list of "safe" configurations.
If you feel more comfortable sharing these info privately, send me a private message, we may be interested also in some server configuration detail, like brand, disks, raid, etc...

Post by **mkretzer** » Jun 28, 2017 4:23 am this post

- size of the whole volume: 192 TB
- used size: 140 TB
- RAM size: First 128 GB (with crashes) now 384 GB (since then extremly stable but slow after 3 weeks)
- is the backup forever forward?: No, Forward with weekly synthetics
- What's the retention?: 28 Days

One more thing: I postet a screenshot where you can see how much memory is used while a synthetic is running in another thread. I believe it fluctuated by ~130 GB in the time of a few minutes. So the RAM really seems to be what saved us from crashes.

Delo123 · Post by **Delo123** » Jun 28, 2017 8:47 am this post

For us:
Volume size: 189TB (backed by Adaptec Raid 60 30x8TB HGST SAS)
Used size: 149TB (277TB used when calculating size on disk, so synthetic fulls are around 50%)
Ram size: 384GB
Weekly synthetic fulls, Monthly active fulls
No retention policy (we keep all backups till volume is full and replace entire JBOD)

Not a single issue until now, backup speeds are also still around 500-700MB/s

Post by **mkretzer** » Jun 28, 2017 9:18 pm this post

Our environments sound VERY similar with the difference that we have three times the disks. Do I understand correctly you never delete anything? That is always when problems start for us and never really recover until we start from scratch...

Post by **Gostev** » Jun 28, 2017 10:15 pm this post

Yes, the issue is most certainly connected with deleting the data from ReFS volume.

kubimike · Post by **kubimike** » Jun 29, 2017 5:40 am this post

My problem is deleting large VBKs after a job runs. Everything else works. After deleting a 5TB VBK System sloooowwllllly freezes then won't do anything, doesn't BSOD.
- size of the whole volume 52TB
- used size 38TB
- RAM size 192GB
- is the backup forever forward? No 2 Backups a day one being a synthetic full daily.
- What's the retention? 200
server configuration detail, like brand, disks, raid, etc... HP DL380 , two D3700 connected SAS via P841(Dual domain mode), 50 1.2TB SFF 10k disks in RAID 6+0

Post by **tsightler** » Jun 29, 2017 5:52 pm this post

kubimike wrote:- is the backup forever forward? No 2 Backups a day one being a synthetic full daily.

Synthetic full daily? Can you explain the reasons for this choice? My lab experience says that, the more cloned blocks you have across files, the slower and more likely to hang the filesystem becomes, so I would think this would be a scenario more prone to problems.

kubimike · Post by **kubimike** » Jun 29, 2017 6:54 pm this post

@tsightler, I am doing the same thing with my exchange backups as well. Except those files its creating is vastly smaller (1TB). I have no issues with veeam deleting the older restore points. Wouldn't every new VBK that synthetic full creates be an independent file from the previous VBK ? The reason Im doing it is running active fulls is time consuming. I do that once every quarter. I also don't like having large backup chains open. Every time a synthetic runs im closing that chain of backups, less chance of corruption. Now, being a new veeam user this is what i've collected from reading. I could be totally wrong. Also nowhere in the documentation does it say you can't do a synthetic full everyday either.

Post by **tsightler** » Jun 29, 2017 7:40 pm this post

@kubimike, nothing you say is technically incorrect, however, most of what you are reading is before ReFS and block clone.

With a traditional filesystem (NTFS for example), a synthetic full was an entirely new file so each new chain was totally independent of the other. This meant that starting a new full required enough space for the new full, but it also meant that the new full was completely separate from the old one, using completely different disk blocks, if something were to happen to the old chain, the new chain would not be impacted. But it's unlikely that you would choose to run a synthetic full every day, because if I was keeping 100 restore points, I'd need 100x more space, not to mention all of the time, so typically you would look to strike a balance between used space and chain length, which lead most people to run synthetics weekly at most.

ReFS drastically changes the equation, since we don't have to move blocks, so the time is highly reduced, but it also reduces the benefit, since the new "chain" is still 100% dependent on blocks from the prior chain as it's sharing all of those blocks. This provides effectively no benefit from an integrity perspective, because the VBK you create today is still dependent on blocks from VBK files you created weeks/months ago.

But even worse, if you create a synthetic full every day, by day 100 you have 100 VBKs all sharing various blocks among each other, which is a massive amount of block referenses that ReFS itself has to keep track of in it's metadata. Compared to having weekly synthetics, you have 14x more cloned blocks, with no integrity benefit at all, and all evidence to this point says that ReFS problems are more likely to trigger the more reference blocks you have. For example, in my lab when I'm configuring to reproduce the ReFS hangs/BSODs, I configure my jobs to create a synthetic full every day exactly to create all of these block references so that I hit the issue more quickly.

So yes, while there's nothing in the users guide that says you can't do a synthetic full every day, the user guide is about what the capabilities of Veeam are, and Veeam certainly has the ability to create a synthetic full every day, and there may be cases where that is useful or a good idea, but with our current knowledge of ReFS, and what things trigger problems with it, I'd say it's less than ideal for that use case.

kubimike · Post by **kubimike** » Jun 29, 2017 9:09 pm this post

@tsightler
Thanks for the explanation. One thing that isn't clear is below.

but it also reduces the benefit, since the new "chain" is still 100% dependent on blocks from the prior chain as it's sharing all of those blocks.

That being said, if all the VBKs are related somehow, when veeam deletes the oldest restore point (due to retention) that would make your statement false ?

**edit would you like to do a remote session on my server tomorrow? I can give a tour..

Jun 30, 2017 1:53 am

kubimike wrote:[tsightler]but it also reduces the benefit, since the new "chain" is still 100% dependent on blocks from the prior chain as it's sharing all of those blocks.

That being said, if all the VBKs are related somehow, when veeam deletes the oldest restore point (due to retention) that would make your statement false ?

But that's entirely what block clone is, each VBK is sharing any unchanged blocks from the prior VBK, when you delete the older VBK, that just means the blocks aren't shared anymore. Let's try a super simple example, three VBKs, each containing only 4 blocks, labled A-D:

On Monday the backup runs and creates the first VBK with the first version of each block:
Mon VBK = A1 B1 C1 D1

On Tuesday the backup runs and creates an VIB, but only block A has changed so the VIB consist only of the new block:

Tue VIB = A2

But the job was configured to create a synthetic full so what the system does is create a VBK and "block clones" the most recent version of this block into the VBK so you end up with a VBK like this:

Tues VBK = A2 B1 C1 D1

Note that this new VBK is simply referencing the same blocks as in the Monday VBK and the Tuesday VIB. Once the synthetic full is built, the system deletes the Tuesday VIB file, but that doesn't delete block A2 because the synthetically built VBK is still referencing block A2. Note that both the VIB and the VBK reference the very same block, so deleting the file freed no space from the actual disk, the exact same content remain on disk.

This continues for Wed-Friday, with different blocks changing each day:

Wed VBK = A3 B1 C1 D2
Thu VBK = A4 B1 C2 D2
Fri VBK = A5 B1 C2 D3

So now lets imagine it's time to delete the VBK from Monday because I've hit my retention. The system delete the Monday VBK, which contained blocks A1, B1, C1, D1, so what can actually be deleted? The Tuesday VBK still references blocks B1, C1, and D1, so those blocks cannot be freed, regardless of the fact that the file has been deleted. Only block A1 can actually be freed, because the other backup files still depend on some of those blocks.

In just this super simple example there are 14 block references which have to be tracked and processed by ReFS during delete. On the other hand, imagine if this was just regular incremental:

Mon VBK = A1 B1 C1 D1
Tue VIB = A2
Wed VIB = A3 D2
Thu VIB = A4 C2
Fri VIB = A5 D3

Note that this method has exactly the same blocks on disk, but exactly zero cloned blocks for the same amount of space. When retention hits on Friday the only thing that happens is that block A2 from the Tues VIB is cloned into the VBK, and then the VIB is deleted, much lighter and faster than having a bunch of blocks cloned to create a synthetic full. This is why I believe that we don't see the problem with customers that are running forever forward or reverse incremental modes, at least so far I haven't found one.

So assuming all of that made any sense at all, lets apply some scale to it! Let's assume you have a 5TB backup file. Assuming you are using normal Veeam settings for storage optimization (local block size and optimal compression), the average Veeam block is about 512KB (assuming 2:1 compression, could be slightly smaller if you get better compression). That means a 5TB backup file contains about 10,000,000 blocks! If you have 100 VBK files all made with block clone, and assuming your change rates are similar to observed averages to this point, probably 60% of your blocks are cloned across all 100 VBK files. Imagine the accounting that ReFS has to do to update it's block reference counts for all of those blocks.

On the other hand, if you ran weekly synthetic fulls, you'd use exactly the same amount of space but, at most, you'd have 15 VBKs on disk with cloned blocks, drastically reducing the reference block count, also, since it's only weekly, the amount of blocks cloned between files would be less as well because a lot more blocks change during a week than during a day.

Note that I'm not saying 100% that this would eliminate the issue in your case, obviously we have people even with weekly synthetic fulls that have the problem, but it's certainly providing additional stress to ReFS having to keep up with the extra accounting for so many block cloned files.

kubimike · Post by **kubimike** » Jun 30, 2017 2:06 pm this post

@tom
Thanks for the explanation I just see a problem with that logic and heres why

But that's entirely what block clone is, each VBK is sharing any unchanged blocks from the prior VBK, when you delete the older VBK, that just means the blocks aren't shared anymore.

I create a new backup job set for 99 restore points It creates its first VBK, lets call this file BACKUP1.VBK. Time warp into the future, I now am up to 99 restore points, lets call this file BACKUP99.VBK. Are you saying file BACKUP99.VBK still has references to file BACKUP1.VBK ? If thats true it could never be deleted.

BACKUP1.VBK has blocks A1 B2 C3 D4, for simplicity we can say that BACKUP99.VBK has A2 B3 C4 D4, the only common block between BACKUP1 and BACKUP99 is D4. if Backup1.VBK was deleted due to retention where would BACKUP99.vbk reference block D4 ?

Jun 30, 2017 2:45 pm

kubimike wrote:I create a new backup job set for 99 restore points It creates its first VBK, lets call this file BACKUP1.VBK. Time warp into the future, I now am up to 99 restore points, lets call this file BACKUP99.VBK. Are you saying file BACKUP99.VBK still has references to file BACKUP1.VBK ? If thats true it could never be deleted.

BACKUP1.VBK has blocks A1 B2 C3 D4, for simplicity we can say that BACKUP99.VBK has A2 B3 C4 D4, the only common block between BACKUP1 and BACKUP99 is D4. if Backup1.VBK was deleted due to retention where would BACKUP99.vbk reference block D4 ?

ReFS tracks which specific cloned blocks are referenced down to the cluster size (64K for Veeam current best practice for ReFS). If BACKUP1.vbk and BACKUP99.vbk each have block D4, then block D4 would be referenced by every single backup 1-99. When you delete BACKUP1.vbk, ReFS uses the file metadata to determine what blocks are referenced by that file and only frees (deletes) blocks that have no more references in any other file. If you delete BACKUP1.VBK, only blocks that are totally unique to that file will actually be freed, but ReFS has to update the reference count for every block that is still used by any other backup file.

Perhaps an even simpler example:

BACKUP1 - A1 B1 C1 D1

Now I use a command line tool to block clone that file to a file called BACKUP2, so BACKUP2 is this:

BACKUP2 - A1 B1 C1 D1

In other words both BACKUP1 and BACKUP2 are each referencing the exact same blocks on disk, ReFS simply keeps track that each block is referenced more than once. That's where the space savings for synthetic fulls come from, the files are sharing the same blocks between each other, otherwise ReFS would have no space savings benefits for synthetic fulls over any other filesystem.

But that doesn't keep me from deleting the file BACKUP1. If I delete BACKUP1, it will free up exactly zero disk space, because BACKUP2 is still referencing all of those blocks, so ReFS could not free them. Deleting the file simply removes the file from the directory, and reduces the reference count for each block from 2 to 1 (previously 2 files referenced each block now only 1 file does). If you then also delete BACKUP2, the reference count for those blocks drops to 0, which means no files on the system are referencing them, so ReFS can free that space.

In simple terms, just because I use block clone to share data between backup files, doesn't mean I can't delete any file at any time, but when I do, ReFS has to update the reference count for every single block that is shared with other files. Actually, it's technically at the cluster level, so that 5TB file would require updating potentially 80,000,000 clusters worth of metadata when using 64K clusters.

kubimike · Post by **kubimike** » Jun 30, 2017 2:54 pm this post

YESSSSSSSSSSSS we are in total alignment now. A+ Now how to get all of this to work under the conditions I need

Now that I have my backup copies jiving I can test out that msft refs.sys and play with the 'RefsProcessedDeleteQueueEntryCountThreshold' + 'RefsDisableCachedpins(not sure on this one). Are you using this driver during your testing ?

jazzoberoi · Post by **jazzoberoi** » Jul 02, 2017 11:42 pm this post

tsightler wrote: ReFS drastically changes the equation, since we don't have to move blocks, so the time is highly reduced, but it also reduces the benefit, since the new "chain" is still 100% dependent on blocks from the prior chain as it's sharing all of those blocks. This provides effectively no benefit from an integrity perspective, because the VBK you create today is still dependent on blocks from VBK files you created weeks/months ago.

HI Tsightler,

Is Reverse incremental a better option then ? Since ReFS does not have that bad a hit on merging anymore ?

Post by **ian0x0r** » Jul 03, 2017 8:19 am this post

Not that you would, but I assume manually deleting VBK files via windows explorer, in this example, would also keep track of the block clones?

Ian

tdewin · Post by **tdewin** » Jul 03, 2017 9:28 am this post

once the block clone call is made, the filesystem is in control. So yes, if you would delete a file, it is ReFS its responsibility to track block usage by different files

kubimike · Post by **kubimike** » Jul 03, 2017 1:55 pm this post

here is something I just thought of. Using the BACKUP1 - BACKUP99 example again. Lets just say this took 3 months to get to BACKUP99. BACKUP1 being the active full, and everything up `till backup 99 has been incrementals and synthetic fulls. Being that a majority of the blocks reference BACKUP1, what happens when another active full starts lets with BACKUP100. Going forward will the filesystem map and new backups IE BACKUP101 - BACKUP1xx to the new active full created at BACKUP100 ?

Post by **tsightler** » Jul 03, 2017 3:19 pm this post

kubimike wrote:Now that I have my backup copies jiving I can test out that msft refs.sys and play with the 'RefsProcessedDeleteQueueEntryCountThreshold' + 'RefsDisableCachedpins(not sure on this one). Are you using this driver during your testing ?

My testing has been focused almost exclusively on consistently reproducing the issue on standard Windows without any private hotfixes. Only once we consistently reproduce the issue can we truly test if the available fixes, and what specific settings, actually help the problem. Unfortunately, this has been a far more elusive goal than one would hope.

Post by **tsightler** » Jul 03, 2017 3:41 pm this post

jazzoberoi wrote:Is Reverse incremental a better option then ? Since ReFS does not have that bad a hit on merging anymore ?

Very limited testing with reverse incremental but, at least in theory, I would expect it to be better, simply because it only deletes a single set of restore points daily, and those points are small. To this point, the single most reliable mode for ReFS seems to be forever forward incremental, with a regular maintenance scheduled. Forever forward makes the minimum use of block clone (only for accelerating the merge processes of the oldest VIB into the VBK), and thus has the minimum amount of data that has to be persistently tracked by the ReFS filesystem.

Post by **tsightler** » Jul 03, 2017 3:44 pm this post

kubimike wrote:here is something I just thought of. Using the BACKUP1 - BACKUP99 example again. Lets just say this took 3 months to get to BACKUP99. BACKUP1 being the active full, and everything up `till backup 99 has been incrementals and synthetic fulls. Being that a majority of the blocks reference BACKUP1, what happens when another active full starts lets with BACKUP100. Going forward will the filesystem map and new backups IE BACKUP101 - BACKUP1xx to the new active full created at BACKUP100 ?

Since the active full would contain new copies of every block, future synthetics would reference blocks in the new active full. While it may seem obvious, I want to make sure to note that running an active full requires the additional space to store these new fulls since they will not share any blocks with the prior backups on disk.

DaveWatkins · Post by **DaveWatkins** » Jul 03, 2017 5:14 pm this post

This is something I've wondered about a few times. It's good to know an active full doesn't use block cloning at all

Post by **dellock6** » Jul 11, 2017 8:00 am this post

Back from vacation, thanks to everyone who posted his repository setup, keep them coming!

Thanks,
Luca

Delo123 · Post by **Delo123** » Jul 11, 2017 8:26 am this post

We do synthetic fulls every week and active fulls every month, maybe that's why we did not see any performance issues until now, even not when we delete files. Currently 180TB allocated on the ReFS repository

Post by **mkretzer** » Jul 11, 2017 8:39 am this post

@Delo123
Very interesting. Do you use per-VM? What is your biggtest backup file?
I find it interesting that such a big repo does not show the same issues we have with a similar big repo...

Especially with you active fulls: Do you have merges running at the same time as the active fulls? In our case that lead to active fulls running at 1-Digit MB/s values.

Delo123 · Post by **Delo123** » Jul 11, 2017 9:52 am this post

Yes, per VM jobs. Biggest Vbk is 6.8TB. I scheduled everything so active fulls and synthetic fulls never run at the same day. Then again we only have 250 Vm's thus only 8 Jobs. Speeds vary especially with incremental runs without much changed data troughput is average, but always over 200MB/s.
Yesterday I did see a "Backup files health check has been completed) with a red x and a duration of 3 hours on one of the big backup files with the error being "ChannelError: ConnectionReset", currently investigation by what that is caused... This morning however the same backup file heatlh check completed in 6 hours.

R&D Forums

REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Re: REFS performance issue "workaround"

Who is online