REFS - to defag or not to defrag... that's the question?

Post by **enoch** » Jun 04, 2017 4:41 pm this post

Hi all,

Defrag Tool in Windows 2016 pretty quicky shows 70%+ and defragmentation needed.

Should this be done on a daily or weekly basis?

With "large" REFS 3.1 volumes 40 TB+, will a defragmentation run ever be "finished"?

Post by **dellock6** » Jun 05, 2017 7:26 am this post

I have some doubts about defragging a REFS filesystem, especially if blockclone is used. If a block is used by more than one file, how defrag is going to choose where to place the block? Say the blocks of a file are 1-2-3-4 for file1, and 5-2-6-7 for file2. If I defrag blocks to optimize file1, chances are that the same block will not be optimized for file2.

Jun 05, 2017 7:54 am

Thanks for the reply, would be nice for a complete "best practice" for REFS and Veeam.

So I should deactivate defrag for all REFS volumes being used with Veeam and Block Cloning?

Post by **enoch** » Jun 05, 2017 10:26 am this post

Luca, can Veeam test/get back if running defrag on REFS with Veeam (using Block cloning) is recommended or not?

Tried with the blockstat.exe / blockcomparerefs.ps1 script and it still after running defrag shows "savings"

Link: http://dewin.me/refs/

Jun 05, 2017 1:18 pm

Luca's point was that defrag is unlikely to be useful for ReFS when block cloning is in use simply because it's impossible for a single block to be in an optimal location for more than one file. Simple example:

File1 -- |Block1|Block2|Block3|Block4|
File2 -- |Block5|Block2|Block6|Block4|

In this example, File1 has four blocks of 64K each (cluster size) laid out in order on the disk, so 0% fragmentation. File2 is sharing two blocks from File1, so it is immediately 50% fragmented. There's simply no way to avoid this when cloning the same blocks between files, there's nothing defragmentation can do about this, if I defragmented File2 then File1 would be fragemented.

At this point I'm not recommending file system defragmentation runs on ReFS until more testing is complete (I haven't found it to cause any issues, but so far also no benefit, just wasted I/O). However, if you are using one of the forever modes, such as reverse incremental or forward incremental without synthetic/active fulls, then using the Veeam settings for defrag and compact can still be useful to clean unsused space and reorganize the metadata within the VBK file.

Post by **enoch** » Jun 05, 2017 1:24 pm this post

Thanks for the answer

Aug 20, 2018 8:41 am

tsightler wrote:At this point I'm not recommending file system defragmentation runs on ReFS until more testing is complete

Has Veeam come to an conclusion, how to optimize an ReFS volume when it comes to fragmentation.

andreash · Post by **andreash** » Aug 30, 2018 7:20 pm this post

I'm not convinced that ReFS doesn't need defragmentation, but I can follow Luca's explanation that this is going to be difficult.

In my setup we have a seven disk raid 6 array of 7.2k SATA drives for daily backups on an Core i3 powered NAS. After more than a year with ~75% usage the read speed has severely degraded. A simple file-copy maxes out at ~40MB/s now while it hit 120MB/s in the beginning.
We are moving the repository to a new device anyway, so I'm planning to run a few tests again after defragging the volume.

Any real world experience greatly appreciated.

Dec 02, 2020 9:21 am

I experience the exact same thing. Veeam data that's placed just a week old has heavy degraded read speed. While other non Veeam files keep their speed read speed over time.

Post by **tsightler** » Dec 02, 2020 2:16 pm this post

There could be some advantages to defragmentation especially when space utilization gets above say 70%, just by keeping the areas for new data more consolidated allowing any new added data to be less fragmented. However, if synthetic fulls are in use, the overall benefit is not very large as defragmenting one file simply fragments the other files that are sharing those same blocks. Unfortunately, fragmentation is the natural side effect of any solution that shares data segments between multiple references so some degradation over time is certainly expected, especially if you are keeping multiple synthetic fulls in backup chains.

However, results from defragmentation in real world environments are still pretty limited, and the limited results I do have indicated very mixed results, with a few reporting minor/moderate improvements to others showing basically no improvement at all and one even reporting a small decrease in performance after days of defragmentation. It's difficult to draw conclusions across the limited results I've seen due to dramatic difference in size of volume, amount of data, retention, and hardware type, so anything you learn and share here would be a welcome addition.

Dec 02, 2020 2:52 pm

Ok thanks for your anwser. I'm staring a defrag on a 11TB volume and see how that goes.

WE have two differente types of repositories.

SOBR 1 with local disks / each node has about 35TB ReFS backed by 12 4TB SATA drives.
SOBR 2 with datastores via NFS backed by NetApp SATA pools / each node (VM) has 12TB ReFS.

We store just 14 restore points with synthetic fulls on saturday. Just small chains, but the read speed on all files vib and vbk are terrible.

Both SOBRs experience the following symptoms:
- A non Veeam file from network to a ReFS volume goes with 200 MB/s.
- Copying this file to other extends goes with the same speed, even when this file is left for a couple of days or weeks. This seem logical, as there's nothing being done ReFS wise to the file.
- Veeam backup job writing to extend goes with the same speed 200 MB/s.
- When copying the most recent .vib file to another extend, it just goes 40MB/s

- When copying the most recent .vbk file to another extend, same 40MB/s
- When copying a VBK or VIB file somewhere else on the same disk 40MB/s, and then copy it to same or another extend or 200 MB/s

So what I found strange is:
- Why is a fresh vib file read degraded, it's not in any way touched by ReFS, correct?
- The 200 vs 40MB/s is a very high degradation with such a small chain isn't it.
- There must be something else in play here.

AlexHeylin · Post by **AlexHeylin** » Apr 29, 2021 4:27 pm this post

To pick up on what's performance we should expect, because I've got some evidence that aligns with what @b.vanhaastrecht sees -

Please can someone from Veeam clarify if a fresh VIB is entirely composed of new unshared blocks, even if by some magic all the block in the VIB are already on the disk from another file?
If so, why do we routinely see VIBs written with very high fragmentation levels (at least 1 frag per 2.4 MB), even though the repo is only 50% full and the free space is fairly contiguous?
If not, are any of those blocks shared with other files in its own chain?

Are any blocks shared outside a backup chain?
For example two backup jobs (chains) of identical virgin windows servers, written to repo one after the other - would we expect most of those blocks to be shared between the chains, or kept separate because they're separate chains? Does "per-vm file" setting affect this?

When the oldest VIB is merged into the VBK (using Fast Clone), the blocks are just left in place and mappings added to the VBK then the VIB file referencing them is removed, leaving the blocks in exactly the same place right?
So ensuring those blocks are defragged (to remove FS frags within a Veeam block) before merging into the VBK avoids adding some fragmentation to the VBK?

I'm working on some code to defrag ReFS repos, and I need to verify my expectations so I can have the code optimise the files properly. So far it does seem to help, but it's slow and inefficent due to the problem highlighted above of shared blocks. "Fragmentation" due to shared blocks can't be avoided - only one file can see those blocks as defragged. However Veeam on ReFS appears to also suffer badly from genuine file system fragmentation (a contiguous Veeam block written to disk in more than one fragment). That we can improve, I think.

Thanks

AlexHeylin · Post by **AlexHeylin** » May 11, 2021 3:58 pm this post

@tsightler are you able to confirm / answer my expectations above please?
Thanks

May 11, 2021 4:54 pm

Please can someone from Veeam clarify if a fresh VIB is entirely composed of new unshared blocks, even if by some magic all the block in the VIB are already on the disk from another file?

This should be true, yes.

If so, why do we routinely see VIBs written with very high fragmentation levels (at least 1 frag per 2.4 MB), even though the repo is only 50% full and the free space is fairly contiguous?

I'm not sure, but my guess would be something to do with how ReFS allocates space during writes and the fact that there's normally not a single VIB being written, but multiple VIB files in parallel. When writing a file the filesystem has no way to know how much space that file will use. If you are writing a large number of files in parallel, every filesystem will "fragment" to an extent, although different filesystem use different heuristics to determine how to allocate in this case. Unfortunately, I have no idea what allocator optimizations ReFS has available or how it deals with this.

XFS allows you to manually override the pre-allocator and set a fixed size to combat this, but I'm a little worried that the fact we are doing unbuffered I/O in v11 will cause this problem to appear there as well since I believe pre-allocation only works for buffered I/O.

Are any blocks shared outside a backup chain?

No

When the oldest VIB is merged into the VBK (using Fast Clone), the blocks are just left in place and mappings added to the VBK then the VIB file referencing them is removed, leaving the blocks in exactly the same place right?

They are left in place.

So ensuring those blocks are defragged (to remove FS frags within a Veeam block) before merging into the VBK avoids adding some fragmentation to the VBK?

That's more questionable to me. If Veeam blocks themselves are truly fragmented, maybe, but I'm not as convinced based on your above statement of at least 1 frag per 2.4MB, as that's bigger than average Veeam block size by 5x. Also, there seems to be an assumption that the 5 blocks in a VIB represent 5 contiguous blocks, but it's very unlikely that they do. If I have 100 blocks in the VBK and there's a VIB with 5 changed blocks, they are more likely to represent block 17, 32, 55, 73, and 91, so having them in order on the disk isn't that useful from a Veeam perspective, and note that the new block 17 may be bigger than the old block 17 and not fit where the old block was stored within the VBK, so we'll have to map it to a new place in the VBK, which means it still won't be in sequential order inside of the VBK. Basically, completely focusing on the VBK container does not guarantee that the blocks within the VBK are in any particular order.

I'm really struggling to see the benefit of defragmenting in this way. I could maybe see a case where, immediately after the VBK compact operation, the file is then defragmented, but even then, I'd need some evidence that shows this benefit. Personally, I work with clients every day that backup PBs of data to ReFS and, overall, performance degradation after years of operation with 1000's and millions of VIB and merges has been moderate at best.

That being said, the examples above are all pretty small, so perhaps it worse in the small cases somehow, or perhaps with low end, non-cached RAID or something? I'm struggling to match what I'm reading in this thread to what I see every day.

AlexHeylin · Post by **AlexHeylin** » May 12, 2021 3:56 pm this post

Thanks tsightler.

Just to pick up on a couple of points & give you more background on our environment

When writing a file the filesystem has no way to know how much space that file will use.

If you use preallocation it does. Veeam does (or certainly could in certain circumstances) know the size of a VIB it's sending, so could preallocate a segment of disk that's as contagious as possible. Now - to pick up on another point you made - it might not make sense to preallocate the whole file, but it might make a lot of sense to preallocate each "veeam chunk" in the file (i.e. the discrete chunks of data that will later be mapped into the VBK). Then the overall file fragmentation doesn't matter, because each "veeam chunk" will only be fragmented as a last resort (because otherwise it could not be written to disk). As all reads for the data will be per "Veeam chunk", the logical read of a full and all its incrementals then only swings the drive heads to deal with the "application level" fragmentation (from the full and a overlaid sequence of incrementals), rather than to deal with fragmentation within a "Veeam chunk"

so perhaps it worse in the small cases somehow, or perhaps with low end, non-cached RAID or something?

While they're not monster servers, we're talking something like 64TB of RAID6. spread over 12 spindles, sat on a decent Dell PERC controller with cache, in a server with 2 recent 12 core Xeons and 64GB of RAM. This isn't a low-end / no-cache environment.

I'm struggling to match what I'm reading in this thread to what I see every day.

We're also struggling to see how anyone would be able to run this at large scale without requiring insane IOPS performance to get sensible speed from Veeam, particularly at high tenant / repo extent ratio. I'm not slagging Veeam off here - we just don't understand why VBR seems to often ignore most of the hardware available to it.

It's worth mentioning that since v11 we're now seeing a new problem as well. Synth fulls using block clone are running at less than 5% of the speed they could be based on testing with the block clone speed test tool support provided. We've seen a 1.5 TB synth full take 24 hours, and right now I'm looking at one from 8am today which took 7h 53m to do a fast clone synth full on 1.5 TB of per-VM VBKs. The whole backup job took 8h 09m, which means the incremental took 16 minutes, followed by 473 minutes of "fast clone" merging. During this time repo disk time, RAM, CPU, and network load all remained effectively zero. As this is an index-only / metadata only operation, we don't understand what on earth VBR is doing for all this time - given the measured block clone performance of the storage. It gives the appearance of doing absolutely nothing, when we look at the repo and VCCG. Even the tenant VBR might take two hours to process 1%.

It's clear that Veeam's use of block clone is "suboptimal" (for us, currently), and potentially that VBR using ReFS also has some objection to operating with sensible performance on what we consider to be ample hardware for the load we place on it.

It's possible that prior to finding the obvious block clone issue in VBR v11, we misinterpreted what we were seeing with VBR v10 and went off chasing fragmentation because it's something we understand - rather than the inner workings in VBR which obviously we don't. I don't feel that adequately explains what we think we've seen - but I'll certainly try to disambiguate the two issues from now on.

In fairness to us, not long before we'd seen VBR restore a < 1TB VM into many hundreds of thousands of fragments onto a completely empty freshly formatted disk (10x size of VM) - and having been told VBR preallocates in this situation to avoid fragmentation, we don't have 100% faith in what Veeam think VBR does because we've seen it do things very differently, and support have been unable to explain what we've seen. The result of that restore was the VM was insufferably slow and we had to take it offline and use contig to make the VHDX contiguous before we could return the VM to operation. Definitely suboptimal.

May 12, 2021 6:08 pm

"Veeam chunks" are typically <1MB as, by default we read 1MB blocks, then compress them and store them block aligned. That's part of the reason you don't need massive IOPS, the Veeam chunks are pretty big. It only requires reading 200 of these blocks per second to do 200MB of throughput for a restore, it's not a ton of IOPs.

Writes to a VBK/VIB files are not 100% just a stream, I always say our format is more like a database of blocks, it has sections for various internal structures, backup metadata, and the database blocks. Blocks are written in batches and, for example, after we write a batch of new blocks we update the associated metadata and flush those writes, so it's not just a 100% sequential operation. That's likely what leads to some of the fragmentation you are seeing. But this is the way it has always worked, it's not new or unique to ReFS, and, in general, fragmentation is not something that our long term customers suffer with today.

Note that I'm not saying I've never seen a case where fragmentation has impacted performance, of course I have, and performance of reading VBK/VIB files will degrade somewhat, especially for sync reads, but this isn't the way restores are performed.

The issue you are describing above with V11 and ReFS certainly sounds unexpected, feels a lot like behavior we saw in the old days with Windows 2016 and all the various issues. It's certainly not something I've seen across a large stretch of the user community I work with, but these are mostly larger customers and most have not started migration to v11 yet.

When I spoke about size possibly impacting, I was basically just wondering aloud if the difference might be related to the size, and more specifically to the amount of free space available on average. Whether something is big or small is mostly relative to what you work with every day, it is exceptionally rare for me to work with customers that have anything smaller than ~250TB of space in an extent, and the majority are closer to 500TB or larger. These systems typically have free space that measures between 50-100 TBs, so the average free space per repo is larger than the entire volume. My thought was that having so much free space might work in favor of avoiding fragmentation.

I'd suggest you continue to work any cases around performance issues with support, but feel free to share you case numbers here as well.

DonZoomik · May 12, 2021 8:08 pm

I've stated it in other threads but I'll repeat it.
With defragmentation you will lose block clone space savings. Therefor only defragment if you have one long chain per vm/job. Synthetic fulls will get rehydrated.
Also you'll have Veeam job failures during analysis phase as defrag will lock filesystem metadata (per folder it seems). On large file systems, analysis can take days.
You can do per-file defrag with Sysinternals contig.exe (shorter metadata lock times) but I'm not sure it's worth the effort.

AlexHeylin · Post by **AlexHeylin** » May 14, 2021 8:37 pm this post

Thanks for the info tsightler. That's food for thought.

We have had progress today on the issue of Synth Fulls taking far far too long. By setting CloudConnectQuotaAllocationMode = 1 we've seen our 1.5TB testing backup go from ~8 hours for SF to < 10 mins (I think that equates to about 75% of the raw storage speed for block clone).

With that issue out of the way, perhaps we'll be able to get a clearer idea of what's going on with this one and if it's still / really an issue, and if fragmentation looks like a suspect. Thanks all for your help.

AlexHeylin · Jun 22, 2021 4:35 pm

Update: CloudConnectQuotaAllocationMode =1 causes some other issues, but CloudConnectQuotaAllocationMode = 2 seems to just work as it should have in the first place. Since setting that I've had no issue with repo speed, so I suspect that as long as disk queue remains reasonable fragmentation was not the cause of our issue, and that the huge quota allocation bottleneck caused by CloudConnectQuotaAllocationMode = 0 was.

R&D Forums

REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Re: REFS - to defag or not to defrag... that's the question?

Who is online