ReFS Repository and Veeam

TheOnlyWizard17 · Sep 15, 2020 8:50 am

Hi,

I've got a question about the use of a ReFS repository.
When you create a job to do backups, and specify to use "per VM backup files" on the repository, Veeam nicely creates a folder per job, and places the backup files, one per VM in there...
According to this paper from Microsoft however, https://docs.microsoft.com/en-us/window ... ted-parity
There's an important remark:
"We recommend placing write-heavy VHDs in different subdirectories. This is because ReFS writes metadata changes at the level of a directory and its files. So if you distribute write-heavy files across directories, metadata operations are smaller and run in parallel, reducing latency for apps."

So, following this logic, wouldn't it be better for Veeam to do per VM files also in per VM folders, so all metadata-operations like merges, transformations, fast clone operations etc. all can run in parallel instead of sequentially ?

Post by **PetrM** » Sep 17, 2020 7:58 pm this post

Hello,

Many thanks for the idea provided! I think it makes sense to analyze this approach a bit deeper in order to estimate potential performance gain and decide on this improvement depending on results of the research.

Thanks!

Post by **Steve-nIP** » Sep 18, 2020 7:45 am this post

Very interesting indeed. I really hope this gets tested thoroughly, and implemented quickly if it gives gains. ReFS on 2019 can use all the help it can get.

Post by **Gostev** » Sep 18, 2020 1:00 pm this post

I have just checked with the ReFS development team on this. While they confirmed the recommendation is correct, this approach will make difference only if the bottleneck is flushing dirty ReFS metadata, which I not currently known to be a bottleneck for the type of workload Veeam creates on ReFS, even in the largest of environments. Nevertheless, we will keep that in mind if we see this operation becoming an issue in certain scenarios. So, thanks for bringing this to our attention!

Post by **dimaslan** » Sep 21, 2020 1:03 pm this post

Also,
That would only be an issue when the backup job contains multiple VMs.

Post by **Gostev** » Sep 21, 2020 1:47 pm this post

This is 99.99% of cases though

only the tiniest customers do one VM per job, and for them the bottleneck be their low-end hardware anyway.

TheOnlyWizard17 · Sep 26, 2020 10:08 pm

"this approach will make difference only if the bottleneck is flushing dirty ReFS metadata, which I not currently known to be a bottleneck for the type of workload Veeam creates on ReFS, even in the largest of environments"

Are there ways of testing/validating if/when this (dirty metadata flush can't keep up) would be happening ?
I've set monitoring on ReFS counters, one of which sounds interesting in this regard: Dirty metadata pages. During normal operation this reaches somewhere around ~7000 but indead seems to be getting flushed instantly by ReFS as they then immediately drop back down and just keeps fluctuating during live backups. However, I know it's a peobably a bit offtopic, but whenever our cluster (5 nodes, 5 jobs) is doing healthchecks, this counter absolutely skyrockets (over 256000 is the highest I've seen) and won't go back down under some 20000 or so during the entire HC. Mind you, because healthchecks are taking so tremendously long already this is the result of 2 jobs doing HC at the time, because I've spread out HC per job over several days so they don't run all at the same time (max 2 jobs same time), which would grind HC to a near complete stop for over 72+ hours (5 jobs doing HC at same time that is)

Post by **Gostev** » Sep 26, 2020 11:12 pm this post

ReFS developers can get this information from the system dump performed during block cloning activities. We had to do quite a lot of troubleshooting with them in one of the largest ReFS deployments we know recently, so a lot of dumps were taken and this particular metric was never a concern for them.

Keep in mind Veeam uses fairly large block sizes, by almost two orders of magnitude larger than other Microsoft apps which ReFS are designed to support, which translates into much less ReFS metadata to handle. For example, Exchange or SQL Server which are fully supported on ReFS use 8KB blocks.

And yes, being a ready-only workload, health checks do not result in changes to ReFS metadata.

Post by **mkretzer** » Sep 27, 2020 11:35 am this post

Gostev, since i believe you are talking about our environment keep in mind that our bigger jobs only have ~300 VMs per Job.

But i concur that it is unlikely that this would help alot - in the article they talk about VHD and not backups files. For VMs every ms count but not so much for backups. Also, a workaround would be to split the VMs in multiple jobs.

But since MS is having such nice dumps from our system perhaps they want to check that?

Post by **Gostev** » Sep 27, 2020 11:15 pm this post

Yes, I was talking about your environment.

May be at some point flushing dirty ReFS metadata becomes a bottleneck and this can be revisited then, but as of right now it seems like we're too far from this. Because even just between those block cloning performance improvements you're seeing in SAC version 2004, which should come standard in the next LTSC, and one simple change we can make to our code based on some unrelated findings from troubleshooting your recent issue, synthetic full performance can be accelerated up to 10x. If anything, this gives an idea of how relatively little we load the ReFS block cloning engine today...

R&D Forums

ReFS Repository and Veeam

Re: ReFS Repository and Veeam

Re: ReFS Repository and Veeam

Re: ReFS Repository and Veeam

Re: ReFS Repository and Veeam

Re: ReFS Repository and Veeam

Re: ReFS Repository and Veeam

Re: ReFS Repository and Veeam

Re: ReFS Repository and Veeam

Re: ReFS Repository and Veeam

Who is online