Comprehensive data protection for all workloads
Post Reply
TheOnlyWizard17
Service Provider
Posts: 50
Liked: 15 times
Joined: Nov 15, 2016 3:38 pm
Full Name: Bart van de Beek
Contact:

ReFS Repository and Veeam

Post by TheOnlyWizard17 » 5 people like this post

Hi,

I've got a question about the use of a ReFS repository.
When you create a job to do backups, and specify to use "per VM backup files" on the repository, Veeam nicely creates a folder per job, and places the backup files, one per VM in there...
According to this paper from Microsoft however, https://docs.microsoft.com/en-us/window ... ted-parity
There's an important remark:
"We recommend placing write-heavy VHDs in different subdirectories. This is because ReFS writes metadata changes at the level of a directory and its files. So if you distribute write-heavy files across directories, metadata operations are smaller and run in parallel, reducing latency for apps."

So, following this logic, wouldn't it be better for Veeam to do per VM files also in per VM folders, so all metadata-operations like merges, transformations, fast clone operations etc. all can run in parallel instead of sequentially ?
PetrM
Veeam Software
Posts: 3229
Liked: 520 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: ReFS Repository and Veeam

Post by PetrM »

Hello,

Many thanks for the idea provided! I think it makes sense to analyze this approach a bit deeper in order to estimate potential performance gain and decide on this improvement depending on results of the research.

Thanks!
Steve-nIP
Service Provider
Posts: 117
Liked: 49 times
Joined: Feb 06, 2018 10:08 am
Full Name: Steve
Contact:

Re: ReFS Repository and Veeam

Post by Steve-nIP »

Very interesting indeed. I really hope this gets tested thoroughly, and implemented quickly if it gives gains. ReFS on 2019 can use all the help it can get.
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS Repository and Veeam

Post by Gostev »

I have just checked with the ReFS development team on this. While they confirmed the recommendation is correct, this approach will make difference only if the bottleneck is flushing dirty ReFS metadata, which I not currently known to be a bottleneck for the type of workload Veeam creates on ReFS, even in the largest of environments. Nevertheless, we will keep that in mind if we see this operation becoming an issue in certain scenarios. So, thanks for bringing this to our attention!
dimaslan
Service Provider
Posts: 99
Liked: 9 times
Joined: Jul 01, 2017 8:02 pm
Full Name: Dimitris Aslanidis
Contact:

Re: ReFS Repository and Veeam

Post by dimaslan »

Also,
That would only be an issue when the backup job contains multiple VMs.
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS Repository and Veeam

Post by Gostev »

This is 99.99% of cases though ;) only the tiniest customers do one VM per job, and for them the bottleneck be their low-end hardware anyway.
TheOnlyWizard17
Service Provider
Posts: 50
Liked: 15 times
Joined: Nov 15, 2016 3:38 pm
Full Name: Bart van de Beek
Contact:

Re: ReFS Repository and Veeam

Post by TheOnlyWizard17 »

"this approach will make difference only if the bottleneck is flushing dirty ReFS metadata, which I not currently known to be a bottleneck for the type of workload Veeam creates on ReFS, even in the largest of environments"

Are there ways of testing/validating if/when this (dirty metadata flush can't keep up) would be happening ?
I've set monitoring on ReFS counters, one of which sounds interesting in this regard: Dirty metadata pages. During normal operation this reaches somewhere around ~7000 but indead seems to be getting flushed instantly by ReFS as they then immediately drop back down and just keeps fluctuating during live backups. However, I know it's a peobably a bit offtopic, but whenever our cluster (5 nodes, 5 jobs) is doing healthchecks, this counter absolutely skyrockets (over 256000 is the highest I've seen) and won't go back down under some 20000 or so during the entire HC. Mind you, because healthchecks are taking so tremendously long already this is the result of 2 jobs doing HC at the time, because I've spread out HC per job over several days so they don't run all at the same time (max 2 jobs same time), which would grind HC to a near complete stop for over 72+ hours (5 jobs doing HC at same time that is)
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS Repository and Veeam

Post by Gostev »

ReFS developers can get this information from the system dump performed during block cloning activities. We had to do quite a lot of troubleshooting with them in one of the largest ReFS deployments we know recently, so a lot of dumps were taken and this particular metric was never a concern for them.

Keep in mind Veeam uses fairly large block sizes, by almost two orders of magnitude larger than other Microsoft apps which ReFS are designed to support, which translates into much less ReFS metadata to handle. For example, Exchange or SQL Server which are fully supported on ReFS use 8KB blocks.

And yes, being a ready-only workload, health checks do not result in changes to ReFS metadata.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: ReFS Repository and Veeam

Post by mkretzer »

Gostev, since i believe you are talking about our environment keep in mind that our bigger jobs only have ~300 VMs per Job.

But i concur that it is unlikely that this would help alot - in the article they talk about VHD and not backups files. For VMs every ms count but not so much for backups. Also, a workaround would be to split the VMs in multiple jobs.

But since MS is having such nice dumps from our system perhaps they want to check that? :-)
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS Repository and Veeam

Post by Gostev »

Yes, I was talking about your environment.

May be at some point flushing dirty ReFS metadata becomes a bottleneck and this can be revisited then, but as of right now it seems like we're too far from this. Because even just between those block cloning performance improvements you're seeing in SAC version 2004, which should come standard in the next LTSC, and one simple change we can make to our code based on some unrelated findings from troubleshooting your recent issue, synthetic full performance can be accelerated up to 10x. If anything, this gives an idea of how relatively little we load the ReFS block cloning engine today...
Post Reply

Who is online

Users browsing this forum: CatalinP, Google [Bot] and 237 guests