Comprehensive data protection for all workloads
Post Reply
vt_lyndon
Lurker
Posts: 2
Liked: never
Joined: Jul 26, 2021 7:30 pm
Full Name: Lyndon Lapierre
Contact:

Feature Request: BTRFS or OpenZFS Backup Repository

Post by vt_lyndon »

Hi,

We're in a scenario with our XFS backup repositories where our ideal chain looks like this:

Primary Site Repository -> Backup Site Edge Repository -> Backup Site Bubble Repository

We do semi-frequent DR testing and like to keep backup assurance during these tests. To this end, we'd like to send to one repository then from there to the next. This isn't very well supported within Veeam itself, so we originally looked to rsync but using rsync uses a ton more space - we lose block cloning.

An ideal solution is to create the repositories using ZFS or BTRFS, and have a frequent block-level send over the network.

As I understand it, Veeam currently uses reflink for block cloning. BTRFS has existing capabilities for reflink and the efficient send/receive capabilities we're looking for. My personal preference would be to have Veeam use zfs snapshots or clone-copies on the backup repository over reflink, but any solution which solves our problem would be a welcome one.

Thanks for reading!

Lyndon

tsightler
VP, Product Management
Posts: 5905
Liked: 2759 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Feature Request: BTRFS or OpenZFS Backup Repository

Post by tsightler » 1 person likes this post

Hi @vt_lyndon.

While it would be great if we could just support everything, it's a very difficult and expensive proposition to support all possible options out there for Linux, just supporting mulitple distros is a significant QA burden and adding even a single additional filesystem effectively doubles the existing QA effort.

When we were looking at support block clone on Linux, there were several filesystem options available, but only the XFS filesystem has the combination of stability and wide support. XFS is proven to scale to volumes up to 1PB and provides a very solid foundation on which to build a repository and, most importantly, is supported on all major distros.

BTRFS is great, but it simply has failed to fully reach the same status of reliability as XFS so far, although I do believe it continues to get closer and closer. However, one big strike against it is that it is no longer considered a supported filesystem on RHEL based distros, and that's a huge issue since such a high percentage of customers run RHEL based distros exclusively, it means the QA efforts there will only ever apply to a smaller subset of customers.

ZFS, well, that's another issue entirely. As far as I know it is only officially included and supported on Ubuntu, although certainly you can get it to work on others as well. It does not have the required block clone features implemented and, while it is somewhat popular with users, it currently has effectively zero percent chance of being a first class citizen in the Linux kernel, which will mean that it will always be this external thing supported by a subset of distros.

However, if you really like ZFS, then it may very well still be the best answer to your requirement as it's completely possible to use XFS on top of ZFS via the use of ZVOLS. Basically, this is what storage appliances like TrueNAS, which leverage ZFS underneath, actually do, you create a ZFS volume as normal, then create ZVOLs and then format those ZVOLS with XFS. You can then use snapshots and incremental send/receive to incrementally replicate volumes at the block level.

I'm also curious what issues you are having with Veeam and this setup as I believe it should work, at least, I know I've helped customers setup this scenario in the past, but it's admittedly been quite a while.

vt_lyndon
Lurker
Posts: 2
Liked: never
Joined: Jul 26, 2021 7:30 pm
Full Name: Lyndon Lapierre
Contact:

Re: Feature Request: BTRFS or OpenZFS Backup Repository

Post by vt_lyndon »

Good to know - thank you for the detailed response! I had considered XFS on a ZVOL initially but ultimately decided against it to ensure we're on a supported configuration, perhaps I should revisit this if it's actually being used in the wild.

Re: my issues, the "backup copy job from a backup copy job" logic doesn't seem to work too well. I can configure it to start working and it kicks off, but it doesn't seem to stay healthy. If, for example, we rename or restore a server it won't find it's way into the final repository.

dandav
Influencer
Posts: 21
Liked: 4 times
Joined: Jan 15, 2021 2:53 am
Full Name: Daniel Davis
Contact:

Re: Feature Request: BTRFS or OpenZFS Backup Repository

Post by dandav »

@tsightler,

I understand your hesitancy to rely on BTRFS or ZFS due to their limited support, but there are some things that I think need to be explained to customers before they implement XFS reflink though. XFS has no data rot detection mechanism built in, so when forever incremental encrypted backups are stored on an XFS volume there is risk that a single flipped bit can result in an entire backup chain becoming unusable. Because the initial data is never re-written and some blocks in the chain may NEVER be changed, the chance of this increases with time. Without reflink enabled the synthetic fulls are re-written to disk fairly regularly so data is always fairly fresh and you increase the chance of a drive error being picked up before it causes issues, also a flipped bit will only affect a small subset of the backups in the chain. Suggesting XFS on top of ZVOLS with regular scrubbing is a good way to mitigate the issue, however then you may as well just go all out with ZFS reflink support.

Obviously everyone should have other copies of their backup data, however data rot happens, usually it's not a big problem, a word document ends up with bad formatting or a JPEG looks a bit weird in one corner but when we're talking about storing deduplicated, forever-forward, compressed, encrypted backups it is a real problem.

mkretzer
Veeam Legend
Posts: 875
Liked: 263 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Feature Request: BTRFS or OpenZFS Backup Repository

Post by mkretzer »

Isn't the CRC option that is needed for XFS repos + Veeam health checks exactly what you mean by "bit rot detection"?

dandav
Influencer
Posts: 21
Liked: 4 times
Joined: Jan 15, 2021 2:53 am
Full Name: Daniel Davis
Contact:

Re: Feature Request: BTRFS or OpenZFS Backup Repository

Post by dandav »

Sorry, I worded it incorrectly that should have read "data rot correction" as yes, it can be detected, just the filesystem has no way of correcting it. Hence why I think there needs to be some kind of warning in the user guide warning users that if they store data on a repository with reflinks enabled that they MUST use backup file health checks.

mkretzer
Veeam Legend
Posts: 875
Liked: 263 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Feature Request: BTRFS or OpenZFS Backup Repository

Post by mkretzer »

The same goes for ReFS on simple volumes. Scrubbing does nothing in such situations!

soncscy
Veeam Legend
Posts: 503
Liked: 246 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey Carel
Contact:

Re: Feature Request: BTRFS or OpenZFS Backup Repository

Post by soncscy »

Block cloning technology, for me at least, natively implies redundancy elsewhere. While it's amazingly fast and allows you to size overall smaller servers, I believe it's plain as day that it's very much so an "all eggs in one basket" situation. Anyone who walks away from reading about reflinks/block cloning and just understands "free space" needs to read again until they understand the cost involved.

To be clear, I'm heavily an advocate to my clients for XFS/ReFS, but we also push redundancy. Even without block cloning the same risks apply; I'm willing to commit to any maths on it, but it's basically the same risk in my opinion -- will a given backup be hit by data corruption?

For me, since the risk is relatively the same, the difference is that without block cloning, you spend far more space and have much slower random IO operations. if the risk is relatively the same, but you get benefits with block cloning, I'd suggest why not? Proper administrators will be performing the correct checks and ensuring they have proper redundancy as well. Those who are not able to afford it at least can make an informed decision and understand that they will be unprotected in certain situations.

tsightler
VP, Product Management
Posts: 5905
Liked: 2759 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Feature Request: BTRFS or OpenZFS Backup Repository

Post by tsightler » 2 people like this post

XFS has no data rot detection mechanism built in, so when forever incremental encrypted backups are stored on an XFS volume there is risk that a single flipped bit can result in an entire backup chain becoming unusable.
While I agree that it is possible for this to happen, I do think the risks are very low if you also combine XFS with with RAID scrubbing and health checks. The reason is that, even if Veeam itself never reads the block again, when using disk scrubbing, the block will be read from disk. All modern drives that I know of detect and correct single bit errors themselves, including rewriting the impacted block, so as long as you have a scrubber process reading the data consistently, single bit errors will be corrected at the drive level before there even a detectable problem. Now, perhaps there are multiple bit flips in a single block, that would produce an unrecoverable error, but then RAID kicks in and reads the data from another drive.

RAID scrubbers will also detect minor inconsistencies and , in some cases, rewrite stripes (exact capabilities vary per vendor) but note that most won't actually correct the data, they are simply correcting the parity. Linux RAID had both "check" which just reports stripe inconsistencies, and "repair", which just rewrites the stripe with new parity (or using the first readable data in the case of RAID1). Note that, if using Linux based RAID6, you should almost certainly not use "repair", but rather perform a "check" scrub and, if you receive any errors you can use a tool like raid6check to actually determine which component has the bad data.

Now, don't get me wrong, I'm not saying that the data protection features in ZFS/BTRFS are not nice, indeed having automatic recovery from such issues is really nice, but, to be fair, both ZFS/BTRFS recommend scrubs as well because that's where 99% of errors will be fixed, not different than with basic RAID. Basically, my suggestion is always "scrub early, scrub often, scrub using all available methods". Oh, and always have a second copy (or more)! :D
Suggesting XFS on top of ZVOLS with regular scrubbing is a good way to mitigate the issue, however then you may as well just go all out with ZFS reflink support.
But, unless something has changed very recently, ZFS doesn't support reflink, because ZFS maintainers seem to continue to think that reflink is the same as dedupe. So XFS on ZVOL is the only way to get reflink with ZFS data checks, as far as I know. Feel free to provide references otherwise. And don't get me wrong, I know you can do filesystem style snapshots of files, which provides a somewhat similar level of functionality to reflink, basically, snapshot the file, then modify the file in place, but this is a completely different approach that introduces a whole share of challenges.

Now, BTRFS is different, it does support reflink, so it's really about supportability. BTRFS continues to improve in stability and is really looking quite good in modern kernels, but supportability is still an issue today. We may support BTRFS one day, but it needs to be a supported and recommended choice by our Linux partners.

Post Reply

Who is online

Users browsing this forum: No registered users and 21 guests