Comprehensive data protection for all workloads
ashleyw
Service Provider
Posts: 207
Liked: 42 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw » 1 person likes this post

BTW, I've opened up a defect report on the ZFS forums as I can simulate a deadlock at the file system layer on tin (removing Veeam and VMware from the mix).
I suspect some of the issues I'm having may be related to certain load patterns when dealing with many large files with a heavy degree of cloned blocks in - certainly form my testing it doesn't take too many large file deletes to cause a ZFS deadlock which can last for up up to 30 minutes or so. If anyone is interested; https://github.com/openzfs/zfs/issues/16680
My guess at this stage is that the deadlocks are occurring randomly and causing timeouts at the synthetic backup creation and leading to the errors in Veeam I'm seeing.
HannesK
Product Manager
Posts: 14830
Liked: 3079 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by HannesK »

thanks for sharing!
ashleyw
Service Provider
Posts: 207
Liked: 42 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

For anyone interested in experimenting with zfs with large Veeam backup workloads etc, I recommend for now disabling synthetic full backups, disabling GFS on copy jobs, and switching to regular active full backups (which are obviously space intensive but are fast and reliable - depending on your zpool/vdev configuration).
ashleyw
Service Provider
Posts: 207
Liked: 42 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

For anyone still following this, I've been running many tests under 2.3 RC3 and have the following conclusions.
- The message "Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable" still occurs when synthetics are enabled and block cloning is enabled on ZFS.
- There is no currently agreed root cause as to which side of the fence the issue exists, but Veeam suspects this is related to OpenZFS. However using the OpenZFS block cloning reliability tests I've been unable to isolate the issue to OpenZFS.
- The following doesn't eliminate the errors but seems to impact at which stage in the job they occur.
-> Change blocksize from 4MB to 8MB (or even downwards).
-> Changing compression level at a job level from optimal to none.
-> Reducing the number of concurrent jobs hitting the repository.
-> The number of jobs hitting the repositories simultaneously in the transformation stage does not appear to correlate to the failures.
- There appear to be long standing issues with synthetic backups and the load they place on the target storage device even when block cloning is in use, so even on commercial solutions many people seem to run without synthetic solutions.
- As a backup target with synthetic backups disabled, an OpenZFS appliance (in our case running Rocky9 on commodity tin with a standard HBA controller, with 23 spindles split over 4 vdevs) can easily saturate a 10gb link throughout the entire period of an Active Full backup, so we are in the process of upgrading our interconnects to 25gb.
- I'm working with Veeam and OpenZFS team on figuring out the best way of moving forwards with this. (thanks to @hannesk).
- This is how active fulls are looking like with this configuration which appears to be a health throughput;
Image
HannesK
Product Manager
Posts: 14830
Liked: 3079 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by HannesK »

Hello,
thank you Ashley for doing all the tests and working with the OpenZFS team. The ZFS team is working on improvements.

From Veeam side, ZFS will stay unsupported for now.

Best regards
Hannes
ashleyw
Service Provider
Posts: 207
Liked: 42 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw » 1 person likes this post

for anyone following this, there has been a breakthrough over the last week.
The OpenZFS team have made a number of changes/improvements to the block cloning logic in OpenZFS.
This is currently targeted for 2.3RC4 - which hasn't been tagged yet as such, so it's likely to be quite a while before it hits the standard package repos, so the only way of accessing this currently is to build OpenZFS from source.

The other change required is the parameter "zfs_bclone_wait_dirty" needs to be set to 1, otherwise the load patterns of Veeam synthetic fulls can trigger the old message "Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable"
https://openzfs.github.io/openzfs-docs/ ... wait_dirty

We noticed another issue in that even when we set the ZFS module parameters like the following;

Code: Select all

# vi /etc/modprobe.d/zfs.conf
options zfs zfs_bclone_enabled=1
options zfs zfs_bclone_wait_dirty=1
after a reboot, the setting was still zero.

Code: Select all

# cat /sys/module/zfs/parameters/zfs_bclone_wait_dirty
0
so to currently persist the behavior to set it to 1 require the setting - and the simplest way I could find in Rocky9 was to use @reboot option in crontab;

Code: Select all

# crontab -u root -e
@reboot echo 1 > /sys/module/zfs/parameters/zfs_bclone_wait_dirty
We ran a number of tests and were unable to get a failure on a synthetic job. and the stats are looking great;

Code: Select all

# zpool get all |grep clone
VeeamBackup  bcloneused                     18.6T                          -
VeeamBackup  bclonesaved                    18.6T                          -
VeeamBackup  bcloneratio                    2.00x                          -
Image

Big shout out to Hannes/OpenZFS team - especially Alex.
ZFS rocks!
Gostev
Chief Product Officer
Posts: 31789
Liked: 7294 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by Gostev » 1 person likes this post

@ashleyw thank you for keeping us posted. Please let us know once this version hits the standard package repos, as if you're still confident with its long term reliability then, I guess it will be a good time for me to ask QA to perform full regression testing on their end so we could remove the experimental support clause from this integration.

My only concern if making this a part of standard package repos takes too long, then we will be already quite close to V13 and QA will be rightfully refusing any and all unplanned tasks. But in that case may be I can agree with them to move only a few customers like yourself off of experimental support, by marking your account accordingly in the Customer Support database.
ashleyw
Service Provider
Posts: 207
Liked: 42 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw » 3 people like this post

@Gostev, thanks for that. I believe originally that the OpenZFS team were hoping to ship 2.3 before fall of 2024, but in view of the block cloning optimisations and other functionality currently being worked on, I wouldn't want to speculate on a planned date for 2.3 to ship and from where I'm sitting there is likely to be a 2.3 rc4 prior to the final release so I my gut feel is quarter 1 2025, but Hannes could potentially ask iXsystems as to their view.

So far we have hit no issues with 2.3 rc3 provided that the appropriate tuning parameters are used as described in this thread.
The only issues now are that our Veeam backups now run too fast with too much reliability :D

Within our company I represent the development services we supply to our own commercial development teams, so we are often able to take a slightly more experimental approach than our external customer centric divisions - and often this allows us to test out innovative approaches well before it hits our wider group.

So it would be fantastic if 2.3 could be planned to be supported by Veeam at some stage in the future, but the "experimental" tag doesn't really impact us directly at this stage as long as we can occasionally continue some technical dialogue from time to time with Hannes and the OpenZFS team.

It really is refreshing to see how dedicated Veeam is towards supporting their customers and helping to drive innovation - so massive thanks to all.
Post Reply

Who is online

Users browsing this forum: No registered users and 89 guests