-
- Service Provider
- Posts: 206
- Liked: 39 times
- Joined: Oct 28, 2010 10:55 pm
- Full Name: Ashley Watson
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
BTW, I've opened up a defect report on the ZFS forums as I can simulate a deadlock at the file system layer on tin (removing Veeam and VMware from the mix).
I suspect some of the issues I'm having may be related to certain load patterns when dealing with many large files with a heavy degree of cloned blocks in - certainly form my testing it doesn't take too many large file deletes to cause a ZFS deadlock which can last for up up to 30 minutes or so. If anyone is interested; https://github.com/openzfs/zfs/issues/16680
My guess at this stage is that the deadlocks are occurring randomly and causing timeouts at the synthetic backup creation and leading to the errors in Veeam I'm seeing.
I suspect some of the issues I'm having may be related to certain load patterns when dealing with many large files with a heavy degree of cloned blocks in - certainly form my testing it doesn't take too many large file deletes to cause a ZFS deadlock which can last for up up to 30 minutes or so. If anyone is interested; https://github.com/openzfs/zfs/issues/16680
My guess at this stage is that the deadlocks are occurring randomly and causing timeouts at the synthetic backup creation and leading to the errors in Veeam I'm seeing.
-
- Product Manager
- Posts: 14807
- Liked: 3067 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
thanks for sharing!
-
- Service Provider
- Posts: 206
- Liked: 39 times
- Joined: Oct 28, 2010 10:55 pm
- Full Name: Ashley Watson
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
For anyone interested in experimenting with zfs with large Veeam backup workloads etc, I recommend for now disabling synthetic full backups, disabling GFS on copy jobs, and switching to regular active full backups (which are obviously space intensive but are fast and reliable - depending on your zpool/vdev configuration).
-
- Service Provider
- Posts: 206
- Liked: 39 times
- Joined: Oct 28, 2010 10:55 pm
- Full Name: Ashley Watson
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
For anyone still following this, I've been running many tests under 2.3 RC3 and have the following conclusions.
- The message "Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable" still occurs when synthetics are enabled and block cloning is enabled on ZFS.
- There is no currently agreed root cause as to which side of the fence the issue exists, but Veeam suspects this is related to OpenZFS. However using the OpenZFS block cloning reliability tests I've been unable to isolate the issue to OpenZFS.
- The following doesn't eliminate the errors but seems to impact at which stage in the job they occur.
-> Change blocksize from 4MB to 8MB (or even downwards).
-> Changing compression level at a job level from optimal to none.
-> Reducing the number of concurrent jobs hitting the repository.
-> The number of jobs hitting the repositories simultaneously in the transformation stage does not appear to correlate to the failures.
- There appear to be long standing issues with synthetic backups and the load they place on the target storage device even when block cloning is in use, so even on commercial solutions many people seem to run without synthetic solutions.
- As a backup target with synthetic backups disabled, an OpenZFS appliance (in our case running Rocky9 on commodity tin with a standard HBA controller, with 23 spindles split over 4 vdevs) can easily saturate a 10gb link throughout the entire period of an Active Full backup, so we are in the process of upgrading our interconnects to 25gb.
- I'm working with Veeam and OpenZFS team on figuring out the best way of moving forwards with this. (thanks to @hannesk).
- This is how active fulls are looking like with this configuration which appears to be a health throughput;
- The message "Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable" still occurs when synthetics are enabled and block cloning is enabled on ZFS.
- There is no currently agreed root cause as to which side of the fence the issue exists, but Veeam suspects this is related to OpenZFS. However using the OpenZFS block cloning reliability tests I've been unable to isolate the issue to OpenZFS.
- The following doesn't eliminate the errors but seems to impact at which stage in the job they occur.
-> Change blocksize from 4MB to 8MB (or even downwards).
-> Changing compression level at a job level from optimal to none.
-> Reducing the number of concurrent jobs hitting the repository.
-> The number of jobs hitting the repositories simultaneously in the transformation stage does not appear to correlate to the failures.
- There appear to be long standing issues with synthetic backups and the load they place on the target storage device even when block cloning is in use, so even on commercial solutions many people seem to run without synthetic solutions.
- As a backup target with synthetic backups disabled, an OpenZFS appliance (in our case running Rocky9 on commodity tin with a standard HBA controller, with 23 spindles split over 4 vdevs) can easily saturate a 10gb link throughout the entire period of an Active Full backup, so we are in the process of upgrading our interconnects to 25gb.
- I'm working with Veeam and OpenZFS team on figuring out the best way of moving forwards with this. (thanks to @hannesk).
- This is how active fulls are looking like with this configuration which appears to be a health throughput;
-
- Product Manager
- Posts: 14807
- Liked: 3067 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
Hello,
thank you Ashley for doing all the tests and working with the OpenZFS team. The ZFS team is working on improvements.
From Veeam side, ZFS will stay unsupported for now.
Best regards
Hannes
thank you Ashley for doing all the tests and working with the OpenZFS team. The ZFS team is working on improvements.
From Veeam side, ZFS will stay unsupported for now.
Best regards
Hannes
-
- Service Provider
- Posts: 206
- Liked: 39 times
- Joined: Oct 28, 2010 10:55 pm
- Full Name: Ashley Watson
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
for anyone following this, there has been a breakthrough over the last week.
The OpenZFS team have made a number of changes/improvements to the block cloning logic in OpenZFS.
This is currently targeted for 2.3RC4 - which hasn't been tagged yet as such, so it's likely to be quite a while before it hits the standard package repos, so the only way of accessing this currently is to build OpenZFS from source.
The other change required is the parameter "zfs_bclone_wait_dirty" needs to be set to 1, otherwise the load patterns of Veeam synthetic fulls can trigger the old message "Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable"
https://openzfs.github.io/openzfs-docs/ ... wait_dirty
We noticed another issue in that even when we set the ZFS module parameters like the following;
after a reboot, the setting was still zero.
so to currently persist the behavior to set it to 1 require the setting - and the simplest way I could find in Rocky9 was to use @reboot option in crontab;
We ran a number of tests and were unable to get a failure on a synthetic job. and the stats are looking great;
Big shout out to Hannes/OpenZFS team - especially Alex.
ZFS rocks!
The OpenZFS team have made a number of changes/improvements to the block cloning logic in OpenZFS.
This is currently targeted for 2.3RC4 - which hasn't been tagged yet as such, so it's likely to be quite a while before it hits the standard package repos, so the only way of accessing this currently is to build OpenZFS from source.
The other change required is the parameter "zfs_bclone_wait_dirty" needs to be set to 1, otherwise the load patterns of Veeam synthetic fulls can trigger the old message "Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable"
https://openzfs.github.io/openzfs-docs/ ... wait_dirty
We noticed another issue in that even when we set the ZFS module parameters like the following;
Code: Select all
# vi /etc/modprobe.d/zfs.conf
options zfs zfs_bclone_enabled=1
options zfs zfs_bclone_wait_dirty=1
Code: Select all
# cat /sys/module/zfs/parameters/zfs_bclone_wait_dirty
0
Code: Select all
# crontab -u root -e
@reboot echo 1 > /sys/module/zfs/parameters/zfs_bclone_wait_dirty
Code: Select all
# zpool get all |grep clone
VeeamBackup bcloneused 18.6T -
VeeamBackup bclonesaved 18.6T -
VeeamBackup bcloneratio 2.00x -
Big shout out to Hannes/OpenZFS team - especially Alex.
ZFS rocks!
-
- Chief Product Officer
- Posts: 31746
- Liked: 7250 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
@ashleyw thank you for keeping us posted. Please let us know once this version hits the standard package repos, as if you're still confident with its long term reliability then, I guess it will be a good time for me to ask QA to perform full regression testing on their end so we could remove the experimental support clause from this integration.
My only concern if making this a part of standard package repos takes too long, then we will be already quite close to V13 and QA will be rightfully refusing any and all unplanned tasks. But in that case may be I can agree with them to move only a few customers like yourself off of experimental support, by marking your account accordingly in the Customer Support database.
My only concern if making this a part of standard package repos takes too long, then we will be already quite close to V13 and QA will be rightfully refusing any and all unplanned tasks. But in that case may be I can agree with them to move only a few customers like yourself off of experimental support, by marking your account accordingly in the Customer Support database.
Who is online
Users browsing this forum: Bing [Bot], Semrush [Bot] and 2 guests