-
- Service Provider
- Posts: 208
- Liked: 43 times
- Joined: Oct 28, 2010 10:55 pm
- Full Name: Ashley Watson
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
BTW, I've opened up a defect report on the ZFS forums as I can simulate a deadlock at the file system layer on tin (removing Veeam and VMware from the mix).
I suspect some of the issues I'm having may be related to certain load patterns when dealing with many large files with a heavy degree of cloned blocks in - certainly form my testing it doesn't take too many large file deletes to cause a ZFS deadlock which can last for up up to 30 minutes or so. If anyone is interested; https://github.com/openzfs/zfs/issues/16680
My guess at this stage is that the deadlocks are occurring randomly and causing timeouts at the synthetic backup creation and leading to the errors in Veeam I'm seeing.
I suspect some of the issues I'm having may be related to certain load patterns when dealing with many large files with a heavy degree of cloned blocks in - certainly form my testing it doesn't take too many large file deletes to cause a ZFS deadlock which can last for up up to 30 minutes or so. If anyone is interested; https://github.com/openzfs/zfs/issues/16680
My guess at this stage is that the deadlocks are occurring randomly and causing timeouts at the synthetic backup creation and leading to the errors in Veeam I'm seeing.
-
- Product Manager
- Posts: 14881
- Liked: 3098 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
thanks for sharing!
-
- Service Provider
- Posts: 208
- Liked: 43 times
- Joined: Oct 28, 2010 10:55 pm
- Full Name: Ashley Watson
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
For anyone interested in experimenting with zfs with large Veeam backup workloads etc, I recommend for now disabling synthetic full backups, disabling GFS on copy jobs, and switching to regular active full backups (which are obviously space intensive but are fast and reliable - depending on your zpool/vdev configuration).
-
- Service Provider
- Posts: 208
- Liked: 43 times
- Joined: Oct 28, 2010 10:55 pm
- Full Name: Ashley Watson
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
For anyone still following this, I've been running many tests under 2.3 RC3 and have the following conclusions.
- The message "Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable" still occurs when synthetics are enabled and block cloning is enabled on ZFS.
- There is no currently agreed root cause as to which side of the fence the issue exists, but Veeam suspects this is related to OpenZFS. However using the OpenZFS block cloning reliability tests I've been unable to isolate the issue to OpenZFS.
- The following doesn't eliminate the errors but seems to impact at which stage in the job they occur.
-> Change blocksize from 4MB to 8MB (or even downwards).
-> Changing compression level at a job level from optimal to none.
-> Reducing the number of concurrent jobs hitting the repository.
-> The number of jobs hitting the repositories simultaneously in the transformation stage does not appear to correlate to the failures.
- There appear to be long standing issues with synthetic backups and the load they place on the target storage device even when block cloning is in use, so even on commercial solutions many people seem to run without synthetic solutions.
- As a backup target with synthetic backups disabled, an OpenZFS appliance (in our case running Rocky9 on commodity tin with a standard HBA controller, with 23 spindles split over 4 vdevs) can easily saturate a 10gb link throughout the entire period of an Active Full backup, so we are in the process of upgrading our interconnects to 25gb.
- I'm working with Veeam and OpenZFS team on figuring out the best way of moving forwards with this. (thanks to @hannesk).
- This is how active fulls are looking like with this configuration which appears to be a health throughput;
- The message "Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable" still occurs when synthetics are enabled and block cloning is enabled on ZFS.
- There is no currently agreed root cause as to which side of the fence the issue exists, but Veeam suspects this is related to OpenZFS. However using the OpenZFS block cloning reliability tests I've been unable to isolate the issue to OpenZFS.
- The following doesn't eliminate the errors but seems to impact at which stage in the job they occur.
-> Change blocksize from 4MB to 8MB (or even downwards).
-> Changing compression level at a job level from optimal to none.
-> Reducing the number of concurrent jobs hitting the repository.
-> The number of jobs hitting the repositories simultaneously in the transformation stage does not appear to correlate to the failures.
- There appear to be long standing issues with synthetic backups and the load they place on the target storage device even when block cloning is in use, so even on commercial solutions many people seem to run without synthetic solutions.
- As a backup target with synthetic backups disabled, an OpenZFS appliance (in our case running Rocky9 on commodity tin with a standard HBA controller, with 23 spindles split over 4 vdevs) can easily saturate a 10gb link throughout the entire period of an Active Full backup, so we are in the process of upgrading our interconnects to 25gb.
- I'm working with Veeam and OpenZFS team on figuring out the best way of moving forwards with this. (thanks to @hannesk).
- This is how active fulls are looking like with this configuration which appears to be a health throughput;
-
- Product Manager
- Posts: 14881
- Liked: 3098 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
Hello,
thank you Ashley for doing all the tests and working with the OpenZFS team. The ZFS team is working on improvements.
From Veeam side, ZFS will stay unsupported for now.
Best regards
Hannes
thank you Ashley for doing all the tests and working with the OpenZFS team. The ZFS team is working on improvements.
From Veeam side, ZFS will stay unsupported for now.
Best regards
Hannes
-
- Service Provider
- Posts: 208
- Liked: 43 times
- Joined: Oct 28, 2010 10:55 pm
- Full Name: Ashley Watson
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
for anyone following this, there has been a breakthrough over the last week.
The OpenZFS team have made a number of changes/improvements to the block cloning logic in OpenZFS.
This is currently targeted for 2.3RC4 - which hasn't been tagged yet as such, so it's likely to be quite a while before it hits the standard package repos, so the only way of accessing this currently is to build OpenZFS from source.
The other change required is the parameter "zfs_bclone_wait_dirty" needs to be set to 1, otherwise the load patterns of Veeam synthetic fulls can trigger the old message "Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable"
https://openzfs.github.io/openzfs-docs/ ... wait_dirty
We noticed another issue in that even when we set the ZFS module parameters like the following;
after a reboot, the setting was still zero.
so to currently persist the behavior to set it to 1 require the setting - and the simplest way I could find in Rocky9 was to use @reboot option in crontab;
We ran a number of tests and were unable to get a failure on a synthetic job. and the stats are looking great;
Big shout out to Hannes/OpenZFS team - especially Alex.
ZFS rocks!
The OpenZFS team have made a number of changes/improvements to the block cloning logic in OpenZFS.
This is currently targeted for 2.3RC4 - which hasn't been tagged yet as such, so it's likely to be quite a while before it hits the standard package repos, so the only way of accessing this currently is to build OpenZFS from source.
The other change required is the parameter "zfs_bclone_wait_dirty" needs to be set to 1, otherwise the load patterns of Veeam synthetic fulls can trigger the old message "Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable"
https://openzfs.github.io/openzfs-docs/ ... wait_dirty
We noticed another issue in that even when we set the ZFS module parameters like the following;
Code: Select all
# vi /etc/modprobe.d/zfs.conf
options zfs zfs_bclone_enabled=1
options zfs zfs_bclone_wait_dirty=1
Code: Select all
# cat /sys/module/zfs/parameters/zfs_bclone_wait_dirty
0
Code: Select all
# crontab -u root -e
@reboot echo 1 > /sys/module/zfs/parameters/zfs_bclone_wait_dirty
Code: Select all
# zpool get all |grep clone
VeeamBackup bcloneused 18.6T -
VeeamBackup bclonesaved 18.6T -
VeeamBackup bcloneratio 2.00x -
Big shout out to Hannes/OpenZFS team - especially Alex.
ZFS rocks!
-
- Chief Product Officer
- Posts: 31835
- Liked: 7325 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
@ashleyw thank you for keeping us posted. Please let us know once this version hits the standard package repos, as if you're still confident with its long term reliability then, I guess it will be a good time for me to ask QA to perform full regression testing on their end so we could remove the experimental support clause from this integration.
My only concern if making this a part of standard package repos takes too long, then we will be already quite close to V13 and QA will be rightfully refusing any and all unplanned tasks. But in that case may be I can agree with them to move only a few customers like yourself off of experimental support, by marking your account accordingly in the Customer Support database.
My only concern if making this a part of standard package repos takes too long, then we will be already quite close to V13 and QA will be rightfully refusing any and all unplanned tasks. But in that case may be I can agree with them to move only a few customers like yourself off of experimental support, by marking your account accordingly in the Customer Support database.
-
- Service Provider
- Posts: 208
- Liked: 43 times
- Joined: Oct 28, 2010 10:55 pm
- Full Name: Ashley Watson
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
@Gostev, thanks for that. I believe originally that the OpenZFS team were hoping to ship 2.3 before fall of 2024, but in view of the block cloning optimisations and other functionality currently being worked on, I wouldn't want to speculate on a planned date for 2.3 to ship and from where I'm sitting there is likely to be a 2.3 rc4 prior to the final release so I my gut feel is quarter 1 2025, but Hannes could potentially ask iXsystems as to their view.
So far we have hit no issues with 2.3 rc3 provided that the appropriate tuning parameters are used as described in this thread.
The only issues now are that our Veeam backups now run too fast with too much reliability
Within our company I represent the development services we supply to our own commercial development teams, so we are often able to take a slightly more experimental approach than our external customer centric divisions - and often this allows us to test out innovative approaches well before it hits our wider group.
So it would be fantastic if 2.3 could be planned to be supported by Veeam at some stage in the future, but the "experimental" tag doesn't really impact us directly at this stage as long as we can occasionally continue some technical dialogue from time to time with Hannes and the OpenZFS team.
It really is refreshing to see how dedicated Veeam is towards supporting their customers and helping to drive innovation - so massive thanks to all.
So far we have hit no issues with 2.3 rc3 provided that the appropriate tuning parameters are used as described in this thread.
The only issues now are that our Veeam backups now run too fast with too much reliability
Within our company I represent the development services we supply to our own commercial development teams, so we are often able to take a slightly more experimental approach than our external customer centric divisions - and often this allows us to test out innovative approaches well before it hits our wider group.
So it would be fantastic if 2.3 could be planned to be supported by Veeam at some stage in the future, but the "experimental" tag doesn't really impact us directly at this stage as long as we can occasionally continue some technical dialogue from time to time with Hannes and the OpenZFS team.
It really is refreshing to see how dedicated Veeam is towards supporting their customers and helping to drive innovation - so massive thanks to all.
-
- Service Provider
- Posts: 208
- Liked: 43 times
- Joined: Oct 28, 2010 10:55 pm
- Full Name: Ashley Watson
- Contact:
Re: OpenZFS 2.2 support for reflinks now available
OpenZFS 2.3rc4 has just been released - I believe this is likely the last RC release before the 2.3 final version is released sometime next year.
We have had no issues running 2.3rc3, but 2.3rc4 introduced some further optimisations around block cloning, hence the reason we are rolling with it now rather than waiting.
The approach we took to get this up and running on rocky9 was the following;
We have had no issues running 2.3rc3, but 2.3rc4 introduced some further optimisations around block cloning, hence the reason we are rolling with it now rather than waiting.
The approach we took to get this up and running on rocky9 was the following;
Code: Select all
# dnf config-manager --set-enabled crb
# dnf install --skip-broken epel-release gcc make autoconf automake libtool rpm-build kernel-rpm-macros libtirpc-devel libblkid-devel libuuid-devel libudev-devel openssl-devel zlib-devel libaio-devel libattr-devel elfutils-libelf-devel kernel-devel-$(uname -r) kernel-abi-stablelists-$(uname -r | sed 's/\.[^.]\+$//') python3 python3-devel python3-setuptools python3-cffi libffi-devel
# dnf install --skip-broken --enablerepo=epel python3-packaging dkms
# cd /root
# wget https://github.com/openzfs/zfs/releases/download/zfs-2.3.0-rc4/zfs-2.3.0-rc4.tar.gz
# tar xvf zfs-2.3.0-rc4.tar.gz
# cd zfs-2.3.0-rc4
# sh autogen.sh
# ./configure
# make -j1 rpm-utils rpm-dkms
# yum localinstall *.$(uname -p).rpm *.noarch.rpm
# reboot
# zfs version
zfs-2.3.0-rc4
zfs-kmod-2.3.0-rc4
Who is online
Users browsing this forum: Semrush [Bot] and 47 guests