Comprehensive data protection for all workloads
Andreas Neufert
VP, Product Management
Posts: 7098
Liked: 1517 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by Andreas Neufert » 2 people like this post

For ZFS behind an SMB filer. If the vendor support FSCTL_DUPLICATE_EXTENTS_TO_FILE and FSCTL_SET_INTEGRITY_INFORMATION on the SMB side backed by the ZFS filesystem fast cloning in the backend, then it would work out of the box with Veeam as well. I am not aware of any of the storage vendors that have implemented this combination, but if a vendor wants to implement it that way, they can reach to me and my team.
HannesK
Product Manager
Posts: 14881
Liked: 3098 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by HannesK »

DonZoomik wrote:But no generic enablement without filesystem filter (btrfs, non-ReFS SMB)? :(
it's generic enablement for block storage. BTRFS reflink works, but was never tested by QA. I also expect bcachefs to work, but I did not test that.

NFS reflink needs changes in the gateway server role (different component).
zbe
Novice
Posts: 6
Liked: 2 times
Joined: May 23, 2024 5:35 pm
Full Name: zbe
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by zbe »

Hi,

I have a stupid question perhaps. How/where can this be enabled and tested exactly? At the moment I have a SMB repository on a Proxmox VM (linux), but this won't work apparently? So if anyone is so kind to tell me basic steps to make this work from Veeam B&R on a Windows server VM (hyper-v), I'd be grateful.

Cheers.
HannesK
Product Manager
Posts: 14881
Liked: 3098 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by HannesK » 1 person likes this post

Hello,
for SMB on top of ZFS, please see Andreas answer above. That's very likely to fail.

1. Linux machine is required (you have that as far as I see). We did not test / check ZFS on Windows (but I think that's not your question)
2. Create the file /etc/veeam/EnableZFSBlockCloning file on the Linux machine
3. Add the Linux machine as Linux repository and keep the XFS checkbox checked

Best regards,
Hannes
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

Hi, Just for the benefit of anyone looking at this without the background, I'm assuming the OpenZFS support is dependent on upgrading to Veeam 12.1.2? (Which was released a couple of days ago)? Nice work to the Veeam team - Will be upgrading our deployment later this morning!
HannesK
Product Manager
Posts: 14881
Liked: 3098 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by HannesK »

Official OpenZFS support depends on how stable reflink is. The first step is, that OpenZFS project decides to make it enabled per default. Once we see it's stable, we would change to "experimental support" (explained earlier by Gostev). 12.1.2 is the minimum version from Veeam side, but that could change in future.
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

We've noticed routine problems specifically with Synthetic Fulls, OpenZFS 2.2.4-2 (with reflinks enabled) and vCentre 8 update 3a/esxi8 8 update 3.
Fully understand that 8u3a/8u3 isn't supported yet, but just wondered if Veeam has come across any of these types of issues in their labs before I dig deeper?
In the meantime, we've switched off synthetic fulls and now run active fulls once per week and the performance is great.
HannesK
Product Manager
Posts: 14881
Liked: 3098 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by HannesK »

Hello,
I would assume, that the VCenter version is completely irrelevant for synthetic fulls. Or are you saying, that these problems only exist since 8.0U3?

If you see general issues with synthetic fulls, then I recommend reporting that to the OpenZFS bug tracker.

To answer your question: no, we did not test that combination.

Doing active full makes reflink irrelevant...

Best regards,
Hannes
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

Thanks Hannes,
As you can imagine before I can engage with the OpenZFS team I'm going to need to precisely identify when the issue first occured.
The only reason I've switched back to active fulls is that we can continue to get the performance benefits out of OpenZFS but it will buy us time to investigate and isolate the problem. I'll confirm here when I've had chance to run some tests to identify the source of the issue. Thanks.
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

For anyone tracking this, OpenZFS 2.2.5 was released 2 days ago.
https://github.com/openzfs/zfs/releases/tag/zfs-2.2.5
Yet another solid release from the OpenZFS team.

It hasn't hit the el8 zfs-testing repos yet (Rocky8/RHEL8); https://github.com/openzfs/zfs/issues/16430 , but I'm assuming it will in next few days.
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw » 2 people like this post

OpenZFS 2.2.5 is up on the el9 (rocky 9/RHEL 9) testing repos. So we reinstalled our backup repositories as Rocky9 earlier today and are now testing out our backup workload (about 60 TB with primary on EMC unity 680F all flash array and backup target 24xspinning rust running OpenZFS 2.2.5. All running smoothly as expected.
zbe
Novice
Posts: 6
Liked: 2 times
Joined: May 23, 2024 5:35 pm
Full Name: zbe
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by zbe »

I'm getting following error when job tries to perform synthetic full:

Code: Select all

Agent: Failed to process method {Transform.CompileFIB}: Operation not permitted Failed to clone file extent. Source offset: 1835008, target offset: 1703936, length: 131072
Processing finished with errors at <time>
Not sure what is not permitted, for example

Code: Select all

cp file1 file2
creates a clone just fine.
HannesK
Product Manager
Posts: 14881
Liked: 3098 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by HannesK »

creates a clone just fine.
hmm, not really... it should be

Code: Select all

cp --reflink=always file1 file2
to test reflink.

Please remember, that reflink is still disabled per default on ZFS and needs to be enabled manually.
zbe
Novice
Posts: 6
Liked: 2 times
Joined: May 23, 2024 5:35 pm
Full Name: zbe
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by zbe »

Well, I see `--reflink=always` fails for some reason in LXC container. Have to figure out why.
But using `cp file1 file2` does create a clone. (It defaults to --reflink=auto, which uses copy_file_range())
tdewin
Veeam Software
Posts: 1818
Liked: 655 times
Joined: Mar 02, 2012 1:40 pm
Full Name: Timothy Dewin
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by tdewin »

could be that you have limited capabilities. Eg the container is running in an unprivileged state because you don't want it to be able to escape the container in the first place
zbe
Novice
Posts: 6
Liked: 2 times
Joined: May 23, 2024 5:35 pm
Full Name: zbe
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by zbe »

Container does indeed run in unprivileged mode, but I don't see why anything else than having rw access would matter. Can't find anything of substance documented in regards to this though.
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

zbe wrote: Aug 17, 2024 7:57 pm I'm getting following error when job tries to perform synthetic full:

Code: Select all

Agent: Failed to process method {Transform.CompileFIB}: Operation not permitted Failed to clone file extent. Source offset: 1835008, target offset: 1703936, length: 131072
Processing finished with errors at <time>
Not sure what is not permitted, for example

Code: Select all

cp file1 file2
creates a clone just fine.
Sorry for late reply - been stuck on other stuff recently.

I can confirm we are seeing the same issue when the synthetic fulls using OpenZFS. In our case on one of our main jobs, we see say 266 success and 80 fail. The retry picked up the failures and processes them correctly. Surprisingly most of the 80 failures occur at exactly the same source/target offset as what you are seeing.

Code: Select all

Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable
Failed to clone file extent. Source offset: 1835008, target offset: 1703936, length: 131072

Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable
Failed to clone file extent. Source offset: 2228224, target offset: 2097152, length: 131072
Because we have plenty of space and our throughput in OpenZFS is extremely high, we've just switched to active fulls once per week temporarily, and were then going to revisit the situation once 12.2 drops, and once we've verified v2.2.5 of OpenZFS is all bedded in as expected (been running it for nearly 3 weeks now without any known issues).
zbe
Novice
Posts: 6
Liked: 2 times
Joined: May 23, 2024 5:35 pm
Full Name: zbe
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by zbe »

ashleyw, are you using unprivileged container too? The problem I'm having is probably with capabilities of unprivileged container — ioctl(FICLONE) needs CAP_SYS_RAWIO apparently. It would work if Veeam would use/fallback to copy_file_range() instead.
birdwaffle
Influencer
Posts: 12
Liked: 3 times
Joined: Jun 19, 2019 9:55 am
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by birdwaffle »

zbe wrote: Aug 17, 2024 7:57 pm I'm getting following error when job tries to perform synthetic full:

Code: Select all

Agent: Failed to process method {Transform.CompileFIB}: Operation not permitted Failed to clone file extent. Source offset: 1835008, target offset: 1703936, length: 131072
Processing finished with errors at <time>
Not sure what is not permitted, for example

Code: Select all

cp file1 file2
creates a clone just fine.
I'm getting the same error. No containers. Just Ubuntu server 24.04.1 LTS. Kind of hard to troubleshoot right now for me as I don't have a good way of forcing a synthetic full.

I tried setting

Code: Select all

Set-VBRJobAdvancedBackupOptions -Job 'test' -Algorithm 'Incremental' -TransformFullToSyntethic $True
but it didn't result in a synthetic full.
zbe
Novice
Posts: 6
Liked: 2 times
Joined: May 23, 2024 5:35 pm
Full Name: zbe
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by zbe » 2 people like this post

Well, all I can say is that after switching to privileged container it all works as expected.
As your veeam user you can try to block clone a testfile directly on your server:

Code: Select all

cp --reflink=always testfile testfile2
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

We aren't using containers - Just OpenZFS 2.2.5 on Rocky 9 (latest 9.4) release).
Our prmary backup unit is hyperconverged under VMware (using PCI passthroough of the HBA contorller.

Code: Select all

Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable Failed to clone file extent. Source offset: 1835008, target offset: 1703936, length: 131072
The error appears random ,and the first error occurred approximate 3 hours after the initial start of the backup copy jobs.

Code: Select all

Creating synthetic full backup (xx% done) [fast clone]
We have just upgraded to OpenZFS 2.2.6 but were seeing this issue on 2.2.5 and before.
I was checking some of the stats though on IO throughput, and noticed that when we last recreated the zpool we forgot to disable atime so I've just disabled that. (We run with sync disabled as well in our case for performance).
I noticed dozens of z_wr_int processes in the process list while the synthetic fulls were running.

Code: Select all

# zpool status VeeamBackup
# zfs get atime VeeamBackup
# zfs set atime=off VeeamBackup
# zfs get atime VeeamBackup
I'll see in a weeks time if our error failure rate reduces.
birdwaffle
Influencer
Posts: 12
Liked: 3 times
Joined: Jun 19, 2019 9:55 am
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by birdwaffle »

zbe wrote: Sep 08, 2024 6:05 pm Well, all I can say is that after switching to privileged container it all works as expected.
As your veeam user you can try to block clone a testfile directly on your server:

Code: Select all

cp --reflink=always testfile testfile2
That made it a lot easier to track down than waiting for an elusive synthetic full backup. :)

Apparently I never had block cloning enabled properly (or permanently).

Though my pool showed:

Code: Select all

$ zpool get feature@block_cloning
NAME       PROPERTY               VALUE                  SOURCE
repopool1  feature@block_cloning  enabled                local
$
ZFS wasn't utilizing the feature.

Code: Select all

$ cat /sys/module/zfs/parameters/zfs_bclone_enabled
0
$
and I needed to edit /etc/modprobe.d/zfs.conf and reboot.

Code: Select all

options zfs zfs_bclone_enabled=1
Veeam currently doesn't check if either of these two requirements are met. It seems Veeam only checks if the /etc/veeam/EnableZFSBlockCloning file exists when the repository is created.
HannesK
Product Manager
Posts: 14881
Liked: 3098 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by HannesK » 1 person likes this post

Hello,
It seems Veeam only checks if the /etc/veeam/EnableZFSBlockCloning file exists when the repository is created.
correct. That file disables all reflink checks and technically it allows to use reflink on any file system that supports reflink (bcachefs, btrfs, ocfs2...)

Best regards,
Hannes
Gostev
Chief Product Officer
Posts: 31835
Liked: 7325 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by Gostev » 1 person likes this post

@david.domask please clean up this thread later and split non-ZFS specific discussions (like how compact full works) into a separate thread. As it stands right now, the original topic is completely hijacked :) and every future reader will have to break through this off-topic to find latest updates/experiences around actual ZFS support.
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

I'm finally re-visiting this again. We are running the latest version of OpenZFs (2.2.6) on Rocky 9.
Due to reliability concerns on the synthetic rolls ups failing, we've switched our backup job to run active full backups once per week instead - this is working without issue.
However our backup copy job still generate synthetic tasks that fail at random times.
eg. on a backup copy job of 6 VMs, 3 of the VMs fail in the style of;

Code: Select all

14/10/2024 1:21:13 PM :: Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable
Failed to clone file extent. Source offset: 1809317888, target offset: 2890792960, length: 1048576
On a different backup copy job with 322 VMs in 29 fail with a similar message in the style of;

Code: Select all

14/10/2024 1:21:25 PM :: Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable
Failed to clone file extent. Source offset: 1808007168, target offset: 2999058432, length: 2359296
14/10/2024 1:21:25 PM :: Processing finished with errors at 14/10/2024 1:21:25 PM  
One of the backup objects in the job that fails is a machine called VeeamProxy3, and if I export out a dump of the log information, I see this ereor in a file Agent.Transform.Target.VeeamProxy3.vm-620369.log

Code: Select all

[14.10.2024 13:20:57.859] <139869015758400> alg      |   Transmission pipe configuration, CPU number: 48, digests calc: enabled, crypto policy: Policy: 'enNoCryptoProcessing', target keyset ID: 'null'
[14.10.2024 13:20:57.859] <139869015758400> cli      |   --pex:0;4194304;4194304;0;4194304;0;0;0;0;82;3;62;133733388578598370
[14.10.2024 13:20:57.859] <139869015758400> cli      |     - 0%
[14.10.2024 13:21:09.629] <139869015758400> cli      |   --pex:49;6392119296;6392119296;0;3988783104;0;0;0;0;0;8;99;133733388696290450
[14.10.2024 13:21:09.629] <139869015758400> cli      |     - 49%
[14.10.2024 13:21:19.946] <139869015758400> cli      |   --pex:56;7335837696;7335837696;0;4932501504;0;0;0;0;0;3;99;133733388799465590
[14.10.2024 13:21:19.946] <139869015758400> cli      |     - 56%
[14.10.2024 13:21:24.036] <139869531645504> cli      | ERR |Command 'FibWrite(0x000x7f35cc07f7c0)' has failed.
[14.10.2024 13:21:24.036] <139869531645504> cli      | >>  |Resource temporarily unavailable
[14.10.2024 13:21:24.036] <139869531645504> cli      | >>  |Failed to clone file extent. Source offset: 1808007168, target offset: 2999058432, length: 2359296
[14.10.2024 13:21:24.036] <139869531645504> cli      | >>  |--tr:Failed to duplicate file extent from HostFS:///VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T060045_79DE.vib:1808007168 to /VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T132043_8D80.vbk:2999058432. Extent length: 2359296
[14.10.2024 13:21:24.036] <139869531645504> cli      | >>  |--tr:Failed to process de-duplicated block
[14.10.2024 13:21:24.036] <139869531645504> cli      | >>  |--tr:Failed to write block to blocks store.
[14.10.2024 13:21:24.036] <139869531645504> cli      | >>  |--tr:Failed to append block. FIB index: [2060]. FIB: [VeeamProxy3-flat.vmdk].
[14.10.2024 13:21:24.036] <139869531645504> cli      | >>  |--tr:Failed to process area [Data block. Start offset: [8640266240], Length: [4194304], Area ID: [1289].].
[14.10.2024 13:21:24.036] <139869531645504> cli      | >>  |An exception was thrown from thread [139869531645504].
[14.10.2024 13:21:24.036] <139868524877376> cli      | ERR |Send thread (channel) has failed.
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |Resource temporarily unavailable
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |Failed to clone file extent. Source offset: 1808007168, target offset: 2999058432, length: 2359296
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |--tr:Failed to duplicate file extent from HostFS:///VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T060045_79DE.vib:1808007168 to /VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T132043_8D80.vbk:2999058432. Extent length: 2359296
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |--tr:Failed to process de-duplicated block
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |--tr:Failed to write block to blocks store.
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |--tr:Failed to append block. FIB index: [2060]. FIB: [VeeamProxy3-flat.vmdk].
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |--tr:Failed to process area [Data block. Start offset: [8640266240], Length: [4194304], Area ID: [1289].].
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |--tr:Failed to wait for recorder invoke.
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |--tr:Failed to check queue and pending invoke status. Wait flag 'true'.
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |--tr:FIB proxy was unable to process write operation, recorder '0x000x7f35cc07f7c0', area offset '8778678272', FIB 'Backup of the FIB VeeamProxy3-flat.vmdk'.
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |--tr:Unable to asynchronously write data block. Block identity: [Data block. Start offset: [8778678272], Length: [4194304], Area ID: [1318].].
[14.10.2024 13:21:24.036] <139868524877376> cli      | >>  |An exception was thrown from thread [139869531645504].
[14.10.2024 13:21:24.036] <139868524877376> cli      | Thread finished. Role: 'Area writer: Backup of the FIB VeeamProxy3-flat.vmdk'.
[14.10.2024 13:21:24.036] <139869015758400> alg      |   Stopping processing threads....
[14.10.2024 13:21:24.036] <139868076234304> cli      | Thread finished. Role: 'Processing thread: Digests calculator.'.
[14.10.2024 13:21:24.036] <139868516484672> cli      | Thread finished. Role: 'Area processing dispatcher: Digests calculator.'.
[14.10.2024 13:21:24.036] <139868550055488> cli      | Thread finished. Role: 'area reader : VeeamProxy3-flat.vmdk'.
[14.10.2024 13:21:24.036] <139869015758400> alg      |   Processing threads were stopped.
[14.10.2024 13:21:24.036] <139869015758400> alg      |   Processing conveyored task... Failed.
[14.10.2024 13:21:24.036] <139869015758400> alg      | ERR |Transmission pipe error.
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |Resource temporarily unavailable
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |Failed to clone file extent. Source offset: 1808007168, target offset: 2999058432, length: 2359296
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |--tr:Failed to duplicate file extent from HostFS:///VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T060045_79DE.vib:1808007168 to /VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T132043_8D80.vbk:2999058432. Extent length: 2359296
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |--tr:Failed to process de-duplicated block
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |--tr:Failed to write block to blocks store.
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |--tr:Failed to append block. FIB index: [2060]. FIB: [VeeamProxy3-flat.vmdk].
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |--tr:Failed to process area [Data block. Start offset: [8640266240], Length: [4194304], Area ID: [1289].].
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |--tr:Failed to wait for recorder invoke.
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |--tr:Failed to check queue and pending invoke status. Wait flag 'true'.
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |--tr:FIB proxy was unable to process write operation, recorder '0x000x7f35cc07f7c0', area offset '8778678272', FIB 'Backup of the FIB VeeamProxy3-flat.vmdk'.
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |--tr:Unable to asynchronously write data block. Block identity: [Data block. Start offset: [8778678272], Length: [4194304], Area ID: [1318].].
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |--tr:Processing of asynchronous write requests has failed. Output file: [Backup of the FIB VeeamProxy3-flat.vmdk].
[14.10.2024 13:21:24.036] <139869015758400> alg      | >>  |An exception was thrown from thread [139869531645504].
[14.10.2024 13:21:24.037] <139869015758400> stg      |   Updating item in local backup. Item details: [Internal FIB 'VeeamProxy3-flat.vmdk']. Is incremental CTK is available: [false]. Failed.
[14.10.2024 13:21:24.037] <139869015758400> dsk      |   Unbuffered file I/O statistics for '/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T060045_79DE.vib': 417792 bytes read (2 requests), 1052672 bytes read unaligned (3 requests)
[14.10.2024 13:21:24.037] <139869015758400> stg      |   Closing storage file [HostFS:///VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T060045_79DE.vib].
[14.10.2024 13:21:24.037] <139869015758400> stg      |     Removing lock on file [/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T060045_79DE.vib].
[14.10.2024 13:21:24.037] <139869015758400> dsk      |   Unbuffered file I/O statistics for '/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-11T060019_1FAE.vib': 417792 bytes read (2 requests), 1052672 bytes read unaligned (3 requests)
[14.10.2024 13:21:24.037] <139869015758400> stg      |   Closing storage file [HostFS:///VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-11T060019_1FAE.vib].
[14.10.2024 13:21:24.037] <139869015758400> stg      |     Removing lock on file [/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-11T060019_1FAE.vib].
[14.10.2024 13:21:24.037] <139869015758400> dsk      |   Unbuffered file I/O statistics for '/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-10T060024_1ED1.vib': 417792 bytes read (2 requests), 1052672 bytes read unaligned (3 requests)
[14.10.2024 13:21:24.037] <139869015758400> stg      |   Closing storage file [HostFS:///VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-10T060024_1ED1.vib].
[14.10.2024 13:21:24.037] <139869015758400> stg      |     Removing lock on file [/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-10T060024_1ED1.vib].
[14.10.2024 13:21:24.037] <139869015758400> dsk      |   Unbuffered file I/O statistics for '/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-09T060051_59B8.vib': 417792 bytes read (2 requests), 1052672 bytes read unaligned (3 requests)
[14.10.2024 13:21:24.037] <139869015758400> stg      |   Closing storage file [HostFS:///VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-09T060051_59B8.vib].
[14.10.2024 13:21:24.037] <139869015758400> stg      |     Removing lock on file [/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-09T060051_59B8.vib].
[14.10.2024 13:21:24.037] <139869015758400> dsk      |   Unbuffered file I/O statistics for '/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-08T060222_64E0.vbk': 417792 bytes read (2 requests), 1052672 bytes read unaligned (3 requests)
[14.10.2024 13:21:24.037] <139869015758400> stg      |   Closing storage file [HostFS:///VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-08T060222_64E0.vbk].
[14.10.2024 13:21:24.037] <139869015758400> stg      |     Removing lock on file [/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-08T060222_64E0.vbk].
[14.10.2024 13:21:24.037] <139869015758400> bkp      | Building new FIB "VBK: 'veeamfs:a:747fb8ab-a505-4ee1-aa0e-c28b12257900 (vm-620369)\VeeamProxy3-flat.vmdk@/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T132043_8D80.vbk'". Failed.
[14.10.2024 13:21:24.037] <139869015758400> cli      | ERR |Failed to process method {Transform.CompileFIB}
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |Resource temporarily unavailable
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |Failed to clone file extent. Source offset: 1808007168, target offset: 2999058432, length: 2359296
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to duplicate file extent from HostFS:///VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T060045_79DE.vib:1808007168 to /VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T132043_8D80.vbk:2999058432. Extent length: 2359296
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to process de-duplicated block
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to write block to blocks store.
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to append block. FIB index: [2060]. FIB: [VeeamProxy3-flat.vmdk].
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to process area [Data block. Start offset: [8640266240], Length: [4194304], Area ID: [1289].].
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to wait for recorder invoke.
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to check queue and pending invoke status. Wait flag 'true'.
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:FIB proxy was unable to process write operation, recorder '0x000x7f35cc07f7c0', area offset '8778678272', FIB 'Backup of the FIB VeeamProxy3-flat.vmdk'.
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Unable to asynchronously write data block. Block identity: [Data block. Start offset: [8778678272], Length: [4194304], Area ID: [1318].].
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Processing of asynchronous write requests has failed. Output file: [Backup of the FIB VeeamProxy3-flat.vmdk].
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to process conveyored task.
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to update item in local backup. Item details: [Internal FIB 'VeeamProxy3-flat.vmdk'].
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to build new FIB "VBK: 'veeamfs:a:747fb8ab-a505-4ee1-aa0e-c28b12257900 (vm-620369)\VeeamProxy3-flat.vmdk@/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T132043_8D80.vbk'" from backup chain.
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |An exception was thrown from thread [139869531645504].
[14.10.2024 13:21:24.037] <139869015758400> cli      | ERR |Failed to process {Invoke} command
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |Duration: [00:00:26.533]
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |Details:
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |Failed to clone file extent. Source offset: 1808007168, target offset: 2999058432, length: 2359296
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to duplicate file extent from HostFS:///VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T060045_79DE.vib:1808007168 to /VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T132043_8D80.vbk:2999058432. Extent length: 2359296
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to process de-duplicated block
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to write block to blocks store.
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to append block. FIB index: [2060]. FIB: [VeeamProxy3-flat.vmdk].
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to process area [Data block. Start offset: [8640266240], Length: [4194304], Area ID: [1289].].
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to wait for recorder invoke.
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to check queue and pending invoke status. Wait flag 'true'.
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:FIB proxy was unable to process write operation, recorder '0x000x7f35cc07f7c0', area offset '8778678272', FIB 'Backup of the FIB VeeamProxy3-flat.vmdk'.
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Unable to asynchronously write data block. Block identity: [Data block. Start offset: [8778678272], Length: [4194304], Area ID: [1318].].
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Processing of asynchronous write requests has failed. Output file: [Backup of the FIB VeeamProxy3-flat.vmdk].
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to process conveyored task.
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to update item in local backup. Item details: [Internal FIB 'VeeamProxy3-flat.vmdk'].
[14.10.2024 13:21:24.037] <139869015758400> cli      | >>  |--tr:Failed to build new FIB "VBK: 'veeamfs:a:747fb8ab-a505-4ee1-aa0e-c28b12257900 (vm-620369)\VeeamProxy3-flat.vmdk@/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T132043_8D80.vbk'" from backup chain.
[14.10.2024 13:21:24.039] <139868990580288> cli      | Next client command: [closeStg].
[14.10.2024 13:21:24.039] <139868990580288> cli      |       :>  /VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T132043_8D80.vbk
[14.10.2024 13:21:24.039] <139868990580288> cli      | Closing storage '/VeeamBackup/XFSBackups/backups/copyjob:Orbit_Critical Management VMs/Orbit_Critical Management VMs/VeeamProxy3.vm-620369D2024-10-14T132043_8D80.vbk'.
just to avoid confusion, the target system has the Veeam Backups in a folder called /VeeamBackup/XFSBackups, but this is not an XFS file system this is a ZFS file system (the naming is while we were experimenting in overlaying file systems on ZFS and I kept the name the same to ease switching over to ZFS).
Below are what I think its relevant to our target backup copy server.
The source backup target copy server has a similar configuration.

Code: Select all

# cat /proc/mounts|grep VeeamBackup
VeeamBackup /VeeamBackup zfs rw,seclabel,noatime,xattr,noacl,casesensitive 0 0
# zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
VeeamBackup  98.5T  61.0T  98.4T  /VeeamBackup
# zpool status
  pool: VeeamBackup
 state: ONLINE
config:

        NAME                                       STATE     READ WRITE CKSUM
        VeeamBackup                                ONLINE       0     0     0
          raidz1-0                                 ONLINE       0     0     0
            sdc                                    ONLINE       0     0     0
            scsi-35000039ae873d3e5                 ONLINE       0     0     0
            scsi-35000039ae873c119                 ONLINE       0     0     0
            scsi-35000039ae873d3c9                 ONLINE       0     0     0
            scsi-35000039ae878d31d                 ONLINE       0     0     0
            sdf                                    ONLINE       0     0     0
          raidz1-1                                 ONLINE       0     0     0
            sdg                                    ONLINE       0     0     0
            sdh                                    ONLINE       0     0     0
            sdi                                    ONLINE       0     0     0
            sdj                                    ONLINE       0     0     0
            sdk                                    ONLINE       0     0     0
            sdm                                    ONLINE       0     0     0
          raidz1-2                                 ONLINE       0     0     0
            sdn                                    ONLINE       0     0     0
            sdo                                    ONLINE       0     0     0
            sdp                                    ONLINE       0     0     0
            sdq                                    ONLINE       0     0     0
            sdr                                    ONLINE       0     0     0
            sds                                    ONLINE       0     0     0
          raidz1-3                                 ONLINE       0     0     0
            sdt                                    ONLINE       0     0     0
            sdu                                    ONLINE       0     0     0
            sdv                                    ONLINE       0     0     0
            sdw                                    ONLINE       0     0     0
        spares
          scsi-STOSHIBA_MG06SCA800EY_51X0A13GF1GF  AVAIL

errors: No known data errors
One thing I think I've noticed in the past is that while the jobs are running, the non transformation steps are included in part of the number of parallel tasks hitting the storage unit, but I believe the transformation steps might not be.
Could it be something like this that is overloading the storage throttling as the number of parallel tasks hitting the storage isn't being represented under the total number of tasks.
As said, its strange that this error is intermittent, and its strange that 50% of the transformation stages on a 6 VM copy job are fialing, whereas about 9% of the transformation stages are fialing on a 322 VM copy job.
I suspect that whatever the cause of the copy job failures is the same cause as our synthetic transformation stages filing when we enable synthetic fulls.

Any thoughts?
I've attached the log files to the ticket I opened a while back (Hannes know the ticket number is 07451905)
thanks
Ashley
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

Just to make it clearer of whats going on here, I've pasted a screen shot of one of our smaller copy job that had a 50% failure rate as discussed above. All failures were at the transformation stage.
Image
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

for anyone interested here. I think the challenge is to be able to consistently reproduce the issue outside of Veeam (assuming this is a ZFS issue).
It appears there are essentially two scripts that are out there, zhammer.sh and reproducer.sh which were used to investigate block cloning issues early on when the OpenZFS team introduced the reflink support into zfs.
https://raw.githubusercontent.com/numin ... zhammer.sh
https://gist.github.com/tonyvi hutter/d ... roducer.sh

reproducer appears to write 1000 files of 1M each so we'll use a gnu utility called parallel to spin up 8 instances of this concurrently to simulate the concurrent load patterns on the storage repository.

Code: Select all

# yum install parallel
# cd /VeeamBackup/Testing
# wget https://gist.github.com/tonyhutter/d69f305508ae3b7ff6e9263b22031a84/raw/c543f1c60eec6c115bd3d1d97fa45c4bb8b3e573/reproducer.sh
# chmod o+x reproducer.sh
# echo "will cite"|parallel --citation
# seq 8 | parallel -n0 /VeeamBackup/Testing/reproducer.sh
I can't cause that to fail so let's move onto zhammer (which appears to run a parallel)

Code: Select all

# wget https://raw.githubusercontent.com/numinit/nixpkgs/refs/heads/zhammer/nixos/tests/zhammer.sh
# chmod o+x zhammer.sh
# ./zhammer.sh
I can't cause that to fail either., so this would indicate to me there is not an issue occurring at the zfs layer.
I'll now attempt to reproduce the issue from a Veeam perspective with the smallest job and proxy+storage throttles that I can in the absence of a script/version of those scripts that can simulate the Veeam traffic patterns.

My other concern is that the the synthetic full process is by nature heavy (in terms of RAM and everything else)... If we are running multiple proxies (like we do), what is stopping the scheduler allocating multiple synthetic processes to the same proxy and therefor overloading it (reasoning here is synthetic transformation process is a very different workload pattern to an incremental)?
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

To proceed with the testing we need to able to run synthetic transformation jobs on demand.
My understanding is we can't do this (even via power-shell) multiple times in one day - so it becomes very hard to attempt to replicate a transient error.
HannesK
Product Manager
Posts: 14881
Liked: 3098 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by HannesK »

Hello,
there is a registry key that enforces transforms at every job run. That's a global setting. The key can be provided via support (it a "restricted" key, so I cannot post it in public here).

Best regards,
Hannes
ashleyw
Service Provider
Posts: 208
Liked: 43 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: OpenZFS 2.2 support for reflinks now available

Post by ashleyw »

thanks for that key Hannes,

It enabled me to consistently reproduce the issue using a single proxy (with 8 parallel processes) and a storage device running ZFS on rocky9 with a throttle limit of 4.
Both proxy and storage unit run on the same physical piece of tin under ESXi 8. the Rocky9 VM uses PCI passthrough of the HBA controller.
We are running a 23 disk raid set striping across 4 raidz1 sets with each spindle running a 16TB disk. (SEAGATE ST16000NM006) - not that any of that is relevant.

The Veeam job crashes out with the same messages as before;
17/10/2024 2:20:40 PM :: Agent: Failed to process method {Transform.CompileFIB}: Resource temporarily unavailable
Failed to clone file extent. Source offset: 3345874944, target offset: 14280163328, length: 1835008
In our lab that is about 12 minutes into a large job.

The error causes subsequent VM backups within the job to fail.
I've attached a full set of log files to the ticket (including /var/log).
If I retry the job afterwards, the job restarts ok and the VM that was previously failing on the transformation stage then goes through a successful transformation - indicating to me this is not a ZFS block cloning/corruption issue. (note: without the registry key from support it's not possible to force the transformation stage on reruns of the job within the same day,and the only reason I needed this key was to attempt to reproduce the issue).

Not sure where I go from here, but for now we are continuing to use ZFS but we have synthetics disabled, and we are able to ingest quickly enough and have sufficient storage where we can run full backups once per week.

Code: Select all

# zpool status
  pool: VeeamBackup
 state: ONLINE
config:

        NAME         STATE     READ WRITE CKSUM
        VeeamBackup  ONLINE       0     0     0
          raidz1-0   ONLINE       0     0     0
            sdc      ONLINE       0     0     0
            sdd      ONLINE       0     0     0
            sde      ONLINE       0     0     0
            sdf      ONLINE       0     0     0
            sdg      ONLINE       0     0     0
            sdh      ONLINE       0     0     0
          raidz1-1   ONLINE       0     0     0
            sdi      ONLINE       0     0     0
            sdj      ONLINE       0     0     0
            sdk      ONLINE       0     0     0
            sdl      ONLINE       0     0     0
            sdm      ONLINE       0     0     0
            sdn      ONLINE       0     0     0
          raidz1-2   ONLINE       0     0     0
            sdo      ONLINE       0     0     0
            sdp      ONLINE       0     0     0
            sdq      ONLINE       0     0     0
            sdr      ONLINE       0     0     0
            sds      ONLINE       0     0     0
            sdt      ONLINE       0     0     0
          raidz1-3   ONLINE       0     0     0
            sdu      ONLINE       0     0     0
            sdv      ONLINE       0     0     0
            sdw      ONLINE       0     0     0
            sdx      ONLINE       0     0     0
            sdy      ONLINE       0     0     0
        spares
          sdz        AVAIL

errors: No known data errors
and example of one of our disks

Code: Select all

diskseq/18, /dev/disk/by-id/scsi-SSEAGATE_ST16000NM006J_???????
  Device Number: block 8:240-8:255 (char 21:17)
  Geometry (Logical): CHS 1945332/255/63
  Size: 31251759104 sectors a 512 bytes
  Capacity: 14902 GB (16000900661248 bytes)
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #14 (Serial Attached SCSI controller)
  
Happy to help in any way possible.
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot], mkretzer, Semrush [Bot] and 86 guests