This is my first post and bold red text is telling me to have a case ID, so here it is: #07483853
I think this is worth having on the forum though for google to index. Has anyone else tried this combo? I need to describe our setup before I describe what I'm seeing, please bear with me.

We are using hardened repositories with XFS reflinking backed by LVM thin provisioned volumes. We have about 15 repositories each in the 10-18TB range. We try to keep around 1-2TB free per repository to accommodate any unexpected incremental size bursts. The storage is from a cloud provider and quite expensive, so we use LVM thin provisioned volumes to pool available burst space so that instead of having 20+TB unused space that we must pay for we only have <5TB.
We originally started out on plain virtual disks. XFS was formatted with the following parameters:
mkfs.xfs -b size=4096 -m reflink=1,crc=1,bigtime=1 -L <volname> <dev>
This used the default sunit/swidth values of 0. When we decided to convert to LVM thin volumes, we settled on a 1MB chunk size. To migrate the volumes we used xfs_copy which auto detected the correct values for sunit and swidth (256 each) and set them on the destination. However, I don't believe the existing data was changed in any way so is not taking into account that setting. On the Veeam side we are using copy jobs which do not have a configurable block size. The parent job is using a block size of 512KB.
We have one repository that holds the vast majority of our VM's, 95 to be exact. We are using per machine backup files with 31 point retention. This repository is 13TB large with 2.1TB free space.
What we're seeing is the majority of the free space not being released to the pool:
Code: Select all
$ df -h /repos/foo
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/foo 13T 11T 2.1T 85% /repos/foo
$ fstrim -v /repos/foo
/repos/foo: 536.3 GiB (575894441984 bytes) trimmed
$ lvs -a repos/foo
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
foo repos Vwi-aotz-- 13.00t repo-pool 96.47
To ty to figure out the problem, I checked fragmentation on this volume:
Code: Select all
# xfs_db -r /dev/repos/foo
xfs_db> frag -f
actual 45993872, ideal 3178, fragmentation factor 99.99%
Note, this number is largely meaningless.
Files on this filesystem average 14472.58 extents per file
XFS does have a defragment tool, but in order to function it requires contiguous free space of the size of the largest file. It's also hard to tell if it supports reflink as it is common for some XFS tools not to (e.g. xfsdump does not). This would require adding around 8TB to each repo, and since XFS does not support shrinking we would be stuck with that.
I've considered using the Veeam defragment option, but I'm a little unclear on the details. How much free space does it need when using per machine backup files. Is it the full space of the backup job, or just the largest VM in the job? How does this combine with Immutability?
What is the block size of a copy job? Is it fixed? Is it inherited from the parent? Is it configurable via CLI?
What other considerations would you recommend to ensure data is written and contained in appropriately sized and aligned blocks? E.g. would the XFS mount option "swalloc" make any sense?
Thanks!