XFS and Thin LVM Volumes

MrGrim · Post by **MrGrim** » Oct 29, 2024 6:45 pm this post

Hello,

This is my first post and bold red text is telling me to have a case ID, so here it is: #07483853

I think this is worth having on the forum though for google to index. Has anyone else tried this combo? I need to describe our setup before I describe what I'm seeing, please bear with me.

We are using hardened repositories with XFS reflinking backed by LVM thin provisioned volumes. We have about 15 repositories each in the 10-18TB range. We try to keep around 1-2TB free per repository to accommodate any unexpected incremental size bursts. The storage is from a cloud provider and quite expensive, so we use LVM thin provisioned volumes to pool available burst space so that instead of having 20+TB unused space that we must pay for we only have <5TB.

We originally started out on plain virtual disks. XFS was formatted with the following parameters:

mkfs.xfs -b size=4096 -m reflink=1,crc=1,bigtime=1 -L <volname> <dev>

This used the default sunit/swidth values of 0. When we decided to convert to LVM thin volumes, we settled on a 1MB chunk size. To migrate the volumes we used xfs_copy which auto detected the correct values for sunit and swidth (256 each) and set them on the destination. However, I don't believe the existing data was changed in any way so is not taking into account that setting. On the Veeam side we are using copy jobs which do not have a configurable block size. The parent job is using a block size of 512KB.

We have one repository that holds the vast majority of our VM's, 95 to be exact. We are using per machine backup files with 31 point retention. This repository is 13TB large with 2.1TB free space.

What we're seeing is the majority of the free space not being released to the pool:

Code: Select all

$ df -h /repos/foo
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/foo                     13T   11T  2.1T  85% /repos/foo
$ fstrim -v /repos/foo
/repos/foo: 536.3 GiB (575894441984 bytes) trimmed
$ lvs -a repos/foo
LV            VG    Attr       LSize  Pool      Origin Data%  Meta%  Move Log Cpy%Sync Convert
foo           repos Vwi-aotz-- 13.00t repo-pool        96.47

This is actually an improvement, it was only 17GB a few days ago. This problem may slowly resolve with time...

To ty to figure out the problem, I checked fragmentation on this volume:

Code: Select all

# xfs_db -r /dev/repos/foo
xfs_db> frag -f
actual 45993872, ideal 3178, fragmentation factor 99.99%
Note, this number is largely meaningless.
Files on this filesystem average 14472.58 extents per file

I suspect the high fragmentation and the bulk of the data being written when sunit/swidth were 0 is the reason why so little data can be trimmed. This will likely resolve to a degree as time goes on, but a lot of data remains fairly static.

XFS does have a defragment tool, but in order to function it requires contiguous free space of the size of the largest file. It's also hard to tell if it supports reflink as it is common for some XFS tools not to (e.g. xfsdump does not). This would require adding around 8TB to each repo, and since XFS does not support shrinking we would be stuck with that.

I've considered using the Veeam defragment option, but I'm a little unclear on the details. How much free space does it need when using per machine backup files. Is it the full space of the backup job, or just the largest VM in the job? How does this combine with Immutability?

What is the block size of a copy job? Is it fixed? Is it inherited from the parent? Is it configurable via CLI?

What other considerations would you recommend to ensure data is written and contained in appropriately sized and aligned blocks? E.g. would the XFS mount option "swalloc" make any sense?

Thanks!

mkretzer · Post by **mkretzer** » Oct 29, 2024 7:41 pm this post

Do you have discard enabled in the filesystem? I am not sure but discarding unused blocks might be necessary to regain LVM thin space.

MrGrim · Post by **MrGrim** » Oct 29, 2024 9:19 pm this post

fstrim and the discard option are two means to the same end. One does not require the other. E.g. that fstrim command I ran did reclaim the space it was able to. The difference is scheduled full or on demand style with various performance possibilities for either. For example, here is the output for a repo that contains only 1 very large VM that has very little churn:

Code: Select all

$ df -h /repos/bar
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/bar                     14T   13T  1.7T  89% /repos/bar
$ fstrim -v /repos/bar
/repos/bar: 1.7 TiB (1865565347840 bytes) trimmed

Edit: Actually now that I check this has a parent job with a block size of 1MB. Do copy jobs inherit the parents block size?

Post by **tdewin** » Oct 30, 2024 11:52 am this post

When you use XFS with relinks, we will use "fast cloning" to make synthetic fulls. This means that a new full will refer to chunks of data in the old vbk-files. Whenever that previous file is deleted, inherently you get fragmentation. The problem with defragmenting in this situation is always, for which file? If you have 10 VBKs, should it be the first one, second, ..nth? For which one XFS should optimise?

MrGrim · Post by **MrGrim** » Oct 30, 2024 2:37 pm this post

Indeed I suspected the fragmentation was unavoidable. With reflink the very nature of what is "contiguous" becomes murky. So I think the goal should be to work with the various alignment and block size settings to still allow effective discards. My focus is on repair options for when those get mismatched and you don't have a lot of space to be throwing around fresh full backups. My secondary goal is to simply ask what others experience is and if they have other gotcha's I should be looking out for.

My tertiary goal is to get as much of this info in a place indexable by search engines as possible.

What I need to learn is:

* How do you set the block size for a copy job?
* Does the Veeam maintenance/compaction/defrag operation work with per machine chains in a way that minimizes the balloon space required. If I have 50 VM's in a job, do I need room for a full backup of all 50, or just the largest of the 50?

Appreciate the feedback!

Post by **tdewin** » Oct 31, 2024 1:31 pm this post

Not R&D here but just a pre-sales so take it with a grain of salt. Just my experience analyzing XFS

It is hard to say on what the blocks will align. Since we use compression the 1MB block size might not match 1MB. Eg it can be compressed to a lower amount, maybe sometimes half but it not necessarily a fixed alignment (just like a zip file is not necessary 1/2 of your original file). The block size you select in the job is how we read the source (vmdk if you use vSphere). We then take that block and compress it. So of course the bigger the block size you select at source, the less fragmentation you get at the target but also the bigger your incrementals will be and the less amount of sharing you potentially get. 1MB seems to be an optimal point between savings and performance.

The block size for a copy job is inherited from the source job because otherwise we would have to repack blocks (eg reading multiple blocks from source chain, potentially different files) to create new blocks.

The compacting maintenance job in the past was made for Reverse Incremental and Forever incremental jobs where data could get stale (we cannot shrink files). I don't think it does a lot for forward chains with weekly fulls as we build a full only with the blocks we need (presumably if you use XFS, you have weekly fulls enabled). This is also mentioned in the helpcenter (If you schedule periodic full backups, the Defragment and compact full backup file check box does not apply. https://helpcenter.veeam.com/docs/backu ... ml?ver=120)

Ultimately, what we store is already quite optimised so thin provisioning might only get you so far

Nov 01, 2024 2:21 pm

When we decided to convert to LVM thin volumes, we settled on a 1MB chunk size.

It feels like this is the core of the problem. If you provisioned 1MB chunks, then you need entire 1MB aligned chunks to be free for them to be returned to the pool. If even 1 byte is used from a 1MB chunk, then it must stay allocated. Because of the way block cloning works, and the overall used space on the volume, it is quite unlikely that there are significant amounts of 1MB aligned chunks to be freed, I'm actually surprised it's as high as you are seeing.

To have a higher chance of having free chunks which can be returned to the pool, you would need a significantly smaller chunk size, probably something like 64K, although this would likely have a negative impact on fragmentation at the LVM layer, potentially adding a second layer of fragmentation with potential for significant performance penalty, especially as part of restores which require granular access such as FLR.

dejan.ilic · Post by **dejan.ilic** » Nov 05, 2024 11:17 am this post

Also check this (from the documentation) regarding the LVM volumes

Create a thin pool with a specific discards mode:
$ lvcreate --type thin-pool -n ThinPool -L Size
--discards ignore|nopassdown|passdown VG

Change the discards mode of an existing thin pool:
$ lvchange --discards ignore|nopassdown|passdown VG/ThinPool

mkretzer wrote: ↑Oct 29, 2024 7:41 pm Do you have discard enabled in the filesystem? I am not sure but discarding unused blocks might be necessary to regain LVM thin space.

R&D Forums

XFS and Thin LVM Volumes

Re: XFS and Thin LVM Volumes

Re: XFS and Thin LVM Volumes

Re: XFS and Thin LVM Volumes

Re: XFS and Thin LVM Volumes

Re: XFS and Thin LVM Volumes

Re: XFS and Thin LVM Volumes

Re: XFS and Thin LVM Volumes

Who is online