Comprehensive data protection for all workloads
Post Reply
DonZoomik
Expert
Posts: 109
Liked: 26 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Feature request: sparse file support to avoid the need for compact operations

Post by DonZoomik » Feb 18, 2019 3:53 pm

Most file systems (NTFS, ReFS, XFS, ext4, ZFS...) support sparse files and most operating systems support functions to "punch holes" in files (also over SMB).
If Veeam supported punching holes in VBKs, it would remove one main need for compact operation - forever growing VBK files.

Use case:
A VM has a VBK of 1TB
500GB of data is added to VM, resulting in 500GB VIB (VIB1)
750GB of data is deleted within VM, captured in VIB (VIB2)
When VIB1 is merged, VBK grows to 1,5TB
When VIB2 is merged, VBK stays at 1,5TB, wasting 750GB of disk space.

Expensive compact with buffer disk space (if not using ReFS) or full backup with retention expiring is required to recover wasted space.
If Veeam were to use sparse files, these 750GB could be "punched out" of the file, resulting in file having logical size of 1,5TB but only 750GB of physical disk space used.

In my real-life case I had a few huge (10TB+) file servers that were refactored into smaller ones. Disk space on backup target was not released as data was migrated so there had to be a lot of hand-holding and careful timing just to not run out of disk space.

I found an old thread with some discussion about it: veeam-backup-replication-f2/remove-clie ... t8947.html
It mentions that tape drives will still see full logical size. However tape compression should mitigate it, as these punched out regions are now just zeroes that should compress to nothing (naive impression). I'm not sure if deduplication appliances and other more exotic devices support this.
Also mentioned that most large customers do regular fulls. I've seen mentioned in other threads that nowadays most customers just use forever-incremental. So it should be something to reconsider after ~6-7 years.

ejenner
Expert
Posts: 313
Liked: 43 times
Joined: Mar 23, 2018 4:43 pm
Full Name: EJ
Location: London
Contact:

Re: Feature request: sparse file support to avoid the need for compact operations

Post by ejenner » Feb 19, 2019 10:18 am

In your real-life case didn't you want the historical restore points?

If something had gone wrong with your migration to the new servers and you wanted to go back to the old data?

Just saying this as I'm trying to work out what would cause the large empty sections in the first instance. I'm thinking it's a fairly exceptional occurrence rather than something which happens all the time?

DonZoomik
Expert
Posts: 109
Liked: 26 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Feature request: sparse file support to avoid the need for compact operations

Post by DonZoomik » Feb 19, 2019 10:39 am

I simplified my case a bit. Of course there are more than one or two restore points, let's say 14. But the point remains, once the big delete gets merged to VBK after 14 new restore points, I'm still stuck with a VBK that is as big as originally.
This is an extreme case but generally I restart chains once in a while. As VMs live new data gets written to new previously unused blocks (that cause VBK growth) and deleted from others (with no reclamation) - for example log rotation, patching etc. So even if VM size stays relatively constant, VBKs keeps growing. I have no exact measured data but I'd say maybe 10% per quarter. Growth slows down over time as blocks already described in VBK get reused but the point remains - no way to reclaim without expensive compact or restarting chain (both require buffer space and may be unfeasible for huge VMs).

ejenner
Expert
Posts: 313
Liked: 43 times
Joined: Mar 23, 2018 4:43 pm
Full Name: EJ
Location: London
Contact:

Re: Feature request: sparse file support to avoid the need for compact operations

Post by ejenner » Feb 19, 2019 12:02 pm

I suppose with it being an occasional thing rather than something which always happens you could avoid running out of disk space by copying your old backup chain off the repository and starting a new chain. Would not really have to be automated as it's occasional and you're aware that it is happening so you can manage the process. It can be a bit frustrating I know, I've had to do similar things with large file servers... but can't see the case for automating it for reasons stated above.

DonZoomik
Expert
Posts: 109
Liked: 26 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Feature request: sparse file support to avoid the need for compact operations

Post by DonZoomik » Feb 19, 2019 12:44 pm

Copying data off repository comes down again to buffer space.
I still see sparse files as a silver bullet in these cases. Considering that on Windows side, ReFS/NTFS deduplicated volumes files are already sparse, it's an easy win. On Windows side it shouldn't be that hard to implement as well, set file as sparse with FSCTL_SET_SPARSE, on merge clear nonexistant data with FSCTL_SET_ZERO_DATA (information on cleared blocks has to be in the chain or compact wouldn't know what to skip). I'm not sure about Linux but I presume that there are similar syscalls/IOCTL/commands (fallocate?). Maybe as an experimental feature controlled by a registry flag (like ReFSDedupeBlockClone)?

Gostev
SVP, Product Management
Posts: 24293
Liked: 3330 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Feature request: sparse file support to avoid the need for compact operations

Post by Gostev » Feb 19, 2019 9:57 pm

You can just backup to ReFS with synthetic fulls enabled?

Same exact benefits with the GA version of the product and using by now well stabilized functionality delivered 3 years ago!

Sounds so much better than betting your data integrity on some rarely used extended file system controls that only God know how many bad data corruption bugs they may have due :D

DonZoomik
Expert
Posts: 109
Liked: 26 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Feature request: sparse file support to avoid the need for compact operations

Post by DonZoomik » Feb 19, 2019 11:40 pm

ReFS does help but it can't be used everywhere. On one site I have to use a ZFS box, on the other NTFS etc, backup copy to a NAS etc... Also some clients just hate Windows with a passion (barely putting up with VBR server but storing data on Linux).
Sparse file support has existed for nearly 20 years on Windows alone, I can' t find any markers on XFS (but being extent-based, likely from inception), so I doubt that there are a huge number of bugs left.

HannesK
Veeam Software
Posts: 3425
Liked: 409 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Feature request: sparse file support to avoid the need for compact operations

Post by HannesK » Feb 20, 2019 6:58 am

so I doubt that there are a huge number of bugs left
I believed the same in various situations. I worked a lot with Linux in the past and asked at Veeam from time to time why we do specific things. I often got the answer "we tested it and it broke or created several thousand support cases". A deeper integration into specific file systems cost a lot of R&D resources (especially QA) that could be spend better for new features.

DonZoomik
Expert
Posts: 109
Liked: 26 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Feature request: sparse file support to avoid the need for compact operations

Post by DonZoomik » Feb 20, 2019 8:14 am

I can't argue with the risk of hidden bugs but this implementation seems such a low hanging fruit.
This integration is not file system specific as these syscalls are abstracted by kernel. If filesystem doesn't support this syscall, you get an error back and life goes on. 2 implementations (Windows and Linux) should cover all use cases if underlying filesystem supports sparse files.

Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 32 guests