https://community.veeam.com/script-libr ... older-1205
https://community.veeam.com/blogs-and-p ... tories-244
https://www.virtualtothecore.com/calcul ... ast-clone/
To solve this, We wrote a script to use filefrag to get the extents (block start/end) and then calculate all the block collisions to get the actual on disk usage of a subset of files.
The utility also has provision to cache the usage in a different scope - for example, for different clients! The utility is written in python, but the inner loop uses numba, which has the function run in C for as much speed as possible (because this can take several minutes to run, it requires a lot of calculations).
Once the cache is built, it's super fast, and gives _accurate_ disk usage and data size information on a per client basis, without having to have a filesystem per client.
This is something that should be a native function, but doesn't appear to be anywhere. Because it uses filefrag, it should support any filesystem which uses reflink to share blocks.
Code: Select all
# ./reflink_size.py /example_dir/
/example_dir/MDD2023-09-04T141514_99FF.vbk : Total: 130.43 GB, Unique: 9.1%, Shared (new/others): 90.9%/0.0% Time: 0.36
/example_dir/MDD2023-09-06T000007_9944.vib : Total: 8.89 GB, Unique: 81.7%, Shared (new/others): 18.2%/0.1% Time: 0.02
/example_dir/MDD2023-09-07T000007_7FD2.vib : Total: 7.93 GB, Unique: 82.3%, Shared (new/others): 17.6%/0.1% Time: 0.02
/example_dir/MDD2023-09-08T014933_00BD.vbk : Total: 130.55 GB, Unique: 5.8%, Shared (new/others): 1.1%/93.2% Time: 1.81
/example_dir/MDD2023-09-10T011650_3D5C.vbk : Total: 130.54 GB, Unique: 5.8%, Shared (new/others): 12.2%/82.0% Time: 2.22
/example_dir/MDD2023-09-11T000020_714C.vib : Total: 7.85 GB, Unique: 100.0%, Shared (new/others): 0.0%/0.0% Time: 0.00
/example_dir/MD_139A6.vbm : Total: 364.0 KB, Unique: 98.9%, Shared (new/others): 0.0%/1.1% Time: 0.00
Files 9 : Disk Used/size: 196.42 GB/440.93 GB, Avg file: 21.82 GB/48.99 GB
https://gitlab.com/cyber-secure-public/ ... type=heads