The four disks used for the OS were pulled from other servers, most likely hardware RAID5 configurations. So I expected there to be a bit of grey area with the OS volume in terms of dirty blocks. Agent proceeded to read and process 251GB and transferred about 90GB to the repository. Knowing that subsequent backups for this volume would be minuscule, this large initial backup was forgivable.
The data volume is a little different though. The disks were new and unused, were configured in RAID1 and initialised by the controller before being presented to the OS for formatting. The volume has 7% of the space used. The files stored on it haven't changed too much over the years, just new ones added and some occasionally overwritten. However when the agent came to the data volume it thought every block was dirty, and went about processing the whole thing. I stopped it about 40% through because I didn't see the point in filling up the repository with a backup that had very little actual data.
I'm going to experiment with zero filling free space on the volumes, but that very act will ensure that every single free space block will become dirty, and so the next backup will become just as large. Unless dedupe and compression will counteract this? Is there a way to zero fill the free space without everything being marked dirty?
Specs
Source "Cleveland"
- CentOS 6.8 x64, Xeon 5450 with 32GB RAM
Kernel 2.6.32-642.3.1.el6.x86_64
Intel SR2520SAXSR with S5000VSA, BIOS S5000.86B.12.00.0098.062320091136 06/23/2009
Intel Embedded Server RAID Technology using LSI MegaSR RAID5 version v15.04.2013.1016, built on Oct 16 2013 at 19:20:04 driver
4x Seagate ST3146356SS in RAID10, 2x Seagate ST2000DL003 in RAID1
Note: one of the 2TB drives died which has been replaced with a ST3000DM001, rebuild status unknown (next job)
- CIFS on Windows Server 2008 R2 x64