The processing rate is roughly 1TB per hour when active full backups are in full swing. and we set active fulls to run the first Friday of every month. (we are exploring different strategies to increase throughput but our test time is limited without destruction of existing data sets). (we are using 4MB block sizes and other tuning to reduce the RAM overhead of the Veeam de-dupe tables to prevent memory errors like the ones below);
Code: Select all
Out of memory: Killed process <x> (veeamagent) total-vm: <a>kB, anon-rss: <b>kB, file-rss: <c>kB, shmem-rss: <d>kB, UID:0 pgtables: <e>kB oom_score_adj:-100
Sometimes depending on the load on our storage array, the active full backups overlap the daily schedules so we end up with some of the daily incrementals queuing up and running and slotting in and interrupting the jobs on the full backup cycle during wait states.
How do we deal with this scenario, other than to abandoning the Veeam schedules and triggering the daily incremental and monthly active fulls and weekly synthetic full jobs via power-shell from windows task scheduler and reducing the number of concurrent tasks hitting the repository? Or alternatively disable daily jobs at the start of the active full cycle, then re-enable daily jobs at the end of the cycle/chain?
Also in terms of the threading model, our throughput is obviously far higher during incremental and synthetic full stages (we are running XFS layered on top of a zfs zpool to get the advantages of COW and the reliability of ZFS).
Is there any chance specific tasks can be weighted differently and impacting the scheduling algorithms?
e.g.
an incremental task has trivial workload impact and a weighting factor of 1.
an active full backup task has high workload impact and a weighting factor of 4.
and the the repository could have a configurable load factor of say 16 - so in essence the number of tasks hitting the repository would be related to the types of workloads in flight. (eg 16 inflight incrementals or 4 active fulls or a combination of both up to the load factor limit).
So on a standard incremental run through, then there would be a higher number of parallel tasks, but when running active fulls a lower number of concurrent tasks would run to prevent the repository from being overloaded.
Any thoughts would be appreciated -thanks.