[Enhancement Request] Replication to apply retention BEFORE creating more RPs

AlexHeylin · Post by **AlexHeylin** » Apr 25, 2022 2:57 pm this post

Now, I know in an ideal world servers would never run short / out of disk space, but in the real world it can happen. Sometimes you come online on Monday to discover that a server that had plenty of space on Friday now has no space left at all.

I've just found VBR replication ran one of the disks on our Hyper-V target completely out of space. I found 9GB of non-VM files I removed to free space on a VBR replication target, reduced the retention period (to reduce the RP count and size used) and when retrying the VBR replication job - instead of applying the retention policy and reducing the RP count and freeing up some space it promptly copied more data into the target then complained about the low disk space - because it used ALL the disk space (180 KB free) and force the job to fail. As no more space can be freed on the target, the only option at this point is to use Hyper-V manager to remove RPs / Checkpoints. Or at least, it would be - but VBR has filled the disk so full that the checkpoints can't merge and the whole thing is now completely block-locked. No free space can be created because that requires merging checkpoints which requires free space.

That leaves bad options to resolve this. I chose to compress a checkpoint file to free up 800MB and told Hyper-V to remove the three oldest checkpoints. The merges failed, but Hyper-V removed the checkpoints from the GUI but left their data on disk. ARGH! While I can't blame Veeam for that particular stupidity - I think VBR could have helped avoid creating the scenario in the first place. Now the only option appears to be to compress most of the AVHDX files to create some working space and hope that VBR can copy in all the data it's insisting on ramming into that already full disk and still leave enough space for VBR to then apply the retention policy and remove the old checkpoints (the ones Hyper-V now says don't exist, though their files still do).

Now VBR complains it can't merge an RP / CP, presumably because HV has dropped several CPs but left their files in place. Even now there's plenty of space VBR replication task just fails.
The only reasonable resolution to this appears to be to remove the whole target VM and VBR replication job and start all over again.
This approach and situation appears to be "highly suboptimal". I'd imagine this isn't an outcome Veeam want - it's certainly not one we want.

I suggest two changes that Veeam could make that would help avoid customers facing this situation in future.
1. Apply / check retention and apply before copying more data to a target especially if the target is already short on free space.
2. Don't fill any target which requires merges so full that the merges can't happen. Leave 100-500 MB free at least. If this was done, it might allow the job to fail but apply the retention policy and remove some old RPs and free up some space without external intervention.

Please, before anyone says "you should have done xyz, and not done abc - here's where it says that in the manual" - yes you're probably correct. I have probably not handled this as well as it could have been handled - however my actions are reasonable, and consistent with 8 years experience of running Hyper-V on a sizeable estate across many customers. My point doesn't relate to how to get out of this situation - it's that VBR backed this system into a corner when it didn't need to, then once it was in the corner it started kicking it to death. As far as VBR's concerned the only way out of this situation is to have more free disk space.

Apr 25, 2022 3:10 pm

Hello,
our idea is that retention can be applied without running a job. So the way would be to reduce retention, then apply retention and then run a backup job. For V12, that will be possible for orphaned backups with time-based retention. For your scenario, there is no timeline available, but we are aware of that scenario.

Automatic violation of a configured retention would upset other customers.

Best regards,
Hannes

AlexHeylin · Post by **AlexHeylin** » Apr 26, 2022 11:50 am this post

Hi HannesK,

For sure I'm not asking for automatic violation of the configured retention! I'm asking that applying the configured retention before trying to copy more data to a target that's already short on space would allow a VBR admin a path to easily resolve the above situation which currently is either non-trivial, or impossible to resolve without deleting the VM and the VBR replication job and starting over. The current near-suicidal approach could REALLY put customers off using VBR for replication.

The ability to apply retention without running the job would also allow the same thing - trim the data on the target, without more data being copied in and creating a block-jam where the situation becomes unresolvable.

I think having the data copy stop with a few hundred MB of disk space left, rather than ~180KB, would help avoid block-jam. Otherwise it may not be possible to apply the retention anyway due to extreme shortage of disk space.

Thanks

Apr 26, 2022 7:10 pm

I’d prefer the second suggestion from Alex. Allowing out of job retention adjustments & processing. If someone underestimated delta changes and ran out of space they could adjust the retention length and reprocess. Otherwise I’m happy that retention = minimum retention and Veeam process afterwards, much safer!

R&D Forums

[Enhancement Request] Replication to apply retention BEFORE creating more RPs

Re: [Enhancement Request] Replication to apply retention BEFORE creating more RPs

Re: [Enhancement Request] Replication to apply retention BEFORE creating more RPs

Re: [Enhancement Request] Replication to apply retention BEFORE creating more RPs

Who is online