Host-based backup of VMware vSphere VMs.
Post Reply
Daniel2
Enthusiast
Posts: 45
Liked: 21 times
Joined: Nov 25, 2019 8:16 am
Full Name: Daniel
Contact:

Feature request: Backup jobs that exceed the backup window should IDLE not FAIL

Post by Daniel2 »

Backup jobs that exceed the backup window should IDLE until the next backup window, instead of just FAIL.

First example:

Imagine having an RPO of 24 hours, starting at 0:00.
You have a backup window defined from 0:00-06:00 and 18:00-23:59, due to work hours and performance impact.

At 0:00 the job starts and if it is not done by 6:00, it crashes. VMs that have not been processed by that time will not be backed up until 0:00 o'clock the following day. This means that the RPO is not met because those VMs have 48 hours between their restore points.

Instead the job should cancel all operations at 6:00, IDLE until 18:00 and then retry the cancelled VMs and resume the remaining backup operations in the job.
When the next backup interval starts (at 0:00) and the job is not finished by then, then it FAILS like it normally would.
This is exactly like Backup Copy jobs already work.

Second example:

RPO 4 hours.
Backup Windows is from 0:00-11:59 and 13:00-23:59.
Interval starts at 11:00, runs for an hour and FAILS. RPO will be missed.
If the job would IDLE and resume at 13:00 it would still have until 14:59 to successfully complete.
Egor Yakovlev
Veeam Software
Posts: 2537
Liked: 683 times
Joined: Jun 14, 2013 9:30 am
Full Name: Egor Yakovlev
Location: Prague, Czech Republic
Contact:

Re: Feature request: Backup jobs that exceed the backup window should IDLE not FAIL

Post by Egor Yakovlev »

Hi Daniel!
I have a few thoughts here:
1. Backup Copy Job works with passive secondary data(copies backup files), whereas Backup Job works with production data(running production VM) - and if we keep it [idle] for 12 production hours, that means we will have to keep backup snapshot on said VMs during 12 production hours, when heavy load will be put on those, when most daily changes are generated. Snapshots growth for 12 hours and raised I\O might be devastating for production environment if we keep jobs [idle].
2. What if you set backup window to 3 hours per day, say 12am to 3am? Shall we [idle] till tomorrow if job doesn't fit? You will miss daily RPO anyway in this case, however with current [failed] status you will know you missed RPO, whereas with [idle] you will miss it without being aware it was finished day after...

We have had same feature request for physical backups with agent, which might be slightly better idea due to volume snapshots in action. Will keep it on track.

/Cheers!
Daniel2
Enthusiast
Posts: 45
Liked: 21 times
Joined: Nov 25, 2019 8:16 am
Full Name: Daniel
Contact:

Re: Feature request: Backup jobs that exceed the backup window should IDLE not FAIL

Post by Daniel2 »

Hi Egor,

thanks for your points. I don't see any problems in them as there are very logical steps how your concerns can be cleared. I believe my feature request handles this in a very natural way and is an easy concept to understand. :)

1. You could abort the VM backup process and remove the snapshot when you exceed the backup window, this is no change to the status quo. When the next backup window in the backup interval is entered, only the aborted VMs will be retried and all VMs resumed that have not yet been backed up. That would solve that problem. In other words, you would IDLE the job, not the individual VM backup process itself.

2. If there are no backup windows available until the next backup interval, you can fail the backup normally, like you do already in the status quo. This would be an expected failure as the backup admin decided to not define more backup windows during that interval.

Not sure I understand what you mean with the Agent, but installing an Agent on a VM to achieve this sounds overly complicated.
PetrM
Veeam Software
Posts: 3264
Liked: 528 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: Feature request: Backup jobs that exceed the backup window should IDLE not FAIL

Post by PetrM »

Hi Daniel,

One more argument in favor of failing a job is that consistent backup implies that state of all VM disks corresponding to specific point in time
is fully transferred and saved into the backup file. If some external signal (user/backup window) aborts the job during data transfer we will have inconsistent backup.
If we remove snapshot we will lost this specific state and processed data will be anyway useless.

Thanks!
Post Reply

Who is online

Users browsing this forum: No registered users and 84 guests