Over 40 VM backups deleted in backup job craziness

bbricker · Post by **bbricker** » Mar 13, 2012 11:45 pm this post

This weekend I was modifying a backup job and during the process, VB&R deleted every retention point for every VM in the job, all 43 or so. I still don't understand why this happened and am hoping there is a logical explanation (or maybe just a bug?).

For starters- this is a job I have setup to run on about 43 VM's every Saturday, and it is set to keep 4 retention points.

I first edited this job back on Wednesday or Thursday to remove a single VM from the selection that was no longer needed. I then changed the setting for how long to keep a deleted VM data to 1 day, since I didn't care to keep that 1 old VM's data at all. I expected it would get removed from the backup on Saturday.

On Saturday before the scheduled time for the job, I again edited it, this time to add a new VM to the selection list. Next, I manually started the job.

Immediately I realized that I had forgotten to set an exclusion on that new VM for it's 3rd vmdk disk. So I right-click and stop the job so I can go in and fix that in the job properties. But the job just sits there saying "stopping" for a really long time. I go off to take care of some other things and come back to check on it a few minutes later. It is *still* saying "stopping", so I right-click to get the realtime statistics, and to my horror it is going through every single VM saying, "VM 'xxxxxx' is outdated and will be deleted".

It finishes a few seconds later before I can do anything about it (not that I could have), and sure enough, all retention points for every VM in the job is deleted. I go to the repository folder for that job which confirms it, as it is basically an empty folder minus a small .vbm. That's compared to the 2TB+ of VBK/VBR files that were there previously there.

Anyone have any clues? I have attached a screenshot of it deleting all the VM's. The machine "XPTemp2" at the top of the list is the only VM that I removed from the job selection list. And I know I didn't accidentally remove all 43 VM's or something, because as soon as it was finished nuking all of my data, I just started it right up again and it ran fine.

Post by **Gostev** » Mar 14, 2012 12:13 am this post

Generally speaking, you do not want to set deleted VM retention period to 1 day. This setting is there and with large default exactly to prevent immediate deletion, and give you the time to react if VM is accidentally deleted from the infrastructure. Or, consider the other case: just 2 glitches in a row from network/vCenter/ESXi resulting in the complete infrastructure tree not being returned properly (happens quite often), and the VM will be considered as removed from infrastructure, and deleted according because of 1 day retention for the deleted VMs.

As to what exactly happened here, you should open a support case and let our engineers look at the job's log files. I am sure there is simple explanation.

bbricker · Post by **bbricker** » Mar 14, 2012 5:06 pm this post

Thanks Gostev, I will definitely open a case, just been really busy and it's easy to get all my thoughts typed out on here

So if I am understanding you correctly, you are saying that the "deleted VMs data retention period" option not only has to do with me manually removing a VM from the backup job selection list that I don't want backed up any more, but it also has to do with a VM that has *actually* been deleted from my vSphere infrastructure? (I guess that is obvious now that I think about it).

If that's the case, and the glitch in which you are talking about occurred, and is known to occur, then that seems pretty dangerous. Wouldn't it be a good idea then to not allow the user to pick 1 day if this is a known issue with VB&R just "thinking" that the VM's were deleted because of a communication failure from vSphere or the user's network? Seems maybe there needs to be more fail-safe's here. And yes I understand that you are saying the "fail-safe" should just be me not setting the days to 1

And in reality, my jobs are all set to 3 days normally, I had just set it to 1 day because I had a bunch of data tied up in that old VM that I wanted removed on the next backup job.

Post by **foggy** » Mar 15, 2012 10:59 am this post

bbricker wrote:I had just set it to 1 day because I had a bunch of data tied up in that old VM that I wanted removed on the next backup job.

Note that in case of reversed incremental backup mode this won't decrease the VBK file size, but just mark all the blocks inside the file belonging to deleted/removed VMs as unused (so that they could be reused by some other data). NTFS does not allow "shrinking files", so in order to reduce VBK file size, you need to perform an active full backup instead.

Post by **Gostev** » Mar 15, 2012 11:55 am this post

We actually had it limited to 7 days minimum originally (in v5), and the users were complaining about that because they wanted to set 1 day in some cases

R&D Forums

Over 40 VM backups deleted in backup job craziness

Re: Over 40 VM backups deleted in backup job craziness

Re: Over 40 VM backups deleted in backup job craziness

Re: Over 40 VM backups deleted in backup job craziness

Re: Over 40 VM backups deleted in backup job craziness

Who is online