backup admin's nightmare

mcz · Sep 20, 2017 12:14 pm

Hi everybody,

these days are really tough for me because a lot of things went wrong and finally it got worser and worser... I know that this is exactly a good example why you should implement the 3-2-1-0 rule, but anyway it won't help now.
So everything started with database corruption and several attempts to solve the problem which of course caused some data loss of some hours. I always took a backup of the vm to make sure that I can restore later and grab the lost data for manual recovery while I was creating a new database.

At the end of the last week I was able to get the new database to a state where it was running fine and everything I needed for being satisfied was a fresh backup. What I didn't realize at that moment was that my NAS which I'm using for the backups was slowly running out of disk space (the warnings provided by veeam one arrived at night when I was sleeping). In the end, several operations (replication, etc.) and a transformation of the backup chain (synthetic full) lead to the fact that 0 bytes of my NAS were available and everything failed. So I was able to free up some space but the next attempt to take a backup failed with the message "metadata corrupted". Unfortunately somebody else was connecting the usb device to this NAS where the chain has been copied before and everything was overridden (so my copy of the chain was corrupted too). The second usb device failed a week before so you can forget it too. I tried to use a backup of the database server which I took a month ago (from another usb device) and tried to rollforward all transaction logs but suddenly I realized that one logfile was missing (database server didn't copy the file and of course I can't get it back).

So here we are: replication failed, backup chain metadata corrupted and stuck in transformation process, copy of chain corrupted too and rotated drive broken, rollforward failed because of missing logfile in the middle - sounds fantastic!
Most important at that point was to have a vaild backup so I copied the existing chain to another location which took several hours and of course I had to remove the chain in the veeam backup console from the configuration. Later I tried to import the copied chain which was successful but when I tried to do a Instant Recovery of a file level restore using the vm on which the database was stored, it failed...

So why am I writing this post? First of all, I would like to warn everybody out there who thinks that 3-2-1-0 rule sounds nice but is not so important to have implemented. I always knew that it's better to use veeam copy jobs for coping a chain to another location but some reasons led to the fact that I did what I didn't like to do... Of course the fact that I wasn't in the office and wasn't able to detect that somebody is connecting the usb device to the NAS was the initial point where things started going worse but that's exactly why murphy's law exists... Best way would be to have a backup copy job into the cloud...

The next reason for writing this post is that I wannt to understand why this chain isn't usable anymore. I know that if a transformation fails or stops (e.g. when the connection gets lost), veeam will try it again the next time and continue at the point where it stopped. So basically, those transformations are managed in some kind of transaction which makes sure that there is no data loss or corruption. I know that it's a different situation when there is no disk space left but the chain should still be fine, right? Wouldn't it be possible to write the metadata to a second location (e.g. %programdata%) exactly for the reason when something goes wrong on the storage side (I assume that metadata file always won't be that big)? I know that there is a powershell script to rebuild the metadata but is this still possible when the chain isn't imported in backup & replication console? any ideas to repair the chain?

I know that there is also the option to create a support ticket but I wannted to share my thoughts with the community and to get a feeling if it's worth to create a ticket or not.
Thanks very much!

Post by **foggy** » Sep 20, 2017 3:30 pm this post

Hi Michael, sorry to hear the story. You're right that backup metadata is updated in transactional manner and is stored twice for redundancy. Most likely in your case all copies of metadata were corrupt due to space issues and without it backup chain is basically useless. You can ask support to verify whether this was the case.

mcz · Post by **mcz** » Sep 20, 2017 3:42 pm this post

Thank you for this update, foggy. Just a short question for better understanding: When we talk about backup metadata, we talk about the metadata stored within the .vbk, .vbr, .vib and not the vbm file which helds the metadata for representing the whole chain, or did I get it wrong? Thanks for the clarification.

Post by **foggy** » Sep 20, 2017 3:58 pm this post

Correct. Metadata describing data blocks that represent the VM state is stored twice inside the backup file itself. VBM is an auxiliary XML file containing information about backup files in the chain, etc.

Post by **Gostev** » Sep 20, 2017 4:40 pm this post

Simply put, having VBM is not essential to be able to perform the restore.

R&D Forums

backup admin's nightmare

Re: backup admin's nightmare

Re: backup admin's nightmare

Re: backup admin's nightmare

Re: backup admin's nightmare

Who is online