Comprehensive data protection for all workloads
Post Reply
mcz
Veeam Legend
Posts: 945
Liked: 221 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

backup admin's nightmare

Post by mcz » 1 person likes this post

Hi everybody,

these days are really tough for me because a lot of things went wrong and finally it got worser and worser... I know that this is exactly a good example why you should implement the 3-2-1-0 rule, but anyway it won't help now.
So everything started with database corruption and several attempts to solve the problem which of course caused some data loss of some hours. I always took a backup of the vm to make sure that I can restore later and grab the lost data for manual recovery while I was creating a new database.

At the end of the last week I was able to get the new database to a state where it was running fine and everything I needed for being satisfied was a fresh backup. What I didn't realize at that moment was that my NAS which I'm using for the backups was slowly running out of disk space (the warnings provided by veeam one arrived at night when I was sleeping). In the end, several operations (replication, etc.) and a transformation of the backup chain (synthetic full) lead to the fact that 0 bytes of my NAS were available and everything failed. So I was able to free up some space but the next attempt to take a backup failed with the message "metadata corrupted". Unfortunately somebody else was connecting the usb device to this NAS where the chain has been copied before and everything was overridden (so my copy of the chain was corrupted too). The second usb device failed a week before so you can forget it too. I tried to use a backup of the database server which I took a month ago (from another usb device) and tried to rollforward all transaction logs but suddenly I realized that one logfile was missing (database server didn't copy the file and of course I can't get it back).

So here we are: replication failed, backup chain metadata corrupted and stuck in transformation process, copy of chain corrupted too and rotated drive broken, rollforward failed because of missing logfile in the middle - sounds fantastic!
Most important at that point was to have a vaild backup so I copied the existing chain to another location which took several hours and of course I had to remove the chain in the veeam backup console from the configuration. Later I tried to import the copied chain which was successful but when I tried to do a Instant Recovery of a file level restore using the vm on which the database was stored, it failed...

So why am I writing this post? First of all, I would like to warn everybody out there who thinks that 3-2-1-0 rule sounds nice but is not so important to have implemented. I always knew that it's better to use veeam copy jobs for coping a chain to another location but some reasons led to the fact that I did what I didn't like to do... Of course the fact that I wasn't in the office and wasn't able to detect that somebody is connecting the usb device to the NAS was the initial point where things started going worse but that's exactly why murphy's law exists... Best way would be to have a backup copy job into the cloud...

The next reason for writing this post is that I wannt to understand why this chain isn't usable anymore. I know that if a transformation fails or stops (e.g. when the connection gets lost), veeam will try it again the next time and continue at the point where it stopped. So basically, those transformations are managed in some kind of transaction which makes sure that there is no data loss or corruption. I know that it's a different situation when there is no disk space left but the chain should still be fine, right? Wouldn't it be possible to write the metadata to a second location (e.g. %programdata%) exactly for the reason when something goes wrong on the storage side (I assume that metadata file always won't be that big)? I know that there is a powershell script to rebuild the metadata but is this still possible when the chain isn't imported in backup & replication console? any ideas to repair the chain?

I know that there is also the option to create a support ticket but I wannted to share my thoughts with the community and to get a feeling if it's worth to create a ticket or not.
Thanks very much!
foggy
Veeam Software
Posts: 21139
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: backup admin's nightmare

Post by foggy »

Hi Michael, sorry to hear the story. You're right that backup metadata is updated in transactional manner and is stored twice for redundancy. Most likely in your case all copies of metadata were corrupt due to space issues and without it backup chain is basically useless. You can ask support to verify whether this was the case.
mcz
Veeam Legend
Posts: 945
Liked: 221 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: backup admin's nightmare

Post by mcz »

Thank you for this update, foggy. Just a short question for better understanding: When we talk about backup metadata, we talk about the metadata stored within the .vbk, .vbr, .vib and not the vbm file which helds the metadata for representing the whole chain, or did I get it wrong? Thanks for the clarification.
foggy
Veeam Software
Posts: 21139
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: backup admin's nightmare

Post by foggy »

Correct. Metadata describing data blocks that represent the VM state is stored twice inside the backup file itself. VBM is an auxiliary XML file containing information about backup files in the chain, etc.
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: backup admin's nightmare

Post by Gostev »

Simply put, having VBM is not essential to be able to perform the restore.
Post Reply

Who is online

Users browsing this forum: No registered users and 57 guests