Page 1 of 3

All instances of the storage metadata are corrupted.

Posted: Nov 19, 2009 7:48 am
by olofc
Hi,
I am getting an error that occured yesterday after the veeam backup server rebooted during a backup, we are getting this message in every backup and we cannot do any restore from the actual backupjob. New backupjobs works fine however.

Dir failed, storageFileName "F:\Backup\Backup.vbk"
All instances of the storage metadata are corrupted.

How can i recover from this?

Re: All instances of the storage metadata are corrupted.

Posted: Nov 19, 2009 8:13 am
by Vitaliy S.
Hello Olof,

Please contact our support (support@veeam.com) regarding this question, also do not forget to provide us all the logs from Help->Support Information, however it seems like your backups got corrupted due to that reboot, but let us see the logs to see if we could help you restore from it.

Thank you.

Re: All instances of the storage metadata are corrupted.

Posted: Nov 19, 2009 11:20 am
by Gostev
Also please include information on what is the storage you are backing up to. We have specifically tested all available backup storage types in the scenario with hard resetting Veeam Backup server while backup job is running, and so chance of such corruption should be extremely low. We should be able to investigate why this has happened from job logs.

Re: All instances of the storage metadata are corrupted.

Posted: Sep 06, 2013 10:52 pm
by zak2011
I encountered the same error while trying to import backups copied using the backup copy job in v7 from an offsite external drive. The job statisitics say the backup copy job completed successfully in Veeam. However during the import the same error comes in.

Re: All instances of the storage metadata are corrupted.

Posted: Sep 07, 2013 3:08 pm
by Vitaliy S.
What is the external drive? Is it a regular disk? Do you have a support case ticket opened on this?

Re: All instances of the storage metadata are corrupted.

Posted: Sep 09, 2013 11:26 am
by zak2011
The external drive is a Western digital My book Essentials. I have a support case opened on this Case # 00439056. The backup copy job statistics say that the job completed successfully and the full transformation was completed.
But the import does not succeed. By default the number of restore points is set to 2.
So once backup job completed for this particular job, I could see a vbk and a vbm file on the target repository ( external drive). However for other jobs also that copied to the same drive with similar retension settings, there is a vbk, vib and a vbm file. So does this mean the next restore point was not copied? If it was strangely there was no warnings or indications about this.
Thanks

Re: All instances of the storage metadata are corrupted.

Posted: Sep 09, 2013 1:17 pm
by Vitaliy S.
Arun,
zak2011 wrote:However for other jobs also that copied to the same drive with similar retension settings
So only 1 backup copy job is affected by this? Or other jobs are just regular backup jobs?

Thanks!

Re: All instances of the storage metadata are corrupted.

Posted: Sep 09, 2013 2:17 pm
by zak2011
Hi Vitaliy,

Two Backup copy jobs were affected. All other backup copy jobs seems to be fine.

Thanks.

Re: All instances of the storage metadata are corrupted.

Posted: Sep 09, 2013 2:27 pm
by Vitaliy S.
Ok, need to understand what is the difference between these jobs, keep us updated on the support team research.

Re: All instances of the storage metadata are corrupted.

Posted: Sep 10, 2013 10:58 am
by zak2011
Yes, will do Vitaliy.

Re: All instances of the storage metadata are corrupted.

Posted: Sep 11, 2013 3:24 am
by mongie
I've had this with a v6.5 backup chain, unfortunately, the response from support was eventually "Oh dear, looks like they're un-recoverable".

Re: All instances of the storage metadata are corrupted.

Posted: Oct 01, 2013 7:05 pm
by electricd7
Just had the same experience. Nothing could be done and we had to abandon the backup-chain and begin a new one. Support suggested it was a storage problem and to check storage. We are writing to a a 4 head distributed gluster filesystem with over 200TB. That will prove to be difficult to say the least.

Re: All instances of the storage metadata are corrupted.

Posted: Oct 01, 2013 7:18 pm
by Gostev
You can catch the issues like this very early on by using periodic SureBackup job with full scan option enabled (new v7 feature). This will physically read all data blocks holding the tested restore point, and ensure that the content of the block matches the CRC stored with the block.

Re: All instances of the storage metadata are corrupted.

Posted: Oct 18, 2013 1:30 pm
by zak2011
I have had the same experience and support says nothing can be done and there is no other way to determine at which step and at which drive the data get corrupted.
If Veeam job statistics say the job completed successfully and the backup is copied why does this happen?

Re: All instances of the storage metadata are corrupted.

Posted: Oct 21, 2013 8:35 am
by Gostev
Most likely some issue with the storage, in the past years we've been mostly seeing this with low end NAS featuring questionable performance optimizations. For example, the some storage will not handle FLUSH command correctly, returning success before the data hits the disks to achieve better performance numbers. Or, as simple as storage not writing some other data instead of the data we asked it to write due to faulty hardware (in 6 years and with 80'000+ customers, we've seen it all by now). It is very important that the storage does not "cheat" us.

Veeam backup file format includes two identical records of storage metadata for redundancy, and they are never updated at the same time, for at least one to remain "good" in case of a crash, or data corruption occurring during the file update.

So, it is simply impossible to break both metadata instances when:
1) Storage writes the data we asked it to write (not the case with faulty RAID controllers)
2) Storage medium stores the data correctly (not the case with silent corruptions issues aka bit rot)
3) Properly processes FLUSH command. The latter is a synchronous call for us, meaning we issue FLUSH command and wait for the data to hit the disks before moving on. If storage returns success but flush does not really happen, and we start updating another instance of metadata while first one is unsaved, and this is one way to run into the said issue.

In human language, the issues look like:
1) We ask storage to write "MOM", but it writes "DAD" instead and returns success.
2) We ask storage to write "MOM", and it writes "MOM" and returns success, but if you try to read the data block, you will get "MAM".
3) We ask storage to commit the write of "MOM" to disks, and it returns success but does not actually write data to disks, keeping it in buffer for a short period of time for performance optimization purposes.

Answering your question, we can only judge on these reported successes, so we mark the job as successful. This is why, it is very important to use SureBackup to verify that what was written into the backup file is what we asked, especially once you get this error at least once and your backup storage becomes a suspect.