'Corrupt' Backup Job

bgbGsy · Post by **bgbGsy** » Nov 14, 2009 9:09 am this post

Has anyone experienced anything similar using v4.

Using just over a week now (previously on version 3.x). All Backup Job use Storage API with block tracking (Via SAN). All backup jobs run once and day and have run with complete success. Did some test restores yesterday. Two domain controllors (Windows 2008) both restored fine. Then restored a Win 2003 x64 Exchange 2007 Server. Have done this excersise many times using 3.x. When I restored the VM, on bootup after the Win2003 logo, I got a blue screen with a terminal session mananger error. I went back to the previous days rolllback, restored and the same error occurred..... right back to the sixth rollback (and last) all failed in the sam way.

So I change the properties of the Job from Storage API to Consolidated Backup (via SAN) and disabled block tracking in the job properties. Re-ran the backup, and restore; and the machine restored fine.

During all the backups the VSS has been working fine with All the expectd ESE messages in the logs. My worry is that if there was a problem in the original backup, using block tracking will not 'repair' this on subsequent backups (As seems to be the case). For confidence I have reverted my jobs to Consolodated Backup mode until I can try and find out under what circumstances this can happen. This is the first time, I have had a job not restored properly.

No erros reported during any backups and the check box is selected to verify the backup file, so if it was backup file corruption, you would have expected a new full backup to be run, but this did not happen on nightly backup.

Any feedback appreciated.

Regards,
Brendan.

bgbGsy · Post by **bgbGsy** » Nov 17, 2009 12:01 pm this post

Update testing this:

I managed to repair the damaged backup by runnings the windows 2003 cd up and doing a repair installation. This then booted in windows. The exchange database appeared to be okay and a db check showed no corruption. Somehow the original backup didn't work correctly (Although no error messages).... subsequent block change enabled backups obviously did not update the original backup which contained the corruption, but running a full backup (consolidated type) did.

I have expereinced this situation only once, but as you can imagine unless you are constantly restoring vm's for testing, you might not spot this sort of error.

I have reverted my live backups to consolidated until I have done lots more confidence testing in this new backup type. I also thing that best practise would dictate doing a full backup periodally also (Good new feature in advanced settings to do this.)

Can any one at VEEAM throw any light on how this might happen?

Regards,
Brendan.

Post by **Gostev** » Nov 18, 2009 1:07 pm this post

Brendan, I believe this is software issue with Windows OS that is unrelated to backup. I would not expect corrupted backup to lead to BSOD.

Actual backup's data was not corrupted because integrity checks what notice corruption. The snapshot of your source VM was captured and stored in the backup file properly, but there was some issue with some system state captured. It could be resulted by system freeze for instance.

If you can reproduce the error and provide us some additional info like crashdumps, we should be able to tell you for sure what is causing this. Our dev team has great Windows expertise, and we also have developer who is good with investigating crashes/BSODs.

As for backup testing that you are performing - we are actually planning to add some automation around this in the future versions.

bgbGsy · Post by **bgbGsy** » Nov 18, 2009 10:33 pm this post

I take your point, but can you see how this sort of issue can occurr when using block tracking? The fact that an 'issue' of some sort occured and was stored in the first full backup meant that subsequent incrementals also contained the issue becuase only the changed blocks were written (I tested 4 out of 6 backups including first and last). When I ran a 'non block change backup' after discovering the error, the issue went away. If I had not noticed this problem in the original backup it may very well have been in the backup job now. Currently I am re-enabling some of my backup jobs with Storage API (without block tracking), and I am also making sure that a full backup is run once a week. Performance is still very good even without block tracking.

Actually some virtual machine where there is lots of changes during the day (Like my Xenapp servers where user profiles are stored in lots of small files) the difference in time between a block change backup and a non block change backup are hardly anything at all.

Thanks,
Brendan.

Post by **Gostev** » Nov 18, 2009 10:45 pm this post

Brendan, thing is, if some source VMDK block contained issue initially, but it got fixed later in the source VM or block, and issue on this block is no longer present in the new snapshot, this block will be classified as "changed" and will be overwritten in VBK with "fresh" one. So really no difference between running job with or without changed block tracking. In one case, our code will look for changed blocks by comparing previous and current block state. In other case, VMware would tell us which block had at least 1 bit changed.

Does this make sense?

Really I think the actual BSOD needs to be investigated before we can make any conclusions on what has happened. By the way, do you have the "corrupted" backup file so we can take a look at it? May be webex will be enough to start with. Also, will you be able to provide OS crash dump?

bgbGsy · Post by **bgbGsy** » Nov 19, 2009 8:47 am this post

Thanks for the reply Gostev. Actuall;y I did log a call initially. The engineer refered me to an MS document on Exchange restores best practice, but I don't think (whatever happened) is anything to do with Exchange specifically.

Ironically, once I had repaired the problem with this VM (which contained some NTFS level erros as well that are definetly not in the source machine becuase I double checked) the VSS replaying would as expected and the Exchange DB was mounted..... it also passed a full verification using ESEUTIL.

Also, please remeber that I performed a consolidated backup of this source VM using the same job as had been run six nights and contained the error. Running that backup cleared the issue in the restore straight away.

My main concerned was not the error in the origianl backup because as you say, many things could have caused this. My concern was that without performing a full backup where all blocks were compared and recorded asd necessary, the bit of the backup which contain the onomily was not 'refreshed' in subsequent incrementals. Perhaps this is a general best practice issue with blocking tracking and not specifically to do with Veeam.

I would be happy for someone to webex and look at the backup if that would help, but its more of a confidence thing in a new technology.

Thanks.

Post by **Gostev** » Nov 19, 2009 11:11 am this post

Brendan, I agree with your points. While we have not seen any issues with vStorage API and changed block tracking internally having running it for half-year now without periodic full backups, I agree that backup corruptions may happen occasionally. Most commonly not due to backup technology used, but due to actual backup file storage device disk corruption, or software glitch due to the fact that this is hot backup via snapshot of running VM, and so on. Corrupt backups problem is as old as backup itself, legacy agent-based backup is also affected by all the same problems. Periodic backup testing is really best option, as even doing periodic fulls (or doing fulls constantly) will not guarantee you recoverability if you backup file storage or transport has some issues.

Overall as actual Veeam Backup users ourselves we are very confident in forever-incremental backup with changed block tracking. As for periodic fulls, we did not even have periodic fulls option before 4.0, but as you can see from forums, backup corruption reports are extremely rare for the past few years given the fact we have thousands of customers. Moreover, Veeam Backup 4.0 features automated backup integrity checks during each job run and is now able to detect some types of backup file corruption automatically.

R&D Forums

'Corrupt' Backup Job

Re: 'Corrupt' Backup Job

Re: 'Corrupt' Backup Job

Re: 'Corrupt' Backup Job

Re: 'Corrupt' Backup Job

Re: 'Corrupt' Backup Job

Re: 'Corrupt' Backup Job

Who is online