Reverse Incremental - what happens when backup fails?

dividedsky319 · Post by **dividedsky319** » Jan 27, 2011 1:47 pm this post

If a reverse incremental backup fails does that make every "rollback" useless since the most recent, "full" backup is corrupt? Or does the backup procedure do the backup, confirm that it's valid, THEN create the "full" backup by which it would apply the rollbacks?

Maybe a stupid question, but I just want to make sure it's set up safely... Thanks.

Post by **tsightler** » Jan 27, 2011 3:52 pm this post

Well, Veeam tries to write to the VBK file in a very secure way, so typically a failure during a backup would simply mean that the file would need some "repair" before the next backup. Veeam performs this automatically at the next backup run, or if your attempt a restore, so it's largely transparent.

Now, it's technically possible that a major corruption could occur during backup that would cause the VBK to become corrupt and not repairable (although Veeam support could probably still extract the data). Back in the "old" Veeam days (say V3 and earlier), you would see reports of this on occasion. With the modern VBK format I haven't seen this issue reported in quite a while and I would suspect it's a rare event these days.

That being said, backups aren't something to take lightly, so I would suggest you protect yourself from the possibility of this happening by one of the following methods:

1. Make a backup of the VBK to an alternate location after each nightly run. This can be either by spooling the VBK to tape, or simply copying it to another folder. If the VBK was damaged you can always restore and import the previous VBK.

2. Use incrementals and synthetic fulls. One of the greatest things about the new incremental backup mode is that, once a backup is created, it is never touched again except for reads, so the likely hood of corruption is minimal. Of course the disadvantage is that you need more space since you end up with multiple full backups.

Post by **Gostev** » Jan 27, 2011 4:59 pm this post

True, this could have happened in v3 and earlier version, there was like 10% chance of data loss when server was crashing during backup (as far as I remember, Tom came out with this number empirically by specifically testing this).

But since then, we have completely revamped backup storage back (this happened back in 2009, in Backup v4 release), and made it transactional (just like NTFS, or databases). So, starting Veeam Backup v4 server or job crashes will never be able to cause data loss. There is some level of redundancy in the storage, so we always know where the "good" data is located - even if backup job crashes during worst possible moment (during storage medata update). And because "last known good state" is available to us, every rollback will be restorable as well (even before VBK is repaired by the next job run).

Post by **tsightler** » Jan 27, 2011 6:22 pm this post

Well, if it's a "well behaved" crash. Sometimes hardware related crashes are not well behaved, so you could have partially written blocks or even zeroed blocks written to your file. I've seen RAID controllers write random data to dozens of blocks during a crash, and undetected filesystem corruption cause files to be corrupted even though the application itself was writing data correctly. It's questionable if the VBK would be readable/importable in that case (I actually had a non-importable VBK file from a crash in October that I could send you as proof, but I just deleted it yesterday). But I agree the scenario is unlikely and VBK files are far more robust in recent versions.

Post by **Gostev** » Jan 27, 2011 9:14 pm this post

It is true that we cannot deal with hardware malfunction on our backup storage layer. If we provided storage device with data block to write, and heard back that it wrote this data block successfully, then we trust the block is indeed there, and with the content that we provided. There is simply no reasonable way to handle "untrusted" storage (that would basically take reading each written block back, so backup window will go through the roof). I agree it does happen, very unlikely on enterprise-grade storage, but we did confirn this with some cheap consumer NAS storage once, which had questionable "performance optimizations" with delayed writes which did not account for unexpected power loss.

Anyway, for any silent corruptions caused by storage hardware malfunctions, we provide two layers of protection. First, quick built-in integrity check done each time the job runs (designed to spot obvious corruptions quickly, without having to read full backup file). And for all more complex issues, there is SureBackup with application integrity check script - this is our solution of dealing with "untrusted" storage without affecting your backup window.

I should also mention that with partially written blocks, or blocks stuffed with random content by controller, we will never hear "success" from the storage (with the exception of bad storage design of course), so backups will remain restorable. Your October backup may not have necessarily been corrupted... could have been a bug with Import Backup functionality instead, for example. We have fixed quite a few of those in 5.0.1 actually, when perfectly good backups would not import, for example because of issue with one helper XML file formatting included in the backup file.

Bottom line - in our experience, actual low-level backup storage logic we have implemented back in v4 is solid and handles crashes well... higher level functionality (import, transform) may have bugs - there is no software without bugs. For example, I know there is currently one issue with transform algorithm that may cause single VM from backup being unrestorable if the job crashes during certain phase of transform. And again, backup data is still good - fix includes running SQL query to fix configuration in our DB (because this is where the bug is). See what I mean?

Post by **tsightler** » Jan 27, 2011 9:50 pm this post

I agree that the current implementation is quite robust. I was only pointing out that there are things that are beyond Veeam's reasonable control that could lead to problems. Database systems like Oracle have use advanced techniques to guarantee transactional integrity during crashes, and have been optimizing and developing these methods for years, yet it still occasionally happens that a database file requires a restore due to silent corruption or a crash. Veeam is subject to the same problems.

I personally don't trust storage. I've seen data loss events from small time NAS devices, to "big boys" such as EMC, and the mid-tier players like Equallogic. The only way to be sure you have good backups is to test them and have multiple copies. With tapes, this was common and well understood, but somehow with the move to disks backup the requirement for a second copy seems to be fading from memory. We replicate all of our backups to either tape, or at least a separate disk array. I would never suggest someone rely solely on a single VBK on a single array, not because Veeam doesn't write data safely, but because Murphy's Law applies to backups as well as anything else.

That October backup was pretty corrupt. The reason it was removed from Veeam in the first place is because the job crashed with some nasty errors during a nightly backup run and the next time the job ran it attempted "recovery" only the eventually end with "end of communication channel" error. After that, the backup wouldn't run anymore. We basically just killed the job and recreated it from scratch.

Post by **Gostev** » Jan 27, 2011 10:09 pm this post

Agreed 100%

R&D Forums

Reverse Incremental - what happens when backup fails?

Re: Reverse Incremental - what happens when backup fails?

Re: Reverse Incremental - what happens when backup fails?

Re: Reverse Incremental - what happens when backup fails?

Re: Reverse Incremental - what happens when backup fails?

Re: Reverse Incremental - what happens when backup fails?

Re: Reverse Incremental - what happens when backup fails?

Who is online