Comprehensive data protection for all workloads
Post Reply
helman
Influencer
Posts: 18
Liked: never
Joined: Oct 22, 2010 10:29 am
Contact:

I suspect a corrupt backup repository - what do I do?

Post by helman »

Yesterday I tried an otheros-FLR on one VM and received the error "'<-', hexadecimal value 0x1B, is an invalid character. Line 39, position 182."

In the Filesystem-Selection screen that came next, all logical volumes were missing - i was only able to access the boot partition and nothing else.

Next I tried the FTP-Access of the FLR appliance, this gave me access to the logical volumes (dm-1 to dm-8) but the console screen of the FLR-Appliance showed filesystem errors when trying to access some of those volumes - and the corresponding directories were empty.

Now concerned I did a full VM restore of this VM (completed without errors). But the Filesystem on the restored VM was severely damaged (e2fsck only spat hundreds of files to lost+found). Now I am really concerned about the consistency of the backup store, because the original VM's filesystem is 100% OK - only when restored, the filesystems are damaged. How can I check the Repository for consistency?

How is it possible that I can restore the VM without errors but the resulting restored VM has damaged filesystems, while the original VM is still ok and running? I tried several backups of this VM - all seem to be damaged.

I already opened a case for this -> 00152954
helman
Influencer
Posts: 18
Liked: never
Joined: Oct 22, 2010 10:29 am
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by helman »

Just for testing the backup process, I just created a quick VEEAM-ZIP backup of the VM and test restored it -> worked 100% OK.

But the backup from this night that ran into the big repository will not restore correctly. Veeam is happily restoring it, but the resulting VM is damaged.
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by Gostev »

helman wrote:Veeam is happily restoring it, but the resulting VM is damaged.
This means the VM was already damaged before it was backed up. In cases when backup file corrupts at rest, it simply would not restore (there is designated CRC check to catch this issue). In other words, CRC check makes sure that what is restored is exactly what was backed up. Even if a single bit changes, the corresponding block will become not restorable (and full VM restore operation will fail on it with decompression error).

That said, your FLR issue definitely does not look like it is caused by corrupt backup. Might be simply because of some FLR engine bug due to unexpected file system configuration settings, or something like it. Let's see what support finds out.
helman
Influencer
Posts: 18
Liked: never
Joined: Oct 22, 2010 10:29 am
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by helman »

I assumed that too, and it makes sense, but:
I did a forcefsck-boot on the original VM and every filesystem is okay. So I can assume that the original VM is definitely running and ok.
Then I did instant-restore the backup of this VM from last night into a new VM and tried to boot that - it failed with filesystem errors, one LV even had the superblock missing, so it's FUBAR.

The only thing I can imagine is that somehow the backup data got damaged or maybe not backed up correctly. There's no CRC error, but the restore won't bring me back a working VM.
cby
Expert
Posts: 109
Liked: 6 times
Joined: Feb 24, 2009 5:02 pm
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by cby »

If you have defined several hard disks on the VM and applied the volume group across them I wonder how the Veeam restore process handles these. I ask because VM disks are seen as discrete entities and the concept of a volume group is alien, let alone LVs. Just look at the option to restore individual disks in Veeam -- that must be interesting in an LVM environment ;)

Hmm, is it the backup that is corrupt even though Veeam thinks it's sound, especially at restore time. Or does Veeam screw up during the restore of a sound backup without reporting errors.

Perhaps a simple LVM backup/restore test is in order.
dellock6
Veeam Software
Posts: 6137
Liked: 1928 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by dellock6 » 1 person likes this post

Another possibility could be a corrupted file in the guest in a previous backup execution, and if running incremental or reverse, that file was never changed after first backup. So it stays corrupted in the backup, and you find out only while doing a restore while all backups complete without error.
Usually a way to correct this behaviour is to run a full active backup so every VMFS block is re-saved to backup, even those that never changed since first backup.

Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
helman
Influencer
Posts: 18
Liked: never
Joined: Oct 22, 2010 10:29 am
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by helman »

cby: VGs should not be a problem even across several virtual HDs - AFAIK Veeam puts a snapshot on the whole VM, so that includes all HDs at the same time. I think you especially with quiescing enabled, you should be fine in that case.

dellock6: I believe something similar, but not a corrupt file inside the VM. The original VM is okay and very little (10GB single HD Linux VM). I believe somehow bad data got into my backup repository and the "forever incremental" approach keeps the bad data in the backup until I do an active full backup.

So there's two questions left:
* How do I confirm there's actually bad data in the backup repository, maybe even how do I find out how it got in there (and prevent it in the future).
* How do I prevent that in the future and maybe proactively and regularly check the repository. I don't think Veeam has the tools for that at the moment - the only thing would be testing via Surebackup - but this way you still won't find subtle data mismatch.

I'm seriously concerned that my backups might be all broken.
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by Gostev »

helman wrote:So there's two questions left:
* How do I confirm there's actually bad data in the backup repository, maybe even how do I find out how it got in there (and prevent it in the future).
* How do I prevent that in the future and maybe proactively and regularly check the repository. I don't think Veeam has the tools for that at the moment - the only thing would be testing via Surebackup - but this way you still won't find subtle data mismatch.
No matter what backup product you are using, you need to be performing recovery testing regularly if you want to be sure your backups are restorable. With most solutions, it is only possible to do such testing manually - however, Veeam provides automated backup recoverability testing facility with the SureBackup feature. SureBackup is designed to be the answer to both questions quoted.

Even most basic SureBackup test would have caught the unbootable OS issue that you have experienced. And more advanced tests involving disk scan inside SureBackup temp VM are able to find even more "hidden" issues (however, such tests take much longer than basic/default tests obviously).
helman
Influencer
Posts: 18
Liked: never
Joined: Oct 22, 2010 10:29 am
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by helman »

Actually, as we are a small shop, there's normally no need to do this testing automated. I do recovery tests regularly, but not every day. That's how I found out that Veeam-restored VMs are unexportable by Vmware-Converter (which still baffles me).

I have another backup product (Acronis) that's so bad that I have to babysit it every day and it still destroys data (but getting rid of that mess soon hopefully).

I just curious how the bad data got into the backup - because once I find that out, that could be prevented in the future. And I still wonder why there's no simple tool (like fsck or "tar -t") to check simply repository consistency.
dellock6
Veeam Software
Posts: 6137
Liked: 1928 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by dellock6 »

Maybe the corruption is "inside" the Guest VM, so there is few possibilities for CBT-based backups, whatever they are, to check what is inside the VMDK. Regular restore tests are really the only yet most secure way to test backups.

Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
helman
Influencer
Posts: 18
Liked: never
Joined: Oct 22, 2010 10:29 am
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by helman »

If the corruption was in the guest VM, then I would be able to detect the corruption. Wouldn't I?

The Guest VM at least runs and fscks 100% OK.
dellock6
Veeam Software
Posts: 6137
Liked: 1928 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by dellock6 »

Uhm, weird. Could it be a vmdk corruption?
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
zoltank
Expert
Posts: 229
Liked: 41 times
Joined: Feb 18, 2011 5:01 pm
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by zoltank »

helman wrote:Actually, as we are a small shop, there's normally no need to do this testing automated. I do recovery tests regularly, but not every day. That's how I found out that Veeam-restored VMs are unexportable by Vmware-Converter (which still baffles me).

I have another backup product (Acronis) that's so bad that I have to babysit it every day and it still destroys data (but getting rid of that mess soon hopefully).

I just curious how the bad data got into the backup - because once I find that out, that could be prevented in the future. And I still wonder why there's no simple tool (like fsck or "tar -t") to check simply repository consistency.
Sorry to change the subject slightly, but automating SureBackup the check your backup jobs would make a lot of sense since you're a small shop. It's one less thing you need to do and remember and will free up your time. It'll even email you with its results when it's done.
helman
Influencer
Posts: 18
Liked: never
Joined: Oct 22, 2010 10:29 am
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by helman »

dellock6 wrote:Uhm, weird. Could it be a vmdk corruption?
that would be bad. How do I check that?

I deleted the backup repository and started a new active full and now the backup of this vm is ok.

I guess I need to schedule more active fulls to at least minimize this problem.
dellock6
Veeam Software
Posts: 6137
Liked: 1928 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by dellock6 »

Well, you would have checked by trying a restore, but since you deleted the corrupted backup... :)

Active full are often a good practice to break long incremental chains and recreate a new completely indipendent backup set.

Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
chrisdearden
Veteran
Posts: 1531
Liked: 226 times
Joined: Jul 21, 2010 9:47 am
Full Name: Chris Dearden
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by chrisdearden »

dellock6 wrote:Well, you would have checked by trying a restore, but since you deleted the corrupted backup... :)

Active full are often a good practice to break long incremental chains and recreate a new completely indipendent backup set.

Luca.
Agreed - once every couple of months is probably a good idea. It helps prevent VBK fragmentation too.
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by Gostev »

VBK fragmentation is certainly an interesting point to consider. In our environment, we have observed significant impact on full VM restore speed from VBK containing single Microsoft Exchange VM that has been transforming for almost a year - comparing to newly created one - specifically due to fragmentation. The performance difference was up to 10 times, if I remember correctly!

Granted, the storage we've done testing on was not too powerful (just a single spindle). Nevertheless, it was a little unexpected, and sent us chasing a wild goose first, as we were looking for possible performance bottlenecks in the product's code - until someone tried to restore newly created backup from the same storage.
helman
Influencer
Posts: 18
Liked: never
Joined: Oct 22, 2010 10:29 am
Contact:

Re: I suspect a corrupt backup repository - what do I do?

Post by helman »

Support directed me to the (yet undocumented) Verify tool in Veeam 6.5. Though it doesn't help in my case (the backup is consistent in itself, just the data inside is wrong).
I guess the only thing that would have helped in my case was some kind of "verify" mode, that checked the contents of the VM against the backup snapshot immediately after the backup was done. Of course, performance-wise, such a feature would defeat all benefits of incremental backups - but a few "spot tests" here and there might be an idea worth considering.
SureBackup is a really nice feature, but that kind of detection might even slip under the radar of automated tests. My problem-VM-restore did boot alright - just some of the filesystems on the VG were gone and one of those just happened to be important. If the corruption was more subtle (and no metadata damaged), nobody would have noticed anything until the data was needed.
Post Reply

Who is online

Users browsing this forum: Semrush [Bot] and 189 guests