Host-based backup of VMware vSphere VMs.
Post Reply
rawtaz
Expert
Posts: 100
Liked: 15 times
Joined: Jan 27, 2012 4:42 pm
Contact:

Question on backup integrity

Post by rawtaz »

Hi,

I spoke to a technician at Veeam a couple of days ago, and asked him about if/when the backup storage experiences bit rot or similar corruption. In short terms, I asked him how Veeam deals with the situation where say you have a very big VMDK for a file server that is backed up, and there is some unnoticed corruption in the data areas of that file (the backup copy of the VMDK, i.e. Veeam's files).

More specifically, lets say the VM have a 2TB large VMDK with an NTFS volume in it. The volume has a lot of data in it as it is for a file server. Now lets assume that somewhere in this backed up VMDK (i.e. in the backup storage) some corruption occurs. A byte or two are corrupted, at the "place" of an important file in the fileserver. I am then wondering what means one has to detect this, if using Veeam to back up this VMDK? I am not sure if Veeam has any checksumming in its backup storage.

The technician said the following: What you have to work with is essentially the SureBackup, or as we specifically talked about, the Instant Restore way of testing it. He summed it up as "You can fire up the VM right from the backup storage, and what we do/Veeam does is to simply check if all parts of the VM mounts and can be used successfully. You can also run some additional checks inside the VM via scripting". This is all to my understanding as well, so far nothing weird.

However, when I asked about confirmation that Veeam won't "scan" or checksum the entire VMDK in the backup storage to find potential block/bit corruption further down the VMDK (i.e. in the pure data areas, which no ESXi/VM OS/application/whatever reads until the VM actually runs and someone asks for the file that is specifically stored at that place in the VMDK/filesystem), he didn't give a clear answer to this. He kept insisting that "if we can mount it and it starts up successfully, then everything is alright", which I think sounds very off.

I don't know the internals of VMDK or how Veeam works, but isn't it true that just firing up the VM using for example Instant VM Recovery and seeing that it runs is not an indicator that there isn't any corruption in the data areas of the corresponding VMDKs? I fail to see how without having checksumming and scanning the entire backed up data, Veeam would be able to determine that the backups are indeed intact.

The indirect question apart from the above is of course your opinion on what means there are to make sure that Veeam backups are indeed healthy. Is there something one can do using Veeam, or is it entirely up to features in the underlying storage to detect corruption (or alternatively doing a full check on the data inside the VM when doing test restores)? One could always run it atop ZFS but there's definately a lot of people that don't do that.

Please let me know if I need to clarify. Thanks! :)
rawtaz
Expert
Posts: 100
Liked: 15 times
Joined: Jan 27, 2012 4:42 pm
Contact:

Re: Question on backup integrity

Post by rawtaz »

I should add that I've read the docs about the backup job setting "Enable automatic backup integrity checks" but it's not clear exactly what it does. It says it detects for example when something cannot be read, not sure that implies checksumming though or just read failures. What I'm mostly thinking about here is more the silent type of corruption that could happen on disk.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Question on backup integrity

Post by tsightler » 1 person likes this post

So my answer is pretty simple, certainly if your storage experienced "bit rot" then there is the potential the a file might have corruption that Veeam is completely unaware of. If you feel this is a significant risk then you must store your backups on media that is likely to be safe from this.

Note that this is not unique to Veeam, or really any backup software. Backups to tape have long experienced "bit rot" during storage. Disk are typically much more resilient to such issue that tape has ever been. Of course, since we are backing up at the VMDK level, it's also possible that your disk might already have file system corruption that is silent. For example, many years ago I lost a directory containing 30-40 documents that were backed up using traditional backups to tape. These were archived test reports that were several years old when it was discovered that they couldn't be opened. We had 6 months worth of backups on tape, and we could restore the backups, but the files on the backups were still corrupt. It was obvious that the corruption had happened many months previously (perhaps years) but had simply gone undiscovered.

The best strategies to avoid this are the same as they pretty much always have been, basically, having more than one copy of your backups. Of course you can script SureBackup to run things like CHKDSK on such volumes and report any filesystem corruption at the MFT level, but this is still unlikely to detect "rotten bits" in the backup. This is also why we typically suggesting running a "real" full backup at least every few months to help protect against this possibility. You can always perform this more often.

That being said, "bit rot" is much less likely with modern storage systems. All modern drives have ECC correction capability and will redirect blocks when there are failures, most RAID systems have background scan capabilities, and RAID6 provides added protections against single bit errors since there are two available checksums. Honestly, saving backups to reliable disk storage is likely to be far safer than tapes have ever been.
rawtaz
Expert
Posts: 100
Liked: 15 times
Joined: Jan 27, 2012 4:42 pm
Contact:

Re: Question on backup integrity

Post by rawtaz »

Thanks Tom, good summary there.

Indeed it's not unique to Veeam in any way. There is no checksumming going on in Veeam backups then, as I understand you.

I was mostly puzzled by what the technician said as I didn't think it made much sense, and I guess it didn't. Somehow we apparently failed to communicate.

Thanks again for clarifying and commenting.
chrisrd
Service Provider
Posts: 8
Liked: never
Joined: May 03, 2013 1:22 am
Full Name: Chris Dunlop
Contact:

Re: Question on backup integrity

Post by chrisrd »

Per thread above, as at early 2012 Veeam provided no method to ensure the integrity of your backups, e.g. checksums of vbk, vbr files etc.

Can anyone confirm that this is still the case, e.g. in V7?

And, if this facility is still not provided, are there any plans to provide some method to guarantee backup integrity?
foggy
Veeam Software
Posts: 21139
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Question on backup integrity

Post by foggy »

Chris, no changes in this regard so far. You can also review some considerations regarding that here. To be 100% sure that your backups work, please use SureBackup functionality.

Btw, the "Enable automatic backup integrity checks" setting referred above ensures physical data integrity of the full backup file.
veremin
Product Manager
Posts: 20414
Liked: 2302 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Question on backup integrity

Post by veremin »

Additionally, if for some reason you can’t perform test restores you can put into use small utility called backup validator in order to check that the content of backup file is unchanged.

Thanks.
rawtaz
Expert
Posts: 100
Liked: 15 times
Joined: Jan 27, 2012 4:42 pm
Contact:

Re: Question on backup integrity

Post by rawtaz »

v.Eremin wrote:Additionally, if for some reason you can’t perform test restores you can put into use small utility called backup validator in order to check that the content of backup file is unchanged.
If I click the above link I arrive at a message from this forum software saying "You are not authorised to read this forum". Is this expected? I'm curious about the utility, sounds useful.

Thanks!
veremin
Product Manager
Posts: 20414
Liked: 2302 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Question on backup integrity

Post by veremin »

Is this expected?
Fixed it already. And below a short description of this tool that used to be in forum digest several months ago:
If you cannot perform test restores (for example, there is no infrastructure where offsite media is stored), you can use the backup validator support tool instead (included in 6.5 installation directory). This tool merely reads all blocks from the backup file, and ensures that each block's content matches the corresponding block's CRC that we include to ensure backup file modification or corruption is detected during restore. While the backup validator is a very basic tool, and does not perform full blown recoverability testing like the SureBackup functionality, it may still be useful in scenarios when you simply want to ensure that your backup file's contents are unchanged. For example, consider using this tool after transferring the backup files over a WAN, or after a storage disaster involving malfunctioning RAID controllers.
Hope this helps.
Thanks.
rawtaz
Expert
Posts: 100
Liked: 15 times
Joined: Jan 27, 2012 4:42 pm
Contact:

Re: Question on backup integrity

Post by rawtaz »

Now it works, thanks man!
veremin
Product Manager
Posts: 20414
Liked: 2302 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Question on backup integrity

Post by veremin »

You’re welcome. Should any additional help be needed, feel free to contact us. Thanks.
dr.Koen
Influencer
Posts: 19
Liked: 1 time
Joined: Jul 23, 2013 12:11 pm
Full Name: Koen Gryspeerdt
Contact:

[MERGED] Not possible to restore VM although SureBackup is O

Post by dr.Koen »

Hi

We use Veeam B&R 7.0.0.746 on a VMWare cluster. One of the backup jobs which runs daily has 20 VM’s and uses a reverse incremental scheme. This job has been running for months without any errors. I have a SureBackup job that I run manually now and then to verify if there are any problems, no errors either. I can initiate an instant recovery or restore individual guest files without problems.

Recently I wanted to do a full restore of one of these VM’s after an update that went wrong on that particular VM. To my surprise, the restore was not possible due to “Client error: Failed to decompress LZ4 block: Incorrect decompression result or length”. I tried several restore points but all of them went wrong. Finally, I deployed a completely new VM and recovered what I needed from the Backup Browser (which was luckiliy not so difficult in this particular case).

After further investigation it turns out that none of the VM’s in this backup job can be restored. There is definitely something wrong with the backup files. The backup repository is only a few months old (HP StoreEasy) and there are absolutely no indications of storage errors.
What I find very worrying here is that there was no indication at all that something is wrong with this backup set. I know that I should do an active full backup regularly and I will start doing that, but that is not enough to take my worries away. Surely, Veeam should be able to signal a problem like this.

Is there something that I can improve in our setup to avoid situations like this in the future?

Thanks in advance

Koen
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2800 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Question on backup integrity

Post by Vitaliy S. »

Hi Koen,

You may want to enable data block verification of the backup files in the SureBackup job wizard, this will detect this kind of backup file corruption. This is the new feature we have added to v7.

Thanks!
dr.Koen
Influencer
Posts: 19
Liked: 1 time
Joined: Jul 23, 2013 12:11 pm
Full Name: Koen Gryspeerdt
Contact:

Re: Question on backup integrity

Post by dr.Koen »

Thanks Vitaly. I must have overlooked this.
Rumple
Service Provider
Posts: 81
Liked: 14 times
Joined: Mar 10, 2010 7:50 pm
Full Name: Mark Hodges
Contact:

[MERGED] : LZ4 Error during Restore

Post by Rumple »

I have a serious concern with the reliability of the backup chains at the moment.
I have an Exchange server with about 1.5TB of data on it doing a nightly backup. Backups have all been completing successfully for months. I keep about 9 restore points or so (basically 2 full's) and then some.
I wanted to use the backups as a seed for a new replica but kept getting errors so I manually tried doing a restore. Unfortunately the OS disk (0:0) is corrupt with LZ4 errors when I try to restore. I can do a FLR no problem but I can't restore the VMDK. The other 14 disks seem to be fine.

THATS A PROBLEM....Case # 00523694, especially when the request from support is perform an active Full...yeah..I either spend about 20 hours doing 1 drive at a time each time I run the backup or I leave the system in snapshot mode for 1 week+. You can imagine how thats going to go.

I can understand my backups failing with an LZ4 if something has gone wrong...but how the hell are the backups completing successfully and are corrupt? If nothing else, the next FULL backup should have started the chain or errors.
How many of my other backups are corrupt and there is no indication until I need them?
Underlying storage checkdisk comes back fine...and I am ok if its a storage problem...but I should find that out during the next backup cycle...not the next restore.
veremin
Product Manager
Posts: 20414
Liked: 2302 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Question on backup integrity

Post by veremin »

I'm wondering what type of device you're using as backup repository. The underlying storage might have experienced notorious "bit rot" problem that resulted in backup data corruption and went completely unnoticed to VB&R.

The best way to be prevented from such issues is to have more than one copy of backups and test the backup data recoverability. For instance, SureBackup should be able to track such problems.

Anyway, kindly, keep working with the support team. They will be able to shed more light on the root cause of such behavior.

Thanks.
Rumple
Service Provider
Posts: 81
Liked: 14 times
Joined: Mar 10, 2010 7:50 pm
Full Name: Mark Hodges
Contact:

Re: Question on backup integrity

Post by Rumple »

The backup server is a Dell server with a Perc 6 Raid controller. There are no issues with the drives according to chkdsk,
I've worked with support and they found the backup validator for me and I am running it against all my jobs now and so far it appears I have multiple jobs with bad backups.
I am ok with the explanation that something on the storage is doing it..I really am....but someone wrote a validator tool...why are you not using it as part of the backup process?

However that still doesn't fix the problem of the fact Veeam is happily doing synthetic rollups of my jobs, happily completing each backup job and erssentially doing it on useless backups.

I'm sorry, but that's unacceptable. Sure I can spend my time doing restores every night of every job to make sure they are working (which is how I found the issue doing a monthly test ) but shouldn't the program be able to pick that up at some point.

Multiple copies of my backups wouldn't have helped now would it since all I would have done is replicate the error to my other copy (garbage in, garbage out).

The fact of the matter is...everyone should be very concerned about the state of their backups if you can only find the problem during a restore or a manual validation of each job.
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2800 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Question on backup integrity

Post by Vitaliy S. »

I agree that multiple backup copies might not help here, but you can use SureBackup and enable data block verification of the backup files in the SureBackup job wizard, this will detect this kind of backup file corruption. This is the new feature we have added to v7.
davidb1234
Expert
Posts: 162
Liked: 15 times
Joined: Nov 15, 2011 8:47 pm
Full Name: David Borden
Contact:

Re: Question on backup integrity

Post by davidb1234 »

I hate to say it but I recently had the same issue.

Forever Reverse Incremenal Backup of our production SQL vm was reporting successful. However when we boot up this VM from the backup file one of the databases was corrupted. This database was fine in production and only corrupted in the backup file.

Running a full active backup with CBT enabled still produced a corrupted backup file. It wasn't until we disabled CBT and did a backup that the database was not corrupted in the backup file. We ultimately reset CBT data and now appear to be fine.

However it is EXTREMELY SCAREY that Veeam can go for weeks or months thinking it is backing up fine yet there is corruption in the file and the only way to notice is if you literally validate the data somehow. Sometimes this is easy(DBCC CHECKDB), sometimes this is much harder(file system or exchange).
davidb1234
Expert
Posts: 162
Liked: 15 times
Joined: Nov 15, 2011 8:47 pm
Full Name: David Borden
Contact:

Re: Question on backup integrity

Post by davidb1234 »

Vitaliy S. wrote:I agree that multiple backup copies might not help here, but you can use SureBackup and enable data block verification of the backup files in the SureBackup job wizard, this will detect this kind of backup file corruption. This is the new feature we have added to v7.
This feature did not detect the corruption in our backup.

Running an active full also did not create a valid backup file.

The only thing that resolved our corruption was disabling CBT in the backup job or resetting CBT data.

At no point was Veeam able to tell us that the file was corrupt or unusable. We had to stumble upon the issue when we needed to restore the data and found it to be bad.

We have very good fiber channel storage end to end so our storage cannot be to blame.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Question on backup integrity

Post by tsightler » 1 person likes this post

It sounds like CBT was to blame in this case. I've certainly seen cases where it appears that CBT data doesn't change from one day to the next which results in corruption of backup. This isn't really something Veeam could detect because the problem is that the CBT mechanism is not returning the complete/correct list of blocks to be backed up. I'm not sure what exactly leads to this issue, but I've seen it in two different environments.

However, Surebackup can indeed be used to detect this, but it would require creating a custom script that validates the databases as part of the verification process. By default it simply checks that the DB starts correctly. The same could be done with Exchange or perhaps to a limited extent, files.
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Question on backup integrity

Post by Gostev »

SureBackup script can validate just about anything. Some customers have shared really cool scripts with us, I believe we are planning to enhance the default application verification scripts in v8 based on that (at least for SQL), but of course you don't have to wait, just create your own script that validates whatever you feel is necessary.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Question on backup integrity

Post by tsightler »

Any product can backup garbage if the OS from which the data is being read doesn't provide good data. Many years ago we had an issue where some important files were discovered to be corrupt on a file server. Unfortunately, while these files were important, they were not accessed particularly often, and when it was discovered they were unreadable we found that our file level backups going back months were still damaged. We were able to recover some of the oldest files from archive tapes but most of the newer files that were only on monthly backups were still corrupt. It appeared to be corruption within the NTFS filesystem as even the metadata on the files were damaged, however, the OS would copy the files, bad metadata and all, so the backup software happily backed up the corrupt files and metadata information. Verifying integrity of files is quite difficult.

In the case of Veeam we are trusting that VMware provides us valid information about the blocks to be backed up. If for some reason this isn't the case, which certainly sounds like that was the problem if resetting CBT fixed the issue, then it would be very difficult for Veeam to detect this. Even a backup validation wouldn't check this and it would impact any product that uses the VMware APIs for backup. Surebackup technology with custom scripting may very well be the only solution on the market that offers any method of detecting this type of issue.
davidb1234
Expert
Posts: 162
Liked: 15 times
Joined: Nov 15, 2011 8:47 pm
Full Name: David Borden
Contact:

Re: Question on backup integrity

Post by davidb1234 »

Gostev wrote:SureBackup script can validate just about anything. Some customers have shared really cool scripts with us, I believe we are planning to enhance the default application verification scripts in v8 based on that (at least for SQL), but of course you don't have to wait, just create your own script that validates whatever you feel is necessary.

Support provided me a Sqlchecker script. Unfortunately it would not have caught these issues as the database was corrupted but still online/mounted. Only a manual DBCC CHECKDB caught the corruption.

The SQLchecker script that Veeam support is handing out as a fix only checks the databases to see that they are all online and mounted, not that there is block corruption in them. The only way to do that is manually that I know of.

It appears that CBT is more prone to corruption than taking file level backups so identifying these bad blocks and replacing/repairing them is very important. Veeam/VMWARE needs to start working on this. My perception is that this is happening more than people realize and just don't notice it until they need the data on those backups and find out CBT was corrupted and therefore the data is corrupted.

I've never run in to corrupted backups in my life until I started using products that bank on CBT like Veeam. Now it comes up from time to time and it can really ruin your day.
larry
Veteran
Posts: 387
Liked: 97 times
Joined: Mar 24, 2010 5:47 pm
Full Name: Larry Walker
Contact:

Re: Question on backup integrity

Post by larry »

Vitaliy S. wrote:I agree that multiple backup copies might not help here, but you can use SureBackup and enable data block verification of the backup files in the SureBackup job wizard, this will detect this kind of backup file corruption. This is the new feature we have added to v7.
is this the checkbox "Enable automatic backup integrity checks"?
veremin
Product Manager
Posts: 20414
Liked: 2302 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Question on backup integrity

Post by veremin »

I believe Vitaliy was talking about "Validate consistency of virtual machines' backup files" option that can be found in the settings of SureBackup job. (SureBackup Job -> Settings -> Job validation). Thanks.
Post Reply

Who is online

Users browsing this forum: No registered users and 26 guests