Comprehensive data protection for all workloads
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Best strategy for fault tolerance with failed backups.

Post by bhendin »

I had a prior problem with Veeam not being able to restore a large file.

I posted the basic issue here:
http://forums.veeam.com/veeam-backup-re ... 26249.html

In short I was getting:

Code: Select all

Exception from server: Failed to decompress LZ4 block: Incorrect decompression result or length (result: '-865272', expected length: '1048576').
                              Unable to receive file.
Which seemed to be an error in the actual file itself. In the end, I was not able to use veeam to restore this data and was luckily able to piece together the majority of the data from other sources.
(https://www.veeam.com/kb1795)

Since veeam performs an integrity check on its own backup, I can only assume that veeam believed this file to be a healthy backup file. As such, it is possible that a disk issue corrupted this backup at some point after verification.

What I want to prevent is this happening again, so I am looking to do one or both of the following:

1) Validate that backup files are good without having to actually restore them. I can't believe that an actual restore is required to validate the ability to properly read/decompress a file - there must be a way to stream this through without actually restoring. How can this be done?

2) I need a good strategy to prevent this from happening again - or at least mitigate this as much as possible with the existing infrastructure we have.

At present we have two on-site storage locations that can hold a full backup of all VMs. Let's call them StorageA and StorageB. And, for ease of discussion let's just say they are both 1TB and that a standard full backup of all data is approximately 500GB.

Ideally, we don't want to use both StorageA and StorageB fully for these backups, but approximately half.

I was thinking that I could do a 500GB backup to StorageA and then do a copy job to StorageB. However, this would seemingly not prevent the issue where the file itself is corrupt - i believe that the copy job would simply continue to copy the corrupt file.

Then I thought of setting up two backup jobs independent on staggered days. One that backs up to StorageA and the other to StorageB. However, I'm not sure how say an incremental from jobB is going to effect the next incremental from jobB, since I'm not sure how Veeam marks the data that has been backed up.

The other possibility is having more than one full backup from a single job stored at any given time. Doing this I can utilize the entirety of StorageA (but of course this won't mitigate a full disk failure).

I realize that there are best practices, but I'm confined by the available resources.

What technique is best to that I am more likely to recover from another source if I have the above issue again.

Thanks.
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by Gostev »

bhendin wrote:What I want to prevent is this happening again, so I am looking to do one or both of the following:

1) Validate that backup files are good without having to actually restore them. I can't believe that an actual restore is required to validate the ability to properly read/decompress a file - there must be a way to stream this through without actually restoring. How can this be done?

2) I need a good strategy to prevent this from happening again - or at least mitigate this as much as possible with the existing infrastructure we have.
In fact, this is exactly the requirements that our SureBackup functionality is designed to address - efficiently and in a fully automated fashion :wink:
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin »

Gostev wrote:In fact, this is exactly the requirements that our SureBackup functionality is designed to address - efficiently and in a fully automated fashion :wink:
Thanks for the sales pitch. I think its a bit strange that one has to buy a program to verify that the backups taken and stored by another program are in good order.
That to me seems like something the main program should do itself.
Additionally, I still don't see why it is necessary to mount the backup, or indeed how a mount would necessarily indicate that the entire backup is good. It would seem that streaming the restore, but actually discarding the data on destination disk (or not actually committing the write) would be more efficient/definitive. The reason I can't restore to test isn't the time - it is that there isn't enough destination space to restore to.

Nonetheless, I'll accept that there is no solution short of paying you more money to accomplish #1. That's not going to happen for most if not all of my customers that are using the product.

What I need therefore is the best backup job strategy as discussed in #2 that can mitigate the risk.

Thanks.
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by veremin »

Not sure why you call SureBackup a side product. This is not a side product, but rather a functionality included in Veeam Backup and Replicaiton. Or you're talking more about product editions?

And since this functionality uses vPower NFS Service, VMs are run directly from backup files without being restored to the production datastore.

As to the best strategy, we've been always recommending the following approach - backup jobs (primary backup) -> Surebackup (data validation) -> backup copy job (secondary location).

Thanks.
Amyd
Influencer
Posts: 11
Liked: 1 time
Joined: Dec 14, 2012 6:42 pm
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by Amyd »

SureBackup is great for some usage scenarios, however it can be extremly dangerous to rely on it if you have large VMs which are used as datastore (file servers, etc.). It is perfectly possible for the VM to start just fine in SureBackup and pass all the relevant tests, and *STILL* be corrupt, if the corrupted block only stored some file, somewhere which isn't touched by default by the OS (possibly not even on the same virtual disk, since most will use a 2 disk layout for file servers). As far as I understood from the OP, this is actually what happened to him, and I don't believe SureBackup would have necessarily caught this in their case.

So, it would be great if Veeam marketing would put a bit more nuance on what SureBackup can and more importantly can't catch.

But I believe that Veeam is well aware that there are cases where SureBackup does not work, since the new Health Check Option for primary backup jobs in V9 looks like exactly the right tool to identify such problems.
Shestakov
Veteran
Posts: 7328
Liked: 781 times
Joined: May 21, 2014 11:03 am
Full Name: Nikita Shestakov
Location: Prague
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by Shestakov »

Of course, bigger the VM and its backup, more reliability of data corruption it has, but Surebackup tests all only main scenarious.
Health Check option for backup jobs is made primarily as an analog of surebackup for those on standard edition. But using it jointly also make sense.
Thanks for the feedback, by the way, we will take it into account.
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin » 1 person likes this post

v.Eremin wrote:Not sure why you call SureBackup a side product. This is not a side product, but rather a functionality included in Veeam Backup and Replicaiton. Or you're talking more about product editions?

And since this functionality uses vPower NFS Service, VMs are run directly from backup files without being restored to the production datastore.

As to the best strategy, we've been always recommending the following approach - backup jobs (primary backup) -> Surebackup (data validation) -> backup copy job (secondary location).

Thanks.
You're correct. However I must say that the marketing/documentation of this feature leaves a lot to be desired.

Following the link from Gostev brings you to a page of for the features of "Veeam Availability Suite v8."
If you look on the feature page for just "Veeam Backup & Replication" there is no mention of SureBackup. This makes it look like it is a feature only available in Availability suite.

I eventually found this which describes how to set it up. Additionally, since the SureBackup job is hidden until you setup the app groups and lab, it is not very intuitive at all that the feature exists. Personally I think the icon should always show, and when you click on it it should tell you that you need to setup the labs first.

So, I apologize to Gostev for the "sales pitch" comments - as the feature is there. However please take note of my comments regarding your marketing and gui implementation.

I have some more questions/comments which I will include in my next response.
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin »

Amyd wrote:SureBackup is great for some usage scenarios, however it can be extremly dangerous to rely on it if you have large VMs which are used as datastore (file servers, etc.). It is perfectly possible for the VM to start just fine in SureBackup and pass all the relevant tests, and *STILL* be corrupt, if the corrupted block only stored some file, somewhere which isn't touched by default by the OS (possibly not even on the same virtual disk, since most will use a 2 disk layout for file servers). As far as I understood from the OP, this is actually what happened to him, and I don't believe SureBackup would have necessarily caught this in their case.

So, it would be great if Veeam marketing would put a bit more nuance on what SureBackup can and more importantly can't catch.

But I believe that Veeam is well aware that there are cases where SureBackup does not work, since the new Health Check Option for primary backup jobs in V9 looks like exactly the right tool to identify such problems.
Yes, Amyd is correct in that this is exactly what happened. The file server data disk (which is configured as independent) was corrupted and unable to restore.

I'm looking at this SureBackup, and I'm not sure this fits my needs. Firstly, it looks overly complex a solution to verify a backup. I really don't want to configure virtual networks and application tests.
What is needed is a way to verify that the data moves from the source disk, over the network, is written to the destination disk, compressed and verified. I'm pretty sure that Veeam already does most of these things with integrity checks along the way. If not, I'm not sure why a modern backup product can't do these things. Computing file integrity isn't something new, one just has to make sure it is being done. Sure, it takes longer to do it, but that's why they are made option.

Additionally, it appears that SureBackup is working by restoring the VM to a virtual lab on an ESX box. To be quite honest, I'm not sure how this is different than any of the other "live" restore options available (e.g. mounting on an NFS store, configuring a replication job, etc) It all seems extremely similar. It does appear that the SureBackup provides automated testing of services and ports - but as Amyd states this doesn't verify that the entire server's file integrity is valid (or at least as valid as the source server was at time of backup).

Also, as stated - I don't have the space to be duplicating large data volumes just for validation. One shouldn't have to bring a system online to verify that the file content of the server match what was originally taken from the source. If Veeam can validate 100% that a backup is verified to be good at time of backup, why can't it also continue to keep tabs on those backups to verify that further disk events don't corrupt them?

I do note however that the SureBackup job has the option "Validate entire virtual disk contents (detects silent data corruption)." Will this option do anything like I need? Remember I need this validation for data disks (secondary) not just the (primary) system disk.

Anyway, please consider the critique above as constructive for the needs of customers. I realize that my resources are somewhat constraining on what I can do - but expecting customers to effectively double/triple/quadruple their storage for the "best practices" is not feasible for many.

SureBackup aside - I'm still awaiting some guidance on how to best configure some type of duplicate backup to minimize loss due to bit rot.
I will once again post a strategy in the next post (to break this up) for comments.

thanks
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin »

So - back to strategies:

My initial thoughts are to do the following:

1) Job1 - Incremental backup of all systems to StorageA with synthetic fulls on Sunday. Run on M,W,F,Su
2) Job2 - Incremental backup of all systems to StorageB with synthetic fulls on Saturday. Run on T,R,Sa

I realize here the possibility of losing one day of data if the last backup is corrupted and I need to resort the alternate job. I can live with that.

My question is will these even work? Can you have two separate jobs backing up the same systems incrementally? Or will one interfere with another?
If this won't work (or if there is a better solution) can someone recommend?

Thanks.
Amyd
Influencer
Posts: 11
Liked: 1 time
Joined: Dec 14, 2012 6:42 pm
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by Amyd »

Somehow I had missed the option for CRC-checking backup contents for a SureBackup Job :oops:. So I take it back: with that option on, it should catch even such corruption errors.

Anyhow, it's great that Veeam will be providing this option even for Standard users in v9, because it is definately a dangerous issue.

By the way, Backup Copy jobs should catch such errors when the main backup becomes corrupted, at least if they are not just copying the incremental changes. Whenever a Backup Copy Job copies something from the primary repository to the secondary, it seems to do a decompress and then recompress of the data, at least in our system it caught such a silent corruption on the primary backups.
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin »

Amyd wrote:Somehow I had missed the option for CRC-checking backup contents for a SureBackup Job :oops:. So I take it back: with that option on, it should catch even such corruption errors.

Anyhow, it's great that Veeam will be providing this option even for Standard users in v9, because it is definately a dangerous issue.

By the way, Backup Copy jobs should catch such errors when the main backup becomes corrupted, at least if they are not just copying the incremental changes. Whenever a Backup Copy Job copies something from the primary repository to the secondary, it seems to do a decompress and then recompress of the data, at least in our system it caught such a silent corruption on the primary backups.
Well that's good news on both fronts. I am leaning more towards the Backup Copy job then, since we have more space outside our ESX storage environment.
Is there any confirmation from the Veeam team that a backup copy does a re-verification and to what level?

If so, what is the best method to get this benefit? Do I have to do a backup copy only on a standard full backup, or will it work on incremental/synthetic?

thanks.
Shestakov
Veteran
Posts: 7328
Liked: 781 times
Joined: May 21, 2014 11:03 am
Full Name: Nikita Shestakov
Location: Prague
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by Shestakov »

bhendin wrote:My question is will these even work? Can you have two separate jobs backing up the same systems incrementally? Or will one interfere with another?
If this won't work (or if there is a better solution) can someone recommend?
Yes it will, but expect double size of backup data. Also, if you don`t believe in Surebackup and want to get more sure, you can use Active full rather than synthetic one.
Thanks!
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin »

Shestakov wrote: Yes it will, but expect double size of backup data. Also, if you don`t believe in Surebackup and want to get more sure, you can use Active full rather than synthetic one.
Thanks!
1) My understanding is that synthetic merges the incrementals into a Full when instructed, so that backups take less time and restore can be done from a single file. Is this correct?


2) Apart from an Active full just taking longer to backup, why would you say it is inherently "safer" than synthetics? Isn't there as much risk that a large Active file can be corrupted rather than smaller incrementals?

3) Can you provide any information hinted at earlier that a copy job will actually re-verify the backup files?

4) Can you confirm that SureBackup requires space on an ESX datastore to perform? Or can we do the SureBackup test to Veeam NFS storage off the backup infrastructure? If the latter, can you confirm that the verification options listed above for SureBackup truly will confirm the data integrity of the secondary data drives?

thanks.
VladV
Expert
Posts: 224
Liked: 25 times
Joined: Apr 30, 2013 7:38 am
Full Name: Vlad Valeriu Velciu
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by VladV »

bhendin wrote: Additionally, it appears that SureBackup is working by restoring the VM to a virtual lab on an ESX box. To be quite honest, I'm not sure how this is different than any of the other "live" restore options available (e.g. mounting on an NFS store, configuring a replication job, etc) It all seems extremely similar. It does appear that the SureBackup provides automated testing of services and ports - but as Amyd states this doesn't verify that the entire server's file integrity is valid (or at least as valid as the source server was at time of backup).

Also, as stated - I don't have the space to be duplicating large data volumes just for validation. One shouldn't have to bring a system online to verify that the file content of the server match what was originally taken from the source. If Veeam can validate 100% that a backup is verified to be good at time of backup, why can't it also continue to keep tabs on those backups to verify that further disk events don't corrupt them?
One thing to mention about Surebackup that I think you got it wrong is that you do not need any additional space to run a backed up VM. As with instant VM recovery, Veeam registers and runs the VM right from the backup file. The only resources needed are ESXi host RAM and some CPU.
Shestakov
Veteran
Posts: 7328
Liked: 781 times
Joined: May 21, 2014 11:03 am
Full Name: Nikita Shestakov
Location: Prague
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by Shestakov »

1) Yes, restore can be done from the single file just like from the active full backup. The difference is that Active full is made directly from production host while synthesized is made of previous full and incrementals.

2) In general Active full is considered to be safer because in a case one of incremental files has a corrupted data block, it may provide it to the synthetic full.

3) I believe we are talking about this link. And here is a link to related discussion.

4) Yes, Surebackup verification requires additional resources. But in the case of vPower NFS the used performance can be much lower because instead of a direct connection between the ESX(i) host and the backup repository, the connection will be split into two parts: ESX(i) host to NFS server and NFS server to backup repository.
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin »

I'm unsure why I am unable to get a direct answer on most of these questions?

Apart from "gut" feelings - how is an Active full safer? If i have 1 1GB file stored on a disk or 10 100MB files stored on the same disk, one bad block in my 10 files will ruin the entire backup. But, by the same token I am just as likely to get 1 bad clock in my 1GB file as both take up the same amount of space. There has to be something else fundamentally wrong with how synthetics work that make them riskier than this.

The link you provided to the discussion on health check on the Backup Copy job states "Backup Copy job has built-in integrity check that ensure the copy is bit-identical."
So all that does is verify that destination matches the source. If the source is somehow corrupted that explanation means that the backup job won't validate the backup can actually be restored (if the source backup is already corrupt).

I've got a really simple question here that a backup program the caliber of what Veeam advertises should be able to address.

What are the methods i can use to ensure that my backup can be restored without having to actually restore the backup?

SureBackup looks like it isn't going to be an option as it is way overly complex for simple verification. The idea that I should actually need to boot a backup to verify it is good is ludicrous. We are talking bits and bytes here. If the data is good on the original VM and that data is backed up and verified (and continually verified) from corruption, then we can be sure that what was a 1 is still a 1 and what was a 0 is still a 0. As long as those are in place it should restore as original.

When I say it is overly complex - this environment consists of multiple virtual switches which are all routed together by a software (VM) based router. I have spent over an hour looking through documentation attempting to figure out how to properly validate a server that is on a different subnet/switch than my Veeam server. I'm not sure this is even possible - and if it is it would seem I would probably have to manually hack the "Virtual Lab" switch to get my routing device on it.
To do any/all of this so simply verify my backup can restore is over the top.

I can't justify to my clients to invest in a software that can't offer some degree of reliability without enormous expenditures to double or triple their storage requirements.

What is the solution?
VladV
Expert
Posts: 224
Liked: 25 times
Joined: Apr 30, 2013 7:38 am
Full Name: Vlad Valeriu Velciu
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by VladV »

bhendin wrote: SureBackup looks like it isn't going to be an option as it is way overly complex for simple verification. The idea that I should actually need to boot a backup to verify it is good is ludicrous. We are talking bits and bytes here. If the data is good on the original VM and that data is backed up and verified (and continually verified) from corruption, then we can be sure that what was a 1 is still a 1 and what was a 0 is still a 0. As long as those are in place it should restore as original.
From what I know Veeam ensures that your backup is 1 to 1 and 0 to 0 while it's transporting the data so you know that each sector written is true. After the data is on disk and the source has it's snapshot removed, with what is there to compare it to? How can a backup software guarantee that the backup you just made is in good state and not affected by file system corruption or HDD corruption? The term "continually verified", in my experience, is not found in any backup software in relation to its restore points.

For added assurance that your VM was quiesced accordingly and that your apps were captured in a consistent way there is Surebackup. With this tool you know exactly in which state your VM will be in case you recover it.

For backup integrity there is the Backup file integrity check, an option in Surebackup.

If I remember correctly there should also be a standalone tool that you can run manually and check the files against silent data corruption. If someone knows exactly, please post, I will too if I find the relevant info.

But for what you are asking, continually verified restore points, there is no feasible solution. Running integrity checks constantly against restore points will take a lot of time and resources. It is better to ensure that you have multiple copies (3-2-1 rule) and that your storage media (array) is in good health.

LE: Here you can find the details regarding manual verification without Surebackup. You can schedule it to run at your convenience: http://www.virtualtothecore.com/en/veea ... kup-files/
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin »

Thanks for that feedback VladV. Again, the original issue I had experienced was outlined here:

https://www.veeam.com/kb1795

In short - during restore Veeam was unable to decompress the file - as I eliminated all other network issues the root cause was apparently a corrupted file.
My point is that I shouldn't have to wait until I try to restore to know the software can't restore it. This type of validation should be able to occur on a set schedule and, in fact, perform actions based on it.

I wonder if the Validator file you linked above would have caught this issue. Funny how it wasn't even suggested by Veeam team at the time to verify the file or not.
For something as critical as backups it would be a nominal programming task to have the application perform these validation steps automatically on some set schedule and, if an issue is found, re-mediate it immediately (such as performing a new full-backup). It seems all the pieces may already be in the product, but no one seemingly has sought to chain them all together into something that makes sense from a peace-of-mind standpoint?

I can't believe what I am asking to do is so "out-of-this world" that I should be looked at like I have three heads.
For smaller businesses, time is often more available then resources and asking to verify that a backup can actually be restored would be preferable to restoring it.
Because honestly, even if I keep 10 copies of a backup they are all totally worthless unless I have validated that at least one can be restored. So why even keep 10? Or 5, or 3?
Extra copies of backups at multiple location are fine for recovery objectives and off-site security, but a single good backup is all you need. How can you make sure it is really good?

To have Veeam sit there chugging along day after day performing incrementals with synthetic fulls never once complaining of an issue until one actually tries to restore seems a little smoke and mirrors on a "it just works" solution.

Again - I realize there is no one perfect solution, especially given constraints - all I'm asking for is the best strategy for backup and verification based on how the software operates to minimize this from happening again.

Right now it appears that I should either do two complete separate backups with weekly fulls to different storage, or one job keeping 2 full copies at all time.

Even then, my chances are minimized but I have no way to confirm that either backup is actually good. SureBackup is overkill and the Validator has to be manually run?
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin »

On the SureBackup note - I think (though not entirely sure) that I might have got Surebackup to properly check a backup of a system that was on a different subnet.

I got a message that said " Network adapter 1: IP address 172.23.10.1, failed - destination host unreachable" followed 15 seconds later by a success - so I don't know if that is normal.

The bigger issue is that I did not see any sort of file verification take place. When i looked at the job settings there is the following option under the integrity check:

"Skip validation for application group VMs"

It is checked and grayed out so you can't change it.

This makes 0 sense to me based on all the docs and tutorials I am reading for SureBackup. SB requires you to create an application group and then choose that application group for the job.

So if you need an application group, but you can't verify anything in an application group...what's the point?
VladV
Expert
Posts: 224
Liked: 25 times
Joined: Apr 30, 2013 7:38 am
Full Name: Vlad Valeriu Velciu
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by VladV »

I understand what you are saying but think about it a little. What does it mean to check the restore point? It means to go through it and validate it's consistency. This will take a lot of time depending on the number of backup jobs and size of them. That is one reason it is not used that much. But if you need it you can script it and run it in a scheduled task.

The probability of having an underlying storage corruption without you knowing is low and even lower to have to rely on a corrupted restore point that resulted from fs corruption. It happened to you and I imagine it was a nightmare. But the fact is that Backup software is backup software and storage is storage. You should keep your backing storage in good health and check it first. If you stumbled on an error then recheck your restore points that were reported affected by the fs checker.

My advice is to keep short chains and not use synth full backups. In that case, if this low probability situation strikes you twice, then you only loose a chain. If you decide to use backup copy or maybe a simple full backup on a different media approach, then you can further limit the damages.

Again, in my opinion, even if it happened to you, if you keep your storage in check you should not see this problem. Veeam ensures that the restore point is in good health once it is created. After that, there is the validator (which can be run weekly if you choose to use weekly synth backups) or Surebackup which you can use to test specific apps.
VladV
Expert
Posts: 224
Liked: 25 times
Joined: Apr 30, 2013 7:38 am
Full Name: Vlad Valeriu Velciu
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by VladV »

bhendin wrote: "Skip validation for application group VMs"

It is checked and grayed out so you can't change it.
That's because you either did not add an application group (you are not required to do so) or you did not check mark the Validate entire..... option. An application group is used for VMs that need to be powered on for the entirety of the surebackup job. A good case is when you test an exchange server and other servers that need DCs. You add the DCs in the app group and the other VMs in the linked jobs. That way, surebackup will power on the DCs, verify them and keep the up and running until the last VM from the linked jobs category is tested.

I agree that Surebackup is not that easy to use, but it is that way because it can manage complex networking scenarios. My advice is to read the manual about Surebackup, then go through the forum and finally ask support for assistance. This is a useful tool to have in many situations.
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by Gostev »

Wow, looks like I missed quite a discussion! Vlad, thanks a lot for your help.

So, have all of the negative earlier comments and concerns regarding SureBackup functionality been "taken back" by now? ;)

Or do some still stand?
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin »

VladV wrote: That's because you either did not add an application group (you are not required to do so) or you did not check mark the Validate entire..... option. An application group is used for VMs that need to be powered on for the entirety of the surebackup job. A good case is when you test an exchange server and other servers that need DCs. You add the DCs in the app group and the other VMs in the linked jobs. That way, surebackup will power on the DCs, verify them and keep the up and running until the last VM from the linked jobs category is tested.

I agree that Surebackup is not that easy to use, but it is that way because it can manage complex networking scenarios. My advice is to read the manual about Surebackup, then go through the forum and finally ask support for assistance. This is a useful tool to have in many situations.
One of the reasons for the complexity is the very poor documentation that you suggest I read...

http://helpcenter.veeam.com/backup/80/v ... iw_vm.html - Specifically states:

To perform VM verification, you need to create the following entities:

Of which an Application group is one of them.

You are correct of course - that an Application group is not needed. After I did some experimentation I found this to be the case.
You can apparently use the Link job option instead of the application group.

I did a link job, and it did successfully verify. Ironically enough, the Application Group - meant to allow you to start multiple machines together - is also the only way you can do a single machine as the Link group will force you to do all machines from a single backup.

The additional issue was that the option to keep the VMs booted was checked (as I wanted to investigate how they were being accessed). I did find this last piece documented eventually.
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by Delo123 »

Did you have a look at Veeam.Backup.Validator? We also don't really trust Surebackups for big vm's since not all data is actually scanned by default (not sure if there is an option at all),
anyway, backup validator at least seems to read all files, so i assume it should be trusted :) Anyway, we regularly scan our backup files with it!

EDIT: Sorry, didn't refresh this morning, backup validator was already mentioned... At least it actually seems to read everything within the backup file, so in case of actual hardware / filesystem issues at least they will show up!

Ps. We also had some decompression issues in the past, they were 100% related to the underlying raid (parity raid 5 which was not regularly scanned by the raid controller)
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin »

Yeah...it was mentioned as you said. I'd like to get some clarification from you or the Veeam team regarding the statement that Surebackup doesn't scan all data.

Assuming the "verify" option is on in Surebackup, and it passes these tests, is there any reason that it would fail Validator?
IOW - is Validator a more robust check that will catch things Surebackup + Verify will not?

(I still don't have confirmation that SureBackup will verify all VM disks and not just OS).

As far as the underlying issue being RAID/storage...I don't doubt that. I also agree with other posters who states that it is very difficult to track integrity on ongoing backups. However, it is not impossible (and actually trivial) to validate that an application can read its own data file - which is really what we are talking about with a backup program. I'm still a bit bewildered why Validator is hidden away as opposed to being a big button on the job options screen.
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by Gostev »

bhendin wrote:Assuming the "verify" option is on in Surebackup, and it passes these tests, is there any reason that it would fail Validator?
IOW - is Validator a more robust check that will catch things Surebackup + Verify will not?
It's the same type of check, so neither is more robust in catching bit rot.

Generally speaking, I don't recommend relying on Validator, because it gives you very little value beyond false sense of protection. Validator is only able to catch bit rot issues, which are 100 times less likely that those recoverability issues that SureBackup is able to catch. In other words, SureBackup can catch all those rare data corruption issues Validator can - and many more of much more common types of issues.
bhendin
Influencer
Posts: 24
Liked: 1 time
Joined: Oct 10, 2013 3:37 am
Full Name: Ben H
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by bhendin »

Thanks for that confirmation Gostev. Can you confirm the last (current) open question that SureBackup is going to validate all disks in the backup, not just the OS disk?
And that there are no qualifications as to whether that disk is configured as independent or not?
chrisdearden
Veteran
Posts: 1531
Liked: 226 times
Joined: Jul 21, 2010 9:47 am
Full Name: Chris Dearden
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by chrisdearden » 1 person likes this post

no qualifications as to whether that disk is configured as independent or not?
Independant discs can't be part of the snapshot , so dont get backed up.
VladV
Expert
Posts: 224
Liked: 25 times
Joined: Apr 30, 2013 7:38 am
Full Name: Vlad Valeriu Velciu
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by VladV » 1 person likes this post

Delo123 wrote: At least it actually seems to read everything within the backup file
It doesn't read anything within the backup file, it checks the integrity of the backup file itself. You will not be able to catch in guest problems, like vmdk corruption or guest ntfs corruption or application startup issues. Maybe you meant to say the same thing and it is just a matter of expression.

Like Gostev said, surebackup is the way to go and if you don't trust your underlying storage that hosts you backup files you should check the option within surbackup to check the backup file integrity. It will scan the whole backup (of course excluding items that are not included from the start in the backup like independent disks).
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: Best strategy for fault tolerance with failed backups.

Post by Delo123 »

Hi Vlad,

thx! The time we had underlying storage issues there was no option to check backup file integrity in surebackup, so the backupvalidator was the way to go I guess.... But yes, I agree it doesn't check anything within the guest vm...
Our main reason for testing backup files was/is primarily because we run windows dedupe on our repositories, so we actually "want" to just scan the file for bit rot / dedupe chunks issues :)
Post Reply

Who is online

Users browsing this forum: Amazon [Bot], jandrewartha and 131 guests