B&R 9 : Health check of VM Backup Files needs so long time

WinstonWolf · Post by **WinstonWolf** » Jan 28, 2016 8:22 am this post

Hello ,

i activated the Option "Perform Backup Files Health Check" , but it needs 10 Hours on some Backup Jobs .
Why this Option needs so long time ?

Michael

Post by **PTide** » Jan 28, 2016 10:23 am this post

Hi,

During health check VBR calculates CRC values for backup metadata and hash values for VM disks data blocks in the backup file and compares them with the CRC and hash values that are already stored in the backup file. The process can take a lot of time for large restore points. What's the average size of "some Backup Jobs" restore point?

Thank you.

WinstonWolf · Post by **WinstonWolf** » Jan 28, 2016 12:10 pm this post

We have an retention Time of 30 Days .
The Problem is that we have an Tape Job after the VM Backup Job running and this Job was waiting long time for finishing the check Backup Files Job .

I think it should be possible may be as an Feature to create own Jobs for the Health Check option .

Post by **PTide** » Jan 28, 2016 12:50 pm this post

So, how big is your average restore point?

WinstonWolf · Post by **WinstonWolf** » Jan 28, 2016 2:03 pm this post

What you mean ? The vib Files or the vbk Files .
The vbk File is 6 TB big the everage vib files are 100 GB big .

Post by **PTide** » Jan 28, 2016 3:22 pm this post

So you have 6Tb vbk + 29 .vibs (average size) 100GB. In order to make sure that your last restore point is vaild and can be used for recovery veeam needs to check all .vibs since your last full. In your case, if no intermediate fulls are present it is 9Tb of data to process. 10 hours looks good when talking about CRC check of 9Tb of data. To reduce the amount of time needed for health check consider configuring periodic fulls.

UPDATE at 3:46 PM (EST) 29/01.2016:

I checked with devs and it turned out that we actually check only blocks that compose the last restore point. Since those blocks can be spread across the whole backup chain that can be a lot of data to check, depending on many factors. Anyway 6Tb of data (even if compressed) that's still a lot work to do.

Thank you.

Post by **Gostev** » Jan 28, 2016 9:56 pm this post

Pasha, this is not correct as only the latest VM state is checked for consistency. As such, we do not read the entire content of each and every backup file in the chain. As such, periodic fulls will make no difference to the health check performance (as this will not change the amount of data that we will need to read from the backup storage). Please, double-check with the devs. Thanks!

WinstonWolf · Post by **WinstonWolf** » Jan 29, 2016 11:59 am this post

Ok , the i can forget the Health Check Option . It is too bad . Because the Time runs on the Backup Job Time .
And after this Backup Job there comes an Tape Job .

What i say before - I think it will be a good feature to have the possibility to create an own "Health Check Job "

I have the Feeling that on all new features from V9 something is wrong and other necessary features are Missing

lennis40 · Post by **lennis40** » Feb 12, 2016 12:57 pm this post

We've had a few lengthy health checks as well, which has brought up some questions on what might be best practice. Health checks by default are disabled on backup jobs, and enabled on copy jobs. There must be a reason why the option is available on both, even though they're using the same backup files for the health check process. Other than spreading out the times when health checks run, I assume Veeam recommends to run on copy job, as that's the default setting? Without parallel processing to cloud repository on copy jobs, would it make more sense to run a health check on the backup job where parallel processing is available?

When we run into a health check taking several hours, it's holding up the other copy jobs from running any tasks. Even running them on backup files may delay the time the copy job starts transferring during the interval. I'm just curious to get the different opinions on what might be the best way to go about alleviating some of the wait time on health checks. Is it also safe to say that if SureBackup is being used, with the backup file verification option enabled, that health checks can be disabled on both backup jobs and copy jobs? Thanks for any input!

Post by **PTide** » Feb 12, 2016 3:47 pm this post

There must be a reason why the option is available on both, even though they're using the same backup files for the health check process. Other than spreading out the times when health checks run, I assume Veeam recommends to run on copy job, as that's the default setting?

Health-check option has been added recently and is set to "disabled" state so it does not change the behaviour of backup jobs that had been already configured.

ven though they're using the same backup files for the health check process

That's not correct. Backup files produces by backup job and backup files produced by backup copy job are different sets of files. Backup copy health check protects your backup copy chain from storage corruption whereas backup health check protects you main backup chain from storage corruption.

would it make more sense to run a health check on the backup job where parallel processing is available?

No, it would not. However, there are some improvements on health-check planned in future releases.

Is it also safe to say that if SureBackup is being used, with the backup file verification option enabled, that health checks can be disabled on both backup jobs and copy jobs?

Yes if you run SureBackup on both backup and backup copy jobs. Did you mean backup validator when you said "backup file verification option"?

Thank you.

lennis40 · Post by **lennis40** » Feb 12, 2016 4:09 pm this post

The backup file integrity check option, in the settings menu of the SureBackup wizard, is the validation I was talking about.

So if the copy job is checking the copy job chain, that health check is actually happening on the target repository, or in this case the cloud repository?

Thanks for the information. We certainly look forward to improvements in future releases.

bkc · Post by **bkc** » Feb 15, 2016 4:57 pm this post

Hi,

Our backup copy job started a health check at the end of January and today I finally killed it off with only 61% complete. During the past 2 weeks the backup copy job didn't copy any jobs. (hardware details below)

I've disabled the health check feature in the job and restarted it.

After restarting I see that it is processing 4 out of 31 VMs simultaneously.

Would disabling parallel data processing reduce disk fragmentation on the target backup repository?

Would enabling use per-vm backup files possibly reduce the health check time?

infrastructure notes..

we have a 3 node vsphere 6 cluster (w/ CBT disabled) and SAN. Veeam backup software runs on a VM and stores data to an external physical linux server w/ RAID 1 via 1GB ethernet connection. The main backup repository is currently using 1.2TB of space

The backup copy job target repository is a ReadyNas 6 box w/ 6 hard drives in Raid 5, also reachable via 1GB ethernet. Yeah, the Readynas isn't great but it's not terrible. Those 4 vms are currently showing "x% completed at 3MB/sec"

The main backup repository machine is currently running 6 VeeamAgents, 0% load and 0.1 I/O wait state.

suggestions?

Post by **PTide** » Feb 15, 2016 5:48 pm this post

Hi,

Would disabling parallel data processing reduce disk fragmentation on the target backup repository?

No, it would not. If you want to reduce fragmentation please use compact full backup file feature.

Would enabling use per-vm backup files possibly reduce the health check time?

No, as the amount of data to be checked stays the same.

Please provide your bottleneck statistics.

Thank you.

bkc · Post by **bkc** » Feb 15, 2016 9:56 pm this post

Can you explain how fragmentation would not be reduced by having only one VM processed at a time? With 4 VMs all writing to the target at the same time their data blocks will be interleaved with each other as the target allocates free space to write to. Simultaneous writes is a classic cause of file fragmentation.

If I enable compact full, and all 4 vms start compacting their fulls at the same time, I'll have the same problem ... simultaneous writes to the same data store = fragmented files.

regarding bottleneck stats, the backup copy job just finished the first VM, 30 vms to go.

stats are very bad: 2/15/2016 4:31:13 PM :: Busy: Source 0% > Proxy 0% > Network 0% > Target 99%

The backup copy job doesn't report much for stats.. 7.4GB read at 533Kb/sec
Would be nice to have more stats regarding the target repository .. avg write speed vs avg read speed.

I guess this ReadyNas is a dog.

bkc · Feb 15, 2016 10:21 pm

wait.. 2 ethernet interfaces on the Readynas box.

eth0 - rsync speed is about 600kb/sec (this is the interface veeam is using)

eth1 - rsync speed 19 MB/sec

I see lots of framing errors on eth0.. well that's hopeful, something I can actually troubleshoot

--

after reconfiguring veeam to use the good interface, I'm now seeing processing rate around 35MB/sec. yeah!

Post by **PTide** » Feb 16, 2016 10:09 am this post

Can you explain how fragmentation would not be reduced by having only one VM processed at a time? With 4 VMs all writing to the target at the same time their data blocks will be interleaved with each other as the target allocates free space to write to. Simultaneous writes is a classic cause of file fragmentation.

My mistake. I was sure that something should had been invented in order to reduce fragmentation during parallel writes. My apologies.

If I enable compact full, and all 4 vms start compacting their fulls at the same time, I'll have the same problem ... simultaneous writes to the same data store = fragmented files.

Your logic is correct, that's why there is no parallel processing for compact operations.

after reconfiguring veeam to use the good interface, I'm now seeing processing rate around 35MB/sec. yeah!

Good to hear that. Feel free to contact us if any isuues arise.

Thank you.

bkc · Post by **bkc** » Feb 17, 2016 2:45 am this post

so the copy job has now been running for 28 hours and about 26 of those hours it's been at 99% complete state.

it seems that after the vms are copied over there's a bunch of house-keeping to perform, creating some kind of fulls or something and now it seems to be creating a GFS restore point.

I think it would be very useful if these house-keeping steps appeared after the named list of vms with clear details of what's happening, what % is completed and the I/O performance the system sees such as read or write performance of the target backup repository.

likewise during the monthly health check it would be good to see more details about what is happening, how far along it is and what the I/O performance is.

showing only 99% complete for 26 hours w/o any other performance info isn't very informative..

thanks

bkc · Post by **bkc** » Feb 17, 2016 2:28 pm this post

Job has been running 40 hours, still creating a GFS restore point, still at 99%.

I really think more feedback would be helpful..

andriktr · Post by **andriktr** » Feb 29, 2016 8:43 am this post

Hello,
Recently migrated our backup copy jobs from StoreOnce CIFS shares to StoreOnce catalyst stores. As was recommended I cloned old BC jobs and configured them with new catalyst repository. That means absolutely new backup chain started (GFS retention used). Jobs performance few times better than it was with shares, but one thing still look strange for me. We have enabled health check for these jobs ( being executed once per month) and this procedure takes a lot of time ~ 5-8 hours for job. I would like to mark that it's a pretty new jobs configured few days ago and each VM have only 2-3 recovery points. StoreOnce catalyst have a requirement to use per-VM backup files that means we will have much more .vbk files in repository. Can this will be a reason for such long health check process. Also what else can impact health check performance?
Thank you.

Post by **foggy** » Feb 29, 2016 2:52 pm this post

Depending on the backup size, this might be expected (see above). Are you saying it used to complete faster before migration to Catalyst?

andriktr · Post by **andriktr** » Feb 29, 2016 3:18 pm this post

No, on CIFS shares it wasn't faster also needed many time to complete. I expected that after migration to catalyst this time will be some how minimized.
Also hope that completion time amount will not grow up hugely in future due to a largest amount of backup files.

lando_uk · Post by **lando_uk** » Feb 29, 2016 5:21 pm this post

On this subject - If a health check of a copy job is fine, could one presume the last restore point of the main backup job that it was sourced from is also fine? Or could the same restore point be knackered on the backup job, but ok on the copy job?

Post by **PTide** » Feb 29, 2016 6:01 pm this post

Hi

On this subject - If a health check of a copy job is fine, could one presume the last restore point of the main backup job that it was sourced from is also fine?

No.

Or could the same restore point be knackered on the backup job, but ok on the copy job?

Yes. For example your primary storage corruption can occur after the backup copy sync has been completed. In this case your backup copy restore point will be ok whereas your primary backup will not.

Please refer to helpcenter:

When a new synchronization interval starts, Veeam Backup & Replication performs a health check for the most recent restore point in the backup chain. Veeam Backup & Replication calculates checksums for data blocks in the backup file on the target backup repository and compares them with the checksums that are already stored in the backup file.

Your backup copy job is an independent set of files so Backup Copy health check makes sure that your secondary restore point is not corrupted.

Thank you.

Post by **foggy** » Mar 01, 2016 12:54 pm this post

andriktr wrote:Also hope that completion time amount will not grow up hugely in future due to a largest amount of backup files.

Per-VM backup chains option should not affect health check performance (actually, it could even be faster with per-VM, since less metadata is kept within each backup file and they are less fragmented).

andriktr · Post by **andriktr** » Mar 07, 2016 7:53 am this post

Are there any recommendations for StoreOnce catalyst repository maximum concurrent tasks?

Post by **Gostev** » Mar 07, 2016 1:52 pm this post

Kindly please do not hijack this topic. This question is best directed into Catalyst support topic (and the number depends on Catalyst model anyway). Thanks!

robvs · Post by **robvs** » Aug 07, 2017 7:01 am this post

Hello,

I got some questions about the Veeam health check on the copy out jobs.

We got a VMWare enviroment with 120 VM's the total backup size is around 10 TB, these vm's are separated in 11 Veeam jobs (forever incremental).
All these veeam jobs have copy out jobs to a offside location.
The connection between these location is around 200Mbit.
The current problem we have now is that a health check on the offside backup could take around 1 week for some jobs, this is causing problems because the copyout job will not run untill the healthcheck is finished. When it acctually starts it will have to copy such a big amount of data that that will also take a lot of time to complete.

Is there a way to offload the health checks to a offside server because this would eliminate the bandwith problem that is causing the health checks taking a lot of time.
If not what would be a goot solution to this for speeding it up ?

Kind regards
Rob

DGrinev · Post by **DGrinev** » Aug 07, 2017 1:18 pm this post

Hi Rob and welcome to the community!

Health-check performance depends on reading speed of the storage and the backup file size. It cannot be delegated to the remote server as it's part of the backup/copy job.
As an alternative you can to set up Surebackup job for the source backups as it's best approach for the recovery verification.
Please review this thread for additional information. Thanks!

egrutman · Post by **egrutman** » Oct 27, 2017 3:08 pm this post

Hello,

I have backups running and the health check takes days. We are backing up about 85 machines with average size of 125 GB. So total size is 10 TB of data we are backing up. The backups are stored on DD2200. Is there anyway to speed the process of the healthcheck? 12 hours have gone by and I am only at 5%. At this rate it will take 10 days to run a health check. This is not acceptable for critical system production machines to be without backups and I am trying to find anyway possible to increase resources to complete the job at a faster pace.

DGrinev · Post by **DGrinev** » Oct 27, 2017 3:30 pm this post

Hi Yevgeniy,

Please review this discussion about the health check performance as it contains plenty of useful considerations. Thanks!

R&D Forums

B&R 9 : Health check of VM Backup Files needs so long time

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

[MERGED] : backup copy health check very slow

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

[MERGED] Health check for copy jobs

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

Re: B&R 9 : Health check of VM Backup Files needs so long ti

[MERGED] Healthcheck Offside

Re: Healthcheck Offside

[MERGED] Health Check Long Duration

Re: Health Check Long Duration

Who is online