Comprehensive data protection for all workloads
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Health check on large backups

Post by perjonsson1960 »

Folks,

We have a file server cluster that we recently began backing up with B&R. The cluster consists of two physical servers, and is backed up as a failover cluster. The amount of data right now is approx. 17 TB, and the number of files is just over 18 million. I don't know if that is to be considered a large backup, or if it perhaps is small compared to some other installations.

The initial full backup took around seven hours, which is okay. An incremental backup takes about 15 minutes, which of course is also okay. And the Merge operation takes around 30 minutes, also okay. However, on Monday the job started a health check, and that process took 26 hours and 28 minutes. And that ruins the backup window on that day of the month when the health check is performed, even if we run the backup job only once every 24 hours. But we are considering running the job twice, once during the night and once mid-day at lunch time.

What is the health check doing that takes so long? Is the long time due to the amount of data or the large number of files?

Right now the job is forever forward incremental. If we change the job to forward incremental with synthetic fulls, can we then safely skip the health check since there are new fulls created regularly? Or are the errors in the latest full backup file, if any, copied to the new full backup file? I can imagine that creating a synthetic full is much faster than doing a health check... We have not done a defrag and compact, so I don't know how long that takes, but if we change to forward incremental with synthetic fulls, we don't need to do a defrag and compact.

Regards,
PJ
Mildur
Product Manager
Posts: 8735
Liked: 2296 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Health check on large backups

Post by Mildur »

I recommend to skip the healthcheck only when you are doing surebackup Jobs.
A synthetic full will help you, if you are using FAST Cloning with refs or xfs. But it will reuse the blocks on the disk. If you have corrupted data, your backups cannot be restored.

You can read more about health check and what it does here. I think, It‘s the amount of data in your case.
https://helpcenter.veeam.com/docs/backu ... ml?ver=110

What are you using as a backup storage? If it‘s to slow, then health check can take some time for your amount of data.
Product Management Analyst @ Veeam Software
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 »

Thanks for your reply! So, we cannot skip the health check. Right.

The backup storage: We have two identical HPE DL380 servers with a ridiculously large amount of RAM, and two disk enclosures each of about 90 TB. So, four backup repositories included in a scale-out rep. of approx. 360 TB. Each disk enclosure is a RAID 5 array with 11 "live" disks and one spare. Our supplier recommended RAID 5 over RAID 6 with the motivation that RAID 5 is faster. The file system is ReFS with dedup. and all the fancy stuff. ;-)
foggy
Veeam Software
Posts: 21073
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Health check on large backups

Post by foggy »

RAM doesn't really matter here, storage random I/O capabilities are the most important. Health check is heavy random as it reads all the blocks required to build the latest restore point, which are scattered across multiple files.
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 »

Is there anything that can be done with the controller cache settings in order to speed up things in general, and health check in particular? For example, the cache ratio is set to 10% read and 90% write, which apparently is the default, because I don't remember ever changing it. There are also a number of other controller parameters that can be tweaked:

Selected Performance Profile - Default Settings
Parity RAID Degraded Mode Performance Optimization - Disabled
Physical Drive Request Elevator Sort - Enabled
Maximum Drive Request Queue Depth - Automatic
Monitor and Performance Analysis Delay - 60
HDD Flexible Latency Optimization - Disabled

The controllers are "HPE Smart Array P408e-p SR Gen10" and "P408i-a SR Gen10". For some reason the internal controller has only 2048 MB cache, while the external controller has 4096 MB. I don't know if that is normal.
PetrM
Veeam Software
Posts: 3264
Liked: 528 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: Health check on large backups

Post by PetrM »

Hello,

Basically, you should follow vendor recommendations to improve random I/O. The only idea which comes to me is to play with different block sizes and compression levels to investigate dependency of health check speed on these parameters. You may review this page on our help center.

Thanks!
DonZoomik
Service Provider
Posts: 368
Liked: 120 times
Joined: Nov 25, 2016 1:56 pm
Full Name: Mihkel Soomere
Contact:

Re: Health check on large backups

Post by DonZoomik »

veeam-backup-replication-f2/backup-file ... 62680.html This thread also covered the same problem (maybe merge?). Not sure what your version is but v11 was supposed to have improvements - personally I don't see the difference. Nobody answered my question about increasing readahead window as well...
The problem is real with spinning media (especially with very large backup files, 17T is quite big) and it gets worse and worse over time with fragmentation. You really don't have many other options than to improve latency and the only way to do that is to use faster media like 10k/15k disks or SSD. Or maybe get an option to increase async readahead even more and keep storage busy.
The file system is ReFS with dedup
Don't do dedupe, it makes it even worse with more random IO.
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 »

Don't do dedupe, you say. But it saves diskspace. Bigtime. There are pros and cons with pretty much everything. ;-)
Mildur
Product Manager
Posts: 8735
Liked: 2296 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Health check on large backups

Post by Mildur »

I‘d prefer todo reFS Fast Clone without Dedup. The space savings and synthetic Full runs are very good.
Product Management Analyst @ Veeam Software
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 »

And how do I turn off dedupe? Is that the "Enable inline data deduplication" checkbox in the backup job? It says "recommended" there... It is probably even the default setting. Can I turn that off for a job that has an existing backup chain?
DonZoomik
Service Provider
Posts: 368
Liked: 120 times
Joined: Nov 25, 2016 1:56 pm
Full Name: Mihkel Soomere
Contact:

Re: Health check on large backups

Post by DonZoomik »

I thought you meant ReFS level deduplication (the one with experimental support). Veeam's integrated one is fine.
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 »

I really don't know what I mean, but I guess that the ReFS level dedupe is not something that you can turn off, and live to talk about it? ;-)
Mildur
Product Manager
Posts: 8735
Liked: 2296 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Health check on large backups

Post by Mildur »

ReFS level dedupe = Windows Server Data Deduplication
ReFS level dedupe = FAST Clone (ReFS Block Cloning API)

Which of this two are you referring to?

Disabling "Enable inline data deduplication" is only needed, if you saving your backup job or backup copy job to an deduplication Appliance.
Product Management Analyst @ Veeam Software
foggy
Veeam Software
Posts: 21073
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Health check on large backups

Post by foggy »

(and not even to any deduplication appliance)
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 »

This is very confusing... So the "inline data deduplication" referred to in the backup job configuration is not the same as in the Windows ReFS filesystem? What I know is that Fast Clone is used, and that "inline data deduplication" is activated in the backup jobs. And I am pretty sure that the Windows Server (ReFS) dedupe function is also activated. I don't know how to check if it is, but if it is, then I suppose that it was activated back in the day when the servers and the RAID arrays were installed, right? I guess that the Windows (ReFS) dedupe cannot be turned off without losing all the data on the repositories?
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 »

I am looking in the Server Manager under File and Storage Services -> Volumes, and the columns "Deduplication Rate" and "Deduplication Savings" are both empty. Does that mean that deduplication is not used? I cannot find any reference to deduplication anywhere else.
foggy
Veeam Software
Posts: 21073
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Health check on large backups

Post by foggy »

Yes, this is the right place to check the status and enable/disable the Windows Data Deduplication feature. As for Veeam B&R inline deduplication and Fast Clone functionality, then please review the referred user guide sections for better understanding of those.
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 »

I have checked now. The Data Deduplication feature is NOT installed in our repository servers. I guess that is a good thing? So, only the integrated dedup in B&R is used.
foggy
Veeam Software
Posts: 21073
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Health check on large backups

Post by foggy » 1 person likes this post

Yes, that's the recommended configuration Fabian was talking about.
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 »

Great!

Perhaps a semi-related question; What is generally the recommended read/right ratio in the RAID controllers? I have now changed it to 30% read and 70% write, in order to see if I can detect any difference in the speeds in various operations. But I guess that any differences will be minor at best. But for example, I guess that a health check is mainly random reads?
foggy
Veeam Software
Posts: 21073
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Health check on large backups

Post by foggy »

But for example, I guess that a health check is mainly random reads?
That's correct.
DonZoomik
Service Provider
Posts: 368
Liked: 120 times
Joined: Nov 25, 2016 1:56 pm
Full Name: Mihkel Soomere
Contact:

Re: Health check on large backups

Post by DonZoomik »

My recommendation would be set it to 100% writes or 10/90 read/write. With modern amounts of memory, it doesn't make much sense to cache reads on controller. Small amount of read cache *may* allow controller-side read-ahead cache but I haven't seen any noticeable difference.
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 »

Okay, thanks! It was set at 10/90 read/write before. I thought that maybe 30/70 would speed up a health check somewhat, since that is virtually only reads. But if the difference in performance is so slim, perhaps it will not show any significant change in the time needed, even if the health check took over 26 hours with the 10/90 setting?
DonZoomik
Service Provider
Posts: 368
Liked: 120 times
Joined: Nov 25, 2016 1:56 pm
Full Name: Mihkel Soomere
Contact:

Re: Health check on large backups

Post by DonZoomik » 2 people like this post

Read cache mostly exists to keep already read data in cache. If application tries to access it again, it is served from cache. However with modern amounts of memory and heavy OS side caching, IMHO this effect is practically nonexistant. RAID controllers also do read-ahead but I doubt that these windows are very large and it has no effect with random IO. Also OS (and Veeam itself) does readahead so there's little point in doing it on controller side, especially as OS/Application actually know what they need next.

IMHO RAID caching only makes sense for writes especially on parity RAIDs where it helps a lot. Uncached parity RAID writes are painful.
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 » 1 person likes this post

Thanks! I might just as well go back to the 10/90 setting then. :-)
garrettt12
Novice
Posts: 3
Liked: 1 time
Joined: Oct 24, 2019 3:29 pm
Full Name: Garrett
Contact:

Re: Health check on large backups

Post by garrettt12 »

If you can spare the space, I would consider experimenting with something not mentioned here yet - periodic Active Fulls. These would be more of sequential write workload than a random I/O one, which may be very beneficial for your spinning disks configuration.

I know this isn't particularly an agreed good practice, but you may be able to forgo Health Checks with this method too, as your timeframe for backup corruption due to bit rot, etc, to corrupt backups would only be the time between each Active Full. One you hit Health Checks that are so long you are starting to miss restore points, it becomes a serious consideration IMHO. I'd be willing to see what someone more knowledgeable on the corruption side would say on this however.
mkh
Service Provider
Posts: 64
Liked: 18 times
Joined: Apr 20, 2018 6:17 am
Full Name: Michael Høyer
Contact:

Re: Health check on large backups

Post by mkh »

perjonsson1960 wrote: May 19, 2021 8:42 am Thanks for your reply! So, we cannot skip the health check. Right.

The backup storage: We have two identical HPE DL380 servers with a ridiculously large amount of RAM, and two disk enclosures each of about 90 TB. So, four backup repositories included in a scale-out rep. of approx. 360 TB. Each disk enclosure is a RAID 5 array with 11 "live" disks and one spare. Our supplier recommended RAID 5 over RAID 6 with the motivation that RAID 5 is faster. The file system is ReFS with dedup. and all the fancy stuff. ;-)
Hi Per

unrelated to the health check/performance etc part, with disks of this size please do RAID 6 instead of RAID 5

one link to info about raid 5 rebuilds with large disks, see the tables at the end - https://www.digistor.com.au/the-latest/ ... e-in-2019/
ejenner
Veteran
Posts: 636
Liked: 100 times
Joined: Mar 23, 2018 4:43 pm
Full Name: EJ
Location: London
Contact:

Re: Health check on large backups

Post by ejenner »

Had to turn off health checking on our large file server jobs for the same reason. I don't think our file server jobs are quite so large either. Big enough, but maybe only 80% of what you're backing up. We also have the HPE servers with direct attached disk.

Simple choice for us. It was either a backup without a health check or no backup at all.
bryanvaneeden
Novice
Posts: 8
Liked: never
Joined: Jan 15, 2021 10:45 am
Full Name: Bryan van Eeden
Contact:

Re: Health check on large backups

Post by bryanvaneeden »

We actually have the same issue with an also 17TB large VM. The backup always succeedes within minutes, but the health check takes days. This is unacceptable. Like the other guys here we have a very large high performaning repository server with ReFS and all bells and whistles. Fast Cloning is being used and the backups run through with over 2GB/s of speed. Not quite sure what else we can do to fix the health check speed.
perjonsson1960
Veteran
Posts: 463
Liked: 47 times
Joined: Jun 06, 2018 5:41 am
Full Name: Per Jonsson
Location: Sweden
Contact:

Re: Health check on large backups

Post by perjonsson1960 »

mkh wrote: May 25, 2021 6:55 am unrelated to the health check/performance etc part, with disks of this size please do RAID 6 instead of RAID 5
I am not sure if it is possible to convert without losing all the data? I have googled a little, and some say that it is possible, and some that it is not. And isn't RAID 6 slower than RAID 5 due to the parity calculations in all writes?
Post Reply

Who is online

Users browsing this forum: Semrush [Bot] and 97 guests