-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Health check on large backups
Folks,
We have a file server cluster that we recently began backing up with B&R. The cluster consists of two physical servers, and is backed up as a failover cluster. The amount of data right now is approx. 17 TB, and the number of files is just over 18 million. I don't know if that is to be considered a large backup, or if it perhaps is small compared to some other installations.
The initial full backup took around seven hours, which is okay. An incremental backup takes about 15 minutes, which of course is also okay. And the Merge operation takes around 30 minutes, also okay. However, on Monday the job started a health check, and that process took 26 hours and 28 minutes. And that ruins the backup window on that day of the month when the health check is performed, even if we run the backup job only once every 24 hours. But we are considering running the job twice, once during the night and once mid-day at lunch time.
What is the health check doing that takes so long? Is the long time due to the amount of data or the large number of files?
Right now the job is forever forward incremental. If we change the job to forward incremental with synthetic fulls, can we then safely skip the health check since there are new fulls created regularly? Or are the errors in the latest full backup file, if any, copied to the new full backup file? I can imagine that creating a synthetic full is much faster than doing a health check... We have not done a defrag and compact, so I don't know how long that takes, but if we change to forward incremental with synthetic fulls, we don't need to do a defrag and compact.
Regards,
PJ
We have a file server cluster that we recently began backing up with B&R. The cluster consists of two physical servers, and is backed up as a failover cluster. The amount of data right now is approx. 17 TB, and the number of files is just over 18 million. I don't know if that is to be considered a large backup, or if it perhaps is small compared to some other installations.
The initial full backup took around seven hours, which is okay. An incremental backup takes about 15 minutes, which of course is also okay. And the Merge operation takes around 30 minutes, also okay. However, on Monday the job started a health check, and that process took 26 hours and 28 minutes. And that ruins the backup window on that day of the month when the health check is performed, even if we run the backup job only once every 24 hours. But we are considering running the job twice, once during the night and once mid-day at lunch time.
What is the health check doing that takes so long? Is the long time due to the amount of data or the large number of files?
Right now the job is forever forward incremental. If we change the job to forward incremental with synthetic fulls, can we then safely skip the health check since there are new fulls created regularly? Or are the errors in the latest full backup file, if any, copied to the new full backup file? I can imagine that creating a synthetic full is much faster than doing a health check... We have not done a defrag and compact, so I don't know how long that takes, but if we change to forward incremental with synthetic fulls, we don't need to do a defrag and compact.
Regards,
PJ
-
- Product Manager
- Posts: 9846
- Liked: 2606 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Health check on large backups
I recommend to skip the healthcheck only when you are doing surebackup Jobs.
A synthetic full will help you, if you are using FAST Cloning with refs or xfs. But it will reuse the blocks on the disk. If you have corrupted data, your backups cannot be restored.
You can read more about health check and what it does here. I think, It‘s the amount of data in your case.
https://helpcenter.veeam.com/docs/backu ... ml?ver=110
What are you using as a backup storage? If it‘s to slow, then health check can take some time for your amount of data.
A synthetic full will help you, if you are using FAST Cloning with refs or xfs. But it will reuse the blocks on the disk. If you have corrupted data, your backups cannot be restored.
You can read more about health check and what it does here. I think, It‘s the amount of data in your case.
https://helpcenter.veeam.com/docs/backu ... ml?ver=110
What are you using as a backup storage? If it‘s to slow, then health check can take some time for your amount of data.
Product Management Analyst @ Veeam Software
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
Thanks for your reply! So, we cannot skip the health check. Right.
The backup storage: We have two identical HPE DL380 servers with a ridiculously large amount of RAM, and two disk enclosures each of about 90 TB. So, four backup repositories included in a scale-out rep. of approx. 360 TB. Each disk enclosure is a RAID 5 array with 11 "live" disks and one spare. Our supplier recommended RAID 5 over RAID 6 with the motivation that RAID 5 is faster. The file system is ReFS with dedup. and all the fancy stuff.
The backup storage: We have two identical HPE DL380 servers with a ridiculously large amount of RAM, and two disk enclosures each of about 90 TB. So, four backup repositories included in a scale-out rep. of approx. 360 TB. Each disk enclosure is a RAID 5 array with 11 "live" disks and one spare. Our supplier recommended RAID 5 over RAID 6 with the motivation that RAID 5 is faster. The file system is ReFS with dedup. and all the fancy stuff.
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Health check on large backups
RAM doesn't really matter here, storage random I/O capabilities are the most important. Health check is heavy random as it reads all the blocks required to build the latest restore point, which are scattered across multiple files.
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
Is there anything that can be done with the controller cache settings in order to speed up things in general, and health check in particular? For example, the cache ratio is set to 10% read and 90% write, which apparently is the default, because I don't remember ever changing it. There are also a number of other controller parameters that can be tweaked:
Selected Performance Profile - Default Settings
Parity RAID Degraded Mode Performance Optimization - Disabled
Physical Drive Request Elevator Sort - Enabled
Maximum Drive Request Queue Depth - Automatic
Monitor and Performance Analysis Delay - 60
HDD Flexible Latency Optimization - Disabled
The controllers are "HPE Smart Array P408e-p SR Gen10" and "P408i-a SR Gen10". For some reason the internal controller has only 2048 MB cache, while the external controller has 4096 MB. I don't know if that is normal.
Selected Performance Profile - Default Settings
Parity RAID Degraded Mode Performance Optimization - Disabled
Physical Drive Request Elevator Sort - Enabled
Maximum Drive Request Queue Depth - Automatic
Monitor and Performance Analysis Delay - 60
HDD Flexible Latency Optimization - Disabled
The controllers are "HPE Smart Array P408e-p SR Gen10" and "P408i-a SR Gen10". For some reason the internal controller has only 2048 MB cache, while the external controller has 4096 MB. I don't know if that is normal.
-
- Veeam Software
- Posts: 3624
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: Health check on large backups
Hello,
Basically, you should follow vendor recommendations to improve random I/O. The only idea which comes to me is to play with different block sizes and compression levels to investigate dependency of health check speed on these parameters. You may review this page on our help center.
Thanks!
Basically, you should follow vendor recommendations to improve random I/O. The only idea which comes to me is to play with different block sizes and compression levels to investigate dependency of health check speed on these parameters. You may review this page on our help center.
Thanks!
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
veeam-backup-replication-f2/backup-file ... 62680.html This thread also covered the same problem (maybe merge?). Not sure what your version is but v11 was supposed to have improvements - personally I don't see the difference. Nobody answered my question about increasing readahead window as well...
The problem is real with spinning media (especially with very large backup files, 17T is quite big) and it gets worse and worse over time with fragmentation. You really don't have many other options than to improve latency and the only way to do that is to use faster media like 10k/15k disks or SSD. Or maybe get an option to increase async readahead even more and keep storage busy.
The problem is real with spinning media (especially with very large backup files, 17T is quite big) and it gets worse and worse over time with fragmentation. You really don't have many other options than to improve latency and the only way to do that is to use faster media like 10k/15k disks or SSD. Or maybe get an option to increase async readahead even more and keep storage busy.
Don't do dedupe, it makes it even worse with more random IO.The file system is ReFS with dedup
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
Don't do dedupe, you say. But it saves diskspace. Bigtime. There are pros and cons with pretty much everything.
-
- Product Manager
- Posts: 9846
- Liked: 2606 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Health check on large backups
I‘d prefer todo reFS Fast Clone without Dedup. The space savings and synthetic Full runs are very good.
Product Management Analyst @ Veeam Software
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
And how do I turn off dedupe? Is that the "Enable inline data deduplication" checkbox in the backup job? It says "recommended" there... It is probably even the default setting. Can I turn that off for a job that has an existing backup chain?
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
I thought you meant ReFS level deduplication (the one with experimental support). Veeam's integrated one is fine.
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
I really don't know what I mean, but I guess that the ReFS level dedupe is not something that you can turn off, and live to talk about it?
-
- Product Manager
- Posts: 9846
- Liked: 2606 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Health check on large backups
ReFS level dedupe = Windows Server Data Deduplication
ReFS level dedupe = FAST Clone (ReFS Block Cloning API)
Which of this two are you referring to?
Disabling "Enable inline data deduplication" is only needed, if you saving your backup job or backup copy job to an deduplication Appliance.
ReFS level dedupe = FAST Clone (ReFS Block Cloning API)
Which of this two are you referring to?
Disabling "Enable inline data deduplication" is only needed, if you saving your backup job or backup copy job to an deduplication Appliance.
Product Management Analyst @ Veeam Software
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Health check on large backups
(and not even to any deduplication appliance)
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
This is very confusing... So the "inline data deduplication" referred to in the backup job configuration is not the same as in the Windows ReFS filesystem? What I know is that Fast Clone is used, and that "inline data deduplication" is activated in the backup jobs. And I am pretty sure that the Windows Server (ReFS) dedupe function is also activated. I don't know how to check if it is, but if it is, then I suppose that it was activated back in the day when the servers and the RAID arrays were installed, right? I guess that the Windows (ReFS) dedupe cannot be turned off without losing all the data on the repositories?
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
I am looking in the Server Manager under File and Storage Services -> Volumes, and the columns "Deduplication Rate" and "Deduplication Savings" are both empty. Does that mean that deduplication is not used? I cannot find any reference to deduplication anywhere else.
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Health check on large backups
Yes, this is the right place to check the status and enable/disable the Windows Data Deduplication feature. As for Veeam B&R inline deduplication and Fast Clone functionality, then please review the referred user guide sections for better understanding of those.
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
I have checked now. The Data Deduplication feature is NOT installed in our repository servers. I guess that is a good thing? So, only the integrated dedup in B&R is used.
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Health check on large backups
Yes, that's the recommended configuration Fabian was talking about.
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
Great!
Perhaps a semi-related question; What is generally the recommended read/right ratio in the RAID controllers? I have now changed it to 30% read and 70% write, in order to see if I can detect any difference in the speeds in various operations. But I guess that any differences will be minor at best. But for example, I guess that a health check is mainly random reads?
Perhaps a semi-related question; What is generally the recommended read/right ratio in the RAID controllers? I have now changed it to 30% read and 70% write, in order to see if I can detect any difference in the speeds in various operations. But I guess that any differences will be minor at best. But for example, I guess that a health check is mainly random reads?
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Health check on large backups
That's correct.But for example, I guess that a health check is mainly random reads?
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
My recommendation would be set it to 100% writes or 10/90 read/write. With modern amounts of memory, it doesn't make much sense to cache reads on controller. Small amount of read cache *may* allow controller-side read-ahead cache but I haven't seen any noticeable difference.
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
Okay, thanks! It was set at 10/90 read/write before. I thought that maybe 30/70 would speed up a health check somewhat, since that is virtually only reads. But if the difference in performance is so slim, perhaps it will not show any significant change in the time needed, even if the health check took over 26 hours with the 10/90 setting?
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
Read cache mostly exists to keep already read data in cache. If application tries to access it again, it is served from cache. However with modern amounts of memory and heavy OS side caching, IMHO this effect is practically nonexistant. RAID controllers also do read-ahead but I doubt that these windows are very large and it has no effect with random IO. Also OS (and Veeam itself) does readahead so there's little point in doing it on controller side, especially as OS/Application actually know what they need next.
IMHO RAID caching only makes sense for writes especially on parity RAIDs where it helps a lot. Uncached parity RAID writes are painful.
IMHO RAID caching only makes sense for writes especially on parity RAIDs where it helps a lot. Uncached parity RAID writes are painful.
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
Thanks! I might just as well go back to the 10/90 setting then.
-
- Novice
- Posts: 3
- Liked: 1 time
- Joined: Oct 24, 2019 3:29 pm
- Full Name: Garrett
- Contact:
Re: Health check on large backups
If you can spare the space, I would consider experimenting with something not mentioned here yet - periodic Active Fulls. These would be more of sequential write workload than a random I/O one, which may be very beneficial for your spinning disks configuration.
I know this isn't particularly an agreed good practice, but you may be able to forgo Health Checks with this method too, as your timeframe for backup corruption due to bit rot, etc, to corrupt backups would only be the time between each Active Full. One you hit Health Checks that are so long you are starting to miss restore points, it becomes a serious consideration IMHO. I'd be willing to see what someone more knowledgeable on the corruption side would say on this however.
I know this isn't particularly an agreed good practice, but you may be able to forgo Health Checks with this method too, as your timeframe for backup corruption due to bit rot, etc, to corrupt backups would only be the time between each Active Full. One you hit Health Checks that are so long you are starting to miss restore points, it becomes a serious consideration IMHO. I'd be willing to see what someone more knowledgeable on the corruption side would say on this however.
-
- Service Provider
- Posts: 64
- Liked: 18 times
- Joined: Apr 20, 2018 6:17 am
- Full Name: Michael Høyer
- Contact:
Re: Health check on large backups
Hi Perperjonsson1960 wrote: ↑May 19, 2021 8:42 am Thanks for your reply! So, we cannot skip the health check. Right.
The backup storage: We have two identical HPE DL380 servers with a ridiculously large amount of RAM, and two disk enclosures each of about 90 TB. So, four backup repositories included in a scale-out rep. of approx. 360 TB. Each disk enclosure is a RAID 5 array with 11 "live" disks and one spare. Our supplier recommended RAID 5 over RAID 6 with the motivation that RAID 5 is faster. The file system is ReFS with dedup. and all the fancy stuff.
unrelated to the health check/performance etc part, with disks of this size please do RAID 6 instead of RAID 5
one link to info about raid 5 rebuilds with large disks, see the tables at the end - https://www.digistor.com.au/the-latest/ ... e-in-2019/
-
- Veteran
- Posts: 636
- Liked: 100 times
- Joined: Mar 23, 2018 4:43 pm
- Full Name: EJ
- Location: London
- Contact:
Re: Health check on large backups
Had to turn off health checking on our large file server jobs for the same reason. I don't think our file server jobs are quite so large either. Big enough, but maybe only 80% of what you're backing up. We also have the HPE servers with direct attached disk.
Simple choice for us. It was either a backup without a health check or no backup at all.
Simple choice for us. It was either a backup without a health check or no backup at all.
-
- Novice
- Posts: 8
- Liked: never
- Joined: Jan 15, 2021 10:45 am
- Full Name: Bryan van Eeden
- Contact:
Re: Health check on large backups
We actually have the same issue with an also 17TB large VM. The backup always succeedes within minutes, but the health check takes days. This is unacceptable. Like the other guys here we have a very large high performaning repository server with ReFS and all bells and whistles. Fast Cloning is being used and the backups run through with over 2GB/s of speed. Not quite sure what else we can do to fix the health check speed.
-
- Veteran
- Posts: 527
- Liked: 58 times
- Joined: Jun 06, 2018 5:41 am
- Full Name: Per Jonsson
- Location: Sweden
- Contact:
Re: Health check on large backups
I am not sure if it is possible to convert without losing all the data? I have googled a little, and some say that it is possible, and some that it is not. And isn't RAID 6 slower than RAID 5 due to the parity calculations in all writes?
Who is online
Users browsing this forum: Semrush [Bot] and 265 guests