-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Health check on large backups
AFAIK, this is being implemented within v12.
-
- Product Manager
- Posts: 14840
- Liked: 3086 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Health check on large backups
just to clarify: what foggy meant is a health check process that is separate from the job itself.
for async read: yes, that should make it into V11a. Internal tests showed up to 5x performance improvement for a 15TB backup file
for async read: yes, that should make it into V11a. Internal tests showed up to 5x performance improvement for a 15TB backup file
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
-
- Novice
- Posts: 8
- Liked: 1 time
- Joined: Apr 23, 2020 11:10 pm
- Full Name: Thomas Ng
- Contact:
Re: Health check on large backups
Can the health check and backup job be running simultaneously?
-
- Veeam Software
- Posts: 50
- Liked: 12 times
- Joined: Oct 21, 2010 8:54 am
- Full Name: Dmitry Vedyakov
- Contact:
Re: Health check on large backups
Health check now is a part of running backup jobs. First all vm's are processed, then so called "post-processing" starts which does all work regarding retention, healthchecks, etc.
-
- Novice
- Posts: 8
- Liked: 1 time
- Joined: Apr 23, 2020 11:10 pm
- Full Name: Thomas Ng
- Contact:
Re: Health check on large backups
Does the Backup window restriction setting in Schedule applies to the "post-processing" process? I don't want any VMs backup running during production hours but OK with merge, healthchecks, etc on the job.
-
- Product Manager
- Posts: 9848
- Liked: 2607 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Health check on large backups
Yes, it does.
Example: Healthcheck will be cancelled, if it takes longer as the configured allowed window.
Example: Healthcheck will be cancelled, if it takes longer as the configured allowed window.
Product Management Analyst @ Veeam Software
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Health check on large backups
Actually that would be a bug if so, because the allowed window was supposed to only restrict activities that touch production environment. While all of the above-mentioned activities are isolated to a backup repository.
-
- Product Manager
- Posts: 9848
- Liked: 2607 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Health check on large backups
Anton, i had that on V9 and V10 with some customers.
The backup window restriction setting has cancelled the backup job.
Ok, only the transport and Health check.
Good to know
https://helpcenter.veeam.com/docs/backu ... ml?ver=110
The backup window restriction setting has cancelled the backup job.
Ok, only the transport and Health check.
Good to know
https://helpcenter.veeam.com/docs/backu ... ml?ver=110
The backup window affects only the data transport process and health check operations. Other transformation processes can be performed in the target repository outside the backup window.
Product Management Analyst @ Veeam Software
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Health check on large backups
Bugs can also be documented but it is not right that the health check process is included, as just like other transformation processes it does not touch a production environment.
The logic by devs was probably that the health check process MAY result in a job retry at the end to obtain data of corrupted blocks from the source. But since corruptions happen so rarely, there's actually no point to restrict health check from running outside of the allowed window merely based on this possibility. If should be allowed to run, but if job retry is needed - then it can be failed with the corresponding error.
@Egor Yakovlev FYI, this is especially important as we uncouple the health check process from backup jobs, since chances they will be scheduled outside of the backup window are pretty high
The logic by devs was probably that the health check process MAY result in a job retry at the end to obtain data of corrupted blocks from the source. But since corruptions happen so rarely, there's actually no point to restrict health check from running outside of the allowed window merely based on this possibility. If should be allowed to run, but if job retry is needed - then it can be failed with the corresponding error.
@Egor Yakovlev FYI, this is especially important as we uncouple the health check process from backup jobs, since chances they will be scheduled outside of the backup window are pretty high
-
- Enthusiast
- Posts: 50
- Liked: 4 times
- Joined: Jun 03, 2015 8:32 am
- Full Name: Stephan
- Contact:
Re: Health check on large backups
A health check prevents tape jobs from running, which can mess with the schedule, when the tape job is so much delayed, it can't finish before the source jobs starts again the next day. In that case the tape job fails. Happens once a month here. In that case it would be helpful to respect the backup windows restrictions right?
But I'm hoping the other changes will be great for me in that case.
But I'm hoping the other changes will be great for me in that case.
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
Having tested v11a...
When verification is running alone, it can easily hit 1,5GB/s+, about 4-5x faster than before . That's actually much faster than reading data from quite good SAN (after disk extension for example).
When verification is running alone, it can easily hit 1,5GB/s+, about 4-5x faster than before . That's actually much faster than reading data from quite good SAN (after disk extension for example).
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Health check on large backups
Haha!! Thanks for sharing sounds like you have a pretty decent backup storage there.
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
It's not *that* beefy (24*16TB SAS RAID60 on some MegaRAID, SSD cache)... One of my customers recently bought a Dell XE7100 that would have 48-disk RAID60, that might show some more interesting numbers, but it's not delivered yet.
-
- VeeaMVP
- Posts: 1007
- Liked: 314 times
- Joined: Jan 31, 2011 11:17 am
- Full Name: Max
- Contact:
Re: Health check on large backups
That sounds promising. We do have some customers who experience very long health checks, some with RAID60 arrays, so I'm looking forward to see their results.
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
It seems that improved verification throughput seems to hit performance bad enough to cause some backup semi-failures.
After enabling verification on some very large jobs, I've seen many errors on other jobs while verification runs:
However job seems to actually succeed as last retry has status "Nothing to process...". So far I've tried reducing number of tasks on repository, no visible improvement. I've got a busy week ahead but I'll try to find time to create a support case...
After enabling verification on some very large jobs, I've seen many errors on other jobs while verification runs:
Code: Select all
Error: Failed to call RPC function 'FcWriteFileEx': The supplied user buffer is not valid for the requested operation. Failed to write data to the file [<always a temporary VBM file path>].
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Health check on large backups
Judging on the error I'm not sure if this is related. If the error was due to a backup storage now being too busy, I would expect timeouts as opposed to buffer errors. Actually, you would see "repository is too busy" warnings in the action log first, even before those I/O timeout errors start to appear. But let's see what support finds out.
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
While I'm getting around to creating a support case (soon!), I noticed that DirectSAN is quite slow and went to reread some docs (a bit offtopic but the same case of missing ADF).
Advanced Data Fetcher is still not used for DirectSAN mode, right? Ironically now reads (more specifically re-read after extending a 25TB VMDK) from a powerful hybrid SAN (with DirectSAN) are much slower than verification (same symptom, queue depth of exactly 1).
Seems like a low hanging fruit as for Storage Snapshots Veeam would also have to query snapshoted VMDK layout in VMFS (or parse it from snapshot file system) to perform ADF reads, almost the same for DirectSAN. And a bit weird that DirectSAN has higher priority than HotAdd in that case, I'm not sure if it has any benefits over HotAdd at all from throughput perspective (ignoring CPU/Mem usage in VMware).
Advanced Data Fetcher is still not used for DirectSAN mode, right? Ironically now reads (more specifically re-read after extending a 25TB VMDK) from a powerful hybrid SAN (with DirectSAN) are much slower than verification (same symptom, queue depth of exactly 1).
Seems like a low hanging fruit as for Storage Snapshots Veeam would also have to query snapshoted VMDK layout in VMFS (or parse it from snapshot file system) to perform ADF reads, almost the same for DirectSAN. And a bit weird that DirectSAN has higher priority than HotAdd in that case, I'm not sure if it has any benefits over HotAdd at all from throughput perspective (ignoring CPU/Mem usage in VMware).
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Health check on large backups
This is not going to work for DirectSAN (of course we tried this when we developed ADF). But let's not hi-jack the thread with this off-topic
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
#05083160
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
Support guy found an interesting remark in Win32 API WriteFile doc: "The WriteFile function may fail with ERROR_INVALID_USER_BUFFER or ERROR_NOT_ENOUGH_MEMORY whenever there are too many outstanding asynchronous I/O requests."
Unsure why it is happening only on our largest and highest-performance repository though, investigation in progress.
Unsure why it is happening only on our largest and highest-performance repository though, investigation in progress.
-
- Product Manager
- Posts: 9848
- Liked: 2607 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Health check on large backups
I have to say, I'm really impressed by the new health-check speeds.
Some example from one of our infrastructures:
Job with 16 VMs and 13.7TB Source Data:
Before V11a: 8.5h
With V11a: 1.5h
Job with 55 VMs and 12.8TB Source Data:
Before V11a: 5.5h
With V11a: 1.5h
Job with 15 VMs and 8.6TB Source Data:
Before V11a: 8.5h
With V11a: 1h
Some example from one of our infrastructures:
Job with 16 VMs and 13.7TB Source Data:
Before V11a: 8.5h
With V11a: 1.5h
Job with 55 VMs and 12.8TB Source Data:
Before V11a: 5.5h
With V11a: 1.5h
Job with 15 VMs and 8.6TB Source Data:
Before V11a: 8.5h
With V11a: 1h
Product Management Analyst @ Veeam Software
-
- Expert
- Posts: 245
- Liked: 58 times
- Joined: Apr 28, 2009 8:33 am
- Location: Strasbourg, FRANCE
- Contact:
Re: Health check on large backups
Impressive !!! What kind of backend storage ?
-
- Product Manager
- Posts: 9848
- Liked: 2607 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Health check on large backups
HPE Apollo as a Linux Hardened Repo.
We are really happy about that product.
I am sure, that other vendors are getting same results after updating to V11a
We are really happy about that product.
I am sure, that other vendors are getting same results after updating to V11a
Product Management Analyst @ Veeam Software
-
- Veteran
- Posts: 389
- Liked: 54 times
- Joined: Sep 05, 2011 1:31 pm
- Full Name: Andre
- Contact:
Re: Health check on large backups
I can confirm, health check is much faster in V11a.
Backup File Size (vbk): ~5TB
Check Duration before 11a: ~6h
Check Duration with 11a: ~50min
Thanks Veeam Team
Backup File Size (vbk): ~5TB
Check Duration before 11a: ~6h
Check Duration with 11a: ~50min
Thanks Veeam Team
-
- Enthusiast
- Posts: 50
- Liked: 4 times
- Joined: Jun 03, 2015 8:32 am
- Full Name: Stephan
- Contact:
Re: Health check on large backups
Unfortunately I cannot confirm those high rates. For one particular job it went from 12h to 8h, making other jobs still fail because of timeout.
Edit: Just checked again and the 12h was an outliner, was getting 8-9h even before the upgrade. So no noticeable change at all.
Edit: Just checked again and the 12h was an outliner, was getting 8-9h even before the upgrade. So no noticeable change at all.
-
- Product Manager
- Posts: 9848
- Liked: 2607 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Health check on large backups
Are you still using a NetApp E-Series as a Backup Target over FC?Unfortunately I cannot confirm those high rates. For one particular job it went from 12h to 8h, making other jobs still fail because of timeout.
Edit: Just checked again and the 12h was an outliner, was getting 8-9h even before the upgrade. So no noticeable change at all.
As far as I understand this implementation (System cache bypass), it only works with enterprise grade raid controller with direct attached disks and not with iSCSI or FC connected LUNs.
But I'm not 100 percent sure.
Product Management Analyst @ Veeam Software
-
- Veteran
- Posts: 389
- Liked: 54 times
- Joined: Sep 05, 2011 1:31 pm
- Full Name: Andre
- Contact:
Re: Health check on large backups
Stephan, how is the volume/array configured on which the backup files are stored? We have an Volume with 50x6TB Disk.
-
- Product Manager
- Posts: 14840
- Liked: 3086 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Health check on large backups
the performance gain for health check comes from "asynchronous read". that is expected to improve performance on all kind of storage systems.
I mean, if the storage is completely overloaded with other tasks, then the impact might be irrelevant. In general, also storage systems connected via FC / iSCSI or whatever protocol profit from async read. Maybe there are also other bottlenecks involved (tasks configuration, any other limits applied to the repository, whatever compute resource shortage)
I mean, if the storage is completely overloaded with other tasks, then the impact might be irrelevant. In general, also storage systems connected via FC / iSCSI or whatever protocol profit from async read. Maybe there are also other bottlenecks involved (tasks configuration, any other limits applied to the repository, whatever compute resource shortage)
-
- Product Manager
- Posts: 9848
- Liked: 2607 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Health check on large backups
Thanks Hannes for the clarification.
Product Management Analyst @ Veeam Software
Who is online
Users browsing this forum: Bing [Bot], Google [Bot] and 78 guests