-
- Novice
- Posts: 3
- Liked: never
- Joined: Jun 21, 2017 11:23 am
- Full Name: Marvin
Veeam Storage Repositories - high failure rate for hard drives?
Hi everyone,
We use Veeam B&R Enterprise 10 to backup our Vsphere VMs and physical servers to a Veeam storage repository - a Supermicro 6048R-E1CR60N with 43 Seagate Exos 12TB enterprise drives. The drives are configured in a RAID10 volume with 3 dedicated hot spares (using the builtin MegaRAID controller).
To ensure the consistency of the backups, we do the following scheduled checks:
-run monthly RAID consistency checks (via MegaRAID)
-run weekly patrol reads/scrubs (via MegaRAID)
-run weekly Veeam health checks
-run test restores
Some of the checks do run simultaneously which can put a high IO load on the repository particularly when backups are running.
Backups are run daily with forever incrementals (since we don't have the capacity to do synthetic fulls). The repository is running Windows 2016 and NTFS (I wasn't comfortable implementing ReFS which at the time was 2 years ago). Otherwise, the repository performs quite well with good throughput and low disk latency.
The issue: we are experiencing drive failure rates (a mix of media/SMART errors or complete failures) of one drive per month. Thankfully, a replacement drive can be remirrored within 13 hours with our setup.
To other admins:
1) Is it normal to have failure rates that I am experiencing? I understand that I may have received a bad batch of disks which might be a contributing factor
2) Am I running TOO many checks? Should they be run LESS frequently?
3) How do you test new replacement drives? In my case, I do use Seagate Tools and run several long generic test scans but I'm thinking this might not be sufficient.
Thanks.
We use Veeam B&R Enterprise 10 to backup our Vsphere VMs and physical servers to a Veeam storage repository - a Supermicro 6048R-E1CR60N with 43 Seagate Exos 12TB enterprise drives. The drives are configured in a RAID10 volume with 3 dedicated hot spares (using the builtin MegaRAID controller).
To ensure the consistency of the backups, we do the following scheduled checks:
-run monthly RAID consistency checks (via MegaRAID)
-run weekly patrol reads/scrubs (via MegaRAID)
-run weekly Veeam health checks
-run test restores
Some of the checks do run simultaneously which can put a high IO load on the repository particularly when backups are running.
Backups are run daily with forever incrementals (since we don't have the capacity to do synthetic fulls). The repository is running Windows 2016 and NTFS (I wasn't comfortable implementing ReFS which at the time was 2 years ago). Otherwise, the repository performs quite well with good throughput and low disk latency.
The issue: we are experiencing drive failure rates (a mix of media/SMART errors or complete failures) of one drive per month. Thankfully, a replacement drive can be remirrored within 13 hours with our setup.
To other admins:
1) Is it normal to have failure rates that I am experiencing? I understand that I may have received a bad batch of disks which might be a contributing factor
2) Am I running TOO many checks? Should they be run LESS frequently?
3) How do you test new replacement drives? In my case, I do use Seagate Tools and run several long generic test scans but I'm thinking this might not be sufficient.
Thanks.
-
- Veteran
- Posts: 298
- Liked: 85 times
- Joined: Feb 16, 2017 8:05 pm
- Contact:
Re: Veeam Storage Repositories - high failure rate for hard drives?
Hello Marvin.
1.) No. This seems to me to be a high failure rate. I would lean toward a bad batch of drives.
2.) I'd suggest reducing the number of checks. MegaRAID is good at detecting problems, in my experience.
3.) I don't bother new testing drives.
Options as I see it:
- Check with to see if there's a firmware update for the disk drive model in use; applying an update to a drive will be tricky.
- Explore using SureBackup to help test backups. Follow this link: https://helpcenter.veeam.com/docs/backu ... ml?ver=100
- Longer term, if the HD issues persist consider switching to a different drive manufacturer.
1.) No. This seems to me to be a high failure rate. I would lean toward a bad batch of drives.
2.) I'd suggest reducing the number of checks. MegaRAID is good at detecting problems, in my experience.
3.) I don't bother new testing drives.
Options as I see it:
- Check with to see if there's a firmware update for the disk drive model in use; applying an update to a drive will be tricky.
- Explore using SureBackup to help test backups. Follow this link: https://helpcenter.veeam.com/docs/backu ... ml?ver=100
- Longer term, if the HD issues persist consider switching to a different drive manufacturer.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Veeam Storage Repositories - high failure rate for hard drives?
Aren't those 12TB Seagate Exos drives some of the worst from a reliability perspective according to Backblaze drive reliability reports? I'm quite sure I remember them being in the 2.5-3% for annualized failure rate at one point, although I think the failure rate stabilized after 15-18 months (i.e. drives that lasted past 12 months had a tendency to fail less). Here's a link to the end of 2019 report which I think has these very drives in it:
https://www.backblaze.com/blog/wp-conte ... _Chart.png
Admittedly, 1 per month is pretty high when you only have 43 drive, but my experience says drives have a tendency to fail in batches, and this is more common for less reliable drives. I guess my opinion is that it's mostly just bad luck. I personally wouldn't cut down on checks as these are enterprise drives, they should be able to handle the load of simple checks.
https://www.backblaze.com/blog/wp-conte ... _Chart.png
Admittedly, 1 per month is pretty high when you only have 43 drive, but my experience says drives have a tendency to fail in batches, and this is more common for less reliable drives. I guess my opinion is that it's mostly just bad luck. I personally wouldn't cut down on checks as these are enterprise drives, they should be able to handle the load of simple checks.
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Veeam Storage Repositories - high failure rate for hard drives?
Yes, most likely it is the quality of hard drives you are using. For other possible reasons, consider building vibrations due to nearby machinery, excessive noise level in the server room, etc.
-
- Veteran
- Posts: 298
- Liked: 85 times
- Joined: Feb 16, 2017 8:05 pm
- Contact:
Re: Veeam Storage Repositories - high failure rate for hard drives?
This had completely slipped my mind.tsightler wrote: ↑Feb 04, 2021 6:36 pm Aren't those 12TB Seagate Exos drives some of the worst from a reliability perspective according to Backblaze drive reliability reports? I'm quite sure I remember them being in the 2.5-3% for annualized failure rate at one point, although I think the failure rate stabilized after 15-18 months (i.e. drives that lasted past 12 months had a tendency to fail less).
Certainly agree with no cutbacks in checks!tsightler wrote: ↑Feb 04, 2021 6:36 pm Admittedly, 1 per month is pretty high when you only have 43 drive, but my experience says drives have a tendency to fail in batches, and this is more common for less reliable drives. I guess my opinion is that it's mostly just bad luck. I personally wouldn't cut down on checks as these are enterprise drives, they should be able to handle the load of simple checks.
-
- Novice
- Posts: 3
- Liked: never
- Joined: Jun 21, 2017 11:23 am
- Full Name: Marvin
Re: Veeam Storage Repositories - high failure rate for hard drives?
The drives that initially came with the server are ST12000NM0027 but they are slowly being replaced with ST12000NM0038. Hopefully these newer drives will have fewer failures rates based on the latest Backblaze blog: https://www.backblaze.com/blog/backblaz ... s-for-2020 (the ST12000NM0008 is the SATA equivalent).
I'll keep my checks on the same schedule (anyways probably a good idea to weed out these faulty drives).
Thanks!
I'll keep my checks on the same schedule (anyways probably a good idea to weed out these faulty drives).
Thanks!
-
- Service Provider
- Posts: 129
- Liked: 27 times
- Joined: Apr 01, 2016 5:36 pm
- Full Name: Olivier
- Contact:
Re: Veeam Storage Repositories - high failure rate for hard drives?
The situation was so bad with Exos 12TB, they are not even on Seagate product catalogue! Today, you jump from 10 to 14TB
Oli
Oli
-
- Lurker
- Posts: 1
- Liked: 1 time
- Joined: Feb 03, 2017 7:53 pm
- Full Name: IT Aerium
- Contact:
Re: Veeam Storage Repositories - high failure rate for hard drives?
I have some experience with that Supermicro box but running 4TB Seagate drives...can't remember when last it had a disk fail. Taken over a almost 5 years you can probably count failures on one hand. The combination was very reliable.
-
- Veeam Legend
- Posts: 945
- Liked: 221 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Veeam Storage Repositories - high failure rate for hard drives?
I'm not a mathematician, but I wonder if under these circumstances it would be likely that enough drives fail at once or maybe shortly after another so that the whole RAID becomes unrecoverable... Without having the data on another repository in addition, I wouldn't feel well about it...
Who is online
Users browsing this forum: Gostev, Semrush [Bot] and 53 guests