Comprehensive data protection for all workloads
Post Reply
quork147
Novice
Posts: 3
Liked: never
Joined: Jun 21, 2017 11:23 am
Full Name: Marvin

Veeam Storage Repositories - high failure rate for hard drives?

Post by quork147 »

Hi everyone,

We use Veeam B&R Enterprise 10 to backup our Vsphere VMs and physical servers to a Veeam storage repository - a Supermicro 6048R-E1CR60N with 43 Seagate Exos 12TB enterprise drives. The drives are configured in a RAID10 volume with 3 dedicated hot spares (using the builtin MegaRAID controller).

To ensure the consistency of the backups, we do the following scheduled checks:

-run monthly RAID consistency checks (via MegaRAID)
-run weekly patrol reads/scrubs (via MegaRAID)
-run weekly Veeam health checks
-run test restores

Some of the checks do run simultaneously which can put a high IO load on the repository particularly when backups are running.

Backups are run daily with forever incrementals (since we don't have the capacity to do synthetic fulls). The repository is running Windows 2016 and NTFS (I wasn't comfortable implementing ReFS which at the time was 2 years ago). Otherwise, the repository performs quite well with good throughput and low disk latency.

The issue: we are experiencing drive failure rates (a mix of media/SMART errors or complete failures) of one drive per month. Thankfully, a replacement drive can be remirrored within 13 hours with our setup.

To other admins:

1) Is it normal to have failure rates that I am experiencing? I understand that I may have received a bad batch of disks which might be a contributing factor
2) Am I running TOO many checks? Should they be run LESS frequently?
3) How do you test new replacement drives? In my case, I do use Seagate Tools and run several long generic test scans but I'm thinking this might not be sufficient.

Thanks.
nitramd
Veteran
Posts: 298
Liked: 85 times
Joined: Feb 16, 2017 8:05 pm
Contact:

Re: Veeam Storage Repositories - high failure rate for hard drives?

Post by nitramd »

Hello Marvin.

1.) No. This seems to me to be a high failure rate. I would lean toward a bad batch of drives.
2.) I'd suggest reducing the number of checks. MegaRAID is good at detecting problems, in my experience.
3.) I don't bother new testing drives.

Options as I see it:
- Check with to see if there's a firmware update for the disk drive model in use; applying an update to a drive will be tricky.

- Explore using SureBackup to help test backups. Follow this link: https://helpcenter.veeam.com/docs/backu ... ml?ver=100

- Longer term, if the HD issues persist consider switching to a different drive manufacturer.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Veeam Storage Repositories - high failure rate for hard drives?

Post by tsightler » 1 person likes this post

Aren't those 12TB Seagate Exos drives some of the worst from a reliability perspective according to Backblaze drive reliability reports? I'm quite sure I remember them being in the 2.5-3% for annualized failure rate at one point, although I think the failure rate stabilized after 15-18 months (i.e. drives that lasted past 12 months had a tendency to fail less). Here's a link to the end of 2019 report which I think has these very drives in it:

https://www.backblaze.com/blog/wp-conte ... _Chart.png

Admittedly, 1 per month is pretty high when you only have 43 drive, but my experience says drives have a tendency to fail in batches, and this is more common for less reliable drives. I guess my opinion is that it's mostly just bad luck. I personally wouldn't cut down on checks as these are enterprise drives, they should be able to handle the load of simple checks.
Gostev
Chief Product Officer
Posts: 31816
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Veeam Storage Repositories - high failure rate for hard drives?

Post by Gostev »

Yes, most likely it is the quality of hard drives you are using. For other possible reasons, consider building vibrations due to nearby machinery, excessive noise level in the server room, etc.
nitramd
Veteran
Posts: 298
Liked: 85 times
Joined: Feb 16, 2017 8:05 pm
Contact:

Re: Veeam Storage Repositories - high failure rate for hard drives?

Post by nitramd »

tsightler wrote: Feb 04, 2021 6:36 pm Aren't those 12TB Seagate Exos drives some of the worst from a reliability perspective according to Backblaze drive reliability reports? I'm quite sure I remember them being in the 2.5-3% for annualized failure rate at one point, although I think the failure rate stabilized after 15-18 months (i.e. drives that lasted past 12 months had a tendency to fail less).
This had completely slipped my mind.
tsightler wrote: Feb 04, 2021 6:36 pm Admittedly, 1 per month is pretty high when you only have 43 drive, but my experience says drives have a tendency to fail in batches, and this is more common for less reliable drives. I guess my opinion is that it's mostly just bad luck. I personally wouldn't cut down on checks as these are enterprise drives, they should be able to handle the load of simple checks.
Certainly agree with no cutbacks in checks!
quork147
Novice
Posts: 3
Liked: never
Joined: Jun 21, 2017 11:23 am
Full Name: Marvin

Re: Veeam Storage Repositories - high failure rate for hard drives?

Post by quork147 »

The drives that initially came with the server are ST12000NM0027 but they are slowly being replaced with ST12000NM0038. Hopefully these newer drives will have fewer failures rates based on the latest Backblaze blog: https://www.backblaze.com/blog/backblaz ... s-for-2020 (the ST12000NM0008 is the SATA equivalent).

I'll keep my checks on the same schedule (anyways probably a good idea to weed out these faulty drives).

Thanks!
orb
Service Provider
Posts: 129
Liked: 27 times
Joined: Apr 01, 2016 5:36 pm
Full Name: Olivier
Contact:

Re: Veeam Storage Repositories - high failure rate for hard drives?

Post by orb » 2 people like this post

The situation was so bad with Exos 12TB, they are not even on Seagate product catalogue! Today, you jump from 10 to 14TB :)

Oli
it.aquelle
Lurker
Posts: 1
Liked: 1 time
Joined: Feb 03, 2017 7:53 pm
Full Name: IT Aerium
Contact:

Re: Veeam Storage Repositories - high failure rate for hard drives?

Post by it.aquelle » 1 person likes this post

I have some experience with that Supermicro box but running 4TB Seagate drives...can't remember when last it had a disk fail. Taken over a almost 5 years you can probably count failures on one hand. The combination was very reliable.
mcz
Veeam Legend
Posts: 945
Liked: 221 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Veeam Storage Repositories - high failure rate for hard drives?

Post by mcz » 1 person likes this post

I'm not a mathematician, but I wonder if under these circumstances it would be likely that enough drives fail at once or maybe shortly after another so that the whole RAID becomes unrecoverable... Without having the data on another repository in addition, I wouldn't feel well about it...
Post Reply

Who is online

Users browsing this forum: Gostev, Semrush [Bot] and 53 guests