topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 3:25 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Hard Drive SMART Stats - from the BackBlaze Blog  (Read 10764 times)

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,641
    • View Profile
    • Donate to Member
Hard Drive SMART Stats - from the BackBlaze Blog
« on: November 19, 2014, 06:28 PM »
Hard Drive SMART Stats

I’ve shared a lot of Backblaze data about hard drive failure statistics While our system handles a drive failing, we prefer to predict drive failures, and use the hard drives’ built-in SMART metrics to help. The dirty industry secret? SMART stats are inconsistent from hard drive to hard drive.

With nearly 40,000 hard drives and over 100,000,000 GB of data stored for customers, we have a lot of hard-won experience. See which 5 of the SMART stats are good predictors of drive failure below. And see the data we have started to analyze from all of the SMART stats to see which other ones predict failure.

From experience, we have found the following 5 SMART metrics indicate impending disk drive failure:

    SMART 5 – Reallocated_Sector_Count.
    SMART 187 – Reported_Uncorrectable_Errors.
    SMART 188 – Command_Timeout.
    SMART 197 – Current_Pending_Sector_Count.
    SMART 198 – Offline_Uncorrectable.
« Last Edit: November 19, 2014, 08:46 PM by 4wd »

ewemoa

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 2,922
    • View Profile
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #1 on: November 19, 2014, 10:26 PM »
Thanks for updating the post with the metrics they pay attention to.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #2 on: November 20, 2014, 06:48 AM »
Interesting. Based on sound observations. That tallies pretty closely with what HDS (Hard Disk Sentinel) was reporting about the deteriorating state of my laptop hard drive a while back. I shall make a note of those for future reference.
This was a Seagate ST9500420AS 2½" laptop drive:

HDS Seagate ST9500420AS - failing HP ENVY laptop hard disk.png
« Last Edit: November 20, 2014, 07:22 AM by IainB, Reason: Added image. »

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,641
    • View Profile
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #3 on: November 20, 2014, 08:02 PM »
^Those stats indicate that it's still a good working HDD AFAIAC  :)

Remember this one:

2012-02-03_16-18-18.jpgHard Drive SMART Stats - from the BackBlaze Blog

Almost three years later:

2014-11-21 12_55_41.pngHard Drive SMART Stats - from the BackBlaze Blog

SeraphimLabs

  • Participant
  • Joined in 2012
  • *
  • Posts: 497
  • Be Ready
    • View Profile
    • SeraphimLabs
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #4 on: November 20, 2014, 09:54 PM »
Because the drive had not run out of spare sectors, and was able to remap 100% of them to spare areas.

I've salvaged quite a few 'bad' devices that way, simply overwriting them repeatedly a few times to brute force trigger the remapping sequence.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #5 on: November 21, 2014, 05:03 AM »
@4wd: Blimey. What did you do to get that result? Why does it have the Plus and Minus next to the 100% Health report? I've never seen that before.
Did you somehow set the offset to -1639, or did HDS do that? I haven't thrown away that "bad" drive of mine. If it still has life and is not deteriorating, then maybe I should put a new image on it?

Is that (writing a new image) the sort of thing @SeraphimLabs means where he writes:
...I've salvaged quite a few 'bad' devices that way, simply overwriting them repeatedly a few times to brute force trigger the remapping sequence.
??
Overwriting them repeatedly is what Spinrite does, I think - except it didn't work on that particular drive of mine:
...The software was unable to run on my hardware (disk drive) - for the simple technical reason that it was not possible to effect a BIOS switch change to enable it. ...

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,641
    • View Profile
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #6 on: November 21, 2014, 05:26 AM »
@4wd: Blimey. What did you do to get that result? Why does it have the Plus and Minus next to the 100% Health report? I've never seen that before.
Did you somehow set the offset to -1639, or did HDS do that?

The +/- appears when you put in an offset - which you can do next to any of the S.M.A.R.T. values.

That drive is over 5 years old and still spinning its wheels.

You can run a Low Level Format a couple of times to see if the sectors get remapped, (what I usually do when a HDD starts getting flakey), or use something like MHDD.  Fill up the HDD with big files a few times, at some point it may trigger the remap if the sector gets hit enough times and produces errors.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #7 on: July 16, 2017, 10:57 AM »
@4wd: Would you still recommend that:
...You can run a Low Level Format a couple of times to see if the sectors get remapped, (what I usually do when a HDD starts getting flakey), or use something like MHDD.  Fill up the HDD with big files a few times, at some point it may trigger the remap if the sector gets hit enough times and produces errors.
____________________________________
Is that worth the time/effort, especially when the disk is no longer necessarily reliable?

I am asking because one of my drives got it's first #187 on 2016-06-06 and a second one on 2017-07-10, with a warning that the hard disk status had "degraded", yet it still says its Performance and Health are 100%.    :tellme:

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #8 on: July 16, 2017, 11:32 AM »
With the price of hard drives my opinion these days is that the first sign of the smallest amount of trouble from a drive means the data gets migrated off it and it gets put into retirement.

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,641
    • View Profile
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #9 on: July 16, 2017, 01:19 PM »
@4wd: Would you still recommend that:
...You can run a Low Level Format a couple of times to see if the sectors get remapped, (what I usually do when a HDD starts getting flakey), or use something like MHDD.  Fill up the HDD with big files a few times, at some point it may trigger the remap if the sector gets hit enough times and produces errors.
____________________________________
Is that worth the time/effort, especially when the disk is no longer necessarily reliable?

Depends what you mean by reliable - that particular drive is still working fine in my main computer, on for 8+ hours a day, (although not atm since I'm overseas), so it's ~8+ years old.

I regard all data on any medium as ephemeral since you have absolutely no idea when that medium is going to fail, whether the 1st or the 10,000th time it's used/switched on ... hence afaiac, every medium is unreliable.

All I can do is ensure I have spread the risk by backing up my important data across a variety of medium.

I am asking because one of my drives got it's first #187 on 2016-06-06 and a second one on 2017-07-10, with a warning that the hard disk status had "degraded", yet it still says its Performance and Health are 100%.    :tellme:

Only you can make that determination, I use a drive until it stops working.
You have to weigh up time vs money, I'll always spend the time since I have a lot of it ... money, not so much  ;D

You also have to consider what computer you're talking about here, mine are desktops with multiple drives ... system and data reside on different drives.
If you're talking about a laptop where you only have one drive generally, if it were me I'd image it, backup the data (not OS), then probably wipe the drive a couple of times, restore the image and see what happens ... but like I said, I have the time to do it and other machines I can use in the meantime.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #10 on: July 16, 2017, 01:19 PM »
@mouser: Yes, that's what I would have intuitively thought as well. I asked the Q of @4wd because of his experience - which I don't have.
His comment was 3 years ago, and with the subsequent fall in prices of hard drives and SSDs, I wondered about the relative economics/benefits.
BackBlaze's view seems to be "toss it out" at the first sign of a #187 error, but then they may see it as simply cheaper than the (for them) false economy of expending labour on recovering a drive.
Interestingly, the first #187 error on this drive was a few months after I had bought the laptop new (shop-soiled at 50% discount in a closing-down sale, with 2 months of its warranty already used up). The HP support people didn't see it as a valid warranty claim (under the terms of the warranty) at the time, so I left the drive in the laptop. I've since extended the warranty, but it rather looks as though I shall have to foot the bill for a new drive myself. I don't want to wait for it to fail.
« Last Edit: July 16, 2017, 01:52 PM by IainB »

wraith808

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 11,186
    • View Profile
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #11 on: July 16, 2017, 01:28 PM »
With the price of hard drives my opinion these days is that the first sign of the smallest amount of trouble from a drive means the data gets migrated off it and it gets put into retirement.

I'm getting to that point.  It vies with the fact that I really don't like to work on computers anymore, however.  Just disconnecting everything, removing the box from the mount, opening it up and making the change fills me with dread now.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #12 on: July 16, 2017, 01:50 PM »
@4wd: Hey, many thanks for your response and advice.

Fortunately, with HDSentinel's relatively early warning, I should have time to think about this and plan accordingly.

« Last Edit: July 16, 2017, 03:15 PM by IainB »

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,922
    • View Profile
    • Donate to Member
Re: Hard Drive SMART Stats - from the BackBlaze Blog
« Reply #13 on: July 16, 2017, 01:51 PM »
With the price of hard drives my opinion these days is that the first sign of the smallest amount of trouble from a drive means the data gets migrated off it and it gets put into retirement.

While I agree with the point being made here, you can still put those drive to some use. By using software like MHDD you get a clear idea where bad sectors on the failing disk are located. If these occur near the beginning or the end, you can partition the disk to size that won't "touch" these bad sectors at all. That extends the life of the disk considerably.

For example, you have a hard disk with a capacity of 1 TByte and errors occur in the first 200 GByte of the disk. You can then use partition management software from companies like MiniTool or Eassus to create 2 partitions on that disk. The first partition will be 250 GByte in size, the other partition 750GByte. The first partition should not have a drive letter, just a label stating it contains errors.

Now the disk could be used as a backup disk (for non-essential data) in a USB cradle. If you have a few of these faulty disks, you can make several copies of such backups. Afterwards, disconnect them and store these disks properly. Then these disks will serve you for quite some time still.

Heck, even if the disk is completely toast, you can take it apart and use the platter(s) for a wind chime, the neodyn magnets (strong!) for whatever and the engine that drove the platter(s) can be re-purposed too. The engine is actually a very well manufactured stepping motor with extremely precise tolerances. You will be hard pressed to find better ones anywhere.