Monday, January 18, 2010

Be SMART, get advance warning that your hard drive is starting to fail

All modern hard drives support S.M.A.R.T. in which the hard drive (HD) runs self tests and reports on the status of internal metrics (seek errors, block errors, temperature, being dropped, etc). If a HD starts to do poorly in some of the internal metrics, it's one sign the drive might be starting to fail. Unfortunately predicting when the drive will actually failure is impossible right now, akin to predicting earthquakes. But it does give you some advance warning to redouble the backup efforts or to switch out that drive if it holds valuable data.

There exist many tools to access the SMART interface depending on your OS. The key thing is that the HD itself does all the work. The tools just provide access to the HD.

The beauty of SMART is that
  • you can run the SMART tests on a running system on a mounted HD in use! Since the HD does the SMART scheduling internally, it can figure out when and how to continue with it's own test while getting normal OS requests.
  • it's pretty easy to do a manual scan. For Linux, download the smartctl software and then start issuing commands. There's a lot of documentation on the web.
  • with a bit of work you can setup regular SMART background tests and have alerts sent when a HD falls below the built in "failure" thresholds.
The main drawbacks are
  • a thorough "long" test takes several hours. On a 1.5TB drive, it takes 4-8 hours.
  • you need to have the HD hooked up directly via its native interface, namely PATA or SATA. If a drive is hooked up via a USB enclosure, smartctl will claim the drive does not support SMART. Ugh. (I assume eSATA will work, as this is fundamentally SATA).
I'm not going to repeat all the information out there. But here's some decent links.

No comments: