One thing I've observed that does have a direct effect on service life is heat. Cases packed with multiple hard drives, inadequate airflow, and "hot room" environments do experience more drive failures than single-drive PCs in normal office or home environments.
-40hz
IIRC Google's big harddrive failure report claimed heat wasn't that important wrt. drive death, but it's certainly been my experience as well - but perhaps Google's report measured ambient case temperature and didn't have a lot of drives crammed close together, which would mean pretty hefty temperature hotspots?
Maybe you can give me a little more detail. Like xkcd's jokes about graphs with no axis labels, the caution drive is listed at "97" (of what?) reallocated sectors, with the "threshold of 36" (of what?). But the data drive is listed as "good" with reallocated sectors at "100" (of what?) with a "threshold of 5". Why So Different? So how is one Caution and the other good?-TaoPhoenix
The CrystalDiskInfo program sucks, IMHO, since it shows raw S.M.A.R.T numbers - and those are damn close to meaningless. You need an application that has knowledge of specific brands and can translate the values to something meaningful. (Sigh, "standards" - when can they ever get anything right?)
As soon as the re-allocated sector count (from a program that can
correctly display SMART data) goes non-zero, replace the drive. Sure, I've had drives that lasted years after a few reallocated sectors, but it's an indication that you're getting disk errors - at best you risk minor data corruption, at worst the drive goes CLUNK from one day to the next.
It's also worth keeping in mind that drives only reallocate sectors on disk writes - so just attempting to
read a bad sector will not cause it to get re-allocated. Thus, it's not the only stat you need to look at - another interesting one is the number of DMA errors. Those
can be an indication of bad SATA cables (which is also worrysome), but can definitely also be a sign of drives that are about to die.
Personally, I don't switch out drives before they show signs of being about to die - whether the two aforementioned stats, or "stuff feeling wonky" (machine being slower or even stalling on disk I/O, or drives making noises they don't usually do). Be sure to raid-mirror your important stuff, and
also do backups.
Don't even consider other RAID forms than mirroring. Yes, you "waste" a lot of space with mirroring compared to *-5 or *-6 modes, but rebuilding a mirror is a simple linear copy, whereas rebuilding the more complex forms of raid have more points of failure, and are more intensive on the involved disks. I've heard more than one story of people losing their entire raid-5 arrays... and not having backups (they built the arrays for ZOMG HUGE SIZE, and thought that *-5 really couldn't fail, thus treated it *as* backup...)
Also, a quick mention on SSDs... back up those things
even more vigilantly than mechanical drives. Yes, in theory those flash cells should wear out gracefully, and even the MLC variants should last quite a bit longer under normal use than a mechanical disk. Funny thing is, though, that they don't. Or rather, the flash cells don't wear out, but either the firmware goes into retardo-mode (known to happen frequently with SandForce based drives), or other parts of the electronics just go frizzle. And then you're SOL. Really, bigtime SOL. At least with mechanical drives, you can send them off to data recovery services if the data was important enough... much less likely to be able to do that with SSDs, especially with the ones that have hardware encryption.
Me and a classmate had our Vertex2 SSDs die a few weeks apart, after... what, a month or so use? And my Intel X25-E (their
ENTERPRISE SLC-based drive) died last month, after a few years of non-intensive use... I'm sure the SLC cells would have several years lifetime left, so it's probably some electronics that went fizzle. Scary that an enterprise drive dies like that :-(