01-13-2019 08:54 AM
I've got an old 240GB M500 drive. It's old and has been used heavily but has been pretty reliable over the years. It's currently being caned by various caches, logs and indexes so it's time to move it off the front-line and either into pasture or to just bin it.
Anyway, it's got a Wear Leveling Count of 2976 (so it's 99% 'used') . It was 96% used just a couple of months ago (Some Azure Sync failure went a bit mad on it but I've dealt with that now). Total writes is about 50TB (I think this SSD is rated for 75TB
First question. Can someone please confirm the Wear Leveling Count is based on 'usage data' than 'error' data. Is it just estimating the life I have left based on some sort average use based on the technology? Assuming it went down to 0, my drive could still be fine for months or years?
Second question. If the Wear Level Count is just based on writes, not actual wear, what other metrics should I be monitoring now that it's life is 'officially' over, even though it's still working? Realocatted NAND blocks maybe? Reallocation Event Count? Unused Reserve NAND blocks? Am I even on the right page here? Anything else?
Solved! Go to Solution.
01-13-2019 09:58 AM
Yeha, it's based on average erases (which occur before re-writing) as a percentage of the rated number of erases for the NAND. So usage rather than errors. It could indeed last much longer yet.
It's a little hard to predict drive failure based on SMART data. As you say, reallocations are usually a good sign and it has a few there already. As are uncorrectable errors and read/write errors. generlaly if the numbers are stable you've just had a small part fail that it's mapepd out. If the numbers are climbing with use then you're in trouble.
In theory, the reserve block count hitting 0 would be game over wear wise but I'm not sure I've ever seen that happen before the drive has been retired or failed in some other way.
01-13-2019 12:38 PM
You may want to read this series of articles where they stressed their SSDs 24/7 for several years. I found it very interesting how the various SSDs held up & eventually failed.
It is certainly a good idea to monitor the SMART attributes of the SSD, but as @targetbsp mentions, they don't always alert you to failures. Almost every critical SSD failure I've seen has been with them suddenly disappearing from the system without any warning. Even so since your SSD is so worn, you should at least monitor the following attributes: 05, B4, BB, C4, C5, C6 & CA. Attribute B4 & CA are probably the most important as they signify the actual end of life for the drive. "BB" will alert you to filesystem & data corruption. Because your SSD is so worn & near end of life, other values than these may become important.
FYI, the "current" & "worst" columns will never go below "1".
I'd really be interested in hearing how this SSD eventually fails (and how long it takes) if you continue using it.
01-13-2019 01:40 PM - edited 01-13-2019 01:47 PM
I'm definitely moving all the caching/tempfile/database duties from it. I think I'll be demoting it to a dropbox/onedrive drive. I was toying with using it for Azure Sync but either something didn't install properly or Azure Sync is really heavy on writes to keep the store synced/indexed properly. At least, the Wear Level Percentage went from 96% to 99% in about two months which to mee seems really high.
We'll see what I actually do with it but I'll defintely keep it doing something non critical until it goes, if nothing else because I'm also curious if it's going to be a slow degradation or a quick death.