M500 - End of useful life (or not)

SOLVED
Kilobyte Kid

M500 - End of useful life (or not)

I've got an old 240GB M500 drive. It's old and has been used heavily but has been pretty reliable over the years. It's currently being caned by various caches, logs and indexes so it's time to move it off the front-line and either into pasture or to just bin it.

Anyway, it's got a Wear Leveling Count of 2976 (so it's 99% 'used') . It was 96% used just a couple of months ago (Some Azure Sync failure went a bit mad on it but I've dealt with that now). Total writes is about 50TB (I think this SSD is rated for 75TB

 

First question. Can someone please confirm the Wear Leveling Count is based on 'usage data' than 'error' data. Is it just estimating the life I have left based on some sort average use based on the technology? Assuming it went down to 0, my drive could still be fine for months or years?

Second question. If the Wear Level Count is just based on writes, not actual wear, what other metrics should I be monitoring now that it's life is 'officially' over, even though it's still working? Realocatted NAND blocks maybe? Reallocation Event Count? Unused Reserve NAND blocks? Am I even on the right page here? Anything else?

 

----------------------------------------------------------------------------
 (1) Crucial_CT240M500SSD1
----------------------------------------------------------------------------
           Model : Crucial_CT240M500SSD1
        Firmware : MU05
   Serial Number : ************
       Disk Size : 240.0 GB (8.4/137.4/240.0/----)
     Buffer Size : Unknown
     Queue Depth : 32
    # of Sectors : 468862128
   Rotation Rate : ---- (SSD)
       Interface : Serial ATA
   Major Version : ACS-2
   Minor Version : ATA8-ACS version 6
   Transfer Mode : SATA/300 | SATA/600
  Power On Hours : 34935 hours
  Power On Count : 1805 count
     Host Writes : 51346 GB
Wear Level Count : 2976
     Temperature : 32 C (89 F)
   Health Status : Caution (1 %)
        Features : S.M.A.R.T., APM, 48bit LBA, NCQ, TRIM, DevSleep
       APM Level : 00FEh [ON]
       AAM Level : ----
    Drive Letter : V:
-- S.M.A.R.T. --------------------------------------------------------------
ID Cur Wor Thr RawValues(6) Attribute Name
01 100 100 __0 0000000000C0 Raw Read Error Rate
05 100 100 __0 000000000003 Reallocated NAND Blocks
09 100 100 __0 000000008877 Power On Hours
0C 100 100 __0 00000000070D Power Cycle Count
AB 100 100 __0 000000000000 Program Fail Count
AC 100 100 __0 000000000000 Erase Fail Count
AD __1 __1 __0 000000000BA0 Average Block-Erase Count
AE 100 100 __0 000000000679 Unexpected Power Loss Count
B4 __0 __0 __0 000000000FDF Unused Reserve NAND Blocks
B7 100 100 __0 000000000001 SATA Interface Downshift
B8 100 100 __0 000000000000 Error Correction Count
BB 100 100 __0 000000000000 Reported Uncorrectable Errors
C2 _68 _44 __0 003800140020 Temperature
C4 100 100 __0 000000000003 Reallocation Event Count
C5 100 100 __0 000000000000 Current Pending Sector Count
C6 100 100 __0 000000000000 Smart Off-line Scan Uncorrectable Error Count
C7 100 100 __0 000000000002 Ultra DMA CRC Error Rate
CA __1 __1 __0 000000000063 Percent Lifetime Used
CE 100 100 __0 000000000000 Write Error Rate
D2 100 100 __0 000000000003 Successful RAIN Recovery Count
F6 100 100 __0 001912561DAF Total Host Sector Writes
F7 100 100 __0 0000C99D20F8 Host Program Page Count
F8 100 100 __0 00007001834D Background Program Page Count
4 Replies
JEDEC Jedi

Re: M500 - End of useful life (or not)

Yeha, it's based on average erases (which occur before re-writing) as a percentage of the rated number of erases for the NAND.  So usage rather than errors.  It could indeed last much longer yet.

 

It's a little hard to predict drive failure based on SMART data.  As you say, reallocations are usually a good sign and it has a few there already.  As are uncorrectable errors and read/write errors.  generlaly if the numbers are stable you've just had a small part fail that it's mapepd out.  If the numbers are climbing with use then you're in trouble.

 

In theory, the reserve block count hitting 0 would be game over wear wise but I'm not sure I've ever seen that happen before the drive has been retired or failed in some other way.

_______________________________________
How do I know what memory to buy?
Shop for your region: US | UK | EU | France |
I think my memory is bad. What do I do now?
FAQs and Top Forum Solutions
Did a user help you? Say thanks by giving Kudos!
Still need help? Contact Customer Service
Want to be a Super User?
Kilobyte Kid

Re: M500 - End of useful life (or not)

Cool. Happy to hear I was on the right track there.

Highlighted
JEDEC Jedi

Re: M500 - End of useful life (or not)

You may want to read this series of articles where they stressed their SSDs 24/7 for several years.  I found it very interesting how the various SSDs held up & eventually failed.

 

It is certainly a good idea to monitor the SMART attributes of the SSD, but as @targetbsp mentions, they don't always alert you to failures.  Almost every critical SSD failure I've seen has been with them suddenly disappearing from the system without any warning.  Even so since your SSD is so worn, you should at least monitor the following attributes:  05, B4, BB, C4, C5, C6 & CA.  Attribute B4 & CA are probably the most important as they signify the actual end of life for the drive.  "BB" will alert you to filesystem & data corruption.   Because your SSD is so worn & near end of life, other values than these may become important.

 

FYI, the "current" & "worst" columns will never go below "1".

 

I'd really be interested in hearing how this SSD eventually fails (and how long it takes)  if you continue using it.

Kilobyte Kid

Re: M500 - End of useful life (or not)

I'm definitely moving all the caching/tempfile/database duties from it. I think I'll be demoting it to a dropbox/onedrive drive. I was toying with using it for Azure Sync but either something didn't install properly or Azure Sync is really heavy on writes to keep the store synced/indexed properly. At least, the Wear Level Percentage went from 96% to 99% in about two months which to mee seems really high.
We'll see what I actually do with it but I'll defintely keep it doing something non critical until it goes, if nothing else because I'm also curious if it's going to be a slow degradation or a quick death.