Currently unreadable (pending) sectors

Kilobyte Kid

Currently unreadable (pending) sectors

I'm on Centos 7 64x and i got a whole bunch of crucials as my main drives, ranging from M4, M500, M550 and MX100

 

Recently drives have been reported as 'failing' by smartctl. The error looks like this: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors

I been having this error on several drives, on one of my boxes even so far as 3 out of the 4 crucials in there. Sometimes i go about on trying to fix this by dropping them from mdadm (software raid) and overwriting the sector or just doing a ATA secure erase. This clears the error but some times they come back.

 

My question now is, is this error an actual failing drive? i'm unsure whenever data from smartcrl can be interprited the same as its more designed for HDD's in mind. If so, 3 drives failing out of 4 seems quite high ;/

9 Replies
JEDEC Jedi

Re: Currently unreadable (pending) sectors

That doesn't match any of the attribute descriptions for a Crucial SSD.  What attribute is it?  C5?  That's current pending sector count.

 

The more modern drives show reserved NAND under B4.  If that's greater than 0 you should be good for covering any worn otu NAND anyway.

_______________________________________
How do I know what memory to buy?
Shop for your region: US | UK | EU | France |
I think my memory is bad. What do I do now?
FAQs and Top Forum Solutions
Did a user help you? Say thanks by giving Kudos!
Still need help? Contact Customer Service
Want to be a Super User?
Kilobyte Kid

Re: Currently unreadable (pending) sectors

Thank for for your response. Smartctl reports this as ID 197.

"197 Current_Pending_Sector  0x0032" 

 

I don't see any C5 or B4 with smartctl, only numbers.

JEDEC Jedi

Re: Currently unreadable (pending) sectors

Could you post your smartctl output(s)? Just please remove drive(s) serial number. It is weird that 3 drives out of 4 in your system seems to show issues.

______________________________________
How do I know what memory to buy?
Shop for your region: US | UK | EU | France |
I think my memory is bad. What do I do now?
FAQs and Top Forum Solutions
Did a user help you? Say thanks by giving Kudos!
Still need help? Contact Customer Service
Want to be a Super User?
Kilobyte Kid

Re: Currently unreadable (pending) sectors

I can't give you one with a pending sector as just a few days before making a post about it i ATA secure erased all of them that had this issue. On one drive it came back but that one is currently laying outside of the case at the datacenter for me to pick up. But this is an example of M500 drive that had it but it dissapeared after a secure erase, not sure if this helps.

 

http://pastebin.com/VBEhVnSJ

Dual Channel Surfer

Re: Currently unreadable (pending) sectors

I'm not sure what the problem is, but to help with diagnostics, you need to update your smartmontools drive-db. Then you won't be seeing those "Unknown_Attribute" messages.

 

If you machine has an internet connection, you should be able to run:

/usr/sbin/update-smart-drivedb

 

If the machine doesn't have a direct internet connection, it might be possible to update the db manually.

In Linux machines the drive-db is probably stored in some path like:

/var/lib/smartmontools/drivedb/

 

Perhaps it could be a different path on RedHat/CentOS than Debian and others.

 

====

 

From your 960GB M500 drive:

 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       6
  5 Reallocated_Sector_Ct   0x0033   100   100   000    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       12401
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       92
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0032   093   093   000    Old_age   Always       -       231
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       58
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   000   000   000    Pre-fail  Always       -       16523
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   073   060   000    Old_age   Always       -       27 (Min/Max 23/40)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Unknown_SSD_Attribute   0x0031   093   093   000    Pre-fail  Offline      -       7
206 Unknown_SSD_Attribute   0x000e   100   100   000    Old_age   Always       -       0
210 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       4
246 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       18489378380
247 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       742313758
248 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       4360715557

 

====

 

The following is from a macbook pro running OS X 10.9.5 with a M500 240GB and running smartmontools with a recently updated db.

 

smartctl 6.3 2014-07-26 r3976 [x86_64-apple-darwin13.3.0] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       41
  5 Reallocated_Sector_Ct   0x0033   100   100   000    Pre-fail  Always       -       1
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       4643
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2646
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Wear_Leveling_Count     0x0032   099   099   000    Old_age   Always       -       32
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       15
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   000   000   000    Pre-fail  Always       -       4065
183 SATA_Iface_Downshift    0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   067   058   000    Old_age   Always       -       33 (Min/Max 20/42)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       1
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       815
202 Percent_Lifetime_Used   0x0031   099   099   000    Pre-fail  Offline      -       1
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_Host_Sector_Write 0x0032   100   100   ---    Old_age   Always       -       9641250755
247 Host_Program_Page_Count 0x0032   100   100   ---    Old_age   Always       -       284843737
248 Bckgnd_Program_Page_Cnt 0x0032   100   100   ---    Old_age   Always       -       224563013

 

====

 

You can see there are no more "Unknown_Attribute" messages.

 

Your drive has had a more writes than mine, a higher number of Unexpected Power Loss counts, but less Raw Read Errors and less UDMA CRC Errors. There really doesn't appear to be any sign that the drive was due to die.

 

JEDEC Jedi

Re: Currently unreadable (pending) sectors

Indeed, it is hard to tell. There is nothing that stands out from the crowd. There are no errors logged and drive passes self-tests. There are some small numers for attributes 1 and 210 but from the other hand I think I have seen drives with much higher values there that were working fine.

197 is C5(hex) and I believe that 05(hex), C5(hex) and C6(hex) are critical on most HDDs. But SSDs supposed to handle some errors in a different way... see B4(hex) - 180 Unused_Rsvd_Blk_Cnt_Tot.

______________________________________
How do I know what memory to buy?
Shop for your region: US | UK | EU | France |
I think my memory is bad. What do I do now?
FAQs and Top Forum Solutions
Did a user help you? Say thanks by giving Kudos!
Still need help? Contact Customer Service
Want to be a Super User?
Highlighted
Dual Channel Surfer

Re: Currently unreadable (pending) sectors

On the drive where you get the error "currently unreadable (pending) sectors" message, you could try running a smartctl long test to see if that gives more information.

 

smartctl -t long /dev/sda

Then check the status to see if it completed.

smartctl -A /dev/sda

 

 

Kilobyte Kid

Re: Currently unreadable (pending) sectors

So am i suppose to ignore 197 unless it keeps growing? One pending sector is not dangerous for my data? What if it keeps comming back even after a reset? I'll be picking up the SSD where the pending sector kept comming back in 2 weeks, i was planning on RMAing it but if you say its not a ciritcal thing i might just keep it i guess.

 

@alex486

I updated my smart tools with the way you provided and the unknown attributes did indeed go away. Cool trick, i'll remember to do this once a while on my servers. New pastebin: http://pastebin.com/W62Uu65y

 

Also about the self test, i haven't tried a long one but when i did a short one it just hanged on 90% remaining for an hour so i killed it.

JEDEC Jedi

Re: Currently unreadable (pending) sectors

I don't know. Possible RMA would be up to you. I think I am not able to tell you if you have actual failing drive(s) at this point. You can try to ask Crucial Customer Service about that (the contact link is in my signature). 

 

Monitoring of SMART attributes is a good idea for sure. I would probably run long self-tests too.

 

Most likely there is following attribute summary:

AttribID: 197

HexID: C5h

Name: Current Pending Sector Count

SMART Trip: No

Implementation: Will always be 0, as error handling will be done at the field

Description: -

 

I would probably lean towards ignoring some small values in SMART if the drive works properly, I mean no data corruption or file/disk/IO errors reported by OS. Also there is a Crucial Storage Executive tool that in theory should show you overall health of the drive (for M500 and newer models). Furthermore there is Crucial's knowledge base article about SMART.

 

Personally I am that kind of user that would backup the data (it should be done on a regular basis anyway) and keep the drive out of curiosity to see if it is failing or not and how SMART attributes and drive condition would develop over time. 

If your drives are in mission criticial system you can probably talk about the reported SMART values directly with Crucial. 

______________________________________
How do I know what memory to buy?
Shop for your region: US | UK | EU | France |
I think my memory is bad. What do I do now?
FAQs and Top Forum Solutions
Did a user help you? Say thanks by giving Kudos!
Still need help? Contact Customer Service
Want to be a Super User?