09-05-2018 02:32 PM - edited 09-05-2018 02:34 PM
I Bought a MX500 1TB from amazon last week however the drive kept throwing up the follwing smart errors #197 Current Pending Sector Count 1 -> 0Current Pending Sector Count 0 -> 1 this happened 15 times in 3 days so I returned the drive for a replacment. I noticed yesterday that the replacment drive was also showing errors so I checked online and found a reddit post suggesting it may be a firmware error, on the back of that I upgraded my replacment drive to the newest firmware however that doesn't seem to have fixed the issue and I am now getting the errors again.
My qestion to you is is this a firmware error? or is it something I should worry about? pics below.
Solved! Go to Solution.
09-05-2018 02:58 PM
Please post all of the SMART Attributes as one of the other values may provide more insight.
Have you tried connecting the SSD to another computer to see if the errors continue? I'm wondering if you might have a power supply issue.
09-05-2018 03:05 PM - edited 09-05-2018 05:53 PM
Thanks for the reply, I haven't tried it in a nother PC but I doubt it to be a PSU issue my current power supply is an 850w Superflowe Leadex Platinum running an overcloked 8700k and 1080ti with no other errors, the PC has pased numerous synthetic stress tests for days at a time and the SSD replaced an old 1tb WD Black HDD which didn't display any power suplly related issue nor do the other 4 ssd's connect to the PC. Smart attriutes below and here is a link to the reddit post in qestion with other uses having the same issues
Unriad forum same issues - multiple users - https://forums.unraid.net/topic/69771-solved-current-pending-sector-is-1on-new-ssd-cache/
For further referance I'm on Windows 10 unlike the users in the post who seem to be mostly Linux based so the issue is present across mulitiple OS'
09-05-2018 07:25 PM
Those SMART Attributes all look good. If it was a real error you would see attribute #5 and/or #196 would be non-zero after the Pending Sector resets to zero. Also none of the other internal attributes show any signs of errors. FYI the Host Writes are very high, but this may be due to receiving a refurbished SSD.
I would perform a Sanitize/Secure Erase on this SSD which will erase all your data on the drive. This will reset the SSD to factory defaults and also reset all the NAND cells. I've had the Secure Erase fix blocks that would not reallocate on their own so I think it is worth a try.
While your system may be perfectly stable, it wouldn't hurt to see if the issue happens when the SSD is connected to a different system & power source if the Secure Erase does not help. The MX500 is using a different controller than the older Crucial SSDs. Perhaps it is a bit more sensitive especially with an overclocked system and one under so much load. Or you may have a bad connector on the power supply or maybe it doesn't fit as snugly on the MX500 as it does on other drives. This is the first I've heard of this problem. Connecting a problematic device to another system is always a good troubleshooting step as it helps to confirm whether the issue is truly internal to the drive. You would be very surprised at some of things I've encountered which have caused failures which is why I'm suggesting you to try it.
09-05-2018 11:17 PM - edited 09-06-2018 10:53 AM
The reason I doubt this is the (the power supply) problem is I have double checked all the phsysical connections and the previous drive showed the same errors while connected to a AC powered usb 3 docking device connected to a different wall outlet (used to transfer data to the new one) this was entirely seperated from my power supply and it's unlikly I would have 2 drives exibiting the exact same issues on 2 previously known to be good power supplies. regarding blocks reallocating themselves that is not the problem the issue always fixes itself within the hour (which is why people belive it to be a bug) the users on the forums posted are all sys admin and or data horders so whilst I accept if it was just me having the issue system instablility and or fautly power supplys could well be at fault I find it hard to believe that the other posters just so happen to have unstable overclocks and faulty power supply/loose connectors too. Bare in mind those users are likly to have stock systems and or ECC ram as stablitly in those useage cases is paramount.
I spoke with the poster of the reddit thread above via PM asking about the issue and his response was the following when asked if it was ever fixed for him
"Nope, unfortunately it has yet to be fixed. Although it doesn't seem to happen as often anymore. The only time I notice it is when im dumping several hundred gigs to the cache drive.
When I reached out to them they basically told me to not Raid 0 them. Which makes no sense, Smh. After a couple of emails and proof of it not being just me (quite a few on the limetech forums have the same issue) they said they will be looking into it. That was when I made the post and I haven't heard back sense.
Despite the error I still trust the drive as it's given me no indication of being faulty or on the fritz."
I agree the smart data looks fine. it's the same smart data the retunred drive showed which again is why we think its a bug, the drive passes all smart tests but will randomly show trhe reallocted block error and fix it's self within minuits or hours for referance here is the smart errors from the returned drive (as you can see the drive is back to 100% and never has more than 1 current pending sector before fixing it's self either within the hour or on the next boot.)
Furher users having the exact same problem.
Considering how many other people are having the same issue I would have to say I think Occam's razor applies here regarding the SSD.
09-06-2018 10:40 AM
With the extra information you just provided I tend to agree with your assessment it is most likey a drive issue. I would still suggest performing a Secure Erase on the SSD.
While Crucial does monitor these forums, it is easy to miss a post so I suggest you contact Crucial Support directly to create an official support ticket since this is the second drive you have with the same problem so they can look a bit more closely at the problem. Hopefully the other people in the links you provided have contacted Crucial to alert them to the issue.
09-06-2018 12:30 PM - edited 09-06-2018 03:05 PM
@opalfruit the SMART data on your drive look perfectly fine, so I wouldn't worry about it suddenly dying or anything.
This certainly appears to be something odd between the drive and the unRAID health test. It is strange it's pointing out attribute 197 for current pending sector count, because that is current pending ECC count, there is no such thing as a "pending sector count" for Micron drives. So unRAID is getting confused for some odd reason.
This is my first time hearing of this but I don't always get included on every engineering discussion, so I'll shoot them over an email to see if there are any updates on this matter. I'll reply back as soon as I get some new information.
09-06-2018 12:42 PM - edited 09-06-2018 12:45 PM
Hi thanks for the reponse, just want to point out I'm not using unraid myself I'm actually just running a windows 10 machine using the drive as a standard data drive connected via the sata ports on my motherboard.
The software that is showing the errors is hard disk sentinel https://www.hdsentinel.com/ smart monitoring software which I would imagine has the same or very similar smart data as unraid so whatever is confusing unraid seems to be confusing smart monitoring in general. The people on the unraid forums seem to believe it's a linux only issue however I can attest to it being an issue on windows too.
I'll await updates thanks again.
09-06-2018 03:11 PM
@opalfruit thank you for that clarification. It does appear the SMART self test used by HD Sentinel and unRAID has some similarities. The odd thing is Smartmontools which is the application used by almost every linux distro to check the SMART health on drives apparently doesn't have this issue. So it's odd unRAID in particular being a Linux based environment would have the problem.
I'm interested to see what engineering has to say.
01-19-2019 05:45 AM
I have purchased two CT250MX500SSD1 units and they are updated with the M3CR023 fimrware.
I have them configured in a RAID 0 with the Intel controller of the motherboard.
Those messages have started appearing in the Hard Disk Sentinel registers on both discs:
01/19/2019 12: 38: 54, # 197 Current Pending Sector Count 1 -> 0 01/19/2019 12:33:54, # 197 Current Pending Sector Count 0 -> 1
That seems to be the same as what happened to opalfruit.
I was wondering if in the end the fault can be solved via software, you have had to change the SSD or you still have the problem.