Unfortunately so far I haven't been able to determine any clear software cause to the very high write amplification phenomenon seen. On the other hand, after collecting SMART attributes at a rate of 1/minute for a few days, it appears there is a correlation between write amplification and the Current Pending ECC Count. It seems unlikely to me that this is something that would be directly software-caused. So, summing up what I have observed so far: Power On Hours Count appears to increase faster (closer to real time) when a certain amount of load (even read-only load) is put on the SSD. A continuous high read load appears to immediately defer this write amplification phenomenon for a while, but does not completely eliminate it. On the long term, the high write amplification seems to be correlated with the appearance of pending ECC errors. Turning off Aggressive Link Power Management (DIPM/HIPM) from Windows or the chipset AHCI driver does not seem to affect the behavior of the SSD. This might or might not be related, but my Crucial MX500 came from the factory with a M3CR020 firmware, which I understand was not publicly released on the Crucial website. The currently installed firmware is the latest (M3CR023). As a bonus, here are the latest SMART attributes from Crucial Storage Executive: Wear leveling count increase for the past few logged days: Timestamp Block wear-leveling Count Host GiB written GiB delta Days delta GiB/day 2019-03-15 01:36:06 78 4643.4 2019-03-17 08:49:13 79 4675.7 32.31 2.30 14.041 2019-03-18 16:11:52 80 4706.2 30.46 1.31 23.296
... View more
Intel RST (the AHCI driver provided by Intel) versions other than 12.9 would not get installed on Windows 10 on the previously mentioned Intel H77 chipset. After installing them, I disabled the "Link Power Management" option in their control panel and rebooted as instructed there. This could possibly be the reason why Power On hours didn't rise as real-time hours, and a possible indirect reason why I was having issues with the drive's write amplification and consequently excessive NAND wear for the amount of writes performed by the Host. I have not performed other changes for now (the P2P program is still active in "seeding" mode), but I've disabled the benchmarking program which was keeping the SSD actively working. In the next hours it should become clearer if this will solve the problem. If it will, this is still something that should be taken care of by the SSD (firmware), though. EDIT: Unfortunately, that option did not result in any improvement in the behavior of the SSD.
... View more
1) I will try later today to have it disabled for a few days (the user of that PC seems adamant on having it "seeding"/sharing data) and observe any difference. 2) I thought of that too. So far the "High performance" power plan has been used. PCI Express power management from Advanced power options has been set to "disabled" and storage device have been set to never go to sleep (0 minutes = "never"). I tried checking out in the BIOS if the "Aggressive Link Power Management" (ALPM) was enabled, which is sometimes known to cause issues and added latencies, but I couldn't find any related option. I found that the SATA ports' "Hot plug" option was enabled, but disabling it didn't seem to bring any positive change. 3) For what it's worth, the motherboard is relatively dated. It's based on the Intel H77 chipset. It's a MSI H77MA-G43 manufactured 2012 and it has the latest BIOS version installed (from 2013). Intel does not produce Windows 10-supported AHCI drivers for this chipset. I haven't checked in detail (although I do recall that recent ones definitely won't be installed), but the latest supported driver for this chipset should be the Intel RST 12.9 from 2013 (and no official Win10 support): https://downloadcenter.intel.com/download/23496/Intel-Rapid-Storage-Technology-Intel-RST-User-Interface-and-Driver I will try checking out if this or newer ones can be installed later this afternoon, anyway. * * * As a possible further datapoint for the Crucial firmware team, in the past few hours it appears that the write amplification increased again (although not as much as in previous occasions) while the benchmark program was still running performing reads and Power On hours increased as expected (1 power on hour = 1 real-time hour). As I found this to be unusual and to stray from previous observations, I tried checking out more in detail and found that it seemed to be related with a temporary increase in "Pending ECC Count". I checked prior data and it seems there is a very loose correlation between high write amplification periods and the appearance of pending ECC counts. As this count resets to zero after a while (likely after the pending ECC have been processed) and I only sample SMART attributes every 30 minutes however, I cannot know whether there have been more of them in the past. This could be a completely expected behavior (perhaps due to the phenomenon known as "Read Disturb"), but it might also potentially indicate the nature of this issue.
... View more
5TB on a 500GB drive in six months is a lot. That is about 28GB/day which is a lot for a drive with just 500GB of storage. With a drive 50% full you are effectively writing to the whole other half in one week. Keep in mind these are consumer grade drives using TLC NAND According to the Crucial MX500 product flier, the 500GB model should be capable of a 180TBW endurance, which should be equivalent to 98GB/day for 5 years, at the least. 28 GB/day would be about 1/4 of what it could be supported over its warranty period. Having reached with my SSD 5% wear indicated at just 4.4 TiB (4495 GiB), 100% wear would be reached at only 88 TBW. This seems too fast for the kind of usage that is being made of it, the stated endurance and for what I have experienced with several other SSDs installed on this and other PCs in the past years. The wearout indicator on this Crucial MX500 SSD appears to be based on a 1500 P/E cycles endurance (5% reached at 75 cycles). Even if your P2P program is using a hard drive, it might still be using the SSD as a temporary staging area. It will likely hold its tracking information on the SSD even if all of the shared files (and pieces of files) are completely stored on the hard drive. I am reasonably confident that it isn't the P2P program because it hasn't been actively downloading anything and has been operating at a low level just by "seeding" (uploading data). I'm using Tixati as a P2P program. The about 3,000,000,000 I/O bytes written could possibly be interpreted as due to cache, state management and staging area for its background activity, but with about 18,000 write operations this would cause on average 166,000 bytes/write, which I feel would be unlikely to cause a high write amplification. Each model SSD is different and can utilize different types of NAND. There are even different manufacturing of TLC NAND. You would need to be comparing two identical drives There certainly are bound to be differences. Normal activity on the MLC-type SSD I'm using right now on this PC on Linux is more about 3.5-4.0x on average, while on other TLC-type SSDs from a different brand it tends to hover to about 1.5x. I've never seen before mid-term values (calculated over the course of an entire P/E cycle) in the order of 8-10 or more, on the other hand. Perhaps this activity is preventing TRIM and the Garbage Collection from doing anything. It may also be interrupting whatever is causing all of the writes. Are short term assessments even that valid or accurate? I didn't check the math as I'm too tired, but are the calculations correct? An unusually high wear caused by excessive write amplification can be observed by several metrics. Below are approximate calculations disregarding the slight difference between GB/GiB (or TB/TiB). 1) 5% wear with 4.4 TBW = 100% wear with 4.4/5*100 = 88 TBW (write endurance is supposed to be at least 180TBW for the 500GB model) 2) Current average P/E count = 77 cycles. 77 cycles * 500 GB = 38500 GB written to the NAND Current Host Writes = 4626 GiB Global write amplification = 38500 / 4626 = 8.3x 3) From This Micron document on how to calculate the write amplification factor with the provided attributes: https://www.micron.com/-/media/client/global/documents/products/technical-note/solid-state-storage/tnfd23_m500_smart_attributes_calc_waf.pdf With my SSD, at the moment: Host Program Page count = 170,707,030 Background Program Page count = 863,145,684 WAF (global) = 1+(863,145,684 / 170,707,030) = 6.06x (this is slightly different from the calculation using P/E cycles. Shorter-term calculations can give much higher values) 4) The WAF could also be calculated differentially for the span of an entire P/E cycle. I haven't kept track of wear continuously over the past few months, but I have some data that can be shown which indicates that the calculation seems to be still overall consistent. Have you tried disabling the P2P software to see if the issue disappears? Have you run a virus scan on your system (preferably by booting a USB AV Rescue disk)? I could try in the next days. Given its low activity and lack of intensive writes on the SSD I don't think it's the problem however. But even if it was, no such problem (or in short, fast wear due to excessive NAND writes) was observed with previous SSDs of lower capacity installed on that same PC in the past years. This is one of the reasons why I believe this issue to be related to the SSD. I do not have virus scan software that could be used other than Windows' integrated one. You may want to consider backing up your system and/or cloning it to an image file on your hard drive using the Crucial Acronis software or any other software you like. You can create a bootable USB Acronis drive to do the transfer. I would then perform a Secure Erase on the SSD to reset it to factory defaults and restore its performance. Then install a clean copy of Win10 and see how the drive behaves (minus the P2P software at first). You can always restore the system to the way it is now using the Acronis USB boot drive if the behavior is the same. This would be extremely time consuming and I hope the culprit could be identified before going through this. Furthermore, the SSD is not installed on a computer I use personally, which would make going through a clean installation complicated. I can consider secure erasing the SSD at some point if really necessary as it would require proportionally much less time and cooperation by the family member who uses that PC.
... View more
@Crucial_Benny wrote: With 960 power on hours you're still writing around 120GB of data every 24 hrs, that's a very high amount erases, certainly not in the realms of regular "light" desktop usage. A possible correlated issue is that power on hours as displayed by Crucial Storage Executive or by any other SMART analysis program does not seem to correspond to actual power on hours. Given that the computer where the SSD has been installed has almost never been turned off, I can produce a graph to show this relationship. From the graph it's also apparent that Power On Hours as reported by the SSD appear to increase faster when SSD activity is higher. As I mentioned previously, in the past few days I have been running a benchmark program that performs read activity continuously. The SSD was installed on 2018-09-05 (and again, it's almost never been turned off), for roughly a "true" Power On time of 188 days or 4512 hours. Current Host writes are about 4620 GiB, which would make this 4620/188 = 24.6 GiB/day. How full is your drive with data? If you have drive that's mostly full, this will really limit wear leveling's ability to do it's job, couple that with the high amount of erases you're doing, and that could explain the abnormal WAF numbers. The drive is about 50% full. Just to make sure, about two weeks ago I tried increasing over-provisioning space by 18 GB (even though this should not be necessary with TRIM enabled and ample free space available), but this did not seem to have any effect. The SSD is running behind Windows' 10 built-in AHCI driver, and TRIM is supposed to be working. EDIT: For what it's worth, here's an update on the previously reported data. The write amplification while the benchmark program was performing reads has remained very low during normal usage during the past few hours. Typical user activity involves high definition web streaming, browsing, downloads. For the most part the writes (columns J and K could be of interest) are caused by web browser activity. These do not seem to cause a high write amplification on their own (column F):
... View more
This other graph should make it clearer that the high write amplification appears to be more or less independent of the amount of host writes (right Y axis, logarithmic values) performed during each 30-minute sample period, or at least no clear correlation is visible. If anything, there's a vague hint of anti-correlation (write amplification is high when writes are low). However in the past couple days this trend might have been broken by the continuous reads performed by the benchmark program as previously described.
... View more
Thanks for answering. The computer is always on, so 5 TB in 6 months seems reasonable. I have a similar rate with comparable usage on other machines with different SSDs and even different operating system (Linux). The Windows 10 PC with a Crucial MX500 500GB has a P2P program enabled, but it's running on a hard disk drive. Besides, no such high write amplification values were recorded with a different 250GB SSD from another manufacturer that was used there before installing this new one. From past observations, the reported extremely high write amplification values appear to fluctuate in an almost cyclical manner, with a period of 5-7 days, and do not seem to be correlated with the usage of any specific program in particular. This is a graph over a period of weeks from last year: In the past few days I found that having a program continuously read data from the SSD appears to prevent this high write amplification phenomenon. I'm doing this with a benchmark program called FIO ("Flexible I/O Tester"). In the graph below at about mid 2019-03-12 it was enabled, then stopped for a few hours, then enabled again. In the past few hours it was stopped again, which caused (indirectly) a write amplification spike. Thereafter it was enabled again and write amplification dropped again to about 1.0-1.1x (as I would normally expect with light usage). Here is a portion of a spreadsheet of the data showing more in detail the behavior for the latest write amplification spike. Notably, Power On Hours increase slowly when the SSD is not actively engaged with user reads/writes. Here is the latest status from SMART parameters. Since last time, Block Wear-Leveling count has already increased from 72 to 77. Drive encryption is not enabled. That the Write Amplification Factor can drop to about 1.0x just by actively making the drive perform work (even if just with read operation) / prevent it from idling makes me suspect that this is either due to internal firmware operations (e.g. garbage collection or some other sort of internal "maintenance work") or something that is strictly related with Windows more than user program activity, which would make it more difficult to track down. This is the FIO configuration file I'm using to make the SSD perform reads continuously at a low rate and apparently prevent the write amplification from increasing while it's active. Relevant links in configuration comments. ; FIO https://github.com/axboe/fio
; FIO Windows builds: https://bluestop.org/fio/
; FIO Parameters: https://fio.readthedocs.io/en/latest/fio_doc.html
rate=10m ; Limit to 10MB/s
... View more
About 6 months ago I purchased a Crucial MX500 500GB for one of my PCs. I found out that under normal desktop usage it appears that its write amplification is very high, far more than any other SSD the same PC has had over the years under the same usage patterns. As of writing, Total Host Sector Writes is 9254241949 (4412 GB) , and Average P/E Count is already at 72. This seems consistent with a Host Program Page Count of 162931734 and a Background Program Page Count of 803818734. Globally, the lifetime write amplification is approaching 6.0x (and rising). However, calculated differentially over relatively short periods of time (e.g. 4 hours) it can even be higher than 100x. Even though in 6 months of usage SSD lifetime is down about 5% and thus that it would take 10 years to wear it out completely, I am concerned that this is not a normal behavior, especially for the light desktop loads it is normally used with. Partition alignment has been checked and appears to be fine. The firmware has been updated to the latest version (M3CR023) without any improvement with this behavior. Here is a graph of Write Amplification over the past few days (calculated every 30 minutes - blue line - and 2 hours - red line).
... View more
I recently acquired a Crucial MX500 500GB SSD (Firmware: M3CR020). I have a few questions about certain SMART attributes as reported by the drive (for example through the Linux utility smartctl/smartmontools). 1) What correlation is there exactly between Host Program page count and Total Host Sector writes? I've found that the bytes/page calculation gives a graph showing a value that tends to decrease over time and increase in steps whenever a large amount of sequential writes occurs, as in the graphed example. Why does it decrease over time anyway? 2) Can you confirm that the real Write Amplification value is calculated as [1 + (Background Program Page Count / Host Program Page Count)] as described here for the older Crucial/Micron M500 SSD? https://www.micron.com/~/media/documents/products/technical-note/solid-state-storage/tnfd23_m500_smart_attributes_calc_waf.pdf 3) Does "Power on Hours" really show what the name implies? In my testing I noticed that rather than the amount of time the SSD has been kept powered on, it seems to show the "active time" of its controller. The more intensively the SSD is used, the faster is increases, as this graph shows. Note that the SSD has never been turned off since it was purchased. At about "real-time" 172 hours I performed some intensive read tests, which caused "Power On Hours" to increase faster. 4) This is a sort of follow-up to question 2. What is the expected short term Write Amplification Factor that one should expect with these drives under normal (light Windows desktop) usage? I found it tends to vary quite a lot, peaking to very high values, as the following graph shows (calculations with the formula given in the link in question 2. Values sampled every 30 minutes. The overall lifetime WAF is currently about 3.1x. More questions to follow...!
... View more