MX500, very high write amplification

Highlighted
Kilobyte Kid

MX500, very high write amplification

About 6 months ago I purchased a Crucial MX500 500GB for one of my PCs. I found out that under normal desktop usage it appears that its write amplification is very high, far more than any other SSD the same PC has had over the years under the same usage patterns.

 

As of writing, Total Host Sector Writes is 9254241949 (4412 GB) , and Average P/E Count is already at 72. This seems consistent with a Host Program Page Count of 162931734 and a Background Program Page Count of 803818734. Globally, the lifetime write amplification is approaching 6.0x (and rising). However, calculated differentially over relatively short periods of time (e.g. 4 hours) it can even be higher than 100x.

Even though in 6 months of usage SSD lifetime is down about 5% and thus that it would take 10 years to wear it out completely, I am concerned that this is not a normal behavior, especially for the light desktop loads it is normally used with.

 

Partition alignment has been checked and appears to be fine. The firmware has been updated to the latest version (M3CR023) without any improvement with this behavior.

 

 

yRz0C5do.png

 

Here is a graph of Write Amplification over the past few days (calculated every 30 minutes - blue line - and 2 hours - red line).

 

Tags (1)
11 Replies
Crucial Employee

Re: MX500, very high write amplification

Very interesting.

What sort of background processes do you have running? Any P2P programs, or game clients that use anti-piracy features like Denuvo? almost 5TBs of erased data over a six month period is pretty high for "light desktop". Try opening up Windows resource monitor and see if you have any errant programs that are writting constantly to the drive.

You aren't happening to use any for of software encryption on the drive either are you?





Crucial_Benny, Micron CPG Support, US


How do I know what memory to buy?
Shop for your region: US | UK | EU | France |
I think my memory is bad. What do I do now?
FAQs and Top Forum Solutions
Did a user help you? Say thanks by giving Kudos!
Still need help? Contact Customer Service
Want to be a Super User?
Kilobyte Kid

Re: MX500, very high write amplification

Thanks for answering.

 

The computer is always on, so 5 TB in 6 months seems reasonable. I have a similar rate with comparable usage on other machines with different SSDs and even different operating system (Linux).

 

The Windows 10 PC with a Crucial MX500 500GB has a P2P program enabled, but it's running on a hard disk drive. Besides, no such high write amplification values were recorded with a different 250GB SSD from another manufacturer that was used there before installing this new one.

 

From past observations, the reported extremely high write amplification values appear to fluctuate in an almost cyclical manner, with a period of 5-7 days, and do not seem to be correlated with the usage of any specific program in particular. This is a graph over a period of weeks from last year:

 

Screenshot_20190313_192800.png

 

In the past few days I found that having a program continuously read data from the SSD appears to prevent this high write amplification phenomenon. I'm doing this with a benchmark program called FIO ("Flexible I/O Tester").

 

In the graph below at about mid 2019-03-12 it was enabled, then stopped for a few hours, then enabled again. In the past few hours it was stopped again, which caused (indirectly) a write amplification spike. Thereafter it was enabled again and write amplification dropped again to about 1.0-1.1x (as I would normally expect with light usage).

 

Screenshot_20190313_193201.png

 

Here is a portion of a spreadsheet of the data showing more in detail the behavior for the latest write amplification spike. Notably, Power On Hours increase slowly when the SSD is not actively engaged with user reads/writes.

 

Screenshot_20190313_193329.jpeg

 

Here is the latest status from SMART parameters. Since last time, Block Wear-Leveling count has already increased from 72 to 77.

 

ItHWkD2B.png

 

Drive encryption is not enabled.

 

That the Write Amplification Factor can drop to about 1.0x just by actively making the drive perform work (even if just with read operation) / prevent it from idling makes me suspect that this is either due to internal firmware operations (e.g. garbage collection or some other sort of internal "maintenance work") or something that is strictly related with Windows more than user program activity, which would make it more difficult to track down.

 

This is the FIO configuration file I'm using to make the SSD perform reads continuously at a low rate and apparently prevent the write amplification from increasing while it's active. Relevant links in configuration comments.

 

; FIO https://github.com/axboe/fio
;
; FIO Windows builds: https://bluestop.org/fio/
; FIO Parameters: https://fio.readthedocs.io/en/latest/fio_doc.html

[global]
filename=testfile.out
filesize=32m
direct=1
rate=10m ; Limit to 10MB/s

[Continuous Reads]
rw=read
bsrange=64k-512k
time_based=1
runtime=14d
Kilobyte Kid

Re: MX500, very high write amplification

This other graph should make it clearer that the high write amplification appears to be more or less independent of the amount of host writes (right Y axis, logarithmic values) performed during each 30-minute sample period, or at least no clear correlation is visible.

 

Screenshot_20190313_200505.png

If anything, there's a vague hint of anti-correlation (write amplification is high when writes are low). However in the past couple days this trend might have been broken by the continuous reads performed by the benchmark program as previously described.

Crucial Employee

Re: MX500, very high write amplification

With 960 power on hours you're still writing around 120GB of data every 24 hrs, that's a very high amount erases, certainly not in the realms of regular "light" desktop usage. 

How full is your drive with data?

If you have drive that's mostly full, this will really limit wear leveling's ability to do it's job, couple that with the high amount of erases you're doing, and that could explain the abnormal WAF numbers.





Crucial_Benny, Micron CPG Support, US


How do I know what memory to buy?
Shop for your region: US | UK | EU | France |
I think my memory is bad. What do I do now?
FAQs and Top Forum Solutions
Did a user help you? Say thanks by giving Kudos!
Still need help? Contact Customer Service
Want to be a Super User?
Kilobyte Kid

Re: MX500, very high write amplification


@Crucial_Benny wrote:

With 960 power on hours you're still writing around 120GB of data every 24 hrs, that's a very high amount erases, certainly not in the realms of regular "light" desktop usage. 

A possible correlated issue is that power on hours as displayed by Crucial Storage Executive or by any other SMART analysis program does not seem to correspond to actual power on hours. Given that the computer where the SSD has been installed has almost never been turned off, I can produce a graph to show this relationship.

 

From the graph it's also apparent that Power On Hours as reported by the SSD appear to increase faster when SSD activity is higher. As I mentioned previously, in the past few days I have been running a benchmark program that performs read activity continuously.

 

Screenshot_20190313_214741.png

 

The SSD was installed on 2018-09-05 (and again, it's almost never been turned off), for roughly a "true" Power On time of 188 days or 4512 hours. Current Host writes are about 4620 GiB, which would make this 4620/188 = 24.6 GiB/day.

 

How full is your drive with data?

If you have drive that's mostly full, this will really limit wear leveling's ability to do it's job, couple that with the high amount of erases you're doing, and that could explain the abnormal WAF numbers.

 

The drive is about 50% full. Just to make sure, about two weeks ago I tried increasing over-provisioning space by 18 GB (even though this should not be necessary with TRIM enabled and ample free space available), but this did not seem to have any effect.

 

The SSD is running behind Windows' 10 built-in AHCI driver, and TRIM is supposed to be working.

 

 

EDIT: For what it's worth, here's an update on the previously reported data. The write amplification while the benchmark program was performing reads has remained very low during normal usage during the past few hours. Typical user activity involves high definition web streaming, browsing, downloads. For the most part the writes (columns J and K could be of interest) are caused by web browser activity. These do not seem to cause a high write amplification on their own (column F):

 

Screenshot_20190313_220302.png

JEDEC Jedi

Re: MX500, very high write amplification

The computer is always on, so 5 TB in 6 months seems reasonable

5TB on a 500GB drive in six months is a lot.  That is about 28GB/day which is a lot for a drive with just 500GB of storage.   With a drive 50% full you are effectively writing to the whole other half in one week.  Keep in mind these are consumer grade drives using TLC NAND.

 

The Windows 10 PC with a Crucial MX500 500GB has a P2P program enabled, but it's running on a hard disk drive. 

Even if your P2P program is using a hard drive,  it might still be using the SSD as a temporary staging area.  It will likely hold its tracking information on the SSD even if all of the shared files (and pieces of files) are completely stored on the hard drive.

 

Besides, no such high write amplification values were recorded with a different 250GB SSD from another manufacturer that was used there before installing this new one.

Each model SSD is different and can utilize different types of NAND.  There are even different manufacturing of TLC NAND.   You would need to be comparing two identical drives

 

In the past few days I found that having a program continuously read data from the SSD appears to prevent this high write amplification phenomenon.

Perhaps this activity is preventing TRIM and the Garbage Collection from doing anything.  It may also be interrupting whatever is causing all of the writes.  Are short term assessments even that valid or accurate?  I didn't check the math as I'm too tired, but are the calculations correct?

 

Have you tried disabling the P2P software to see if the issue disappears?  Have you run a virus scan on your system (preferably by booting a USB AV Rescue disk)?

 

You may want to consider backing up your system and/or cloning it to an image file on your hard drive using the Crucial Acronis software or any other software you like.  You can create a bootable USB Acronis drive to do the transfer.  I would then perform a Secure Erase on the SSD to reset it to factory defaults and restore its performance.  Then install a clean copy of Win10 and see how the drive behaves (minus the P2P software at first).   You can always restore the system to the way it is now using the Acronis USB boot drive if the behavior is the same.

 

 

 

Kilobyte Kid

Re: MX500, very high write amplification

5TB on a 500GB drive in six months is a lot. That is about 28GB/day which is a lot for a drive with just 500GB of storage. With a drive 50% full you are effectively writing to the whole other half in one week. Keep in mind these are consumer grade drives using TLC NAND

According to the Crucial MX500 product flier, the 500GB model should be capable of a 180TBW endurance, which should be equivalent to 98GB/day for 5 years, at the least. 28 GB/day would be about 1/4 of what it could be supported over its warranty period.

 

Screenshot_20190314_092442.png

 

Having reached with my SSD 5% wear indicated at just 4.4 TiB (4495 GiB), 100% wear would be reached at only 88 TBW. This seems too fast for the kind of usage that is being made of it, the stated endurance and for what I have experienced with several other SSDs installed on this and other PCs in the past years.

 

The wearout indicator on this Crucial MX500 SSD appears to be based on a 1500 P/E cycles endurance (5% reached at 75 cycles).

 

Even if your P2P program is using a hard drive, it might still be using the SSD as a temporary staging area. It will likely hold its tracking information on the SSD even if all of the shared files (and pieces of files) are completely stored on the hard drive.

I am reasonably confident that it isn't the P2P program because it hasn't been actively downloading anything and has been operating at a low level just by "seeding" (uploading data).

 

11288921.100000007_image.png

I'm using Tixati as a P2P program. The about 3,000,000,000 I/O bytes written could possibly be interpreted as due to cache, state management and staging area for its background activity, but with about 18,000 write operations this would cause on average 166,000 bytes/write, which I feel would be unlikely to cause a high write amplification.

 

Each model SSD is different and can utilize different types of NAND. There are even different manufacturing of TLC NAND. You would need to be comparing two identical drives

There certainly are bound to be differences. Normal activity on the MLC-type SSD I'm using right now on this PC on Linux is more about 3.5-4.0x on average, while on other TLC-type SSDs from a different brand it tends to hover to about 1.5x. I've never seen before mid-term values (calculated over the course of an entire P/E cycle) in the order of 8-10 or more, on the other hand.

 

Perhaps this activity is preventing TRIM and the Garbage Collection from doing anything. It may also be interrupting whatever is causing all of the writes. Are short term assessments even that valid or accurate? I didn't check the math as I'm too tired, but are the calculations correct?


An unusually high wear caused by excessive write amplification can be observed by several metrics. Below are approximate calculations disregarding the slight difference between GB/GiB (or TB/TiB).

 

1)
5% wear with 4.4 TBW = 100% wear with 4.4/5*100 = 88 TBW

(write endurance is supposed to be at least 180TBW for the 500GB model)

 

2)

Current average P/E count = 77 cycles.

77 cycles * 500 GB = 38500 GB written to the NAND

Current Host Writes = 4626 GiB

Global write amplification = 38500 / 4626 = 8.3x

 

3)

From This Micron document on how to calculate the write amplification factor with the provided attributes: https://www.micron.com/-/media/client/global/documents/products/technical-note/solid-state-storage/t...

 

Screenshot_20190314_095241.png

 

With my SSD, at the moment:

Host Program Page count = 170,707,030

Background Program Page count = 863,145,684

WAF (global) = 1+(863,145,684 / 170,707,030) = 6.06x

 

(this is slightly different from the calculation using P/E cycles. Shorter-term calculations can give much higher values)

 

4)

The WAF could also be calculated differentially for the span of an entire P/E cycle. I haven't kept track of wear continuously over the past few months, but I have some data that can be shown which indicates that the calculation seems to be still overall consistent.

 

Screenshot_20190314_101151.png

 

 

Have you tried disabling the P2P software to see if the issue disappears? Have you run a virus scan on your system (preferably by booting a USB AV Rescue disk)?

I could try in the next days. Given its low activity and lack of intensive writes on the SSD I don't think it's the problem however. But even if it was, no such problem (or in short, fast wear due to excessive NAND writes) was observed with previous SSDs of lower capacity installed on that same PC in the past years. This is one of the reasons why I believe this issue to be related to the SSD.

 

I do not have virus scan software that could be used other than Windows' integrated one.

 

You may want to consider backing up your system and/or cloning it to an image file on your hard drive using the Crucial Acronis software or any other software you like. You can create a bootable USB Acronis drive to do the transfer. I would then perform a Secure Erase on the SSD to reset it to factory defaults and restore its performance. Then install a clean copy of Win10 and see how the drive behaves (minus the P2P software at first). You can always restore the system to the way it is now using the Acronis USB boot drive if the behavior is the same.

This would be extremely time consuming and I hope the culprit could be identified before going through this. Furthermore, the SSD is not installed on a computer I use personally, which would make going through a clean installation complicated. I can consider secure erasing the SSD at some point if really necessary as it would require proportionally much less time and cooperation by the family member who uses that PC.

JEDEC Jedi

Re: MX500, very high write amplification

Looking at the pieces of information you have written I would probably try 3 things:

 

1/ As suggested, try to disable this P2P software. That shouldn't be a problem, I guess. t is not a secret that programs are written carelessly and may have bugs. A good example here can be Spotify that has been quietly killing user's SSDs life for months by writing massive amounts of data over and over, even when the service was in idle mode.

 

Turning off such kind of software is worth giving a try.

 

2/ Your workaround with FIO script, and pieces of information related to power-on hours makes me think about power saving mode the drive has been put into, DIPM or DevSleep perhaps. What windows power plan is in use? Did you try switching it to 'High performance'?

 

3/ While Windows' built-in AHCI drivers may work better on some systems, they are, from my observations, proprietary AHCI drivers (particularly from Intel) that work better, more stable and properly on most of the systems. I could probably say that one-for-all (driver) is not necessarily best-for-all. I would check if there is Intel's driver available for this PC.

______________________________________
How do I know what memory to buy?
Shop for your region: US | UK | EU | France |
I think my memory is bad. What do I do now?
FAQs and Top Forum Solutions
Did a user help you? Say thanks by giving Kudos!
Still need help? Contact Customer Service
Want to be a Super User?
Kilobyte Kid

Re: MX500, very high write amplification

1) I will try later today to have it disabled for a few days (the user of that PC seems adamant on having it "seeding"/sharing data) and observe any difference.

 

2) I thought of that too. So far the "High performance" power plan has been used. PCI Express power management from Advanced power options has been set to "disabled" and storage device have been set to never go to sleep (0 minutes = "never"). I tried checking out in the BIOS if the "Aggressive Link Power Management" (ALPM) was enabled, which is sometimes known to cause issues and added latencies, but I couldn't find any related option. I found that the SATA ports' "Hot plug" option was enabled, but disabling it didn't seem to bring any positive change.

 

3) For what it's worth, the motherboard is relatively dated. It's based on the Intel H77 chipset. It's a MSI H77MA-G43 manufactured 2012 and it has the latest BIOS version installed (from 2013). Intel does not produce Windows 10-supported AHCI drivers for this chipset.

 

I haven't checked in detail (although I do recall that recent ones definitely won't be installed), but the latest supported driver for this chipset should be the Intel RST 12.9 from 2013 (and no official Win10 support):

https://downloadcenter.intel.com/download/23496/Intel-Rapid-Storage-Technology-Intel-RST-User-Interf...

 

I will try checking out if this or newer ones can be installed later this afternoon, anyway.

 

* * *

 

As a possible further datapoint for the Crucial firmware team, in the past few hours it appears that the write amplification increased again (although not as much as in previous occasions) while the benchmark program was still running performing reads and Power On hours increased as expected (1 power on hour = 1 real-time hour). As I found this to be unusual and to stray from previous observations, I tried checking out more in detail and found that it seemed to be related with a temporary increase in "Pending ECC Count".

 

Screenshot_20190314_125935.jpeg

 

I checked prior data and it seems there is a very loose correlation between high write amplification periods and the appearance of pending ECC counts. As this count resets to zero after a while (likely after the pending ECC have been processed) and I only sample SMART attributes every 30 minutes however, I cannot know whether there have been more of them in the past.

 

Screenshot_20190314_130000.png

 

This could be a completely expected behavior (perhaps due to the phenomenon known as "Read Disturb"), but it might also potentially indicate the nature of this issue.