Disabling the Intel X-25 E write cache
In my last posting on using SSD with Oracle, I said how impressed I was with the X-25 E SSD write performance. However, at the OReilly MySQL conference last month, I attended a talk by Vadim Tkachenko and Morgan Tocker from Percona on An Overview of Flash Storage for Databases. It was a great talk overall, but one important thing I learned is that the X-25 E has a volatile 64MB write cache. What this means is that the X-25 can report that a block is written to disk when it is still within a RAM buffer within the device. If the disk failed between the write to RAM and the write to flash then the data could be lost.
We’d normally regard this data loss as an unacceptable risk, so you would think that the best thing to do would be to turn the write cache off. This can be done with the following command:
hdparm -W 0 /dev/sdb
(assuming that /dev/sdb is the flash SSD).
Turning off the write cache – as you’d expect – reduces the write IO capacity of the device. Below we see the effect on two identical workloads:
These workloads involve SELECT and UPDATE operations on a table which is stored on a datafile directly on the SSD. There’s no db flash cache involved in this simulation.
The datafile write rate drops substantially and the work takes longer to complete, as we expect. But why does the read IO rate drop as well? The reason is because of free buffer waits.
As described in this post, when a IO subsystem has a higher read bandwidth than write bandwidth, then sessions may be able to add and update blocks in the buffer cache faster than the DBWR can clear them out. When this happens free buffer waits occur as sessions wait for buffers to be cleared.
We can see the free buffer waits in Spotlights event wait chart:
Disabling the write cache slows down disk performance somewhat, but it’s still a lot faster than a spinning disk. Furthermore, most workloads are not as update intensive as in my simulation so a lot of the time you won’t hit this problem. Nevertheless, it's important to realize that the X-26 has this write cache and that it may be artificially increasing write throughput at the cost of write safety.
One word of caution: I met a guy from Percona who told me that Intel doesn’t actually support the X-25 with the write cache disabled. This is a bit disturbing, since it implies that you can choose data safety or vendor support but not both!
Also, note that the write cache can be left enabled if the SSD is only being used for the 11GR2 database flash cache. In that configuration failed writes to the cache in the event of a disk failure will cause no harm: Oracle will detect that the cache has failed and will bypass the cache completely.