[zfs-devel] writing to slog on SSDs
bprotopopov at hotmail.com
Wed Jun 24 18:32:00 EDT 2015
I am using a fairly recent clone on zfs-0.6.3 for some performance benchmarks, and I've recently observed the following on a pool with an SSD provisioned as SLOG. Trying a simple single stream synchronous write benchmark against a zvol in that pool, I found that the throughput was limited by the write latency to the SSD, and that according to the iostat, the latency was pretty bad - between 2 and 3 msec.
So, I took the SSD out of the pool and re-ran the benchmark against the raw device (using O_SYNC|O_DIRECT). This time, the latency was 10x less, and the throughput went up accordingly.
Experimenting more, I found that this is not the SSD that is taking its time, but rather the Linux block layer plugging the queue and unplugging it on timer (which happens to be 3 msec by default). I also found that the directio path in Linux takes care to unplug the queue explicitly right away, so the SSD performs well in my raw device benchmark. I used blktrace/blkparse for this analysis, and I also tried Brendan Gregg's iolatency from his Linux perf tools based on perf, ftrace, etc. (excellent tool for disk latency analysis).
I run somewhat older Linux (Centos 6.6, more specifically, 504.23.4). I did check that the /sys/block/[dev]/queue/rotational is 0, set scheduler to [none] for the SSDs, but it did not make a difference. Yet it would seem that the scheduler should have understood that synchrouns writes should be unplugged right away.
However, I also took a quick look at the vdev_disk.c, and I did not find any attempts to distinguish between synchronous and asynchronous writes when setting the bio flags. And it appears that in my kernel, blk_queue_bio() will not take care to unplug the queue right away, unless the BIO_RW_UNPLUG is set (which seems included in WRITE_SYNC and not in WRITE as defined in include/linux/fs.h).
I would really appreciate someone familiar with the code double-checking my analysis. Are writes to SLOGs really _not_ marked as synchronous when dispatched to the device queues ? I must be confused/missing something else that is misconfigured on my system.
Perhaps someone has seen this and might give me a hint ?
Does anyone see bad single stream synchronous write performance with SLOGs on SSDs ?
Best regards,Boris Protopopov.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the zfs-devel