[zfs-devel] writing to slog on SSDs

Boris bprotopopov at hotmail.com
Wed Jun 24 18:32:00 EDT 2015

Hi, guys,
I am using a fairly recent clone on zfs-0.6.3 for some performance benchmarks, and I've recently observed the following on a pool with an SSD provisioned as SLOG. Trying a simple single stream synchronous write benchmark against a zvol in that pool, I found that  the throughput was limited by the write latency to the SSD, and that according to the iostat, the latency was pretty bad - between 2 and 3 msec. 
So,  I took the SSD out of the pool and re-ran the benchmark against the raw device (using O_SYNC|O_DIRECT). This time, the latency was 10x less, and the throughput went up accordingly. 
Experimenting more, I found that this is not the SSD that is taking its time, but rather the Linux block layer plugging the queue and unplugging it on timer (which happens to be 3 msec by default). I also found that the directio path in Linux  takes care to unplug the queue explicitly right away, so the SSD performs well in my raw device benchmark. I used blktrace/blkparse for this analysis, and I also tried Brendan Gregg's iolatency from his Linux perf tools based on perf, ftrace, etc. (excellent tool for disk latency analysis).  
I run somewhat older Linux (Centos 6.6, more specifically, 504.23.4). I did check that the /sys/block/[dev]/queue/rotational is 0, set scheduler to [none] for the SSDs, but it did not make a difference. Yet it would seem that the scheduler should have understood that synchrouns writes should be unplugged right away. 
However, I also took a quick look at the vdev_disk.c, and I did not find any attempts to distinguish between synchronous and asynchronous writes when setting the bio flags. And it appears that in my kernel, blk_queue_bio() will not take care to unplug the queue right away, unless the BIO_RW_UNPLUG is set (which seems included in WRITE_SYNC and not in WRITE as defined in include/linux/fs.h). 
I would really appreciate someone familiar with the code double-checking my analysis. Are writes to SLOGs really _not_ marked as synchronous when dispatched to the device queues ? I must be confused/missing something else that is misconfigured on my system. 
Perhaps someone has seen this and might give me a hint ? 
Does anyone see bad single stream synchronous write performance with SLOGs on SSDs ? 
Best regards,Boris Protopopov.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-devel/attachments/20150624/3d66d39f/attachment-0001.html>

More information about the zfs-devel mailing list