[zfs-discuss] Low read performance on ZFS based on iSCSI disk

Chris Siebenmann cks at cs.toronto.edu
Tue Jan 12 10:58:52 EST 2016


> On the iSCSI initiator I get the following results:
> 
> 1) IO on file in ZFS (based on iSCSI device or multipath device)
> 
> write:
>  'dd if=/dev/zero bs=1M count=102400 of=/pool1/test/testfile'
>  -> 2400-6600 IOPS, 300-825 MB/s
> 
> read:
>  'dd if=/pool1/test/testfile bs=1M of=/dev/null'
>  -> 1600-2100 IOPS, 200-270 MB/s
> 
> When using 'iostat' you can see that 'avgrq-sz' ("The average size (in
> sectors) of the requests that were issued to the device.") is only 256
> (Linux default).
> This seems to cause low read performance.

 ZFS normally issues 128 KB IOs, so an average group size of 256 is
kind of what you'd expect (it's expressed in 512-byte sectors). This
comes about because 128 KB is normally the largest ZFS block size;
sufficiently large write IOs (eg, for sequentially written files) are
chunked into 128 KB blocks and then read back in those chunks.

(ZFS must read back files in whole blocks because the ZFS checksum is
done over the whole block and must be verified on read.)

 Normally ZFS readahead and perhaps IO aggregation will hide any delays
due to this from you. If there is a performance hit here, it sort of
suggests that ZFS is not able to issue all that many concurrent IOs to
the 'device' (here the iSCSI target) at once. You may want to look into
this, although I don't know how to see eg the queue depth that iSCSI
initiator is advertising and supports.

 ZFS's own readahead is very aggressive and very smart if it can issue
IOs at all[*]. For a sequential file read, it is basically going to
flood as much IO to the devices as it can.

	- cks
[*: this is kind of a pain if you want a repeatable, consistent pattern
    of 'random' IO for performance measurement. If your IO is actually
    issued in some clever pattern, like say 'six separate streams of
    130 Kb reverse strides through separate areas of a data file', well,
    ZFS is smarter than you and will actually readahead/prefetch on that.
    Been there, beat my head against the issue.
]


More information about the zfs-discuss mailing list