there is a lot of good info in this thread already, but I'd like to draw your attention to prefetching...

> On Mar 29, 2018, at 5:43 AM, Alex Vodeyko via zfs-discuss <zfs-discuss at list.zfsonlinux.org> wrote:
> I use "recordsize=1M" because we have big files and sequential I/O.
> 1) iozone (write = 2.7GB/s, read = 1GB/s)
>   "arcstat" during reads:
> # arcstat.py 5 (100% pm and 50+% miss)
>    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
> 14:13:17  1.9K  1.0K     55     1    0  1.0K  100     2   42    62G   62G
> 14:13:22  2.1K  1.1K     54     0    0  1.1K  100     0   13    63G   62G
> 14:13:27  1.8K   980     54     1    0   979  100     2   42    62G   62G
> 14:13:32  1.6K   880     55     1    0   879  100     2   40    62G   62G

"pmis" is the number of prefetch misses: both data and metadata
"pm%" is the prefetch miss ratio to the number of total accesses.

First, check that prefetching is enabled (it is by default) zfs_prefetch_disable = 0

For a sequentual read operation, we expect the prefetcher to be prefetching,
and thus do not expect pm%=100%.

The zfetch_array_rd_sz tunable parameter is a limit to the size of the prefetching
blocks. Basically, if a block is larger than zfetch_array_rd_sz, then it is not prefetched.
However, the default zfetch_array_rd_sz = 1,048,576 thus it should be fine if your
volblocksize=1m. Be sure to check its value.

A related tunable is zfetch_max_distance, default = 8MiB, maximum number of bytes 
to prefetch per stream. This might be too small for volblocksize=1m.

To help visualize the zfetch activity, I usually do the data collection with Prometheus' 
node_exporter or influxdb's telegraf. But if you are a CLI fan, then I pushed a Linux
version of zfetchstat to 
https://github.com/richardelling/zfs-linux-tools <https://github.com/richardelling/zfs-linux-tools>
> "top" shows only 8 "z_rd_int" processes during reads (and only one
> "z_rd_int" running), while there were 32 running z_wr_iss processes
> during writes.

This could be another clue about prefetching not being enabled or not working as
desired. However, in my experience, it is better to observe the detailed back-end I/O
distribution and classification with "zpool iostat -q" where prefetches are often, but
not always, in the asyncq_read category.
 -- richard

