[zfs-discuss] Slow read performance

Alex Vodeyko alex.vodeyko at gmail.com
Sun Apr 1 14:57:37 EDT 2018


Richard thanks a lot for the info and for your "zfetchstat" tool!

I've checked zfs module parameters - they are now defaults...
And now I'm working with zpool of two raidz3 12+3 (30 drives total);
ashift = 9 (still not good read results with ashift = 12).

"zfetch_max_distance" seems to be indeed very important:
though I was not able to improve "dd" read performance, I found some
interesting results with "fio":
- I run "fio" with 4 readers and 4 writers (1M block), and

1) with the default "zfetch_max_distance" = 8 MiB I got reads / writes = 1 / 3

# zpool iostat -q 5
              capacity     operations     bandwidth    syncq_read
syncq_write   asyncq_read  asyncq_write   scrubq_read
pool        alloc   free   read  write   read  write   pend  activ
pend  activ   pend  activ   pend  activ   pend  activ
----------  -----  -----  -----  -----  -----  -----  -----  -----
-----  -----  -----  -----  -----  -----  -----  -----
z30         1.81T   270T  7.31K  21.6K   621M  1.71G      0      0
 0      0      0     13    434     92      0      0
z30         1.81T   270T  7.25K  19.9K   616M  1.47G      0      0
 0      0      0     17    377     84      0      0
z30         1.81T   270T  7.72K  21.2K   656M  1.58G      0      0
 0      0      0      8    181     95      0      0

# ./zfetchstat 5
                      time    hits/sec   misses/sec     hit%  max_streams/sec
2018-04-01T20:57:13.355509     1854.34        66.13    96.56            64.54
2018-04-01T20:57:18.360642     1968.63       150.85    92.88           150.05
2018-04-01T20:57:23.364488     1984.01         3.20    99.84             2.40
2018-04-01T20:57:28.369600     2094.30       123.48    94.43           122.68

2)  with the "zfetch_max_distance" = 16 MiB I got reads / writes = 1 / 2

# zpool iostat -q 5
              capacity     operations     bandwidth    syncq_read
syncq_write   asyncq_read  asyncq_write   scrubq_read
pool        alloc   free   read  write   read  write   pend  activ
pend  activ   pend  activ   pend  activ   pend  activ
----------  -----  -----  -----  -----  -----  -----  -----  -----
-----  -----  -----  -----  -----  -----  -----  -----
z30         1.81T   270T  10.6K  19.1K   898M  1.50G      0      0
 0      0      2     12    114     41      0      0
z30         1.81T   270T  11.7K  19.9K   999M  1.45G      0      0
 0      0      0     39    407     86      0      0
z30         1.81T   270T  10.3K  19.5K   877M  1.42G      0      0
 0      0     13     88      0      0      0      0
z30         1.81T   270T  10.8K  19.5K   917M  1.52G      0      0
 0      0      2     28    269     53      0      0

# ./zfetchstat 5
                      time    hits/sec   misses/sec     hit%  max_streams/sec
2018-04-01T20:50:35.023493     2175.17         3.60    99.83             2.80
2018-04-01T20:50:40.028728     2166.35        38.96    98.23            37.36
2018-04-01T20:50:45.033515     2330.16        34.57    98.54            33.77

3)  with the "zfetch_max_distance" = 32 MiB I got reads / writes = 1 / 1

# zpool iostat -q 5
              capacity     operations     bandwidth    syncq_read
syncq_write   asyncq_read  asyncq_write   scrubq_read
pool        alloc   free   read  write   read  write   pend  activ
pend  activ   pend  activ   pend  activ   pend  activ
----------  -----  -----  -----  -----  -----  -----  -----  -----
-----  -----  -----  -----  -----  -----  -----  -----
z30         1.81T   270T  14.2K  14.6K  1.18G  1.00G      0      0
 0      0     14     36     70     95      0      0
z30         1.81T   270T  14.3K  16.9K  1.19G  1.30G      0      0
 0      0     27     78    178     46      0      0
z30         1.81T   270T  15.5K  16.4K  1.29G  1.26G      0      0
 0      0      7     24    438     60      0      0
z30         1.81T   270T  16.1K  15.8K  1.34G  1.20G      0      0
 0      0     45     74    233     68      0      0

# ./zfetchstat 5
                      time    hits/sec   misses/sec     hit%  max_streams/sec
2018-04-01T20:54:53.798833      384.28       104.85    78.56           104.62
2018-04-01T20:54:58.799497     2198.31        38.99    98.26            37.39
2018-04-01T20:55:03.804498     2150.45        33.97    98.45            33.17

I will continue testing and keep you updated.

Thanks,
Alex




2018-04-01 20:05 GMT+03:00 Richard Elling <richard.elling at richardelling.com>:
> there is a lot of good info in this thread already, but I'd like to draw
> your attention to prefetching...
>
> On Mar 29, 2018, at 5:43 AM, Alex Vodeyko via zfs-discuss
> <zfs-discuss at list.zfsonlinux.org> wrote:
>
> ...
>
> I use "recordsize=1M" because we have big files and sequential I/O.
>
> ...
>
> 1) iozone (write = 2.7GB/s, read = 1GB/s)
>
> ...
>
>   "arcstat" during reads:
> # arcstat.py 5 (100% pm and 50+% miss)
>    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
> 14:13:17  1.9K  1.0K     55     1    0  1.0K  100     2   42    62G   62G
> 14:13:22  2.1K  1.1K     54     0    0  1.1K  100     0   13    63G   62G
> 14:13:27  1.8K   980     54     1    0   979  100     2   42    62G   62G
> 14:13:32  1.6K   880     55     1    0   879  100     2   40    62G   62G
>
>
> "pmis" is the number of prefetch misses: both data and metadata
> "pm%" is the prefetch miss ratio to the number of total accesses.
>
> First, check that prefetching is enabled (it is by default)
> zfs_prefetch_disable = 0
>
> For a sequentual read operation, we expect the prefetcher to be prefetching,
> and thus do not expect pm%=100%.
>
> The zfetch_array_rd_sz tunable parameter is a limit to the size of the
> prefetching
> blocks. Basically, if a block is larger than zfetch_array_rd_sz, then it is
> not prefetched.
> However, the default zfetch_array_rd_sz = 1,048,576 thus it should be fine
> if your
> volblocksize=1m. Be sure to check its value.
>
> A related tunable is zfetch_max_distance, default = 8MiB, maximum number of
> bytes
> to prefetch per stream. This might be too small for volblocksize=1m.
>
> To help visualize the zfetch activity, I usually do the data collection with
> Prometheus'
> node_exporter or influxdb's telegraf. But if you are a CLI fan, then I
> pushed a Linux
> version of zfetchstat to
> https://github.com/richardelling/zfs-linux-tools
>
>
> "top" shows only 8 "z_rd_int" processes during reads (and only one
> "z_rd_int" running), while there were 32 running z_wr_iss processes
> during writes.
>
>
> This could be another clue about prefetching not being enabled or not
> working as
> desired. However, in my experience, it is better to observe the detailed
> back-end I/O
> distribution and classification with "zpool iostat -q" where prefetches are
> often, but
> not always, in the asyncq_read category.
>  -- richard
>


More information about the zfs-discuss mailing list