a difficult-to-describe problem involving long write times and enormous load levels

Daniel Brooks dlb48x at gmail.com
Sun May 22 05:47:13 EDT 2011


On Sat, May 21, 2011 at 9:03 PM, Fajar A. Nugraha <list at fajar.net> wrote:

> I'm guessing that:
> - you're disk IOPS bound, or
>

Oh, a certainly am IO bound here, but I didn't expect that to run away and
lock up zfs.


> - you hit some unknown bug in zfs memory management (the last one was
> in arc reclaim, but it should be fixed in the version you use)
>
> Can you try:
> - "iostat -mx 3" (or some other method that can show disk i/o
> utilization) when the load gets big
>

This I've done; I'll include a sample.


> - limit max arc size using something like this
> # cat /etc/modprobe.d/zfs.conf
> options zfs zfs_arc_max=134217728
>

This I haven't done yet.

The load average right now isn't particularly high (it's only 4) and I'm not
having any noticeable trouble, but here's a snippet from iotop before I hit
the sack:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.70    0.00    4.26    0.00    0.00   93.04

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdc              41.33     5.00  170.00   12.33     0.73     0.62
15.19     2.81   19.24   4.90  89.40
sde              51.00     5.33  174.00   12.00     0.65     0.62
13.98     2.11   15.06   3.86  71.77
sdf              48.00     5.67  180.00   11.00     0.69     0.62
14.09     2.05   13.66   3.86  73.77
sdg              61.33     5.00  164.33   12.67     0.68     0.62
15.12     1.18    8.20   3.21  56.83
sdh               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdd              43.33     6.67  174.00   12.00     0.76     0.62
15.28     3.15   20.72   4.82  89.67

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.09    0.00    5.41    0.08    0.00   91.42

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     0.33    0.00    0.67     0.00     0.00
12.00     0.00    6.50   6.50   0.43
sdb               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdc             109.00     0.00  435.33    0.00     1.14     0.00
5.35     2.24    5.15   2.13  92.87
sde             116.67     0.00  430.00    0.00     1.07     0.00
5.08     2.09    4.89   2.09  89.73
sdf             124.33     0.00  423.33    0.00     1.06     0.00
5.13     1.97    4.67   2.12  89.60
sdg             125.33     0.00  478.00    0.00     1.00     0.00
4.29     1.53    3.19   1.65  78.93
sdh               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdd             140.33     0.00  415.33    0.00     1.08     0.00
5.33     1.97    4.76   2.16  89.90

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.98    0.00    3.88    0.00    0.00   93.14

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdc              55.67     6.67  119.33   40.00     0.34     0.82
14.95     2.54   16.00   5.64  89.83
sde              34.33     5.67  136.67   37.67     0.32     0.40
8.47     2.70   15.41   4.66  81.27
sdf              30.33     8.00  137.67   35.33     0.35     0.52
10.31     2.46   13.90   5.00  86.57
sdg              36.67     6.67  131.67   42.33     0.32     0.82
13.42     1.11    6.48   2.83  49.23
sdh               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdd              56.33     4.33  114.33   41.00     0.33     0.82
15.12     2.41   15.54   6.02  93.43

As you can see, the five disks that I've dedicated to my pool are pretty
well-utilized. The only anomaly that I can see is that sdg is consistently
faster than the others; it's service time is always lower.

If the filesystem is wedged when I wake up then I'll try it again with a
maximum arc size and see what happens. iotop might having interesting things
to say then too.

db48x
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20110522/1f4c6022/attachment.html>


More information about the zfs-discuss mailing list