a difficult-to-describe problem involving long write times and enormous load levels

Daniel Brooks dlb48x at gmail.com
Sun May 22 15:08:32 EDT 2011


On Sat, May 21, 2011 at 9:03 PM, Fajar A. Nugraha <list at fajar.net> wrote:

> On Sun, May 22, 2011 at 3:55 AM, Daniel Brooks <dlb48x at gmail.com> wrote:
> > my only clues are that the
> > load average was 180 this morning when I logged in via ssh. The box was
> > still operating normally for the root user, as long as I didn't interact
> > with a zfs filesystem. The box was even still acting as a gateway for the
> > rest of my network. In fact, there was essentially zero cpu usage, and
> > although most of my 8 gigs of ram was in use, that wasn't unexpected.
> There
> > was essentially no swap usage at all, which is normal. Is there any way I
> > can try to narrow down the actual cause?
>
> I'm guessing that:
> - you're disk IOPS bound, or
> - you hit some unknown bug in zfs memory management (the last one was
> in arc reclaim, but it should be fixed in the version you use)
>
> Can you try:
> - "iostat -mx 3" (or some other method that can show disk i/o
> utilization) when the load gets big
> - limit max arc size using something like this
> # cat /etc/modprobe.d/zfs.conf
> options zfs zfs_arc_max=134217728
>
>

Ok, This morning the load is only 19, but everything that interacts with a
zfs filesystem is locked up. iostat says that there is zero activity on
those disks:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.08   11.77    0.00   88.15

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.04    0.00    0.08   11.39    0.00   88.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.48   13.09    0.00   85.43

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.33     0.00     0.00
8.00     0.00    3.00   3.00   0.10
sdb               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00

Although as you can see it can still read from sda, which isn't being used
by zfs. The drives themselves still work, I can interrogate them with
hdparm, for instance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20110522/bbee9d10/attachment.html>


More information about the zfs-discuss mailing list