[zfs-discuss] Catastrophic hangs after upgrading to 0.6.5.3-1~trusty

J David j.david.lists at gmail.com
Tue Dec 15 15:15:21 EST 2015


Thanks for the responses.  There are several open reports of >120
second hangs inside ZFS, but I’m not sure how similar they are.  Issue
#3835 seems to be the closest match. (
https://github.com/zfsonlinux/zfs/issues/3835 )

Here is some more relevant information from our end.

The problem happens most frequently (but not always) while a “zfs
send” is in progress.

The problem may be related to ARC.  After the update, we have observed
that our machines spend huge amounts of CPU time (often 25-50% CPU,
continuously, all day long) in arc_reclaim.  The #1 and #2 overall CPU
users on all of these systems are arc_reclaim and arc_prune:

Top - 19:16:17 up 1 day,  6:15,  3 users,  load average: 3.54, 3.34, 3.10
Tasks: 622 total,   5 running, 617 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 47.7 sy,  0.0 ni, 23.1 id, 27.8 wa,  0.0 hi,  1.5 si,  0.0 st
KiB Mem:  16434404 total, 11624292 used,  4810112 free,    90988 buffers
KiB Swap: 16775164 total,        0 used, 16775164 free.   613288 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  287 root      20   0       0      0      0 S  40.5  0.0 406:16.44 arc_reclaim
  286 root      20   0       0      0      0 S   1.7  0.0  22:38.91 arc_prune
  285 root      20   0       0      0      0 S   2.3  0.0  22:33.80 arc_prune
 1176 root       1 -19       0      0      0 S   3.0  0.0  10:59.38 z_wr_iss
 2420 root      20   0       0      0      0 S   0.0  0.0   8:52.39 nfsd
[… several more nfsd … ]
 1411 root      20   0       0      0      0 D   1.0  0.0   7:48.41 txg_sync

Note this system is up 30 hours (since the last crash) and has about
7.5 CPU hours in arc_reclaim and arc_prune already.

The ARC is full, mostly with metadata:

 cat /proc/spl/kstat/zfs/arcstats
6 1 0x01 91 4368 5482325839 108618481083244
name                            type data
hits                            4    2815595479
misses                          4    60454956
demand_data_hits                4    1628072
demand_data_misses              4    3350321
demand_metadata_hits            4    360415850
demand_metadata_misses          4    16091005
prefetch_data_hits              4    109714
prefetch_data_misses            4    452230
prefetch_metadata_hits          4    2453441843
prefetch_metadata_misses        4    40561400
mru_hits                        4    31368149
mru_ghost_hits                  4    3261639
mfu_hits                        4    414647348
mfu_ghost_hits                  4    354
deleted                         4    54673830
mutex_miss                      4    4292592
evict_skip                      4    124738616331
evict_not_enough                4    94475981
evict_l2_cached                 4    0
evict_l2_eligible               4    281063656960
evict_l2_ineligible             4    679277349888
evict_l2_skip                   4    0
hash_elements                   4    409705
hash_elements_max               4    946155
hash_collisions                 4    9777170
hash_chains                     4    35052
hash_chain_max                  4    6
p                               4    7983257088
c                               4    12884901888
c_min                           4    33554432
c_max                           4    12884901888
size                            4    8617992712
hdr_size                        4    173910480
data_size                       4    3768832
metadata_size                   4    5350089728
other_size                      4    3090223672
anon_size                       4    38669312
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    825356288
mru_evictable_data              4    0
mru_evictable_metadata          4    28147712
mru_ghost_size                  4    0
mru_ghost_evictable_data        4    0
mru_ghost_evictable_metadata    4    0
mfu_size                        4    4489832960
mfu_evictable_data              4    0
mfu_evictable_metadata          4    4418103808
mfu_ghost_size                  4    0
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    0
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_cdata_free_on_write          4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_compress_successes           4    0
l2_compress_zeros               4    0
l2_compress_failures            4    0
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    0
memory_direct_count             4    0
memory_indirect_count           4    0
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    492529197
arc_meta_used                   4    8614223880
arc_meta_limit                  4    8589934592
arc_meta_max                    4    8646130024
arc_meta_min                    4    16777216
arc_need_free                   4    0
arc_sys_free                    4    262946816

The zpools in question are *very* small:

$ zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
data   992G   714G   278G         -    68%    71%  1.00x  ONLINE  -

It seems like the rest of the system is just getting tied up behind
eternal ARC metadata cleanup and eventually stalls out.  (Note the
stack trace in my original message shows that the hanging z_wr_int_4
process is inside arc_write_done waiting on a mutex.)

This is definitely a new issue with 0.6.5.3.  (Upgraded from 0.6.3
which was incredibly stable.)

What is our best course of action here?

- Add RAM?

- Set ARC to metadata only?

- Disable ARC?  (Permanently or during ZFS send.)

- Downgrade to an earlier version of ZFSonLinux?

All of the above courses of action are workarounds.  What can we do to
help identify and resolve the underlying issue, which is happening
*very* frequently?

Thanks!


On Tue, Dec 15, 2015 at 6:50 AM, Aneurin Price via zfs-discuss
<zfs-discuss at list.zfsonlinux.org> wrote:
> On 15 December 2015 at 03:00, Manfred Braun via zfs-discuss
> <zfs-discuss at list.zfsonlinux.org> wrote:
>> Hi !
>>
>> I have a similar thing, reported at 2015-12-05, sad to say, with
>> no response. This was a fresh install on Arch,
>> zfs 0.6.5.3_r0_g9aaf60b_4.2.5_1-1 and kernel 4.2.5-1
>>
>> For me, it happened on shutdown the box. The reason for
>> my shutdown was, that a "dd if=/dev/zero of="/dev/zvol/somevol"
>> never finshed. It was the first time I saw such strange
>> behavior and I need zfs on boot or go to BTRFS.
>
> I've seen the same thing a couple of weeks ago - specifically, "dd
> if=/dev/zero of=/empty-file" within a VM backed by a zvol, which is
> something I do on occasion to get around the lack of discard. In my
> case I noticed within a few seconds that it had stalled after writing
> very little, then a few seconds later the VM locked up solid and I
> killed it immediately - it stuck around as a zombie with anything
> touching that volume getting stuck in uninterruptible sleep, until
> unlocking after about half an hour.
>
> I didn't try again until I'd sent the volume to another pool, so I'm
> not sure if it would have happened consistently - to be honest I
> didn't want to find out.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss


More information about the zfs-discuss mailing list