Problem with ZFS, or have I just run out of memory?

Matthew saturnreturn at gmail.com
Mon May 16 16:36:04 EDT 2011


Hi,

I'm using ZFS for linux via the PPA for Ubuntu 10.10 - latest version
(0.6.0.14-0ubuntu1~maverick1) on amd64. 500GB SATA drive with single
zpool, version 28.

I don't know which version this started happening with, but about 3
weeks ago I started getting frequent lock-ups soon after mounting
zpools. I wasn't able to find anything pertinent in kern.log, but
yesterday I decided to ssh in to the system to see if the ssh session
stopped responding at the same time that my desktop froze.

I found that when my desktop froze, the ssh session on the other
computer became non-responsive for about a minute. Then, it came back
to life (but my desktop remained frozen - I guess xorg had crashed). I
was able to do a 'top' and saw that the highest consumer of CPU was
arc_reclaim, consistently using about 17% CPU (it's an 2.8GHz Athlon).

kern.log contained this info:

May 14 17:48:01 matt-desktop kernel: [  324.003939] SPL: Loaded module
v0.6.0.14, using hostid 0xa8c00300
May 14 17:48:01 matt-desktop kernel: [  324.406956] ZFS: Loaded module
v0.6.0.14, ZFS pool version 28, ZFS filesystem version 5
May 14 19:48:37 matt-desktop kernel: [ 7560.380073] INFO: task java:
4738 blocked for more than 120 seconds.
May 14 19:48:37 matt-desktop kernel: [ 7560.380078] "echo 0 > /proc/
sys/kernel/hung_task_timeout_secs" disables this message.
May 14 19:48:37 matt-desktop kernel: [ 7560.380081] java          D
0000000000000001     0  4738      1 0x00000000
May 14 19:48:37 matt-desktop kernel: [ 7560.380086]  ffff880009263e08
0000000000000086 ffff880009263f38 0000000000015980
May 14 19:48:37 matt-desktop kernel: [ 7560.380091]  ffff880009263fd8
0000000000015980 ffff880009263fd8 ffff88007ab92dc0
May 14 19:48:37 matt-desktop kernel: [ 7560.380095]  0000000000015980
0000000000015980 ffff880009263fd8 0000000000015980
May 14 19:48:37 matt-desktop kernel: [ 7560.380099] Call Trace:
May 14 19:48:37 matt-desktop kernel: [ 7560.380109]
[<ffffffff8158b10d>] rwsem_down_failed_common+0x9d/0x210
May 14 19:48:37 matt-desktop kernel: [ 7560.380114]
[<ffffffff8158b2a3>] rwsem_down_write_failed+0x23/0x30
May 14 19:48:37 matt-desktop kernel: [ 7560.380119]
[<ffffffff812c3323>] call_rwsem_down_write_failed+0x13/0x20
May 14 19:48:37 matt-desktop kernel: [ 7560.380123]
[<ffffffff8158a522>] ? down_write+0x32/0x40
May 14 19:48:37 matt-desktop kernel: [ 7560.380127]
[<ffffffff81127e66>] sys_mprotect+0xe6/0x250
May 14 19:48:37 matt-desktop kernel: [ 7560.380132]
[<ffffffff81066b25>] ? sys_gettimeofday+0x45/0x90
May 14 19:48:37 matt-desktop kernel: [ 7560.380137]
[<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
May 14 19:48:37 matt-desktop kernel: [ 7560.380141] INFO: task java:
4745 blocked for more than 120 seconds.
May 14 19:48:37 matt-desktop kernel: [ 7560.380143] "echo 0 > /proc/
sys/kernel/hung_task_timeout_secs" disables this message.
May 14 19:48:37 matt-desktop kernel: [ 7560.380145] java          D
0000000000000001     0  4745      1 0x00000000
May 14 19:48:37 matt-desktop kernel: [ 7560.380149]  ffff8800092b7de0
0000000000000086 ffffffff81c60ee0 0000000000015980
May 14 19:48:37 matt-desktop kernel: [ 7560.380154]  ffff8800092b7fd8
0000000000015980 ffff8800092b7fd8 ffff880037a98000
May 14 19:48:37 matt-desktop kernel: [ 7560.380158]  0000000000015980
0000000000015980 ffff8800092b7fd8 0000000000015980
May 14 19:48:37 matt-desktop kernel: [ 7560.380161] Call Trace:
May 14 19:48:37 matt-desktop kernel: [ 7560.380214]
[<ffffffff8158b10d>] rwsem_down_failed_common+0x9d/0x210
May 14 19:48:37 matt-desktop kernel: [ 7560.380218]
[<ffffffff8158b2d6>] rwsem_down_read_failed+0x26/0x30
May 14 19:48:37 matt-desktop kernel: [ 7560.380222]
[<ffffffff812c32f4>] call_rwsem_down_read_failed+0x14/0x30
May 14 19:48:37 matt-desktop kernel: [ 7560.380226]
[<ffffffff8158a554>] ? down_read+0x24/0x30
May 14 19:48:37 matt-desktop kernel: [ 7560.380230]
[<ffffffff8158f257>] do_page_fault+0x347/0x350
May 14 19:48:37 matt-desktop kernel: [ 7560.380234]
[<ffffffff8158bbb5>] page_fault+0x25/0x30

I also noticed that the RAM was nearly 100% full - it's got 2GB. I
know this is the lowest recommended amount of RAM, but I thought ZFS
was able to handle manage its use of RAM according to the amount
available?

I don't know if it's relevant, but the java process mentioned in the
logs above is a backup tool - crashplan - and it makes sense that this
would be using the disk a lot.

The thing that puzzles me is that zfs was working fine on this
hardware for a few months, and all of a sudden seems to have started
causing pretty bad crashes. Can anyone cast any illumination on the
matter?

Thanks in advance for any help,
Matt



More information about the zfs-discuss mailing list