Problem with ZFS, or have I just run out of memory?

Brian Behlendorf behlendorf1 at llnl.gov
Mon May 16 17:17:30 EDT 2011


Matthew,

I suspect the problems your observing are related to issue #218.
Several weeks ago the code was updated to properly enforce the ARC cache
limits via the arc_reclaim thread.  Since then I've observed a flaw in
that VM integration work which I'm looking in to.  It may explain your
problem.  We'll certainly get it fixed before tagging a stable 0.6.0
release.  Thanks for the feedback, this is exactly the sort of thing we
need to hear about so we get it right.

https://github.com/behlendorf/zfs/issues/218

-- 
Thanks,
Brian 

On Mon, 2011-05-16 at 13:36 -0700, Matthew wrote:
> Hi,
> 
> I'm using ZFS for linux via the PPA for Ubuntu 10.10 - latest version
> (0.6.0.14-0ubuntu1~maverick1) on amd64. 500GB SATA drive with single
> zpool, version 28.
> 
> I don't know which version this started happening with, but about 3
> weeks ago I started getting frequent lock-ups soon after mounting
> zpools. I wasn't able to find anything pertinent in kern.log, but
> yesterday I decided to ssh in to the system to see if the ssh session
> stopped responding at the same time that my desktop froze.
> 
> I found that when my desktop froze, the ssh session on the other
> computer became non-responsive for about a minute. Then, it came back
> to life (but my desktop remained frozen - I guess xorg had crashed). I
> was able to do a 'top' and saw that the highest consumer of CPU was
> arc_reclaim, consistently using about 17% CPU (it's an 2.8GHz Athlon).
> 
> kern.log contained this info:
> 
> May 14 17:48:01 matt-desktop kernel: [  324.003939] SPL: Loaded module
> v0.6.0.14, using hostid 0xa8c00300
> May 14 17:48:01 matt-desktop kernel: [  324.406956] ZFS: Loaded module
> v0.6.0.14, ZFS pool version 28, ZFS filesystem version 5
> May 14 19:48:37 matt-desktop kernel: [ 7560.380073] INFO: task java:
> 4738 blocked for more than 120 seconds.
> May 14 19:48:37 matt-desktop kernel: [ 7560.380078] "echo 0 > /proc/
> sys/kernel/hung_task_timeout_secs" disables this message.
> May 14 19:48:37 matt-desktop kernel: [ 7560.380081] java          D
> 0000000000000001     0  4738      1 0x00000000
> May 14 19:48:37 matt-desktop kernel: [ 7560.380086]  ffff880009263e08
> 0000000000000086 ffff880009263f38 0000000000015980
> May 14 19:48:37 matt-desktop kernel: [ 7560.380091]  ffff880009263fd8
> 0000000000015980 ffff880009263fd8 ffff88007ab92dc0
> May 14 19:48:37 matt-desktop kernel: [ 7560.380095]  0000000000015980
> 0000000000015980 ffff880009263fd8 0000000000015980
> May 14 19:48:37 matt-desktop kernel: [ 7560.380099] Call Trace:
> May 14 19:48:37 matt-desktop kernel: [ 7560.380109]
> [<ffffffff8158b10d>] rwsem_down_failed_common+0x9d/0x210
> May 14 19:48:37 matt-desktop kernel: [ 7560.380114]
> [<ffffffff8158b2a3>] rwsem_down_write_failed+0x23/0x30
> May 14 19:48:37 matt-desktop kernel: [ 7560.380119]
> [<ffffffff812c3323>] call_rwsem_down_write_failed+0x13/0x20
> May 14 19:48:37 matt-desktop kernel: [ 7560.380123]
> [<ffffffff8158a522>] ? down_write+0x32/0x40
> May 14 19:48:37 matt-desktop kernel: [ 7560.380127]
> [<ffffffff81127e66>] sys_mprotect+0xe6/0x250
> May 14 19:48:37 matt-desktop kernel: [ 7560.380132]
> [<ffffffff81066b25>] ? sys_gettimeofday+0x45/0x90
> May 14 19:48:37 matt-desktop kernel: [ 7560.380137]
> [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
> May 14 19:48:37 matt-desktop kernel: [ 7560.380141] INFO: task java:
> 4745 blocked for more than 120 seconds.
> May 14 19:48:37 matt-desktop kernel: [ 7560.380143] "echo 0 > /proc/
> sys/kernel/hung_task_timeout_secs" disables this message.
> May 14 19:48:37 matt-desktop kernel: [ 7560.380145] java          D
> 0000000000000001     0  4745      1 0x00000000
> May 14 19:48:37 matt-desktop kernel: [ 7560.380149]  ffff8800092b7de0
> 0000000000000086 ffffffff81c60ee0 0000000000015980
> May 14 19:48:37 matt-desktop kernel: [ 7560.380154]  ffff8800092b7fd8
> 0000000000015980 ffff8800092b7fd8 ffff880037a98000
> May 14 19:48:37 matt-desktop kernel: [ 7560.380158]  0000000000015980
> 0000000000015980 ffff8800092b7fd8 0000000000015980
> May 14 19:48:37 matt-desktop kernel: [ 7560.380161] Call Trace:
> May 14 19:48:37 matt-desktop kernel: [ 7560.380214]
> [<ffffffff8158b10d>] rwsem_down_failed_common+0x9d/0x210
> May 14 19:48:37 matt-desktop kernel: [ 7560.380218]
> [<ffffffff8158b2d6>] rwsem_down_read_failed+0x26/0x30
> May 14 19:48:37 matt-desktop kernel: [ 7560.380222]
> [<ffffffff812c32f4>] call_rwsem_down_read_failed+0x14/0x30
> May 14 19:48:37 matt-desktop kernel: [ 7560.380226]
> [<ffffffff8158a554>] ? down_read+0x24/0x30
> May 14 19:48:37 matt-desktop kernel: [ 7560.380230]
> [<ffffffff8158f257>] do_page_fault+0x347/0x350
> May 14 19:48:37 matt-desktop kernel: [ 7560.380234]
> [<ffffffff8158bbb5>] page_fault+0x25/0x30
> 
> I also noticed that the RAM was nearly 100% full - it's got 2GB. I
> know this is the lowest recommended amount of RAM, but I thought ZFS
> was able to handle manage its use of RAM according to the amount
> available?
> 
> I don't know if it's relevant, but the java process mentioned in the
> logs above is a backup tool - crashplan - and it makes sense that this
> would be using the disk a lot.
> 
> The thing that puzzles me is that zfs was working fine on this
> hardware for a few months, and all of a sudden seems to have started
> causing pretty bad crashes. Can anyone cast any illumination on the
> matter?
> 
> Thanks in advance for any help,
> Matt



More information about the zfs-discuss mailing list