zfs locking up my machine

Brian Behlendorf behlendorf1 at llnl.gov
Wed May 25 12:56:32 EDT 2011


We absolutely need to get additional stress and correctness testing
running on ZFS.  In fact, until this happens I'm loath to tag anything
other than the current development release candidates (0.6.0-rcX).  I'm
not considering an official release tag until all the known stability
issues are resolved.  First and foremost this implementation has to be
stable.

The said concerning your hard lockups there are a couple likely
candidates.  Two of which are already addressed in master but your
likely hitting the third.

* Issue #218: ARC memory reclaim accounting bug.
  Fixed by commit 3fd70ee post 0.6.0-rc4

* Issue #232: Deadlock in the TXG processing.
  Fixed by commit 21ade34 post 0.6.0-rc4

* Issue #214: Stack overflow during 'zpool scrub'.
  Holding up the 0.6.0-rc5 tag, this is the biggie people are hitting
  
-- 
Thanks,
Brian 

On Wed, 2011-05-25 at 08:59 -0700, devsk wrote:
> Folks,
> 
> I am on 0.6.0-rc4 on a system with 12GB RAM and 2GB dedicated to ARC.
> There have been 3 occasions in past 1 week period when machine has
> locked up doing something with ZFS: on two occasions it was overnight
> scrub (on a RAIDZ backup pool with dedup and compression on) and on
> one occasion it was while restoring a VM stored on ZFS (RAIDZ2 my data
> pool with just compression on).
> 
> In all cases its a hard lockup and it leaves no trace in /var/log/
> messages. My system has been super stable with uptimes running into
> months before I moved to zfs on Linux (I was running zfs-fuse).
> 
> We need get the stress test going for ZFS. I think scrub code path is
> super unstable, particularly for deduped pools (or may be its just the
> duration: the non-dedup pools just finish scrub faster, so never run
> into this lockup, which looks like memory related).
> 
> -devsk
> PS: another lock up happened while I was typing this message and scrub
> was running in the BG. Thanks to firefox session restore, I had my
> message text exactly where it locked up.



More information about the zfs-discuss mailing list