zfs locking up my machine

devsk devsku at gmail.com
Wed May 25 15:03:34 EDT 2011


Brian,

The thing is I don't see anything in /var/log/messages (which is on a
different disk and is ext4). Won't I see an Oops message in case it
was one of the bugs you mention?

-devsk


On May 25, 9:56 am, Brian Behlendorf <behlendo... at llnl.gov> wrote:
> We absolutely need to get additional stress and correctness testing
> running on ZFS.  In fact, until this happens I'm loath to tag anything
> other than the current development release candidates (0.6.0-rcX).  I'm
> not considering an official release tag until all the known stability
> issues are resolved.  First and foremost this implementation has to be
> stable.
>
> The said concerning your hard lockups there are a couple likely
> candidates.  Two of which are already addressed in master but your
> likely hitting the third.
>
> * Issue #218: ARC memory reclaim accounting bug.
>   Fixed by commit 3fd70ee post 0.6.0-rc4
>
> * Issue #232: Deadlock in the TXG processing.
>   Fixed by commit 21ade34 post 0.6.0-rc4
>
> * Issue #214: Stack overflow during 'zpool scrub'.
>   Holding up the 0.6.0-rc5 tag, this is the biggie people are hitting
>
> --
> Thanks,
> Brian
>
>
>
>
>
>
>
> On Wed, 2011-05-25 at 08:59 -0700, devsk wrote:
> > Folks,
>
> > I am on 0.6.0-rc4 on a system with 12GB RAM and 2GB dedicated to ARC.
> > There have been 3 occasions in past 1 week period when machine has
> > locked up doing something with ZFS: on two occasions it was overnight
> > scrub (on a RAIDZ backup pool with dedup and compression on) and on
> > one occasion it was while restoring a VM stored on ZFS (RAIDZ2 my data
> > pool with just compression on).
>
> > In all cases its a hard lockup and it leaves no trace in /var/log/
> > messages. My system has been super stable with uptimes running into
> > months before I moved to zfs on Linux (I was running zfs-fuse).
>
> > We need get the stress test going for ZFS. I think scrub code path is
> > super unstable, particularly for deduped pools (or may be its just the
> > duration: the non-dedup pools just finish scrub faster, so never run
> > into this lockup, which looks like memory related).
>
> > -devsk
> > PS: another lock up happened while I was typing this message and scrub
> > was running in the BG. Thanks to firefox session restore, I had my
> > message text exactly where it locked up.



More information about the zfs-discuss mailing list