zfs locking up my machine

devsk devsku at gmail.com
Wed May 25 12:40:09 EDT 2011


This is reproducible with a simple test case:

1. create a dedup RAIDZ pool with compression on
2. populate it with large number of files (copy the root fs).
3. mark one of the devices offline and replace with another device
4. resilvering will trigger a hard lock up within minutes every time.

I had a script running in the background during last lockup. The
script saved meminfo, arcstats, dmesg to a file on rootfs every 5
seconds. Nothing useful in there. System is as healthy as it can be.
So, its really a hard deadlock caused by ZFS.

-devsk


On May 25, 8:59 am, devsk <dev... at gmail.com> wrote:
> Folks,
>
> I am on 0.6.0-rc4 on a system with 12GB RAM and 2GB dedicated to ARC.
> There have been 3 occasions in past 1 week period when machine has
> locked up doing something with ZFS: on two occasions it was overnight
> scrub (on a RAIDZ backup pool with dedup and compression on) and on
> one occasion it was while restoring a VM stored on ZFS (RAIDZ2 my data
> pool with just compression on).
>
> In all cases its a hard lockup and it leaves no trace in /var/log/
> messages. My system has been super stable with uptimes running into
> months before I moved to zfs on Linux (I was running zfs-fuse).
>
> We need get the stress test going for ZFS. I think scrub code path is
> super unstable, particularly for deduped pools (or may be its just the
> duration: the non-dedup pools just finish scrub faster, so never run
> into this lockup, which looks like memory related).
>
> -devsk
> PS: another lock up happened while I was typing this message and scrub
> was running in the BG. Thanks to firefox session restore, I had my
> message text exactly where it locked up.



More information about the zfs-discuss mailing list