Hung processes during copy operations

Brian Behlendorf behlendorf1 at llnl.gov
Mon May 23 17:12:11 EDT 2011


I took a look at the debugging you posted and there's good and bad news.
While the symptoms of this issue appear the same as before, it looks
like a different root cause.  That's the good news.  The bad news is the
lockup is caused by the kernel not being able to allocate some memory.
>From my initial reading of the debugging you posted I'm not quite sure
why this is the case.

I've opened issue #251 for this problem and included the relevant
portions of the logs.  If you were to set zfs_arc_max module option to
some smaller fraction of your total system memory you could probably
avoid this issue until it's fixed.

https://github.com/behlendorf/zfs/issues/251

-- 
Thanks,
Brian 

On Mon, 2011-05-23 at 04:45 -0700, Ulrich Petri wrote:
> One thing I forgot to add:
> After the last reboot I had to zpool import -f the pool in order to
> get it back as zpool claimed the pool was from a foreign system.
> 
> On May 23, 1:33 pm, Ulrich Petri <u.pe... at gmail.com> wrote:
> > Hi Brian,
> >
> > On May 16, 11:26 pm, Brian Behlendorf <behlendo... at llnl.gov> wrote:
> >
> > > Have you tried the latest source from the zfs master branch?  I recently
> > > committed a fix which might address this issue.  It's hard to be certain
> > > from the stacks included in your email, but it's certainly possible.
> > > Commit 21ade34 fixed issues #232 which was very similar to the problem
> > > your describing.
> >
> > I updated spl to 372c2572336 and zfs to d9bfe0f57 and at first it
> > seemed that this solved the problems. There were still some long
> > periods of rsync being in state D but it came out of it eventually.
> >
> > Unfortunately this weekend the system hung again.
> >
> > This time I had a script running that logged the memory usage every 10
> > seconds (I thought maybe that was a clue to what is going on) and the
> > memory usage definitely looks unusual to me.
> >
> > You can see the graph here [1] (takes a while to load).
> >
> > There is nothing running on this machine besides ZFS, rsync copying
> > data over and the usual system deamons present on a freshly installed
> > ubuntu system.
> >
> > When the system was in the hung state everything that tries to
> > interact with the zfs volumes (e.g zpool / zfs commands and updatedb)
> > also hung.
> >
> > Here is the complete dmesg output [2]. The hung reporting eventually
> > stopps because it reached the maximum hung reporing number.
> >
> > Also look at the output of "ps fax" [3] shortly before I rebooted the
> > machine. Notice that the txg_sync process is also in state D
> >
> > I just saw another thread on the list [4] that seems to describe the
> > same (or at least very similar) problem. In both cases many small
> > files seem to be involved.
> >
> > Bye
> > Ulrich
> >
> > [1]http://ulo.pe/misc/zfs_rsync_mem.html
> > [2]http://cl.ly/3F2M0v3h3t3S422E0H3S
> > [3]http://cl.ly/2J2d3W1G0k003y092t1S
> > [4]http://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/browse_th...



More information about the zfs-discuss mailing list