Hung processes during copy operations
behlendorf1 at llnl.gov
Mon May 23 17:12:11 EDT 2011
I took a look at the debugging you posted and there's good and bad news.
While the symptoms of this issue appear the same as before, it looks
like a different root cause. That's the good news. The bad news is the
lockup is caused by the kernel not being able to allocate some memory.
>From my initial reading of the debugging you posted I'm not quite sure
why this is the case.
I've opened issue #251 for this problem and included the relevant
portions of the logs. If you were to set zfs_arc_max module option to
some smaller fraction of your total system memory you could probably
avoid this issue until it's fixed.
On Mon, 2011-05-23 at 04:45 -0700, Ulrich Petri wrote:
> One thing I forgot to add:
> After the last reboot I had to zpool import -f the pool in order to
> get it back as zpool claimed the pool was from a foreign system.
> On May 23, 1:33 pm, Ulrich Petri <u.pe... at gmail.com> wrote:
> > Hi Brian,
> > On May 16, 11:26 pm, Brian Behlendorf <behlendo... at llnl.gov> wrote:
> > > Have you tried the latest source from the zfs master branch? I recently
> > > committed a fix which might address this issue. It's hard to be certain
> > > from the stacks included in your email, but it's certainly possible.
> > > Commit 21ade34 fixed issues #232 which was very similar to the problem
> > > your describing.
> > I updated spl to 372c2572336 and zfs to d9bfe0f57 and at first it
> > seemed that this solved the problems. There were still some long
> > periods of rsync being in state D but it came out of it eventually.
> > Unfortunately this weekend the system hung again.
> > This time I had a script running that logged the memory usage every 10
> > seconds (I thought maybe that was a clue to what is going on) and the
> > memory usage definitely looks unusual to me.
> > You can see the graph here  (takes a while to load).
> > There is nothing running on this machine besides ZFS, rsync copying
> > data over and the usual system deamons present on a freshly installed
> > ubuntu system.
> > When the system was in the hung state everything that tries to
> > interact with the zfs volumes (e.g zpool / zfs commands and updatedb)
> > also hung.
> > Here is the complete dmesg output . The hung reporting eventually
> > stopps because it reached the maximum hung reporing number.
> > Also look at the output of "ps fax"  shortly before I rebooted the
> > machine. Notice that the txg_sync process is also in state D
> > I just saw another thread on the list  that seems to describe the
> > same (or at least very similar) problem. In both cases many small
> > files seem to be involved.
> > Bye
> > Ulrich
> > http://ulo.pe/misc/zfs_rsync_mem.html
> > http://cl.ly/3F2M0v3h3t3S422E0H3S
> > http://cl.ly/2J2d3W1G0k003y092t1S
> > http://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/browse_th...
More information about the zfs-discuss