Hung processes during copy operations
u.petri at gmail.com
Mon May 23 07:33:04 EDT 2011
On May 16, 11:26 pm, Brian Behlendorf <behlendo... at llnl.gov> wrote:
> Have you tried the latest source from the zfs master branch? I recently
> committed a fix which might address this issue. It's hard to be certain
> from the stacks included in your email, but it's certainly possible.
> Commit 21ade34 fixed issues #232 which was very similar to the problem
> your describing.
I updated spl to 372c2572336 and zfs to d9bfe0f57 and at first it
seemed that this solved the problems. There were still some long
periods of rsync being in state D but it came out of it eventually.
Unfortunately this weekend the system hung again.
This time I had a script running that logged the memory usage every 10
seconds (I thought maybe that was a clue to what is going on) and the
memory usage definitely looks unusual to me.
You can see the graph here  (takes a while to load).
There is nothing running on this machine besides ZFS, rsync copying
data over and the usual system deamons present on a freshly installed
When the system was in the hung state everything that tries to
interact with the zfs volumes (e.g zpool / zfs commands and updatedb)
Here is the complete dmesg output . The hung reporting eventually
stopps because it reached the maximum hung reporing number.
Also look at the output of "ps fax"  shortly before I rebooted the
machine. Notice that the txg_sync process is also in state D
I just saw another thread on the list  that seems to describe the
same (or at least very similar) problem. In both cases many small
files seem to be involved.
More information about the zfs-discuss