[zfs-discuss] Stalled scrub / txg_sync

Steven M Wilson stevew at purdue.edu
Wed May 2 22:18:35 EDT 2012


Thanks for the advice.  But before I can do anything, I need to get the pool stabilized.  This afternoon scrub ran for several hours and zpool status was showing that it had only scrubbed one byte.  Then the server became unresponsive; I'll need to reboot it again tomorrow when I'm back in the office.

Steve


----- Original Message -----
> From: "Gregor Kopka" <gregor at kopka.net>
> To: zfs-discuss at zfsonlinux.org
> Sent: Wednesday, May 2, 2012 5:31:42 PM
> Subject: Re: [zfs-discuss] Stalled scrub / txg_sync
> A) Stop the scrub, zfs send/recv the data into new a pool which hever
> had (and never will have) dedup enabled.
> B) Add CACHE SSDs, this might help a bit.
> C) Add more RAM to the machine (this could be - depending on pool
> contents - well be several 100 GB) so the dedup tables and metadate
> completely fit into ARC.
> 
> Stay away from dedup unless you have enough RAM and/or are willing to
> take the performance impact.
> Compression is fine, this might even speed up zfs (in case your use
> pattern is limited by disk I/O and not CPU bound).
> 
> Gregor
> 
> Am 02.05.2012 21:03, schrieb Steve Wilson:
> 
> Hi,
> 
> Yesterday, I was running a scrub on one of our backup servers while
> also trying to delete some snapshots. I did have dedup enabled on this
> pool but decided to turn it off yesterday and eventually rebuild the
> pool with it disabled. This morning nothing had progressed (the scrub
> or the snapshot deletion) so I decided to reboot the server and start
> again. After the reboot, the zpool status command shows the scrub is
> still active but it's not making any progress (1 byte per second???)
> and attempts to stop the scrub just hang. Trying to look at the zpool
> history also hangs. So now I have three processes in uninterruptible
> sleep state:
> 
> 
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 465 1.0 0.0 0 0 ? D< 11:30 2:00 [txg_sync]
> root 3983 0.0 0.0 30580 1572 pts/0 D+ 13:40 0:00 zpool scrub -s tank
> root 6107 0.0 0.0 30624 1572 pts/1 D+ 14:38 0:00 zpool history
> Here's what zpool status shows:
> 
> 
> root at mckinley:~# zpool status
> pool: tank
> state: ONLINE
> scan: scrub in progress since Tue May 1 10:00:01 2012
> 193G scanned out of 5.43T at 1/s, (scan is slow, no estimated time)
> 0 repaired, 3.47% done
> config:
> 
> NAME STATE READ WRITE CKSUM
> tank ONLINE 0 0 0
> raidz2-0 ONLINE 0 0 0
> wwn-0x600050e026ed9000357700004b260000 ONLINE 0 0 0
> wwn-0x600050e026edd600b333000067be0000 ONLINE 0 0 0
> wwn-0x600050e026edef003273000088420000 ONLINE 0 0 0
> wwn-0x600050e026edf9003d87000026b50000 ONLINE 0 0 0
> wwn-0x600050e026edfe00178300003b520000 ONLINE 0 0 0
> wwn-0x600050e026ee03002edb000018ce0000 ONLINE 0 0 0
> wwn-0x600050e026ee0d00085b000026b50000 ONLINE 0 0 0
> wwn-0x600050e026ee120068fb0000af920000 ONLINE 0 0 0
> 
> errors: No known data errors
> My environment:
> Ubuntu Lucid 10.04 but running 3.0.0-19 kernel from the Oneiric
> backports
> latest ZFS modules from PPA for Lucid stable
> using all default ZFS module parameters
> 
> UPDATE:
> Just as I was ready to send this I noticed that the scrub was finally
> cancelled after about 45 minutes. I'm now attempting to start another
> scrub but that command hasn't returned to the prompt yet and I'm
> seeing it stuck in the uninterruptible sleep state also:
> root 6592 0.0 0.0 30580 1572 pts/0 D+ 14:52 0:00 zpool scrub tank
> 
> Any ideas of what I should do to get this ZFS pool operational again?
> 
> Thanks,
> Steve



More information about the zfs-discuss mailing list