[zfs-discuss] Stalled scrub / txg_sync

Steve Wilson stevew at purdue.edu
Wed May 2 15:03:11 EDT 2012


Yesterday, I was running a scrub on one of our backup servers while also 
trying to delete some snapshots.  I did have dedup enabled on this pool 
but decided to turn it off yesterday and eventually rebuild the pool 
with it disabled.  This morning nothing had progressed (the scrub or the 
snapshot deletion) so I decided to reboot the server and start again.  
After the reboot, the zpool status command shows the scrub is still 
active but it's not making any progress (1 byte per second???) and 
attempts to stop the scrub just hang.  Trying to look at the zpool 
history also hangs.  So now I have three processes in uninterruptible 
sleep state:

    root       465  1.0  0.0      0     0 ?        D<   11:30   2:00
    root      3983  0.0  0.0  30580  1572 pts/0    D+   13:40   0:00
    zpool scrub -s tank
    root      6107  0.0  0.0  30624  1572 pts/1    D+   14:38   0:00
    zpool history

Here's what zpool status shows:

    root at mckinley:~# zpool status
       pool: tank
      state: ONLINE
      scan: scrub in progress since Tue May  1 10:00:01 2012
         193G scanned out of 5.43T at 1/s, (scan is slow, no estimated time)
         0 repaired, 3.47% done

         NAME                                        STATE     READ
         tank                                        ONLINE       0    
    0     0
           raidz2-0                                  ONLINE       0    
    0     0
             wwn-0x600050e026ed9000357700004b260000  ONLINE       0    
    0     0
             wwn-0x600050e026edd600b333000067be0000  ONLINE       0    
    0     0
             wwn-0x600050e026edef003273000088420000  ONLINE       0    
    0     0
             wwn-0x600050e026edf9003d87000026b50000  ONLINE       0    
    0     0
             wwn-0x600050e026edfe00178300003b520000  ONLINE       0    
    0     0
             wwn-0x600050e026ee03002edb000018ce0000  ONLINE       0    
    0     0
             wwn-0x600050e026ee0d00085b000026b50000  ONLINE       0    
    0     0
             wwn-0x600050e026ee120068fb0000af920000  ONLINE       0    
    0     0

    errors: No known data errors

My environment:
     Ubuntu Lucid 10.04 but running 3.0.0-19 kernel from the Oneiric 
     latest ZFS modules from PPA for Lucid stable
     using all default ZFS module parameters

Just as I was ready to send this I noticed that the scrub was finally 
cancelled after about 45 minutes.  I'm now attempting to start another 
scrub but that command hasn't returned to the prompt yet and I'm seeing 
it stuck in the uninterruptible sleep state also:
     root      6592  0.0  0.0  30580  1572 pts/0    D+   14:52   0:00 
zpool scrub tank

Any ideas of what I should do to get this ZFS pool operational again?

