[zfs-discuss] Suspiciously frequent (and spurious) resilvering

Jason Marshall jasonm at katalystdm.com
Thu May 17 12:16:51 EDT 2018


Hi All, long time user, first time poster.  

For the first time ever, we're seeing issues that can't be explained while 
using ZFS on Linux.  Sorry in advance if this is common knowledge, or 
really obvious for some reason; I really did try to find answers by 
googling.

Anyway, we have zfs v0.7.6 installed on CentOS 6.9.  The machine has 48x 
10 TB disks in it, and here's the pool config.  As you can see, it's 
resilvering now:

[root at houdm-pool12 ~]# zpool status
  pool: datapool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat May 12 23:25:11 2018
	142T scanned out of 392T at 385M/s, 189h15m to go
	2.77T resilvered, 36.23% done
config:

	NAME                            STATE     READ WRITE CKSUM
	datapool                        DEGRADED     0     0     0
	  raidz2-0                      DEGRADED     0     0     0
	    wwn-0x5000cca25111ba00      ONLINE       0     0     0
	    spare-1                     DEGRADED     0     0     0
	      wwn-0x5000cca251a9bb00    FAULTED      2     0     0  too many errors
	      wwn-0x5000cca266b61ac0    ONLINE       0     0     0  (resilvering)
	    wwn-0x5000cca251b6dd34      ONLINE       0     0     0
	    wwn-0x5000cca251ce675c      ONLINE       0     0     0
	    wwn-0x5000cca26c1a23ac      ONLINE       0     0     0
	    wwn-0x5000cca26b5a7718      ONLINE       0     0     0
	    wwn-0x5000cca266b50800      ONLINE       0     0     0
	    wwn-0x5000cca26c14975c      ONLINE       0     0     0
	    wwn-0x5000cca26c17417c      ONLINE       0     0     0
	    wwn-0x5000cca26c18af50      ONLINE       0     0     0
	    wwn-0x5000cca26c18cf7c      ONLINE       0     0     0
	    wwn-0x5000cca26c18e0c0      ONLINE       0     0     0
	  raidz2-1                      ONLINE       0     0     0
	    wwn-0x5000cca26c190a14      ONLINE       0     0     0
	    wwn-0x5000cca26c198e64      ONLINE       0     0     0
	    wwn-0x5000cca26c19c82c      ONLINE       0     0     0
	    wwn-0x5000cca26c19c890      ONLINE       0     0     0
	    wwn-0x5000cca26c19c934      ONLINE       0     0     0
	    wwn-0x5000cca26c19cec0      ONLINE       0     0     0
	    wwn-0x5000cca26c19db0c      ONLINE       0     0     0
	    wwn-0x5000cca26c19dd3c      ONLINE       0     0     0
	    wwn-0x5000cca26c19df80      ONLINE       0     0     0
	    wwn-0x5000cca26c19e1e0      ONLINE       0     0     0
	    wwn-0x5000cca26c1a2114      ONLINE       0     0     0
	  raidz2-4                      ONLINE       0     0     0
	    wwn-0x5000cca26b020d2c      ONLINE       0     0     0
	    wwn-0x5000cca26b0374e0      ONLINE       0     0     0
	    wwn-0x5000cca26b03ed68      ONLINE       0     0     0
	    wwn-0x5000cca26b06f314      ONLINE       0     0     0
	    wwn-0x5000cca26b071e20      ONLINE       0     0     0
	    wwn-0x5000cca26b076b90      ONLINE       0     0     0
	    wwn-0x5000cca26b0776d8      ONLINE       0     0     0
	    wwn-0x5000cca26b094ae4      ONLINE       0     0     0
	    wwn-0x5000cca26b09c1fc      ONLINE       0     0     0
	    wwn-0x5000cca26b09e438      ONLINE       0     0     0
	    wwn-0x5000cca26b09fa2c      ONLINE       0     0     0
	    wwn-0x5000cca26b09fc84      ONLINE       0     0     0
	  raidz2-5                      ONLINE       0     0     0
	    wwn-0x5000cca2663488c8      ONLINE       0     0     0
	    wwn-0x5000cca266b80414      ONLINE       0     0     0
	    wwn-0x5000cca26b51d00c      ONLINE       0     0     0
	    wwn-0x5000cca26c014f20      ONLINE       0     0     0
	    wwn-0x5000cca26c017b94      ONLINE       0     0     0
	    wwn-0x5000cca26c019b98      ONLINE       0     0     0
	    wwn-0x5000cca26c01bbdc      ONLINE       0     0     0
	    wwn-0x5000cca26c01c2e8      ONLINE       0     0     0
	    wwn-0x5000cca26c01ddcc      ONLINE       0     0     0
	    wwn-0x5000cca26c01e2d4      ONLINE       0     0     0
	    wwn-0x5000cca26c020fb8      ONLINE       0     0     0
	logs
	  wwn-0x5002538c408ce8d0-part7  ONLINE       0     0     0
	  wwn-0x5002538c408ce8d6-part7  ONLINE       0     0     0
	cache
	  wwn-0x5002538c408ce8d0-part8  ONLINE       0     0     0
	  wwn-0x5002538c408ce8d6-part8  ONLINE       0     0     0
	spares
	  wwn-0x5000cca266b61ac0        INUSE     currently in use
	  wwn-0x5000cca266d2e504        AVAIL   

errors: No known data errors

I'll apologize here for this being a bit on the rambling side.  I 
don't know how to describe what we're seeing.  

First of all, I'm not sure WHY it is resilvering at all, since autoreplace 
is disabled:

[root at houdm-pool12 ~]# zpool get all |grep autoreplace
datapool  autoreplace                    off                            default

Yes, zed is running, and clearly it's identifying a problem, and kicking 
off the resilver.  Normally we love when this happens.  But SHOULD it be 
happening??

On this machine, it seems like it's been resilvering for no reason we can 
determine.  It's happened several times since it was placed into service a 
few months ago.  The above zpool status output indicates two read errors, 
but there are no corresponding media errors in any of the logs.  So this 
leads me to believe the errors are due to cksum mismatches.  But if a 
scrub runs, those errors seem to vanish.

Is there a way to identify what ZFS thinks has gone wrong?  Or what zed 
has seen that makes it think something has gone haywire?  I'm having 
trouble making sense of the zed.d config files.  Maybe I just need to 
uncomment something in there?

On another machine with the same config, we are ending up with this kind 
of thing:

[...]
	    wwn-0x5000cca251b79b98      ONLINE       0     0     0
	    spare-8                     ONLINE       0     0     0
	      wwn-0x5000cca251c7b9d8    ONLINE       0     0     0
	      wwn-0x5000cca2568314fc    ONLINE       0     0     0
	    wwn-0x5000cca251ca10b0      ONLINE       0     0     0
[..]

It had been resilvering, but it must not have finished?  Or something?  It 
may have been rebooted while it was resilvering -- since it is rarely NOT 
resilvering, and we probably couldn't wait 300+ hours for it to finish.  

So which disk is it actually using in this scenario?  Or is it being 
treated as a mirror, writing to both?  Are they both subjected to the 
scrub process in this scenario?  A scrub has completed, and identified no 
errors on the original device -- which leads me to believe the problem was 
really never there in the first place, despite it somehow being detected, 
so now both the original disk and the spare that was partially(?) 
resilvered in, are seemingly healthy.

Because these machines are severe-duty workhorses that operate at ~1.5 
GB/sec continuously in order to perform the work we ask of them, the 
continuous resilvering causes some serious slowdowns (and annoyed users).  
We are about to remove the spares from the pool so that it cannot resilver 
onto anything until we manually add the spare back to the pool and 
manually do the zpool replace sequence.  What I'd love is some way to dig 
into the problem ZFS believes it has, and avoid the long, long resilver.

PS.  I know now that this configuration is not optimal - the resilvers run 
for far too long.  We're looking at a bunch of mirrors next time instead 
of raidz-anything.  But I'm not sure that's going to prevent what seem 
like spurious resilvering events.  It might just make them run a lot 
shorter...

Thanks in advance for any guidance here, folks...

---
Jason Marshall
Director, Information Technology
Katalyst Data Management
www.katalystdm.com | www.seismiczone.com


More information about the zfs-discuss mailing list