[zfs-devel] self-healing IOs stats in zpool status

Tony Hutter hutter2 at llnl.gov
Tue Aug 1 21:01:06 EDT 2017


 > To be clear here, I think that what you're seeing is specifically
 > that readahead IO does not cause these updates. I don't believe that
 > regular reads are issued as speculative ZIOs, only readahead ones.
 > Since you're reading your test files with sequential IO, I believe that
 > most of the reads will be initially issued as speculative readaheads
 > and then the real user-level read()s will be satisfied from the cache.

Thanks for the clarification on that - yep, it's the speculative 
("read-ahead") IOs that are not getting recorded, not the regular ones.  
I can also confirm that self-healed writes also don't increment the 
error counters, but do generate events.  That can lead to some funny 
results if you happen to have zed running:

   pool: mypool
  state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
     Sufficient replicas exist for the pool to continue functioning in a
     degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
     repaired.
   scan: none requested
config:

     NAME        STATE     READ WRITE CKSUM
     mypool      DEGRADED     0     0     0
       mirror-0  DEGRADED     0     0     0
         md100   FAULTED      0     0     0  too many errors
         md101   ONLINE       0     0     0




It seems that self-healed writes don't get counted because they're EIO + 
no retry:

vdev_stat_update(zio_t *zio, uint64_t psize)
...
         /*
          * If this is an I/O error that is going to be retried, then 
ignore the
          * error.  Otherwise, the user may interpret B_FAILFAST I/O 
errors as
          * hard errors, when in reality they can happen for any number of
          * innocuous reasons (bus resets, MPxIO link failure, etc).
          */
         if (zio->io_error == EIO &&
             !(zio->io_flags & ZIO_FLAG_IO_RETRY)) {
                 return;
         }


 > I can't think of a good reason why these shouldn't be counted in the
 > current read and checksum errors counts. I'm actively surprised (and
 > slightly disturbed) that they aren't already

Brian was able to get me an answer on the historical reasons:

"It turns out the way the speculative IOs are issued at the moment 
bypasses the interlock in the ZIO pipeline.  That means it's possible 
you could issue a speculative IO for a block which was freed and and 
then potentially reallocated with new data. That'll get caught an 
treated as a checksum error and we don't want to report it as a false 
positive. Normal demand read have a waiter and honor the interlock so 
this can't happen in that case."


So for any fix we do, we'll need to be careful to not include any 
speculative false-positive checksum errors.


 > If there is a good reason not to count these in the current
 > read and checksum stats, I would definitely support adding
 > additional stats counters for them. I would even go so far
 > as having 'zpool status' automatically add them to the real
 > read/checksum counts unless you give it a flag to turn this
 > off.

I agree, especially with the part about adding the self-healed IOs into 
the regular totals *by default*, since I think that's what people 
expect.  I'm already working on a patch. Here's some sample output 
showing 40 self-healed read errors, plus 5 regular read errors ('-s' = 
"show self-healed"):

$ zpool status -s

   pool: mypool
  state: ONLINE
   scan: none requested
config:
                                                Non-Fatal
     NAME        STATE     READ WRITE CKSUM  READ WRITE CKSUM
     mypool      ONLINE       0     0     0     0     0 0
       mirror-0  ONLINE       0     0     0     0     0 0
         md100   ONLINE      45     0     0    40     0 0
         md101   ONLINE       0     0     0     0     0 0









More information about the zfs-devel mailing list