[zfs-discuss] Permanent errors in older snapshots

Håkan Johansson h_t_johansson at fastmail.fm
Sun Dec 11 04:21:19 EST 2016


As the old snapshots share the same data blocks on disk, they would all
have / show the same error.  Even older shapshots, as well as newer ones
would not have the problem, if they are logical 'copies' of another
version of the file; i.e. where at least the now bad block was
different.


Deleting all the bad snapshots should then make the problem disappear.

If one wants to retain some snapshot, one would have to first clone it
(to get a writable instance), and then overwrite or delete the affected
file in that clone.
(But I'm not really a ZFS expert, so better get some other confirmation
on this before proceeding!)


For some expert: I am a bit cursious to what may have happened?:  Since
snapshots cannot be written, it cannot be metadata suffering some
bitflip while read into memory to apply some other change and then
written back with a faulty checksum.  It should rather have been the
block on disk that changed.  Assuming the disks are not lying when they
say that they return good data (no disk I/O errors), most likely
somethings has by accident overwritten some sector(s) of the disk?  Thus
the contents do not match the checksums.  As it is a raidz2, actually
corresponding sectors on multiple disks must have suffered this
accidental overwrite.


Best regards,

Håkan





On Sat, Dec 10, 2016, at 11:37 PM, devsk via zfs-discuss wrote:

> Hi,
>
>  I think this might have been discussed here before because I am very
>  sure I myself ran into this issue several years ago.
>
> One fine day after the update to v0.6.5.8-r0-gentoo, a scrub finds a
> file which has permanent error (I guess it means that it can't correct
> the blocks in error) on a file in an old snapshot. The file is
> unreadable (Input/Output error) in all snapshots taken after that but
> they were taken in the years since.
> 

>  # zpool status -v

>    pool: backup
>   state: DEGRADED

>  status: One or more devices has experienced an error resulting
>  in data
>          corruption.  Applications may be affected.

>  action: Restore the file in question if possible.  Otherwise
>  restore the
>          entire pool from backup.

>     see: http://zfsonlinux.org/msg/ZFS-8000-8A

>    scan: scrub repaired 0 in 9h16m with 1 errors on Sat Dec 10
>    23:19:10 2016
>  config:

> 

>          NAME                                            STATE
>          READ WRITE CKSUM
>          backup                                          DEGRADED
>          0     0     1
>            raidz2-0                                      DEGRADED
>            0     0     2
>              ata-WDC_WD4001FAEX-00MJRA0_WD-WCCxxxxxxxxx  ONLINE
>              0     0     0
>              ata-ST4000VN000-1H4168_Z302C80M             ONLINE
>              0     0     0
>              ata-WDC_WD4001FAEX-00MJRA0_WD-WCCxxxxxxxxx  ONLINE
>              0     0     0
>              /mnt/serviio/4tbFile                        OFFLINE
>              0     0     0
> 

>  errors: Permanent errors have been detected in the following files:

> 

>          backup/zfs-
>          backup at move_backup_to_4tb_external_sep25_2014:/Installs/clon-
>          ezilla/live/filesystem.squashfs
> 

> 

>  I tried to read the file in all subsequent snapshots (using
>  .zfs/snapshot folder) since Sept 2014 and its unreadable in all of
>  them. I can copy the correct  file over and take snapshots and they
>  are all fine. I can keep deleting snapshots and the scrub keep
>  pointing to the next snapshot.
> 

>  Neither of the 3 disks in there show any pending, uncorrectable or
>  reallocated sectors. Overall health is fine with the disks and scrub
>  has never failed on these for last any number of months I can
>  remember (and from zpool history).
> Any ideas? I remember I had to restore from backup (of backup in this
> case) last time I ran into this. Is there any other way? Its a pain to
> start over.
> Also, I want to add a 4TB disk to replace 4tbFile but I am wondering
> if the resilver will even succeed in this state. I am afraid it will
> fail at this snapshot and it will be a waste of time.
> Thanks

>  -devsk



> _________________________________________________

> zfs-discuss mailing list

> zfs-discuss at list.zfsonlinux.org

> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20161211/f1ca57ec/attachment-0001.html>


More information about the zfs-discuss mailing list