[zfs-discuss] Permanent errors in older snapshots

Richard Elling richard.elling at richardelling.com
Fri Dec 23 15:46:59 EST 2016


> On Dec 23, 2016, at 10:38 AM, Håkan Johansson via zfs-discuss <zfs-discuss at list.zfsonlinux.org> wrote:
> 
> I have been thinking about the below (original mail) problem for a while.
> 
> The issue in short:
> 
> if a scrub finds a file which has permanent errors (i.e. no good copies can be found or reconstructed), there usually is one way to recover/clear the error without having to recreate the entire filesystem: delete the file in question.  If the error is in a file which is part of a snapshot, the snapshot cannot be fixed as it cannot be changed.  Thus the only way of clearing the error from the pool is to destroy the entire snapshot.
> 
> Destroying an entire snapshot is sometimes a rather heavy-handed solution.
> 
> Would it make sense to introduce an list of known (and acknowledged) damaged blocks, that scrub would ignore and not report during checking?  Acknowledgement from the user would be by issuing some zpool/zfs command to add the blocks to the list.  Normal reads would still generate I/O errors.  The blocks could be stored as offset+expected checksum pairs, and would thus not allow any other brokenness to pass.

Why would we want to make a list of the things we don’t want listed?

Asking this another way, why would you want to destroy good data, unique to the snapshot,
because some of the data is lost?

IMNSHO, the best approach is to acknowledge that errors exist and be happy that you 
might not care anymore. If you think of this like the defects lists on a disk drive, then 
you know that defects exist and are known. There is no need to destroy any snapshot
just because it contains a defect, just like there is no need to try and buy a disk with zero
defects.
 — richard 

> 
> Best regards,
> Håkan
> 
> 
> On Sat, Dec 10, 2016, at 11:37 PM, devsk via zfs-discuss wrote:
>> 
>> Hi,
>> 
>> I think this might have been discussed here before because I am very sure I myself ran into this issue several years ago.
>> 
>> One fine day after the update to v0.6.5.8-r0-gentoo, a scrub finds a file which has permanent error (I guess it means that it can't correct the blocks in error) on a file in an old snapshot. The file is unreadable (Input/Output error) in all snapshots taken after that but they were taken in the years since.
>> 
>> # zpool status -v
>>   pool: backup                                                                                                                                                                                                                    
>>  state: DEGRADED
>> status: One or more devices has experienced an error resulting in data
>>         corruption.  Applications may be affected.
>> action: Restore the file in question if possible.  Otherwise restore the
>>         entire pool from backup.
>>    see: http://zfsonlinux.org/msg/ZFS-8000-8A <http://zfsonlinux.org/msg/ZFS-8000-8A>
>>   scan: scrub repaired 0 in 9h16m with 1 errors on Sat Dec 10 23:19:10 2016
>> config:
>> 
>>         NAME                                            STATE     READ WRITE CKSUM
>>         backup                                          DEGRADED     0     0     1
>>           raidz2-0                                      DEGRADED     0     0     2
>>             ata-WDC_WD4001FAEX-00MJRA0_WD-WCCxxxxxxxxx  ONLINE       0     0     0
>>             ata-ST4000VN000-1H4168_Z302C80M             ONLINE       0     0     0
>>             ata-WDC_WD4001FAEX-00MJRA0_WD-WCCxxxxxxxxx  ONLINE       0     0     0
>>             /mnt/serviio/4tbFile                        OFFLINE      0     0     0
>> 
>> errors: Permanent errors have been detected in the following files:
>> 
>>         backup/zfs-backup at move_backup_to_4tb_external_sep25_2014:/Installs/clonezilla/live/filesystem.squashfs <mailto:backup/zfs-backup at move_backup_to_4tb_external_sep25_2014:/Installs/clonezilla/live/filesystem.squashfs>
>> 
>> 
>> I tried to read the file in all subsequent snapshots (using .zfs/snapshot folder) since Sept 2014 and its unreadable in all of them. I can copy the correct  file over and take snapshots and they are all fine. I can keep deleting snapshots and the scrub keep pointing to the next snapshot.
>> 
>> Neither of the 3 disks in there show any pending, uncorrectable or reallocated sectors. Overall health is fine with the disks and scrub has never failed on these for last any number of months I can remember (and from zpool history).
>> 
>> Any ideas? I remember I had to restore from backup (of backup in this case) last time I ran into this. Is there any other way? Its a pain to start over.
>> 
>> Also, I want to add a 4TB disk to replace 4tbFile but I am wondering if the resilver will even succeed in this state. I am afraid it will fail at this snapshot and it will be a waste of time.
>> 
>> 
>> Thanks
>> -devsk
>> 
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at list.zfsonlinux.org <mailto:zfs-discuss at list.zfsonlinux.org>
>> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss <http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss>
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20161223/14ace611/attachment-0001.html>


More information about the zfs-discuss mailing list