[zfs-discuss] zfs-discuss Digest, Vol 20, Issue 25

devsk internet_everyone at yahoo.com
Sun Dec 11 13:39:35 EST 2016


Hakan,

 > I am a bit cursious to what may have happened?:

Yes, same here. I think this may be a ZFS bug, sort of like off-by-one 
in rare rare circumstances.


 > If one wants to retain some snapshot, one would have to first clone it
 > (to get a writable instance), and then overwrite or delete the affected
 > file in that clone.

Do you think this will work with that one file in that bad state in 
pretty much every snapshot? Is there an order I should follow for 
creating clones from the snapshots (go from old to newer)? I am ready to 
put in effort to do this because I want to retain some of the snapshots.

-devsk


> Date: Sun, 11 Dec 2016 01:21:19 -0800
> From: Håkan Johansson <h_t_johansson at fastmail.fm>
> To: zfs-discuss at list.zfsonlinux.org
> Subject: Re: [zfs-discuss] Permanent errors in older snapshots
> Message-ID:
> 	<1481448079.1853349.815186785.0F5D27AE at webmail.messagingengine.com>
> Content-Type: text/plain; charset="utf-8"
>
> As the old snapshots share the same data blocks on disk, they would all
> have / show the same error.  Even older shapshots, as well as newer ones
> would not have the problem, if they are logical 'copies' of another
> version of the file; i.e. where at least the now bad block was
> different.
>
>
> Deleting all the bad snapshots should then make the problem disappear.
>
> If one wants to retain some snapshot, one would have to first clone it
> (to get a writable instance), and then overwrite or delete the affected
> file in that clone.
> (But I'm not really a ZFS expert, so better get some other confirmation
> on this before proceeding!)
>
>
> For some expert: I am a bit cursious to what may have happened?:  Since
> snapshots cannot be written, it cannot be metadata suffering some
> bitflip while read into memory to apply some other change and then
> written back with a faulty checksum.  It should rather have been the
> block on disk that changed.  Assuming the disks are not lying when they
> say that they return good data (no disk I/O errors), most likely
> somethings has by accident overwritten some sector(s) of the disk?  Thus
> the contents do not match the checksums.  As it is a raidz2, actually
> corresponding sectors on multiple disks must have suffered this
> accidental overwrite.
>
>
> Best regards,
>
> Håkan



More information about the zfs-discuss mailing list