[zfs-discuss] Recovering data from a mirror

Stephan Stachurski ses1984 at gmail.com
Sat Dec 16 11:11:28 EST 2017


Sorry I'm not quite sure how to reply to the next reply -- I think there
was something off with my mailing list settings and I didn't get Bernd Z's
last message:

>What is the exact output of "zpool import"?

>Are you sure you created a mirror in the first place?

Here's the output of zpool import with a single disk plugged in:

pool: homer1
     id: 1662440015038107701
  state: DEGRADED
 status: One or more devices were being resilvered.
 action: The pool can be imported despite missing or damaged devices.  The
	fault tolerance of the pool may be compromised if imported.
 config:

	homer1                             DEGRADED
	  mirror-0                         DEGRADED
	    4133932251356825588            UNAVAIL
	    ata-TP06001GB_TPW160115510226  UNAVAIL
	    ata-TP06001GB_TPW160115510257  ONLINE


The confusing part was "The pool can be imported" but it actually
failed to import. I tried each of the four disks and it turns out only
the first one I tried showed this problem that it could not be
imported even though `zpool import` told me it could.

Each of the three importable disks contain a data set that is
completely toasted. When I `ls` the dataset, some files show ???
instead of attrs like permissions and atime. Attempting to read those
files returns "invalid exchange". Other files do show attrs, but
attempting to read them show "intput output error". Every time I do
anything to the toasted dataset, checksum errors increment like crazy.

For now I have my two new 8TB drives imported in read only mode
serving the good datasets, and the two original 6TB drives waiting for
me to do something to them.

Another error I made I should probably mention: I think I used detach
a few times when I should have used replace.

Anyway, I'm still hoping that I can recover something from the bad
dataset, which is why I have the 8TB drives imported read only.

I'm quite certain that the 6TB drives must have most if not all of the
original dataset intact, but zfs is being conservative and not letting
me read that data at all.

There is one reason I think that data might be gone: because I used
detach instead of replace, when I plugged them back in and started
resilvering, it might have overwritten good data with garbage data.

How can I ask zfs to return whatever data is there, ignoring checksum errors?


On Fri, Dec 15, 2017 at 9:20 AM, Stephan Stachurski <ses1984 at gmail.com>
wrote:

> I'm not quite certain how to import just a single drive, everything I've
> tried results in "cannot import: one or more devices is currently
> unavailable"
>
> On Tue, Dec 12, 2017 at 3:49 AM, Sam Van den Eynde <svde.tech at gmail.com>
> wrote:
>
>> First thing I would try is to put only the last original 6TB drive,
>> import read-only. That should still be consistent.
>>
>> On Dec 12, 2017 7:03 AM, "Stephan Stachurski via zfs-discuss" <
>> zfs-discuss at list.zfsonlinux.org> wrote:
>>
>> Hi
>>
>> Long story short is I tried to grow a 2x6TB mirror to 2x8TB drives. I saw
>> a lot of checksum errors so I tried to roll back, and ended up with
>> corrupted data. I'm trying to figure out how to recover as much as
>> possible, because restoring from backup is pretty inconvenient.
>>
>> I'm using ECC memory, and CPU and memtests are OK. I think cables and
>> other associated bits of hardware are OK because I've been swapping slots
>> of disks in this bad array with disks another good array and the other good
>> array is running smoothly, zero checksum errors.
>>
>> I started with 2x6TB drives in a mirror configuration. I wanted to
>> replace these drives with 8TB drives so I first attached a 8TB disk to the
>> 2x6TB mirror. It resilvered overnight and I saw 150 checksum errors.
>>
>> I didn't think much of it. I thought maybe the drive was bad, or there
>> was some hiccup in the system. I shut down the system so I could pull 1x
>> 6TB drive and add the second 8TB drive. I added it and it also started to
>> resilver.
>>
>> I think my first mistake was adding the 2nd drive to a potentially
>> unhealthy mirror using read/write mode instead of read only. But we can get
>> back to this later.
>>
>> After the 2nd 8TB drive resilvered, the two 8TB drives now have 2k
>> checksum errors each.
>>
>> I shutdown and pulled both 8TB drives and added the pulled 6TB drive back
>> to the system. I boot, zpool import, and I see bad things. Half my data is
>> gone. When I ls, I can see the file names there, but the attrs are question
>> marks, and when I try to stat those files, I see "Invalid exchange"
>>
>> I shutdown and pulled both 6TB drives. This time I add the 8TB drives and
>> zpool import readonly, and things look better, but still ~15% of my data is
>> messed up.
>>
>> I'm not sure what i did wrong, but I did not expect this kind of
>> corruption just from adding and removing disks from a mirror. I know I had
>> checksum errors, but I thought zfs would have me covered. I did not think
>> there was even a slim chance that I could have corrupted data at rest from
>> just adding or removing drives from a mirror. I know I had a potentially
>> degraded mirror that I did not mount read only, but I did not think doing
>> so would cause old data to go bad, because zfs checksums and redundancy
>> would stop that from happening.
>>
>> So, if nothing else, let this be a lesson. ZFS doesn't magically save you
>> from being stupid and/or bad hardware.
>>
>> Anyway, I was wondering if there was a process I could follow to attempt
>> to recover as much as possible from the array. I still have a hard time
>> believing that data I wrote to the 2x6TB disks a year ago somehow went bad.
>> I think between the original 2x6TB disks, and the new 2x8TB disks that
>> resilvered successfully, I should probably be able to recover most if not
>> all of the data. Maybe I'm being naively optimistic. It would be a pain the
>> butt to restore a couple TB from optical media.
>>
>> Thanks,
>> Steve
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at list.zfsonlinux.org
>> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
>>
>>
>>
>
>
> --
> Stephan E Stachurski
> 773-315-1684 <(773)%20315-1684>
> ses1984 at gmail.com
>



-- 
Stephan E Stachurski
773-315-1684
ses1984 at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20171216/f49142ec/attachment-0001.html>


More information about the zfs-discuss mailing list