[zfs-discuss] Recovering data from a mirror

Stephan Stachurski ses1984 at gmail.com
Tue Dec 12 01:03:19 EST 2017


Long story short is I tried to grow a 2x6TB mirror to 2x8TB drives. I saw a
lot of checksum errors so I tried to roll back, and ended up with corrupted
data. I'm trying to figure out how to recover as much as possible, because
restoring from backup is pretty inconvenient.

I'm using ECC memory, and CPU and memtests are OK. I think cables and other
associated bits of hardware are OK because I've been swapping slots of
disks in this bad array with disks another good array and the other good
array is running smoothly, zero checksum errors.

I started with 2x6TB drives in a mirror configuration. I wanted to replace
these drives with 8TB drives so I first attached a 8TB disk to the 2x6TB
mirror. It resilvered overnight and I saw 150 checksum errors.

I didn't think much of it. I thought maybe the drive was bad, or there was
some hiccup in the system. I shut down the system so I could pull 1x 6TB
drive and add the second 8TB drive. I added it and it also started to

I think my first mistake was adding the 2nd drive to a potentially
unhealthy mirror using read/write mode instead of read only. But we can get
back to this later.

After the 2nd 8TB drive resilvered, the two 8TB drives now have 2k checksum
errors each.

I shutdown and pulled both 8TB drives and added the pulled 6TB drive back
to the system. I boot, zpool import, and I see bad things. Half my data is
gone. When I ls, I can see the file names there, but the attrs are question
marks, and when I try to stat those files, I see "Invalid exchange"

I shutdown and pulled both 6TB drives. This time I add the 8TB drives and
zpool import readonly, and things look better, but still ~15% of my data is
messed up.

I'm not sure what i did wrong, but I did not expect this kind of corruption
just from adding and removing disks from a mirror. I know I had checksum
errors, but I thought zfs would have me covered. I did not think there was
even a slim chance that I could have corrupted data at rest from just
adding or removing drives from a mirror. I know I had a potentially
degraded mirror that I did not mount read only, but I did not think doing
so would cause old data to go bad, because zfs checksums and redundancy
would stop that from happening.

So, if nothing else, let this be a lesson. ZFS doesn't magically save you
from being stupid and/or bad hardware.

Anyway, I was wondering if there was a process I could follow to attempt to
recover as much as possible from the array. I still have a hard time
believing that data I wrote to the 2x6TB disks a year ago somehow went bad.
I think between the original 2x6TB disks, and the new 2x8TB disks that
resilvered successfully, I should probably be able to recover most if not
all of the data. Maybe I'm being naively optimistic. It would be a pain the
butt to restore a couple TB from optical media.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20171212/fb1def98/attachment.html>

More information about the zfs-discuss mailing list