[zfs-discuss] cannot import 'home': I/O error Destroy and re-create the pool from a backup source
anton.gubarkov at iits.ru
Fri Apr 27 07:06:12 EDT 2018
my copying has completed. I could save each and every file from the video
archive dataset (I was most anxious to save it).
I could not save any of my lab VM images on zvols :-(. They are lab
machines anyway, I can rebuild them in some time.
Here is the final stats on checksum errors:
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
scan: scrub repaired 0B in 0 days 09:58:16 with 0 errors on Wed Mar 14
NAME STATE READ WRITE CKSUM
home ONLINE 0 0 1.80K
raidz2-0 ONLINE 0 0 9.03K
wwn-0x5000c500a41a0a00 ONLINE 0 0 0
wwn-0x5000c500a41ae340 ONLINE 0 0 0
wwn-0x5000c500a41b4c57 ONLINE 0 0 38.0K
wwn-0x5000c500a41b7572 ONLINE 0 0 0
wwn-0x5000c500a41ba99c ONLINE 0 0 0
wwn-0x5000c500a41babe8 ONLINE 0 0 37.5K
wwn-0x30000d1700d9d40f-part2 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
home/VM/WIN10PRO-1 at MSI-enable:<0x1>
The procedure I used to get to this point:
1. zpool import -F -o readonly=on home - failure
2. zpool import -FX -o readonly=on home - failure
3. many attepmts of zpool import -T <txg> -o readonly=on home - failure
4. discovery of broken ZIL and attempt to import -m with all above
combinations - failure
5. Side suggestion from Richard to offline cache device, I removed the
device file from /dev - failure
6. following another thread,I dared to build zfs from
https://github.com/zfsonlinux/zfs/pull/7459 (the branch link) and
configured zfs_dbgmsg_enable=1 parameter
7. Following /proc/spl/kstat/zfs/dbgmsg suggested that I have some txgs
with very low metadata corruption (1-2 items), but don't have any txg
8. Chris suggested a way to ignore metadata corruption and try pool
import anyway - echo 0 >/sys/module/zfs/parameters/spa_load_verify_metadata
9. I used zdb -d -e home to find out txg data for the snapshots I had in
my pool. I made a list of txgs for snapshots in my video dataset and
started to do import -T <txg> -m -o readonly=on -R <mountpoint>. Corrupted
txgs resulted in zfs kernel threads oopses and the host had to be rebooted.
The 3rd tried txg resulted in a successful import and mounting of datasets.
10. I started copying the files from the datasets. I used rsync rather
than zfs send/receive to see what files I could/couldn't salvage. I used dd
to copy zvols to image files. I couldn't copy zvols due to io errors. I
could copy all files from my video dataset.
Thanks everyone for helpful suggestions. I do hope that this thread could
help others in despair.
On Thu, Apr 26, 2018 at 11:01 PM Anton Gubar'kov <anton.gubarkov at iits.ru>
> Dear friends,
> I used zdb -d to display data about the snapshots I have in the pool's
> datasets. I checked the txg numbers of the snapshots in the dataset I'm
> most anxious to recover and started my import attempts from the most recent
> working through to the past. The 3rd one proved to be working. I has to
> restart my host after every unsuccessful attempt due to zfs freeze.
> The copy still runs and will do for at least 6 more hours.
> The current zpool stat -v looks like:
> pool: home
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://zfsonlinux.org/msg/ZFS-8000-8A
> scan: scrub repaired 0B in 0 days 09:58:16 with 0 errors on Wed Mar 14
> 09:21:55 2018
> NAME STATE READ WRITE CKSUM
> home ONLINE 0 0 122
> raidz2-0 ONLINE 0 0 514
> wwn-0x5000c500a41a0a00 ONLINE 0 0 0
> wwn-0x5000c500a41ae340 ONLINE 0 0 0
> wwn-0x5000c500a41b4c57 ONLINE 0 0 7
> wwn-0x5000c500a41b7572 ONLINE 0 0 0
> wwn-0x5000c500a41ba99c ONLINE 0 0 0
> wwn-0x5000c500a41babe8 ONLINE 0 0 8
> wwn-0x30000d1700d9d40f-part2 ONLINE 0 0 0
> errors: Permanent errors have been detected in the following files:
> I don't really care about home/users/anrdey dataset where permanent errors
> are reported. I don't understand the errors stats. What do checksum errors
> on pool and raidz2-0 vdev level mean? They keep growing while device-level
> checksum errors stay.
> There was no read error reported so far to the copying process (around 1TB
> of data is copied already). There are no messages in zfs debug log since
> the import had been completed.
> On Thu, Apr 26, 2018 at 6:23 PM Raghuram Devarakonda via zfs-discuss <
> zfs-discuss at list.zfsonlinux.org> wrote:
>> On Thu, Apr 26, 2018 at 11:12 AM, Anton Gubar'kov via zfs-discuss
>> <zfs-discuss at list.zfsonlinux.org> wrote:
>> > Chris, thank you very much for the hint! After a couple of panics, I
>> > find the intact txg and import the pool rewinding it to one of the
>> > snapshots' txg. I'm copying the contents now. I understand that I may
>> not be
>> > able to copy everything, but this is better than loosing everything.
>> That's great. Can you please describe how you figured out the valid txg?
>> zfs-discuss mailing list
>> zfs-discuss at list.zfsonlinux.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the zfs-discuss