[zfs-discuss] cannot import 'home': I/O error Destroy and re-create the pool from a backup source

Bryn Hughes zfs at nashira.ca
Fri Apr 27 11:54:35 EDT 2018


Glad you got it working!!

Do we have any hints as to the cause of the initial corruption?

Bryn

On 2018-04-27 04:06 AM, Anton Gubar'kov via zfs-discuss wrote:
> Hi, friends
>
> my copying has completed. I could save each and every file from the 
> video archive dataset (I was most anxious to save it).
> I could not save any of my lab VM images on zvols :-(. They are lab 
> machines anyway, I can rebuild them in some time.
>
> Here is the final stats on checksum errors:
>   pool: home
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption. Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://zfsonlinux.org/msg/ZFS-8000-8A
>   scan: scrub repaired 0B in 0 days 09:58:16 with 0 errors on Wed Mar 
> 14 09:21:55 2018
> config:
>
>         NAME               STATE     READ WRITE CKSUM
>         home               ONLINE       0     0 1.80K
>           raidz2-0               ONLINE       0     0 9.03K
> wwn-0x5000c500a41a0a00      ONLINE       0     0     0
> wwn-0x5000c500a41ae340      ONLINE       0     0     0
> wwn-0x5000c500a41b4c57      ONLINE       0     0 38.0K
> wwn-0x5000c500a41b7572      ONLINE       0     0     0
> wwn-0x5000c500a41ba99c      ONLINE       0     0     0
> wwn-0x5000c500a41babe8      ONLINE       0     0 37.5K
>         logs
> wwn-0x30000d1700d9d40f-part2  ONLINE       0     0     0
>
> errors: Permanent errors have been detected in the following files:
>
> home/VM/WIN10PRO-1 at MSI-enable:<0x1>
> home/VM/WIN10PRO-1:<0x1>
> home/users/anrdey:<0x0>
>
> The procedure I used to get to this point:
>
>  1. zpool import -F -o readonly=on home - failure
>  2. zpool import -FX -o readonly=on home - failure
>  3. many attepmts of zpool import -T <txg> -o readonly=on home - failure
>  4. discovery of broken ZIL and attempt to import -m with all above
>     combinations - failure
>  5. Side suggestion from Richard to offline cache device, I removed
>     the device file from /dev - failure
>  6. following another thread,I dared to build zfs from
>     https://github.com/zfsonlinux/zfs/pull/7459 (the branch link) and
>     configured zfs_dbgmsg_enable=1 parameter
>  7. Following /proc/spl/kstat/zfs/dbgmsg suggested that I have some
>     txgs with very low metadata corruption (1-2 items), but don't have
>     any txg completely clean.
>  8. Chris suggested a way to ignore metadata corruption and try pool
>     import anyway - echo 0
>     >/sys/module/zfs/parameters/spa_load_verify_metadata
>  9. I used zdb -d -e home to find out txg data for the snapshots I had
>     in my pool. I made a list of txgs for snapshots in my video
>     dataset and started to do import -T <txg> -m -o readonly=on -R
>     <mountpoint>. Corrupted txgs resulted in zfs kernel threads oopses
>     and the host had to be rebooted. The 3rd tried txg resulted in a
>     successful import and mounting of datasets. Bingo!
> 10. I started copying the files from the datasets. I used rsync rather
>     than zfs send/receive to see what files I could/couldn't salvage.
>     I used dd to copy zvols to image files. I couldn't copy zvols due
>     to io errors. I could copy all files from my video dataset.
>
>
> Thanks everyone for helpful suggestions. I do hope that this thread 
> could help others in despair.
>
>
>
>
> On Thu, Apr 26, 2018 at 11:01 PM Anton Gubar'kov 
> <anton.gubarkov at iits.ru <mailto:anton.gubarkov at iits.ru>> wrote:
>
>     Dear friends,
>     I used zdb -d to display data about the snapshots I have in the
>     pool's datasets. I checked the txg numbers of the snapshots in the
>     dataset I'm most anxious to recover and started my import attempts
>     from the most recent working through to the past. The 3rd one
>     proved to be working. I has to restart my host after every
>     unsuccessful attempt due to zfs freeze.
>
>     The copy still runs and will do for at least 6 more hours.
>     The current zpool stat -v looks like:
>       pool: home
>      state: ONLINE
>     status: One or more devices has experienced an error resulting in data
>             corruption. Applications may be affected.
>     action: Restore the file in question if possible.  Otherwise
>     restore the
>             entire pool from backup.
>        see: http://zfsonlinux.org/msg/ZFS-8000-8A
>       scan: scrub repaired 0B in 0 days 09:58:16 with 0 errors on Wed
>     Mar 14 09:21:55 2018
>     config:
>
>             NAME           STATE     READ WRITE CKSUM
>             home           ONLINE       0     0   122
>               raidz2-0           ONLINE       0     0   514
>     wwn-0x5000c500a41a0a00      ONLINE       0     0     0
>     wwn-0x5000c500a41ae340      ONLINE       0     0     0
>     wwn-0x5000c500a41b4c57      ONLINE       0     0     7
>     wwn-0x5000c500a41b7572      ONLINE       0     0     0
>     wwn-0x5000c500a41ba99c      ONLINE       0     0     0
>     wwn-0x5000c500a41babe8      ONLINE       0     0     8
>             logs
>     wwn-0x30000d1700d9d40f-part2  ONLINE       0     0  0
>
>     errors: Permanent errors have been detected in the following files:
>
>     home/users/anrdey:<0x0>
>
>     I don't really care about home/users/anrdey dataset where
>     permanent errors are reported. I don't understand the errors
>     stats. What do checksum errors on pool and raidz2-0 vdev level
>     mean? They keep growing while device-level checksum errors stay.
>     There was no read error reported so far to the copying process
>     (around 1TB of data is copied already). There are no messages in
>     zfs debug log since the import had been completed.
>     thanks
>
>
>     On Thu, Apr 26, 2018 at 6:23 PM Raghuram Devarakonda via
>     zfs-discuss <zfs-discuss at list.zfsonlinux.org
>     <mailto:zfs-discuss at list.zfsonlinux.org>> wrote:
>
>         On Thu, Apr 26, 2018 at 11:12 AM, Anton Gubar'kov via zfs-discuss
>         <zfs-discuss at list.zfsonlinux.org
>         <mailto:zfs-discuss at list.zfsonlinux.org>> wrote:
>         > Chris, thank you very much for the hint! After a couple of
>         panics, I could
>         > find the intact txg and import the pool rewinding it to one
>         of the
>         > snapshots' txg. I'm copying the contents now. I understand
>         that I may not be
>         > able to copy everything, but this is better than loosing
>         everything.
>
>         That's great. Can you please describe how you figured out the
>         valid txg?
>         _______________________________________________
>         zfs-discuss mailing list
>         zfs-discuss at list.zfsonlinux.org
>         <mailto:zfs-discuss at list.zfsonlinux.org>
>         http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20180427/0f116107/attachment-0001.html>


More information about the zfs-discuss mailing list