[zfs-discuss] cannot import 'home': I/O error Destroy and re-create the pool from a backup source

Anton Gubar'kov anton.gubarkov at gmail.com
Sun Apr 22 08:56:37 EDT 2018


Thanks a lot for advice. The rig is not bootable, so I use rescuecd live
system to inspect/repair. There is no auto-import  in it.
I did a dd of=/dev/null for all my pool's devices (all 6 raidz2, ZIL and
cache) overnight. all reads were successful, no messages in dmesg
whatsoever related to storage/scsi/zfs. Night dd run produced only :

[11271.659763] perf: interrupt took too long (2513 > 2500), lowering
kernel.perf_event_max_sample_rate to 79000
[15930.168897] perf: interrupt took too long (3153 > 3141), lowering
kernel.perf_event_max_sample_rate to 63000

I don't believe this signifies any problems with my disks/controllers.

All 6 devices and ZIL show identical txg from zdb -l

Replacing all 6 drives is an overkill for my budget at the moment and total
profanation of the idea I built this pool for - reliability, fault
tolerance and backup.

Is there a way to get more information about the import error? Some command
line switches/environmental variables to set? When I run zpool import,
dmesg and/or system log has no messages. The diag message produced by zpool
import command is not very helpful.

The ZFS-8000-4J message that is referenced in zpool import output relates
to missing/failed devices, but there are no OFFLINE/REMOVED/UNAVAIL devices
in the pool config. They are all present in /dev/disk/by-id and /dev and
successful read is confirmed.

Thanks for your support.
KR
Anton.



On Sat, Apr 21, 2018 at 10:24 PM Daniel Armengod via zfs-discuss <
zfs-discuss at list.zfsonlinux.org> wrote:

> *Wait for someone more knowledgeable to provide advice*
>
> I was in a similar situation just yesterday. 4-drive RAIDz1, 1 drive
> completely dead, the other one had acted up.
>
> What I did (again: wait for someone else to provide input on this) was to:
>
> * Disable ZFS automatic import on boot (in my case, systemctl disable
> zfs-import.target). Actually, boot the system as stripped of non-essential
> processes and services as you can.
>
> * Check dmesg for error messages. Disks acting up will leave a very
> recognizable pattern there.
>
> * Make a full non-destructive read-only badblocks pass on each device.
> This will tell you if they can withstand reads. If any disks are
> not-yet-dead-but-dying the stress will leave logs in the kernel ring
> buffer; check dmesg regularly. Identify how many flaky drives you have.
> Pray you don't break the redundancy threshold.
>
> * Check the ZFS data structures with zdb. With the pool unmounted, for
> each member drive, run zdb -l /dev/<path_to_drive>. Make sure to provide
> the partition/slice number, even if you gave ZFS the whole disks to build
> the pool. Of particular note is the txg number: it should be the same in
> all devices. I believe it should be the same in *at least* n-2 devices for
> a raidz2.
>
> * Go back and read the zdb manual, it's quite interesting.
>
> In my case, 2 of the remaining RAIDZ1 disks - the healthy ones - showed a
> txg number higher than the faulty one.
>
> zpool import tank -XF did the trick for me. I was lucky and able to
> recover all the data (if anything was lost I've yet to notice, it mostly
> contains anime :P) I thought I'd lose.
>
> After successful recovery, re-import it with -o readonly=on and zfs send'd
> all the datasets you care about somewhere safe and reliable. Then you can
> do disk reshuffling until you can trust your pool again.
>
> Best of luck,
>
> On 2018-04-21 19:36, Anton Gubar'kov via zfs-discuss wrote:
>
> Hi,
>
> My recent backup server freeze ended up with non-importable pool. Since
> it's a backup server - I have no further backups - so following the diag
> message is not a way for me. I would like to recover this pool as it
> contains some valuable data I cannot reproduce ever (video archive).
>
> So my status today is:
>
> root at sysresccd /root % zpool import -N -f  home
> cannot import 'home': I/O error
>         Destroy and re-create the pool from
>         a backup source.
>
> root at sysresccd /root % zpool import -N
>    pool: home
>      id: 4810743847386909334
>   state: ONLINE
>  status: One or more devices contains corrupted data.
>  action: The pool can be imported using its name or numeric identifier.
>    see: http://zfsonlinux.org/msg/ZFS-8000-4J
>  config:
>
>         home                            ONLINE
>           raidz2-0                      ONLINE
>             wwn-0x5000c500a41a0a00      ONLINE
>             wwn-0x5000c500a41ae340      ONLINE
>             wwn-0x5000c500a41b4c57      ONLINE
>             wwn-0x5000c500a41b7572      ONLINE
>             wwn-0x5000c500a41ba99c      ONLINE
>             wwn-0x5000c500a41babe8      ONLINE
>         cache
>           sdj3
>         logs
>           wwn-0x30000d1700d9d40f-part2  ONLINE
>
> I tried import -F and import -FX too - no luck :-[
> I have reviewed all similar cases that google search returned me. I'm
> really confused as I've built the 6-drive raidz2 just for the resiliense
> and now I face availability issues.
>
> Can someone experienced provide an advice on recovery?
>
> My ZFS versions:
> v0.7.7-r0-gentoo
>
> This is also my root pool, so I can't boot my normal rig and booting the
> recovery environment using https://wiki.gentoo.org/wiki/User:Fearedbliss systemrescuecd
> based live system.
>
>
> Thanks in advance.
> Anton.
>
>
>
>
>
>
>
> _______________________________________________
> zfs-discuss mailing listzfs-discuss at list.zfsonlinux.orghttp://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20180422/c048efe3/attachment-0001.html>


More information about the zfs-discuss mailing list