[zfs-discuss] cannot import 'home': I/O error Destroy and re-create the pool from a backup source
anton.gubarkov at gmail.com
Sun Apr 22 08:56:37 EDT 2018
Thanks a lot for advice. The rig is not bootable, so I use rescuecd live
system to inspect/repair. There is no auto-import in it.
I did a dd of=/dev/null for all my pool's devices (all 6 raidz2, ZIL and
cache) overnight. all reads were successful, no messages in dmesg
whatsoever related to storage/scsi/zfs. Night dd run produced only :
[11271.659763] perf: interrupt took too long (2513 > 2500), lowering
kernel.perf_event_max_sample_rate to 79000
[15930.168897] perf: interrupt took too long (3153 > 3141), lowering
kernel.perf_event_max_sample_rate to 63000
I don't believe this signifies any problems with my disks/controllers.
All 6 devices and ZIL show identical txg from zdb -l
Replacing all 6 drives is an overkill for my budget at the moment and total
profanation of the idea I built this pool for - reliability, fault
tolerance and backup.
Is there a way to get more information about the import error? Some command
line switches/environmental variables to set? When I run zpool import,
dmesg and/or system log has no messages. The diag message produced by zpool
import command is not very helpful.
The ZFS-8000-4J message that is referenced in zpool import output relates
to missing/failed devices, but there are no OFFLINE/REMOVED/UNAVAIL devices
in the pool config. They are all present in /dev/disk/by-id and /dev and
successful read is confirmed.
Thanks for your support.
On Sat, Apr 21, 2018 at 10:24 PM Daniel Armengod via zfs-discuss <
zfs-discuss at list.zfsonlinux.org> wrote:
> *Wait for someone more knowledgeable to provide advice*
> I was in a similar situation just yesterday. 4-drive RAIDz1, 1 drive
> completely dead, the other one had acted up.
> What I did (again: wait for someone else to provide input on this) was to:
> * Disable ZFS automatic import on boot (in my case, systemctl disable
> zfs-import.target). Actually, boot the system as stripped of non-essential
> processes and services as you can.
> * Check dmesg for error messages. Disks acting up will leave a very
> recognizable pattern there.
> * Make a full non-destructive read-only badblocks pass on each device.
> This will tell you if they can withstand reads. If any disks are
> not-yet-dead-but-dying the stress will leave logs in the kernel ring
> buffer; check dmesg regularly. Identify how many flaky drives you have.
> Pray you don't break the redundancy threshold.
> * Check the ZFS data structures with zdb. With the pool unmounted, for
> each member drive, run zdb -l /dev/<path_to_drive>. Make sure to provide
> the partition/slice number, even if you gave ZFS the whole disks to build
> the pool. Of particular note is the txg number: it should be the same in
> all devices. I believe it should be the same in *at least* n-2 devices for
> a raidz2.
> * Go back and read the zdb manual, it's quite interesting.
> In my case, 2 of the remaining RAIDZ1 disks - the healthy ones - showed a
> txg number higher than the faulty one.
> zpool import tank -XF did the trick for me. I was lucky and able to
> recover all the data (if anything was lost I've yet to notice, it mostly
> contains anime :P) I thought I'd lose.
> After successful recovery, re-import it with -o readonly=on and zfs send'd
> all the datasets you care about somewhere safe and reliable. Then you can
> do disk reshuffling until you can trust your pool again.
> Best of luck,
> On 2018-04-21 19:36, Anton Gubar'kov via zfs-discuss wrote:
> My recent backup server freeze ended up with non-importable pool. Since
> it's a backup server - I have no further backups - so following the diag
> message is not a way for me. I would like to recover this pool as it
> contains some valuable data I cannot reproduce ever (video archive).
> So my status today is:
> root at sysresccd /root % zpool import -N -f home
> cannot import 'home': I/O error
> Destroy and re-create the pool from
> a backup source.
> root at sysresccd /root % zpool import -N
> pool: home
> id: 4810743847386909334
> state: ONLINE
> status: One or more devices contains corrupted data.
> action: The pool can be imported using its name or numeric identifier.
> see: http://zfsonlinux.org/msg/ZFS-8000-4J
> home ONLINE
> raidz2-0 ONLINE
> wwn-0x5000c500a41a0a00 ONLINE
> wwn-0x5000c500a41ae340 ONLINE
> wwn-0x5000c500a41b4c57 ONLINE
> wwn-0x5000c500a41b7572 ONLINE
> wwn-0x5000c500a41ba99c ONLINE
> wwn-0x5000c500a41babe8 ONLINE
> wwn-0x30000d1700d9d40f-part2 ONLINE
> I tried import -F and import -FX too - no luck :-[
> I have reviewed all similar cases that google search returned me. I'm
> really confused as I've built the 6-drive raidz2 just for the resiliense
> and now I face availability issues.
> Can someone experienced provide an advice on recovery?
> My ZFS versions:
> This is also my root pool, so I can't boot my normal rig and booting the
> recovery environment using https://wiki.gentoo.org/wiki/User:Fearedbliss systemrescuecd
> based live system.
> Thanks in advance.
> zfs-discuss mailing listzfs-discuss at list.zfsonlinux.orghttp://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the zfs-discuss