[zfs-discuss] cannot import 'home': I/O error Destroy and re-create the pool from a backup source

Jeff Johnson jeff.johnson at aeoncomputing.com
Tue Apr 24 10:49:09 EDT 2018


Anton,

If you run `zdb -lu <blockdevice> | grep "txg =" | sort -nr -k3`  you will
get a reverse chronological list of transaction group id numbers. You can
start with the most recent txg and work backwards trying to import the pool
based on an earlier state, prior to the event that created the issue.

`zpool -T txg# -o readonly=on <poolname>`

--Jeff

On Tue, Apr 24, 2018 at 7:42 AM, Антон Губарьков <anton.gubarkov at iits.ru>
wrote:

> Hi, Jeff.
> Thanks for heads up. I always wanted to understand what I should put as
> txg?
>
> How to come to a sound number?
>
> вт, 24 апр. 2018 г., 17:23 Jeff Johnson <jeff.johnson at aeoncomputing.com>:
>
>> Anton,
>>
>> I don't know if your environment is dual-host or not but your error
>> presents like a split-brain scenario. I also noticed that your ZFS version
>> is 0.7.7, this version has been identified as having a data corruption
>> issue ( https://github.com/zfsonlinux/zfs/issues/7401 ). 7401 doesn't
>> present like your error but it is an additional facet to consider, perhaps
>> build 0.7.8 and use $buildroot/cmd/zpool/zpool to try and import using -F
>> -X or -t <txg> to get it imported.
>>
>> --Jeff
>>
>> On Tue, Apr 24, 2018 at 6:44 AM, Anton Gubar'kov via zfs-discuss <
>> zfs-discuss at list.zfsonlinux.org> wrote:
>>
>>> Here is the output of
>>> home64 ~ # zdb -scL -eG home
>>>
>>> Traversing all blocks to verify metadata checksums ...
>>>
>>> 44.4M completed (   4MB/s) estimated time remaining: 623hr 36min 21sec
>>>       zdb_blkptr_cb: Got error
>>>  52 reading <0, 32, 0, 1a7>  -- skipping
>>> 1.04G completed (   9MB/s) estimated time remaining: 261hr 50min 42sec
>>>       zdb_blkptr_cb: Got error
>>>  52 reading <0, 51, 1, 3>  -- skipping
>>> 2.91G completed (   9MB/s) estimated time remaining: 255hr 55min 16sec
>>>       zdb_blkptr_cb: Got error
>>>  52 reading <0, 153, 1, 0>  -- skipping
>>> 4.46G completed (  10MB/s) estimated time remaining: 235hr 32min 31sec
>>>       zdb_blkptr_cb: Got error
>>>  52 reading <0, 292, 1, 3>  -- skipping
>>> 6.18G completed (  11MB/s) estimated time remaining: 217hr 02min 56sec
>>>       zdb_blkptr_cb: Got error
>>>  52 reading <0, 396, 1, 0>  -- skipping
>>> 87.9G completed (   8MB/s) estimated time remaining: 306hr 45min 20sec
>>>       zdb_blkptr_cb: Got error
>>>  52 reading <515, 0, 1, 0>  -- skipping
>>> 6.59T completed ( 509MB/s) estimated time remaining: 1hr 13min 14sec
>>>     zdb_blkptr_cb: Got error 5
>>> 2 reading <16097, 0, 1, 0>  -- skipping
>>> zdb_blkptr_cb: Got error 52 reading <16097, 0, 1, 1>  -- skipping
>>> zdb_blkptr_cb: Got error 52 reading <16097, 0, 1, 2>  -- skipping
>>> zdb_blkptr_cb: Got error 52 reading <16097, 0, 1, 6>  -- skipping
>>> 6.59T completed ( 509MB/s) estimated time remaining: 1hr 13min 14sec
>>>     zdb_blkptr_cb: Got error 5
>>> 2 reading <16175, 0, 1, 0>  -- skipping
>>> zdb_blkptr_cb: Got error 52 reading <16175, 0, 1, 2>  -- skipping
>>> zdb_blkptr_cb: Got error 52 reading <16175, 0, 1, 5>  -- skipping
>>> 8.87T completed ( 603MB/s) estimated time remaining: 4294718932hr
>>> 4294967257min 4294967243sec
>>> Error counts:
>>>
>>>         errno  count
>>>            52  13
>>> block traversal size 9597670125568 != alloc 9597792563200 (unreachable
>>> 122437632)
>>>
>>>         bp count:       143491768
>>>         ganged count:           0
>>>         bp logical:    6202627303424      avg:  43226
>>>         bp physical:   5973284168192      avg:  41628     compression:
>>>  1.04
>>>         bp allocated:  9751113109504      avg:  67955     compression:
>>>  0.64
>>>         bp deduped:    153442983936    ref>1: 2536737   deduplication:
>>>  1.02
>>>         SPA allocated: 9597791084544     used: 53.72%
>>>
>>>         additional, non-pointer bps of type 0:    1416603
>>>         Dittoed blocks on same vdev: 3677859
>>>
>>>                             capacity   operations   bandwidth  ----
>>> errors ----
>>> description                used avail  read write  read write  read
>>> write cksum
>>> home                      8.73T 7.52T   774     0 11.2M     0     0
>>>  0    25
>>>   raidz2                  8.73T 7.52T   774     0 11.2M     0     0
>>>  0   122
>>>     /dev/disk/by-id/wwn-0x5000c500a41a0a00-part1   130     0 1.88M
>>>  0     0     0     0
>>>     /dev/disk/by-id/wwn-0x5000c500a41ae340-part1   125     0 1.84M
>>>  0     0     0   258
>>>     /dev/disk/by-id/wwn-0x5000c500a41b4c57-part1   130     0 1.87M
>>>  0     0     0  137K
>>>     /dev/disk/by-id/wwn-0x5000c500a41b7572-part1   130     0 1.88M
>>>  0     0     0     0
>>>     /dev/disk/by-id/wwn-0x5000c500a41ba99c-part1   125     0 1.84M
>>>  0     0     0   266
>>>     /dev/disk/by-id/wwn-0x5000c500a41babe8-part1   130     0 1.87M
>>>  0     0     0  136K
>>>   log /dev/disk/by-id/wwn-0x30000d1700d9d40f-part2 1.41M 7.94G     0
>>>  0    25     0     0     0     0
>>>
>>> ZFS_DBGMSG(zdb):
>>> home64 ~ #
>>>
>>> I really would appreciate someone's experienced opinion.
>>>
>>> On Tue, Apr 24, 2018 at 11:38 AM Anton Gubar'kov <anton.gubarkov at iits.ru>
>>> wrote:
>>>
>>>> I tried to experiment with zdb today and I ran (to verify the integrity
>>>> of pool's metadata)
>>>> home64 ~ # zdb -sc -eG home
>>>>
>>>> Traversing all blocks to verify metadata checksums and verify nothing
>>>> leaked ...
>>>>
>>>> loading vdev 0 of 2, metaslab 3 of 130 ...space_map_load(msp->ms_sm,
>>>> msp->ms_tree, SM_ALLOC) == 0 (0x5 == 0x0)
>>>> ASSERT at zdb.c:3502:zdb_leak_init_ms()Aborted (core dumped)
>>>>
>>>> It ended with a core dump very quickly.
>>>> I started zdb -scL -eG home then to skip verification of space maps. It
>>>> still runs. I'll post the outcome as soon as it is completed.
>>>>
>>>>
>>>> On Mon, Apr 23, 2018 at 2:18 PM Anton Gubar'kov <anton.gubarkov at iits.ru>
>>>> wrote:
>>>>
>>>>> another note - I did configure cache dev via /dev/disk/by-id path.
>>>>> Current config shows (correct) link via /dev
>>>>>
>>>>> On Mon, Apr 23, 2018 at 2:10 PM Anton Gubar'kov <
>>>>> anton.gubarkov at iits.ru> wrote:
>>>>>
>>>>>> Dear friends
>>>>>>
>>>>>> May I ask you to start  a separate discussion on zfs recv -f usage?
>>>>>>
>>>>>> I've upgraded to kernel 4.16.3 and ZFS v0.7.8-r0-gentoo and tried
>>>>>> import -FX. The result was
>>>>>> cannot import 'home': one or more devices is currently unavailable
>>>>>>
>>>>>> zpool import -N immediately following the above message:
>>>>>> home64 /usr/src/linux # zpool import -N
>>>>>>    pool: home
>>>>>>      id: 4810743847386909334
>>>>>>   state: ONLINE
>>>>>>  status: Some supported features are not enabled on the pool.
>>>>>>  action: The pool can be imported using its name or numeric
>>>>>> identifier, though
>>>>>> some features will not be available without an explicit 'zpool
>>>>>> upgrade'.
>>>>>>  config:
>>>>>>
>>>>>> home                            ONLINE
>>>>>>   raidz2-0                      ONLINE
>>>>>>     wwn-0x5000c500a41a0a00      ONLINE
>>>>>>     wwn-0x5000c500a41ae340      ONLINE
>>>>>>     wwn-0x5000c500a41b4c57      ONLINE
>>>>>>     wwn-0x5000c500a41b7572      ONLINE
>>>>>>     wwn-0x5000c500a41ba99c      ONLINE
>>>>>>     wwn-0x5000c500a41babe8      ONLINE
>>>>>> cache
>>>>>>   sdg3
>>>>>> logs
>>>>>>   wwn-0x30000d1700d9d40f-part2  ONLINE
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss at list.zfsonlinux.org
>>> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
>>>
>>>
>>
>>
>> --
>> ------------------------------
>> Jeff Johnson
>> Co-Founder
>> Aeon Computing
>>
>> jeff.johnson at aeoncomputing.com
>> www.aeoncomputing.com
>> t: 858-412-3810 x1001   f: 858-412-3845
>> m: 619-204-9061
>>
>> 4170 Morena Boulevard, Suite D - San Diego, CA 92117
>> <https://maps.google.com/?q=4170+Morena+Boulevard,+Suite+D+-+San+Diego,+CA+92117&entry=gmail&source=g>
>>
>> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>>
>


-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson at aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20180424/b5072170/attachment-0001.html>


More information about the zfs-discuss mailing list