[zfs-discuss] Help with debugging pool import problem

Raghuram Devarakonda draghuram at gmail.com
Tue Apr 24 09:50:06 EDT 2018


BTW, here is a more detailed output (from "zdb -llluu /dev/sdb1"). I
would appreciate any insights from this data.

-----
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'bdgpool'
    state: 1
    txg: 35
    pool_guid: 9382196665123140658
    errata: 0
    hostname: 'tcmutest'
    top_guid: 12205973504118410794
    guid: 12205973504118410794
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 12205973504118410794
        path: '/dev/sdb1'
        devid: 'scsi-360014051e093766c0fa4d54aacf7b2dc-part1'
        whole_disk: 1
        metaslab_array: 68
        metaslab_shift: 24
        ashift: 16
        asize: 1042808832
        is_log: 0
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3


ZFS Label NVList Config Stats:
  1072 bytes used, 113576 bytes free (using  0.9%)

   integers:   17    632 bytes (58.96%)
    strings:    5    224 bytes (20.90%)
   booleans:    2     92 bytes ( 8.58%)
    nvlists:    3    124 bytes (11.57%)


    Uberblock[0]
magic = 0000000000bab10c
version = 5000
txg = 32
guid_sum = 3141426095531999836
timestamp = 1524524570 UTC = Mon Apr 23 23:02:50 2018
mmp_magic = 00000000a11cea11
mmp_delay = 0
        labels = 0
    Uberblock[1] invalid
    Uberblock[2] invalid
    Uberblock[3]
magic = 0000000000bab10c
version = 5000
txg = 35
guid_sum = 3141426095531999836
timestamp = 1524524570 UTC = Mon Apr 23 23:02:50 2018
mmp_magic = 00000000a11cea11
mmp_delay = 0
        labels = 0
    Uberblock[4]
magic = 0000000000bab10c
version = 5000
txg = 4
guid_sum = 3141426095531999836
timestamp = 1524524465 UTC = Mon Apr 23 23:01:05 2018
mmp_magic = 00000000a11cea11
mmp_delay = 0
        labels = 0 1
    Uberblock[5]
magic = 0000000000bab10c
version = 5000
txg = 5
guid_sum = 3141426095531999836
timestamp = 1524524465 UTC = Mon Apr 23 23:01:05 2018
mmp_magic = 00000000a11cea11
mmp_delay = 0
        labels = 0
    Uberblock[6]
magic = 0000000000bab10c
version = 5000
txg = 6
guid_sum = 3141426095531999836
timestamp = 1524524465 UTC = Mon Apr 23 23:01:05 2018
mmp_magic = 00000000a11cea11
mmp_delay = 0
        labels = 0
    Uberblock[7] invalid
    Uberblock[8] invalid
    Uberblock[9] invalid
    Uberblock[10] invalid
    Uberblock[11] invalid
    Uberblock[12] invalid
    Uberblock[13] invalid
    Uberblock[14]
magic = 0000000000bab10c
version = 5000
txg = 14
guid_sum = 3141426095531999836
timestamp = 1524524496 UTC = Mon Apr 23 23:01:36 2018
mmp_magic = 00000000a11cea11
mmp_delay = 0
        labels = 0
    Uberblock[15]
magic = 0000000000bab10c
version = 5000
txg = 15
guid_sum = 3141426095531999836
timestamp = 1524524496 UTC = Mon Apr 23 23:01:36 2018
mmp_magic = 00000000a11cea11
mmp_delay = 0
        labels = 0 1 2 3
------------------------------------
LABEL 1
------------------------------------
    version: 5000
    name: 'bdgpool'
    state: 1
    txg: 35
    pool_guid: 9382196665123140658
    errata: 0
    hostname: 'tcmutest'
    top_guid: 12205973504118410794
    guid: 12205973504118410794
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 12205973504118410794
        path: '/dev/sdb1'
        devid: 'scsi-360014051e093766c0fa4d54aacf7b2dc-part1'
        whole_disk: 1
        metaslab_array: 68
        metaslab_shift: 24
        ashift: 16
        asize: 1042808832
        is_log: 0
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3


ZFS Label NVList Config Stats:
  1072 bytes used, 113576 bytes free (using  0.9%)

   integers:   17    632 bytes (58.96%)
    strings:    5    224 bytes (20.90%)
   booleans:    2     92 bytes ( 8.58%)
    nvlists:    3    124 bytes (11.57%)


    Uberblock[0] invalid
    Uberblock[1] invalid
    Uberblock[2] invalid
    Uberblock[4] invalid
    Uberblock[5] invalid
    Uberblock[6] invalid
    Uberblock[7] invalid
    Uberblock[8] invalid
    Uberblock[9] invalid
    Uberblock[10] invalid
    Uberblock[11] invalid
    Uberblock[12] invalid
    Uberblock[13] invalid
    Uberblock[14] invalid
------------------------------------
LABEL 2
------------------------------------
    version: 5000
    name: 'bdgpool'
    state: 1
    txg: 35
    pool_guid: 9382196665123140658
    errata: 0
    hostname: 'tcmutest'
    top_guid: 12205973504118410794
    guid: 12205973504118410794
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 12205973504118410794
        path: '/dev/sdb1'
        devid: 'scsi-360014051e093766c0fa4d54aacf7b2dc-part1'
        whole_disk: 1
        metaslab_array: 68
        metaslab_shift: 24
        ashift: 16
        asize: 1042808832
        is_log: 0
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3


ZFS Label NVList Config Stats:
  1072 bytes used, 113576 bytes free (using  0.9%)

   integers:   17    632 bytes (58.96%)
    strings:    5    224 bytes (20.90%)
   booleans:    2     92 bytes ( 8.58%)
    nvlists:    3    124 bytes (11.57%)


    Uberblock[0] invalid
    Uberblock[1] invalid
    Uberblock[2] invalid
    Uberblock[3] invalid
    Uberblock[4] invalid
    Uberblock[5] invalid
    Uberblock[6] invalid
    Uberblock[7] invalid
    Uberblock[8] invalid
    Uberblock[9] invalid
    Uberblock[10] invalid
    Uberblock[11] invalid
    Uberblock[12] invalid
    Uberblock[13] invalid
    Uberblock[14] invalid
------------------------------------
LABEL 3
------------------------------------
    version: 5000
    name: 'bdgpool'
    state: 1
    txg: 35
    pool_guid: 9382196665123140658
    errata: 0
    hostname: 'tcmutest'
    top_guid: 12205973504118410794
    guid: 12205973504118410794
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 12205973504118410794
        path: '/dev/sdb1'
        devid: 'scsi-360014051e093766c0fa4d54aacf7b2dc-part1'
        whole_disk: 1
        metaslab_array: 68
        metaslab_shift: 24
        ashift: 16
        asize: 1042808832
        is_log: 0
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3


ZFS Label NVList Config Stats:
  1072 bytes used, 113576 bytes free (using  0.9%)

   integers:   17    632 bytes (58.96%)
    strings:    5    224 bytes (20.90%)
   booleans:    2     92 bytes ( 8.58%)
    nvlists:    3    124 bytes (11.57%)


    Uberblock[0] invalid
    Uberblock[1] invalid
    Uberblock[2] invalid
    Uberblock[3] invalid
    Uberblock[4] invalid
    Uberblock[5] invalid
    Uberblock[6] invalid
    Uberblock[7] invalid
    Uberblock[8] invalid
    Uberblock[9] invalid
    Uberblock[10] invalid
    Uberblock[11] invalid
    Uberblock[12] invalid
    Uberblock[13] invalid
    Uberblock[14] invalid
-----

On Mon, Apr 23, 2018 at 5:58 PM, Raghuram Devarakonda
<draghuram at gmail.com> wrote:
> I built from the PR branch and after enabling debug, I see the
> following error messages (in /proc/spl/kstat/zfs/dbgmsg):
>
> ---
> 1524519363   spa.c:5174:spa_tryimport(): spa_tryimport: importing bdgpool
> 1524519363   spa_misc.c:403:spa_load_note(): spa_load($import, config
> untrusted): LOADING
> 1524519363   vdev.c:92:vdev_dbgmsg(): disk vdev '/dev/sdb1':
> vdev_validate: failed reading config
> 1524519363   spa_misc.c:388:spa_load_failed(): spa_load($import,
> config untrusted): FAILED: no valid uberblock found
> 1524519363   spa_misc.c:403:spa_load_note(): spa_load($import, config
> untrusted): UNLOADING
> -----
>
> So the patch already helped in some way. As the messages are
> suggesting that uberblocks could not be found, I ran "zdb -lu
> /dev/sdb1" which does show some uberblocks. Here is its output in
> full:
>
> -----
> ------------------------------------
> LABEL 0
> ------------------------------------
>     version: 5000
>     name: 'bdgpool'
>     state: 1
>     txg: 14
>     pool_guid: 14332205498532543363
>     errata: 0
>     hostname: 'tcmutest'
>     top_guid: 2351111970084592093
>     guid: 2351111970084592093
>     vdev_children: 1
>     vdev_tree:
>         type: 'disk'
>         id: 0
>         guid: 2351111970084592093
>         path: '/dev/sdb1'
>         devid: 'scsi-3600140594ac7dd267914357a480b465d-part1'
>         whole_disk: 1
>         metaslab_array: 68
>         metaslab_shift: 24
>         ashift: 16
>         asize: 1042808832
>         is_log: 0
>         create_txg: 4
>     features_for_read:
>         com.delphix:hole_birth
>         com.delphix:embedded_data
>     labels = 0 1 2 3
>     Uberblock[4]
>         magic = 0000000000bab10c
>         version = 5000
>         txg = 4
>         guid_sum = 16683317468617135456
>         timestamp = 1524499235 UTC = Mon Apr 23 16:00:35 2018
>         mmp_magic = 00000000a11cea11
>         mmp_delay = 0
>         labels = 0
>     Uberblock[5]
>         magic = 0000000000bab10c
>         version = 5000
>         txg = 5
>         guid_sum = 16683317468617135456
>         timestamp = 1524499235 UTC = Mon Apr 23 16:00:35 2018
>         mmp_magic = 00000000a11cea11
>         mmp_delay = 0
>         labels = 0
>     Uberblock[6]
>         magic = 0000000000bab10c
>         version = 5000
>         txg = 6
>         guid_sum = 16683317468617135456
>         timestamp = 1524499235 UTC = Mon Apr 23 16:00:35 2018
>         mmp_magic = 00000000a11cea11
>         mmp_delay = 0
>         labels = 0
>     Uberblock[8]
>         magic = 0000000000bab10c
>         version = 5000
>         txg = 8
>         guid_sum = 16683317468617135456
>         timestamp = 1524499237 UTC = Mon Apr 23 16:00:37 2018
>         mmp_magic = 00000000a11cea11
>         mmp_delay = 0
>         labels = 0
>     Uberblock[9]
>         magic = 0000000000bab10c
>         version = 5000
>         txg = 9
>         guid_sum = 16683317468617135456
>         timestamp = 1524499237 UTC = Mon Apr 23 16:00:37 2018
>         mmp_magic = 00000000a11cea11
>         mmp_delay = 0
>         labels = 0
>     Uberblock[10]
>         magic = 0000000000bab10c
>         version = 5000
>         txg = 10
>         guid_sum = 16683317468617135456
>         timestamp = 1524499242 UTC = Mon Apr 23 16:00:42 2018
>         mmp_magic = 00000000a11cea11
>         mmp_delay = 0
>         labels = 0
>     Uberblock[11]
>         magic = 0000000000bab10c
>         version = 5000
>         txg = 11
>         guid_sum = 16683317468617135456
>         timestamp = 1524499244 UTC = Mon Apr 23 16:00:44 2018
>         mmp_magic = 00000000a11cea11
>         mmp_delay = 0
>         labels = 0
>     Uberblock[14]
>         magic = 0000000000bab10c
>         version = 5000
>         txg = 14
>         guid_sum = 16683317468617135456
>         timestamp = 1524499244 UTC = Mon Apr 23 16:00:44 2018
>         mmp_magic = 00000000a11cea11
>         mmp_delay = 0
>         labels = 0
> -----
>
> Any ideas?
>
> Thanks,
> Raghu
>
>
> On Mon, Apr 23, 2018 at 5:15 PM, Raghuram Devarakonda
> <draghuram at gmail.com> wrote:
>> Thanks Chris. I will build the PR and see whether it helps.
>>
>> Thanks,
>> Raghu
>>
>> On Mon, Apr 23, 2018 at 2:00 PM, Chris Siebenmann <cks at cs.toronto.edu> wrote:
>>>> I am looking for some help in debugging a problem with pool import and
>>>> any suggestions are highly appreciated.
>>>>
>>>> First off, this is what I see when I try to import:
>>> [...]
>>>> An important point here is that the device "sdb" is a block device
>>>> backed by my own driver. [...]
>>>>
>>>> I completely understand that the real problem could be in my code but
>>>> I am looking for suggestions on how to debug pool import issues. If I
>>>> can get more specific error (something beyond "corrupted data"), that
>>>> would help me in narrowing down the root cause.
>>>
>>>  At the moment there are no better diagnostics that are readily
>>> available from the user level code, as you've noticed. This is
>>> very annoying in a number of situations, but the good news is
>>> that improvements are coming.
>>>
>>>  First, I believe that this error ultimately emerges from the kernel,
>>> but you should check the user-level code involved in 'zpool import'
>>> to be sure. Modifying the user-level code to spit out additional
>>> information (or simply running it under gdb with appropriate breakpoints
>>> set) will likely be a lot easier than the alternatives.
>>>
>>>  As far as the kernel code goes, a change to split the previously
>>> monolithic kernel code for importing a ZFS pool into multiple functions
>>> was recently added to the development version of ZFS on Linux (this
>>> is the commit for 'OpenZFS 7638 - Refactor spa_load_impl into several
>>> functions'). If you can build the development ZFS on Linux and then
>>> instrument which of these new functions are called and succeed (versus
>>> fail), you can probably gather much more information about the specific
>>> failing operation. Possible avenues for instrumentation are various
>>> passive kernel function tracing methods like ftrace[*], or simply hand
>>> modifying the code to stick printk()s in.
>>>
>>>  This split of spa_load_impl is the prequel for a second change that
>>> will add better error reporting to 'zpool import' itself, but that
>>> change is not yet ported to ZFS on Linux. However, there is an open
>>> pull request with an initial integration of this code:
>>>         https://github.com/zfsonlinux/zfs/pull/7459
>>>
>>> A sufficiently daring person could build a version of ZFS on Linux
>>> with this pull request integrated and then give it a try. Obviously
>>> you want to do this on a scratch system, but if this is a test system
>>> for a storage driver, it may qualify.
>>>
>>>         - cks
>>> [*: see eg https://jvns.ca/blog/2017/03/19/getting-started-with-ftrace/
>>> ]


More information about the zfs-discuss mailing list