[zfs-discuss] Help with debugging pool import problem

Raghuram Devarakonda draghuram at gmail.com
Mon Apr 23 17:58:19 EDT 2018


I built from the PR branch and after enabling debug, I see the
following error messages (in /proc/spl/kstat/zfs/dbgmsg):

---
1524519363   spa.c:5174:spa_tryimport(): spa_tryimport: importing bdgpool
1524519363   spa_misc.c:403:spa_load_note(): spa_load($import, config
untrusted): LOADING
1524519363   vdev.c:92:vdev_dbgmsg(): disk vdev '/dev/sdb1':
vdev_validate: failed reading config
1524519363   spa_misc.c:388:spa_load_failed(): spa_load($import,
config untrusted): FAILED: no valid uberblock found
1524519363   spa_misc.c:403:spa_load_note(): spa_load($import, config
untrusted): UNLOADING
-----

So the patch already helped in some way. As the messages are
suggesting that uberblocks could not be found, I ran "zdb -lu
/dev/sdb1" which does show some uberblocks. Here is its output in
full:

-----
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'bdgpool'
    state: 1
    txg: 14
    pool_guid: 14332205498532543363
    errata: 0
    hostname: 'tcmutest'
    top_guid: 2351111970084592093
    guid: 2351111970084592093
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 2351111970084592093
        path: '/dev/sdb1'
        devid: 'scsi-3600140594ac7dd267914357a480b465d-part1'
        whole_disk: 1
        metaslab_array: 68
        metaslab_shift: 24
        ashift: 16
        asize: 1042808832
        is_log: 0
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3
    Uberblock[4]
        magic = 0000000000bab10c
        version = 5000
        txg = 4
        guid_sum = 16683317468617135456
        timestamp = 1524499235 UTC = Mon Apr 23 16:00:35 2018
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        labels = 0
    Uberblock[5]
        magic = 0000000000bab10c
        version = 5000
        txg = 5
        guid_sum = 16683317468617135456
        timestamp = 1524499235 UTC = Mon Apr 23 16:00:35 2018
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        labels = 0
    Uberblock[6]
        magic = 0000000000bab10c
        version = 5000
        txg = 6
        guid_sum = 16683317468617135456
        timestamp = 1524499235 UTC = Mon Apr 23 16:00:35 2018
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        labels = 0
    Uberblock[8]
        magic = 0000000000bab10c
        version = 5000
        txg = 8
        guid_sum = 16683317468617135456
        timestamp = 1524499237 UTC = Mon Apr 23 16:00:37 2018
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        labels = 0
    Uberblock[9]
        magic = 0000000000bab10c
        version = 5000
        txg = 9
        guid_sum = 16683317468617135456
        timestamp = 1524499237 UTC = Mon Apr 23 16:00:37 2018
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        labels = 0
    Uberblock[10]
        magic = 0000000000bab10c
        version = 5000
        txg = 10
        guid_sum = 16683317468617135456
        timestamp = 1524499242 UTC = Mon Apr 23 16:00:42 2018
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        labels = 0
    Uberblock[11]
        magic = 0000000000bab10c
        version = 5000
        txg = 11
        guid_sum = 16683317468617135456
        timestamp = 1524499244 UTC = Mon Apr 23 16:00:44 2018
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        labels = 0
    Uberblock[14]
        magic = 0000000000bab10c
        version = 5000
        txg = 14
        guid_sum = 16683317468617135456
        timestamp = 1524499244 UTC = Mon Apr 23 16:00:44 2018
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        labels = 0
-----

Any ideas?

Thanks,
Raghu


On Mon, Apr 23, 2018 at 5:15 PM, Raghuram Devarakonda
<draghuram at gmail.com> wrote:
> Thanks Chris. I will build the PR and see whether it helps.
>
> Thanks,
> Raghu
>
> On Mon, Apr 23, 2018 at 2:00 PM, Chris Siebenmann <cks at cs.toronto.edu> wrote:
>>> I am looking for some help in debugging a problem with pool import and
>>> any suggestions are highly appreciated.
>>>
>>> First off, this is what I see when I try to import:
>> [...]
>>> An important point here is that the device "sdb" is a block device
>>> backed by my own driver. [...]
>>>
>>> I completely understand that the real problem could be in my code but
>>> I am looking for suggestions on how to debug pool import issues. If I
>>> can get more specific error (something beyond "corrupted data"), that
>>> would help me in narrowing down the root cause.
>>
>>  At the moment there are no better diagnostics that are readily
>> available from the user level code, as you've noticed. This is
>> very annoying in a number of situations, but the good news is
>> that improvements are coming.
>>
>>  First, I believe that this error ultimately emerges from the kernel,
>> but you should check the user-level code involved in 'zpool import'
>> to be sure. Modifying the user-level code to spit out additional
>> information (or simply running it under gdb with appropriate breakpoints
>> set) will likely be a lot easier than the alternatives.
>>
>>  As far as the kernel code goes, a change to split the previously
>> monolithic kernel code for importing a ZFS pool into multiple functions
>> was recently added to the development version of ZFS on Linux (this
>> is the commit for 'OpenZFS 7638 - Refactor spa_load_impl into several
>> functions'). If you can build the development ZFS on Linux and then
>> instrument which of these new functions are called and succeed (versus
>> fail), you can probably gather much more information about the specific
>> failing operation. Possible avenues for instrumentation are various
>> passive kernel function tracing methods like ftrace[*], or simply hand
>> modifying the code to stick printk()s in.
>>
>>  This split of spa_load_impl is the prequel for a second change that
>> will add better error reporting to 'zpool import' itself, but that
>> change is not yet ported to ZFS on Linux. However, there is an open
>> pull request with an initial integration of this code:
>>         https://github.com/zfsonlinux/zfs/pull/7459
>>
>> A sufficiently daring person could build a version of ZFS on Linux
>> with this pull request integrated and then give it a try. Obviously
>> you want to do this on a scratch system, but if this is a test system
>> for a storage driver, it may qualify.
>>
>>         - cks
>> [*: see eg https://jvns.ca/blog/2017/03/19/getting-started-with-ftrace/
>> ]


More information about the zfs-discuss mailing list