[zfs-discuss] zfs-discuss Digest, Vol 7, Issue 59

hunter morgan automaticgiant at gmail.com
Thu Jan 7 12:04:56 EST 2016


Hahaha. Try zpconfig on your pool with a cache disk! (laughing because
I just tried that.) Now, if only I could get it working on a pool that
I can't open.

On 7 January 2016 at 11:35, Chris Siebenmann <cks at cs.toronto.edu> wrote:
>> >  Here is the thing: cache devices are *not* listed in the vdev tree.
>>
>> Why is that? I want sauce for that claim, due, in part, to output I'll
>> provide shortly.
>
>  This is what I see when I do a zdb dump of a pool with a L2ARC device
> (on ZFS on Linux):
>
> # zdb maindata
> Cached configuration:
>         vdev_children: 1
> [...]
>         vdev_tree:
>             type: 'root'
>             id: 0
>             guid: 2318438208971947125
>             children[0]:
>                 type: 'mirror'
>                 id: 0
>                 guid: 8223432651739159300
> [...]
>                 children[0]:
>                     type: 'disk'
>                     id: 0
>                     guid: 3188774935843120330
>                     path: '/dev/disk/by-id/ata-ST31000340AS_5QJ10BRT-part7'
> [...]
>                 children[1]:
>                     type: 'disk'
>                     id: 1
>                     guid: 12137634640557355462
>                     path: '/dev/disk/by-id/ata-ST31000333AS_6TE0D4T6-part7'
> [...]
>         features_for_read:
> [...]
>
>  There's no L2ARC device listed, just the main mirrored pair of disks
> (eg the root has 'vdev_children: 1' for the single mirror vdev).
> 'zpool status' reports the L2ARC:
>
> config:
>         NAME
>         maindata
>           mirror-0
>             ata-ST31000340AS_5QJ10BRT-part7
>             ata-ST31000333AS_6TE0D4T6-part7
>         cache
>           ata-Samsung_SSD_850_EVO_250GB_S21NNXCGA46445H
>
> I assume that L2ARC devices are not recorded in the vdev tree exactly
> so that they can be missing without causing the pool to fail to import.
>
>  My understanding is that the on-disk ZFS vdev configuration tree is
> not explicitly stored anywhere accessible and instead is basically
> reconstructed on the fly and verified based on the checksum matching.
> This unfortunately leaves one up the creek for detailed information if
> devices are missing; unlike a conventional RAID system, there is no
> accessible source of what the missing stuff is supposed to be. Instead
> all ZFS can say is 'there are some bits missing and the checksum doesn't
> verify, but I have no idea what exactly I'm supposed to be looking for'.
>
>  I believe ZFS doesn't even know the GUID(s) of the missing components,
> which is why the GUID of the 'missing' child is reported as 'guid: 0'
> in your original email. It might be possible to reconstruct a single
> missing GUID given an understanding of how the pool GUID sum is formed
> and all of the other GUIDs. I think the GUID sum is literally just the
> (wrapped-around) sum of all the GUIDs; if I'm doing the math right, this
> would make your single missing GUID value be 14707329497279011883.
>
>> > Your zdb output here says that your pool is supposed to have two
>> > vdevs ('vdev_children:2') and you can find one (the raidz vdev).
>> > However, the second child vdev, id 1 and type 'missing', is not
>> > there; the 'missing' type is a special vdev type that ZFS fills in
>> > when the pool data says there should be a child vdev but it can't be
>> > found.
>>
>> Idk why zdb says that or which os it was from, but zpool import
>> doesn't give quite the same information. I guess I'd have to look
>> through the nvlists to really understand what it was referring to, but
>> zfs-on-linux 0.6.5-pve6~jessi (proxmox) says:
>
>  'zpool import' is unfortunately uninformative here; it is explicitly
> coded to stop reporting details after the first problem it finds, which
> here is 'The pool was last accessed by another system'. An Illumos bug
> has been filed to fix this at some point:
>
>         https://www.illumos.org/issues/6478
>
> It also turns out that there's an Illumos bug for a potential underlying
> cause of your situation:
>
>         https://www.illumos.org/issues/6477
>
>  zdb does not report the 'pool was last accessed by another system'
> issue, so it dumps the underlying on-disk pool information as best
> it can reconstruct it.
>
>> I haven't quite gotten around to tracing the import through user and
>> kernelspace to find the exact error path, but I did replace the guid
>> of a new cache disk with this one in all 4 copies of the vdev label.
>> with the same results. I think I'm going to try and grok the nvlists
>> on that cache disk to see if a vdev tree is present that I need to
>> modify and/or grok the original pool5 vdev tree.
>
>  The 'missing' child is a vdev type of VDEV_TYPE_MISSING. The whole
> configuration reconstruction process seems to be done primarily in
> lib/libzfs/libzfs_import.c's get_configs(); see especially the bit
> that says:
>                 /*
>                  * Look for any missing top-level vdevs.  If this is
>                  * the case, create a faked up 'missing' vdev as a
>                  * placeholder.  We cannot simply compress the child
>                  * array, because the kernel performs certain checks to
>                  * make sure the vdev IDs match their location in the
>                  * configuration.
>                  */
>
>  There is also an interesting and potentially relevant 'XXX' comment
> in module/zfs/spa.c's spa_config_valid():
>                                 /*
>                                  * XXX - once we have 'readonly' pool
>                                  * support we should be able to handle
>                                  * missing data devices by transitioning
>                                  * the pool to readonly.
>                                  */
>
>  I believe that in theory it would be possible to hack
> spa_config_valid() et al to force a pool with a missing vdev
> to be considered valid and thus to import. I believe that the
> kernel ZFS code will immediately fail IO to such a vdev, possibly
> causing pool explosions if it actually needs data from there; see
> module/zfs/vdev_missing.c.
>
> (You might want to do this using zfs-fuse, or at least on a
> sacrificial machine in case this has side effects on other pools.
> Obviously such hackery is only an emergency measure to get the data
> off the pool.)
>
>         - cks


More information about the zfs-discuss mailing list