[zfs-devel] Why are vdevs immediately closed after opening them?

Chris Siebenmann cks at cs.toronto.edu
Thu Aug 24 17:11:23 EDT 2017

> The actual question:
> So, for us it would be great to combine zfs with DRBD. The /dev/drbdX
> would be used instead of the actual disks when creating the zpool and
> we would end up with a replicated zfs. The problem is that it looks
> like zfs opens the /dev/drbdX and immediately closes ist again while
> holding/using an internal in kernel reference. This immediate close
> switches DRBD back to secondary and further writes then fail because
> the device is not "primary" anymore.
> I did not dive too deep into the zfs code, but "vdev_disk_rrpart()"
> looks like a candidate where the device is opened and then immediately
> closed again.
> Closing the device while it actually is still in use looks quite
> uncommon. Is there a reason why it is done that way? Is that
> fundamental to the zfs design? Or could that be changed to close the
> device later when it actually is not used anymore?

 I think you need to look at the code flow in more detail to understand
what you're seeing, and it will very much depend on what operations are

 For instance, a 'zpool import' normally opens a bunch of disk devices
at user level so that it can read the disk labels and find which ones of
them belong to which ZFS pools; after it's read the labels, it closes
the devices again. Then once it has assembled the necessary information
about a particular pool to import, it passes all of this into the kernel
where the kernel re-opens the devices in vdev_disk_open() (possibly
calling vdev_disk_rrpart() just before it does this, although the
comments before vdev_disk_rrpart() suggest that this is rare).

 I don't think it's very likely that ZFS code will be reconstructed to
drastically change the user-to-kernel interface to do things like pass
already-opened file descriptors to disks in to the kernel along with the
pool configuration information. It would be a big change that touches a
great deal of things (and causes more divergences from upstream ZFS).

(To the best of my ability to see from the code, the ZFS kernel module
always holds an open reference to disks while they're in use in a
ZFS pool. This reference is a kernel-level reference obtained by
blkdev_get_by_path() in exclusive-access mode.)

	- cks

More information about the zfs-devel mailing list