[zfs-devel] Why are vdevs immediately closed after opening them?
roland.kammerer at linbit.com
Wed Aug 23 11:03:48 EDT 2017
Hi zfs devs!
Before the question makes sense, I probably should give a short
introduction what we do and what we want to achieve. If you already know
DRBD, jump to the "The actual question" section.
We develop DRBD, which replicates block devices over the network to a
given number of peer nodes. You define your DRBD resource on all nodes,
and all nodes have a local backing-device (/dev/sdXY). You then get a
/dev/drbdX and data written to that device gets replicated to peers.
"RAID-1 over network". Only one peer is allowed to write data, we call
that node "primary". Previously this was a manual process. The
admin/cluster management software promoted one node to "primary", all
others are "secondary".
Nowadays we have something called "auto-promote". If all cluster nodes
are "secondary", and /dev/drbdX gets opened for writing, we can promote
that to "primary" without any further action. On close we demote it to
"secondary". This works quite well for use cases like creating a file
system on /dev/drbdX. mkfs opens it, we switch the device to primary,
mkfs writes its data, we replicate it, mkfs closes the device and we
demote the DRBD resource to "secondary".
The actual question:
So, for us it would be great to combine zfs with DRBD. The /dev/drbdX
would be used instead of the actual disks when creating the zpool and
we would end up with a replicated zfs. The problem is that it looks like
zfs opens the /dev/drbdX and immediately closes ist again while
holding/using an internal in kernel reference. This immediate close
switches DRBD back to secondary and further writes then fail because the
device is not "primary" anymore.
I did not dive too deep into the zfs code, but "vdev_disk_rrpart()"
looks like a candidate where the device is opened and then immediately
Closing the device while it actually is still in use looks quite
uncommon. Is there a reason why it is done that way? Is that fundamental
to the zfs design? Or could that be changed to close the device later
when it actually is not used anymore?
More information about the zfs-devel