[zfs-discuss] Integrating better with systemd
aarcane at aarcane.org
Mon Mar 26 02:31:46 EDT 2012
It's impossible to predict 100% the device node where the needed disk
will appear. It's usually safe to guess that it will be
/dev/sd(x+1)(partnum), but it isn't always. So either we need a new
interface, or we need to wait until we can import the pool degraded, at
least. Perhaps we should examine the output of zpool import, and only
auto-import the pool when all disks are present? Perhaps some
variable.. "HOTPLUG_ZPOOLS='degraded'", "='othervalue'", or =''? if
degraded, it auto imports the pool as soon as possible, if any other
nonempty value, it auto imports it when completely available, and if
empty, it doesn't import at all? Perhaps a userproperty for zpool to
specify whether to autoimport the pool? it does seem only useful for
backup volumes, and user drives. Perhaps we should zpool set
zol:autoimport=yes on autoimport pools?
On 3/25/2012 23:25, Manuel Amador (Rudd-O) wrote:
> I don't agree with the configuration. We need to get it right by
> default, and THEN provide unbreak-me options.
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
> Christ Schlacta <aarcane at aarcane.org> wrote:
> I think we should be able to import only a single disk of a pool, and if
> "HOTPLUG_ZPOOLS='Y'" is set in /etc/default/zfs, we should be able to
> import a device as soon as it's attached, but the problem is what
> command can we use to tell zfs about the device node of each disk as
> they come available? Currently, I don't think it's possible, and zfs
> needs to be expanded somehow. It should be possible to tell ZFS that
> the missing disks are missing, give us a symbolic name, so we can zfs
> online poolname /symbolic/name /dev/realdisk when the disk comes available.
> On 3/25/2012 23:04, Manuel Amador wrote:
> > Ideally, hotplugged pools will not be imported until all component parts are
> > available. Alternatively, pools could be imported in a degraded state and
> > late-plugged components can be left to "catch up" by zpool online us
> > same udev mechanism.
> > I don't really have an answer to this. What do you think we should do?
> > On Sunday, March 25, 2012 22:12:00 Christ Schlacta wrote:
> >> On 3/25/2012 21:14, Manuel Amador (Rudd-O) wrote:
> >>> For a while now, I've been experiencing a few woes related to systemd,
> >>> booting with ZFS on the root, underlied by dm-crypt, and my
> >>> generator-based approach.
> >>> Here is the ideal situation I would like to have:
> >>> During initramfs:
> >>> - decrypt all available initial volumes (done by Fedora)
> >>> - import ZFS pools (done by our scripts)
> >>> - mount root file system (done by our scripts)
> >>> During early boot:
> >>> - discover all file systems mountable and schedule them for mo
> (done in
> >>> my branch)
> >>> - perform late block storage initialization and decryption (done by
> >>> Fedora)
> >>> - import any newly available ZFS pools (not done)
> >>> - schedule any new file systems available for mount (not done)
> >>> During shut down:
> >>> - unmount everything (systemd does it)
> >>> - export pools cleanly (not done)
> >>> because, if they are imported, the crypt FSes are never closed
> >>> The "not done" parts are really ticking me off.
> >>> It is supposedly possible that we can accomplish most of this stuff simply
> >>> by using udev rules. Udev rules could possibly announce "hey, I found a
> >>> ZFS array component", and communicate to a userspace program that says
> >>> "OK, this a
> rray is
> now ready to be assembled and imported, and it really
> >>> belongs to this system, so import it now". Or "hey I found a bunch of
> >>> ZFS file systems after importing this array, and these file systems have
> >>> a policy of mounting on import, so mount them in the right order now".
> >>> Of course, the question of "which file systems are essential for the
> >>> system to boot up" is MUCH TRICKIER that way, because ZFS does not
> >>> specify them on /etc/fstab. So it might be worthwhile to keep my
> >>> generator-based approach at least for early boot, and use the udev-based
> >>> approach for pools that are made available later on during or after boot.
> >>> This would also give us hotplug and automount for free.
> >>> Can anyone give me more input and ideas? I'd appreciate it a lot. I'm
> crossposting to systemd-devel because I really have no idea, and this part
> >>> is really not well documented.
> >>> Is anyone even using my branch of ZFS with systemd support?
> >> what happens when you plug in three of five disks of a raid-z array, one
> >> per minute over three minutes, then walk away to have coffee? Will the
> >> system know that a disk is missing from the pool and not try to import
> >> it? When the fourth disk comes in, will it know that it's the last
> >> needed piece, and bring the pool online degraded? Will the pool come to
> >> non-degraded state when the fifth disk comes live? I know this is a
> >> bit of an exageration, but often two or more disks can't be plugged in
> >> at the same time, and the kernel initializes disks one at a time, and
> >> can take several seconds to complete. I plugged in a USB hub with 4 USB
> >> keys on it, and they came live in an unpredictable order with different
> >> device nodes each time.. took like 20 seconds for the last disk to show
> >> up. While my example is a bit extreme, it is a plausible test case.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the zfs-discuss