[zfs-discuss] Integrating better with systemd

Christ Schlacta aarcane at aarcane.org
Mon Mar 26 02:31:46 EDT 2012


It's impossible to predict 100% the device node where the needed disk 
will appear.  It's usually safe to guess that it will be 
/dev/sd(x+1)(partnum), but it isn't always.  So either we need a new 
interface, or we need to wait until we can import the pool degraded, at 
least.  Perhaps we should examine the output of zpool import, and only 
auto-import the pool when all disks are present?  Perhaps some 
variable..  "HOTPLUG_ZPOOLS='degraded'", "='othervalue'", or =''?  if 
degraded, it auto imports the pool as soon as possible, if any other 
nonempty value, it auto imports it when completely available, and if 
empty, it doesn't import at all?  Perhaps a userproperty for zpool to 
specify whether to autoimport the pool?  it does seem only useful for 
backup volumes, and user drives.  Perhaps we should zpool set 
zol:autoimport=yes on autoimport pools?
On 3/25/2012 23:25, Manuel Amador (Rudd-O) wrote:
> I don't agree with the configuration. We need to get it right by 
> default, and THEN provide unbreak-me options.
> -- 
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>
> Christ Schlacta <aarcane at aarcane.org> wrote:
>
>     I think we should be able to import only a single disk of a pool, and if
>     "HOTPLUG_ZPOOLS='Y'" is set in /etc/default/zfs, we should be able to
>     import a device as soon as it's attached, but the problem is what
>     command can we use to tell zfs about the device node of each disk as
>     they come available?  Currently, I don't think it's possible, and zfs
>     needs to be expanded somehow.  It should be possible to tell ZFS that
>     the missing disks are missing, give us a symbolic name, so we can zfs
>     online poolname /symbolic/name /dev/realdisk when the disk comes available.
>
>     On 3/25/2012 23:04, Manuel Amador wrote:
>     >  Ideally, hotplugged pools will not be imported until all component parts are
>     >  available.  Alternatively, pools could be imported in a degraded state and
>     >  late-plugged components can be left to "catch up" by zpool online us
>       ing
>     the
>     >  same udev mechanism.
>     >
>     >  I don't really have an answer to this.  What do you think we should do?
>     >
>     >  On Sunday, March 25, 2012 22:12:00 Christ Schlacta wrote:
>     >>  On 3/25/2012 21:14, Manuel Amador (Rudd-O) wrote:
>     >>>  For a while now, I've been experiencing a few woes related to systemd,
>     >>>  booting with ZFS on the root, underlied by dm-crypt, and my
>     >>>  generator-based approach.
>     >>>
>     >>>  Here is the ideal situation I would like to have:
>     >>>
>     >>>  During initramfs:
>     >>>
>     >>>  - decrypt all available initial volumes (done by Fedora)
>     >>>  - import ZFS pools (done by our scripts)
>     >>>  - mount root file system (done by our scripts)
>     >>>
>     >>>  During early boot:
>     >>>
>     >>>  - discover all file systems mountable and schedule them for mo
>       unt
>     (done in
>     >>>  my branch)
>     >>>  - perform late block storage initialization and decryption (done by
>     >>>  Fedora)
>     >>>  - import any newly available ZFS pools (not done)
>     >>>  - schedule any new file systems available for mount (not done)
>     >>>
>     >>>  During  shut down:
>     >>>
>     >>>  - unmount everything (systemd does it)
>     >>>  - export pools cleanly (not done)
>     >>>
>     >>>        because, if they are imported, the crypt FSes are never closed
>     >>>
>     >>>  The "not done" parts are really ticking me off.
>     >>>
>     >>>  It is supposedly possible that we can accomplish most of this stuff simply
>     >>>  by using udev rules.  Udev rules could possibly announce "hey, I found a
>     >>>  ZFS array component", and communicate to a userspace program that says
>     >>>  "OK, this a
>       rray is
>     now ready to be assembled and imported, and it really
>     >>>  belongs to this system, so import it now".  Or "hey I found a bunch of
>     >>>  ZFS file systems after importing this array, and these file systems have
>     >>>  a policy of mounting on import, so mount them in the right order now".
>     >>>
>     >>>  Of course, the question of "which file systems are essential for the
>     >>>  system to boot up" is MUCH TRICKIER that way, because ZFS does not
>     >>>  specify them on /etc/fstab.  So it might be worthwhile to keep my
>     >>>  generator-based approach at least for early boot, and use the udev-based
>     >>>  approach for pools that are made available later on during or after boot.
>     >>>
>     >>>  This would also give us hotplug and automount for free.
>     >>>
>     >>>  Can anyone give me more input and ideas?  I'd appreciate it a lot.  I'm
>     >&
>       gt;>
>     crossposting to systemd-devel because I really have no idea, and this part
>     >>>  is really not well documented.
>     >>>
>     >>>  Is anyone even using my branch of ZFS with systemd support?
>     >>  what happens when you plug in three of five disks of a raid-z array, one
>     >>  per minute over three minutes, then walk away to have coffee?  Will the
>     >>  system know that a disk is missing from the pool and not try to import
>     >>  it?  When the fourth disk comes in, will it know that it's the last
>     >>  needed piece, and bring the pool online degraded?  Will the pool come to
>     >>  non-degraded state when the fifth disk comes live?    I know this is a
>     >>  bit of an exageration, but often two or more disks can't be plugged in
>     >>  at the same time, and the kernel initializes disks one at a time, and
>     >>  can take several seconds to complete.  I plugged in a USB hub with 4 USB
>     >>  keys on it, and they came live in an unpredictable order with different
>     >>  device nodes each time..  took like 20 seconds for the last disk to show
>     >>  up.  While my example is a bit extreme, it is a plausible test case.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20120325/f9a850bd/attachment.html>


More information about the zfs-discuss mailing list