[zfs-discuss] Help with initial ZFS setup

Richard Laager rlaager at wiktel.com
Fri Jul 26 12:35:39 EDT 2013


On Fri, 2013-07-26 at 08:48 -0300, Durval Menezes wrote:


> In defense of the (default) "no normalization" approach, I found a
> blog entry by one of the old Sun developers

I don't think that blog entry defends normalization=none. It is
comparing always normalizing text to an alternate behavior of
normalizing only on comparison, which is how ZFS behaves when ZFS does
normalization.

> Richard, do you care to explain why the formD setting is better, not
> just better than "none" but better than the other formN options? 

As you'd expect, normalization=none does no normalization. This means
you can create two files that appear to have the same name. Try this:
touch $(echo -e "caf\xC3\xA9") $(echo -e "cafe\xCC\x81") ; ls
        
On ext4 (or ZFS with normalization=none), you get two files that appear
to have the same name, because one is NFC (uses LATIN SMALL LETTER E
WITH ACUTE) and the other is NFD (LATIN SMALL LETTER E, COMBINING ACUTE
ACCENT). If you use any of the other normalization options on ZFS, they
will be treated as the same file. This seems like a desirable behavior
to me, and should help avoid interoperability problems across systems
which generally use different normal forms (e.g. Linux vs. OS X).

Since ZFS only normalizes on comparison (and never changes the stored
value), formC and formD are equivalent to each other and formKC and
formKD are equivalent to each other. I have no idea why Sun implemented
all four, instead of just two (either formC and formKC OR formD and
formKD). The composed variants should be ever so slightly slower, as
they involve decomposition first followed by composition. Thus, I
suggest you choose from formD and formKD only. It's possible in theory
that formC could be faster when the input strings are already in NFC (as
they typically are on Linux), but I don't believe ZFS's implementation
has that optimization in practice. And in any case, the computational
cost of normalization is pretty small (and it's zero or nearly so if
your filenames are all ASCII).

Finally, formKD is (by design) less strict in what it considers equal.
In some cases, that behavior is what I'd prefer (e.g. the ligature of
the characters "f" and "i" combined being equivalent to the two separate
characters in sequence), but in other cases it's not ("2" being equal to
the superscript squared character). So I don't use that personally and I
don't recommend it, since that might create results that are surprising
in the opposite direction (things you expect to be different being
treated as the same). But a reasonable person could want formKD.

-- 
Richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20130726/f49e01a4/attachment.sig>


More information about the zfs-discuss mailing list