[zfs-announce] Important: 0.6.5 zvol data loss regression on unaligned discard operations

Richard Yao ryao at gentoo.org
Fri Sep 18 10:26:23 EDT 2015


Dear Everyone,

I regret to say that ZoL 0.6.5 has the distinction of being the first tagged
production release to have a data loss regression (excluding xattr corruption
from xattr=sa, which was/is non-default and only affected metadata) in the
history of the project.

Unaligned Non-Secure discard commands on zvols on 0.6.5 will cause the zvol code
to discard past the requested region by the amount of the misalignment. For
example, if volrecordsize is 4k and you request that the data from LBAs
[2048,18432) be discarded. The discard will become [4096,20480), causing data in
[18432,20480) to be lost.

At this time, only Windows Virtual Machine guests are known to be affected, but
other affected zvol consumers that use the discard command could be affected.

A workaround is to disable discard on anything using zvols. A patch is
availiable:

https://github.com/zfsonlinux/zfs/pull/3798.patch

It will be included in the 0.6.5.1 point release that will almost certainly be
done on Monday at the latest. Brian is the one who tags releases and he is away
on a trip. I have sent him a message through Skype to try to get 0.6.5.1 tagged
sooner.

The patch has already been backported to Gentoo's packaging through
sys-fs/zfs-kmod-0.6.5-r1 update as of an hour ago and it should become avaliable
on the mirrors within the next few hours. Users of the Funtoo and Pentoo
distributions, which rely on Gentoo's packaging, should also receive the update
when their mirrors update within the next 24 hours.

I am notifying the maintainers of the NixOS, EPEL, Ubuntu and Debian packaging
on BCC so that this patch can be immediately backported ahead of the 0.6.5.1.
Given that Brian is the maintainer of the EPEL packaging, the 0.6.5.1 update
likely will be when it sees this patch.

People using zvols in production environments should apply the patch
immediately. The following commands will apply the patch and rebuild the module
on distributions that use DKMS, such as EPEL distributions and Debian/Ubuntu:

curl https://github.com/zfsonlinux/zfs/pull/3798.patch \
	| sudo patch -p1 -d /var/lib/dkms/zfs/0.6.5/source

sudo dkms remove --all zfs/0.6.5
sudo dkms install zfs/0.6.5

If an initramfs archive loads the module, it will need to be updated. Systems
that use root on ZFS will certainly use an initramfs archive that needs to be
updated while others might or might not. On distributions that use initramfs
archives generated by dracut such as EPEL distributions, the initramfs can be
updated by the `dracut -f` command.

On distributions that use initramfs archives generated by Debian's
initramfs-tools such as Debian and Ubuntu, this can be done with
`update-initramfs -u`.

On distributions that use initramfs archives generated by Gentoo's genkernel,
the initramfs archive should be rebuilt by the original command that was used to
build it, but it can be rebuilt by `genkernel initramfs --zfs`. It should be
noted that these distributions do not use DKMS and will receive the backported
patch as part of normal system updates. However, the patch can be applied
independently of the normal process by running the following commands as root:

mkdir -p /etc/portage/patches/sys-fs/zfs-kmod-0.6.5
curl https://github.com/zfsonlinux/zfs/pull/3798.patch \
	>/etc/portage/patches/sys-fs/zfs-kmod-0.6.5/3798.patch
emerge --nodeps --oneshot --ask =sys-fs/zfs-kmod-0.6.5

As a precaution, care should be taken to ensure that initramfs archives are
updated for all kernels rather than just the primary kernel so that the use of
an older kernel as a fallback does not risk bypassing the patch, but the
immediate risk will be eliminated by the above instructions. See your
distribution documentation for details.

After the module has been rebuilt, the module will need to be reloaded. The
easiest way to reload the module is through a reboot. The alternative is to stop
everything using zfs filesystems, umount all zfs filesystems, run `modprobe -r
zfs`, verify that ZFS was unloaded with `lsmod`, run `modprobe zfs`, remount the
zfs filesystems and restart services.

Those that do not use zvols not need take any action.


The project has gone to great effort to catch data loss regressions before they
were included in HEAD, much less a release, but having one get past us at some
point was unfortunately inevitable. This is the 6th data/metadata corruption
regression to enter the repository, the 2nd to enter a tagged production release
(the first being xattr=sa corrupting xattrs, which was fixed in 0.6.4) and the
first in a tagged production release that caused data loss. However, a
regression being news as opposed to a normal occurence is a very good thing and
the project's track record is still stellar.

My apologies to everyone either affected by this or will have the inconvenience
of an unscheduled update. we do everything in our power to catch and prevent
such issues as early as possible and our track record reflects that. The
project's regression tests will be tightened so that this specific type of
regression (unaligned discard operations on zvols discarding past the end of the
requested region) never happens again. The project cannot promise that we will
never have another regression of a different kind, but we can promise that we
will do our best to avoid it.

Yours truly,
Richard Yao

P.S. To anyone on BCC, a couple earlier emails were rejected by the mailing list
daemon, so I had to resend this after having talked with the mailing list
administrator.


More information about the zfs-announce mailing list