[zfs-discuss] Kernel panic when operating on ZFS Datasets

Martin Ritchie ritchiem at apache.org
Wed Dec 13 23:45:41 EST 2017


Thanks for the suggestions guys. I did feel a bit risky going with a non
LTS ubuntu but had hoped for a newer verison of ZFS as a result.

I was tempted to build a new machine that has ECC ram for the storage but
ended up using my 'old' dev i7 desktop. I never had issues with it when it
was under heavy compilation or testing loads. But a good prime burn test
couldn't hurt.

Could a bit flip really cause such issues?

I did scrub the pool as I hoped that might fix things but the two data sets
still can't be read.

The boot volume is fresh so after the festivities I'll drop back to LTS and
upgrade ZFS to 0.7.x (will try .3 first)

Thanks for the tips. I've been running hardware raid 6 for the last 9 years
and now that I need to upgrade the whole pool I thought moving to ZFS would
free me from some of the hardware foibles.

Guess I just swapped it for some new ones :) Perhaps now is time for that
new Supermicro Denverton board (A2SDi-4C-HLN4F), keep the i7 for compute
and isolate the storage. Guess I need to get a back up of this data pronto.

Cheers
Martin


On 13 December 2017 at 12:06, Chris Siebenmann <cks at cs.toronto.edu> wrote:

> > Now the real problem is that I can't do any filesystem operations on
> > pools earth, remote, remote/TM they all hang, processes blocked on IO
> > a quick ps grep:
>
>  Since you've had a kernel panic, all bets are off. It's quite likely
> that the kernel panic has caused subsequent problems that are blocking
> IO to some or all pools.
>
> > The only sign of what might be going wrong is a kernel panic stack
> > trace in dmesg:
>
>  This kernel panic and stack trace is a big red flag, especially
> because of what it is. I'm going to quote selective pieces of it:
>
>         Dec 11 21:35:40 earth kernel: [  140.258196] kernel BUG at
> /build/linux-tt6jd0/linux-4.13.0/lib/string.c:985!
>         [...]
>         Dec 11 21:35:40 earth kernel: [  140.258923] RIP:
> 0010:fortify_panic+0x13/0x22
>
> This is what the kernel paniced in. fortify_panic() in lib/string.c
> is an internal kernel routine that is used to panic if the kernel
> detects that kernel code is using string functions in an unsafe way
> that would lead to buffer overflows or the like. This is obviously
> not supposed to happen; if it does, something bad has happened and
> the system is unstable from that point onward.
>
>         Dec 11 21:35:40 earth kernel: [  140.259291] Call Trace:
>         Dec 11 21:35:40 earth kernel: [  140.259363]
> zfs_acl_node_read.constprop.16+0x31a/0x320 [zfs]
>
>  This is probably the function where the error happened; the source code
> is in module/zfs/zfs_acl.c. This function does call one thing that could
> call fortify_panic() (bcopy(), for people following along); however,
> clearly this code should not be creating a buffer overflow.
>
>  This code has experienced a number of changes since ZFS 0.6.5, although
> none of them have change messages that are of the nature 'fixed
> potential buffer overflow'. If you can, I would try the most recent
> version of ZFS (either the latest development version or at least 0.7.4)
> to see if that makes a difference.
>
>  It's possible that this is because of a hardware issue, such as a
> flipped RAM bit. It's also possible that you've found a genuine issue in
> the code.  If this happens again after rebooting and updating to 0.7.4,
> I'd suggest reporting it as a bug in the issue tracker.
>
>  If you can, I would also scrub the pool. Unfortunately it's possible
> that you've wound up with on-disk data that now causes this ZFS panic
> when it gets touched; I'm not sure if this can be fixed. But the first
> step is to scrub the pool after a reboot, and then try reading all data
> with something like 'tar -cf /dev/null ....'.
>
> (Technically you don't have to read the data, but you do have to somehow
> trigger looking at the potential ACLs for every file. Reading everything
> is the easy way to do this.)
>
>         - cks
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20171213/1202e127/attachment.html>


More information about the zfs-discuss mailing list