[zfs-discuss] Help with ZFS bad kernel Oops
Niels de Carpentier
niels at decarpentier.com
Mon Mar 26 01:07:12 EDT 2012
> On Sun, Mar 25, 2012 at 18:18, Niels de Carpentier
> <niels at decarpentier.com> wrote:
>> I had a quick look at the code, error 5 is EIO, txtype 9 is WRITE.
>> It looks like this will be caused by dbuf_findbp returning an error.
>> I don't think it's easy to determine why it failed without more info.
> I get this assertion when I set `--enable-debug` on any recent ZoL build:
> SPLError: 19196:0:(dbuf.c:806:dbuf_unoverride())
> ASSERTION(dr->dt.dl.dr_override_state != DR_IN_DMU_SYNC) failed
> More detail in this ticket:
> I did some bisecting, but the problem does not seem particular to
> 0.6.0.54, 0.6.0.55, or any combination of kernels. The Ubuntu
> 3.0.0-16 kernel in Oneiric seems particularly sensitive and crashes
> earlier than other versions, often at import or the first write.
> Solaris 11 is also complaining that all of the pool labels are bad
> although the pool is still recognized by ZoL. In my case, the
> underlying glitch is now on disk.
In my view the root cause is corruption of the ZIL. This in turn causes
problems when replaying the zil, which is what we are hitting. The problem
is identifying which version can cause ZIL corruption, which is a pain to
I didn't get a panic, just the warning in the log. The pool seemed to be
working properly. Now I don't even get the warning anymore. Maybe you have
a different kind of corruption? I have a separate log device, do you?
I think I might have fixed the issue by accident, by importing the pool
read-only. This will skip replay of the zil, avoiding the problem. Either
this or the switch back to rw must also clear/invalidate the zil, since
the problem was gone afterwards. I cannot think of any other way for the
error to disappear. Is there a command to clear the zil? I guess you can
just remove the log device if you use one?
The problem should only show up if the filesystems were not unmounted on
shutdown, since a clean shutdown wouldn't need a log replay. On my system
this indeed didn't happen.
I'll see if I can dig up some old logs to see the exact version I was
running before the reboot.
More information about the zfs-discuss