Migration from the KQ Implementation

Gordan Bobic gordan.bobic at gmail.com
Tue May 3 14:54:13 EDT 2011


On 05/03/2011 05:51 PM, Brian Behlendorf wrote:
> Hi Gordon,
>
> I would suggest you try the latest 0.6.0-rc source from Github or
> Darik's PPA.  This will become the 0.6.0-rc4 tag shortly if it passes
> the needed testing.  I suspect you'll be pleasantly surprised with the
> stability of this implementation.
>
> github:
>    git clone git://github.com/behlendorf/spl.git
>    git clone git://github.com/behlendorf/zfs.git
>
> PPA:
>    https://launchpad.net/~dajhorn/+archive/zfs
>
> As for when an official 0.6.0 tag will be released it primarily depends
> on resolving the remaining 14 open bugs.  We don't want to tag a stable
> release with any know stability/correctness issues.
>
>    http://github.com/behlendorf/zfs/issues

Indeed, I understand that.

> As for your dedup+compression issue I would suggest trying to reproduce
> it with this code base.

The problem is that the issue is quite nebulous. Once it occurs, the 
kernel will crash hard, and the stack trace goes off screen. The only 
thing that I know for sure is that txg_sync is listed in the trace, and 
that it will happen as soon as zfs module is loaded (if zpool.cache 
exists), or as soon as the zpool is imported (if there is no zpool.cache).

In retrospect I should have probably made sure that the zpool is created 
as v23 so that I could use the fuse implementation to scrub the pool out 
and still have the data accessible, but I hadn't thought of that in time 
- and I really want to avoid having to do another 4 TB restore (you 
never quite realize just how much data 4TB is until you actually have to 
copy it all).

> As you probably know the KQ implementation was
> originally derived from this project.  However, we have fixed numerous
> issues which never made it back in to the KQ version.  It wouldn't
> surprise me too much if your issue has already been addressed.

I tend to to put much faith in issues being fixed by accident. OTOH, I'd 
love to be pointed at a closed bug report that sounds like it might have 
been responsible for my crash. I'm not sure if the issue is caused by 
merely having file systems with dedup+compress set to on, or whther is 
caused by twiddling those two flags on a fs while a large file is being 
copied to it. But it happened to me twice in 4 days in exactly the same 
way, just when I got the fs restored back to how it was.

So - the issue should be re-creatable, but unfortunately, I'm not 
prepared to go through the re-creation effort right now because it takes 
me about 2 days to recover from it and the hardware I am running it on 
isn't quite doable without for the next few days.

> If you
> are able to recreate it please open a bug on the issue tracker with the
> crash details.  Just think of it as one more blocker for the 0.6.0
> release!

That's another problem - extracting crash details may prove challenging 
since the machine locks up very hard, and the stack trace is > screen 
buffer, so the best I could probably do easily is a photo of the part of 
the kernel stack trace that fits on the screen.

Thanks for your input, I appreciate it. If for some reason another 
restore becomes inevitable I'll try to re-create the problem with zpool 
v23 so that I can fall back on the fuse implementation if needs be, and 
try rc4 when it comes out.

Gordan



More information about the zfs-discuss mailing list