Native zfs vs zfs-fuse

Brian Behlendorf behlendorf1 at llnl.gov
Sat May 7 19:05:35 EDT 2011


Very cool!  This is a great start.  I'd love to see us using a framework
like this to ensure we always maintain the highest quality of
development.  In my ideal world the CI system would minimally provide
the following:

* Every commit to master validate the spl/zfs build on N distributions.
* Nightly run the full ZFS Test Suite on N distributions.
* Nightly/weekly performance testing on real hardware (not VMs).
* All results should be publicly available
* Developers can manually trigger builds for development branches

Concerning the kernel autotest framework.  The more that I look at it,
the more I agree with you that it may not be suited for this.
Particularly, since I see they haven't posted results for kernels newer
than 2.6.36.

However, there's another existing CI candidate called Buildbot I've been
reading the manual for.  It looks like it might be a really nice fit to
augment what your already working on.  It's a GPL project written in
Python and is used by the Python, Mozilla, Chromium, Wireshark,
Subversion, etc projects for their CI needs.

http://trac.buildbot.net/

It looks like to already covers the mundane stuff on your TODO list (web
interface, logging, scheduling, etc).  And since its written in python
I'm sure it could be easily extended to integrate in to your scheme for
quickly setting up distributions on real hardware.  In fact, they have a
section in the manual called 'Latent Buildslaves' for exactly this sort
of thing.  Although their examples are for EC2 or libvirt.

http://buildbot.net/buildbot/docs/current/Latent-Buildslaves.html#Latent-Buildslaves

The Latent Buildslaves probably only make sense for the performance
critical testing where VMs aren't appropriate.  For the quick build
testing setting up a bunch of long lived build slave VMs would be nice.

On nice feature is that it's designed to support distributed build
slaves.  If a someone wants to ensure we're testing on their
distribution, they just need to offer up the hardware/VM for a build
slave.  They can administer the host and we can easily get the automated
testing results.

Anyway, great work so far!  It's looking very promising.  If you get a
little a time you might skim the buildbot manual.  Maybe this is
something we can leverage, maybe not.

Thanks,
Brian 


On Sat, 2011-05-07 at 07:42 -0700, Gunnar Beutner wrote: 
> Autotest looks quite interesting. However there seems to be a bit a lack
> of documentation - or maybe I just didn't look at it thoroughly enough.
> 
> For now I've continued with writing my own scripts, which by now can
> automatically build images for multiple distributions (just Debian
> squeeze, CentOS 5.6 and OpenSUSE 11.x so far) with various kernels
> (2.6.26 - 2.6.39) and continuously run tests in a loop.
> 
> There are scripts to build jobs for all the possible combinations of the
> input parameters (distribution, kernel, git branch/commit, filesystem
> type, etc.):
> 
> root at charon:~/zfsci# ./zfsci-build-jobs
> Successfully built 294 job descriptions.
> 
> root at charon:~/zfsci# python -mjson.tool
> build/jobs/c96bd1d61225c9e8c87d3c5abd93150f.json
> {
>     "input": {
>         "distribution": "debian",
>         "fs-type": "zfs",
>         "kernel-version": "2.6.30",
>         "zfs-git-commit": 1304532000,
>         "zfs-git-repo": {
>             "branch": "master",
>             "spl": "behlendorf-spl",
>             "zfs": "behlendorf-zfs"
>         }
>     },
>     "job_id": "c96bd1d61225c9e8c87d3c5abd93150f"
> }
> 
> The PXE-bootable "master" image picks one of those job descriptions from
> an NFS share and installs the selected distribution, kernel and build
> tools and sets up the node to automatically run the tests on boot.
> 
> Once the tests are completed the node reboots and the master image tries
> to extract the test results:
> 
> root at charon:~/zfsci# python -mjson.tool
> /var/lib/zfsci-data/results/result-1304780569/job.json
> {
>     "input": {
>         "distribution": "debian",
>         "fs-type": "zfs",
>         "kernel-version": "2.6.35",
>         "rootpart": "/dev/sda1",
>         "swappart": "/dev/sda2",
>         "testpart": "/dev/sda3",
>         "zfs-git-commit": 1304712000,
>         "zfs-git-repo": {
>             "branch": "master",
>             "spl": "behlendorf-spl",
>             "zfs": "behlendorf-zfs"
>         }
>     },
>     "job_id": "cd4da6f9e1f659e590c3de206e863dc8",
>     "output": {
>         "<class '__main__.BuildKernelTask'>": {
>             "status": "SKIPPED"
>         },
>         "<class '__main__.BuildSPLTask'>": {
>             "run_end": 1304780359.8432441,
>             "run_start": 1304780318.95593,
>             "status": "PASSED"
>         },
>         "<class '__main__.BuildZFSTask'>": {
>             "run_end": 1304780441.2562261,
>             "run_start": 1304780364.5518751,
>             "status": "PASSED"
>         },
>         "<class '__main__.CreateExtFSTask'>": {
>             "run_end": 1304780443.9385149,
>             "run_start": 1304780443.938489,
>             "status": "SKIPPED"
>         },
>         "<class '__main__.CreateZPoolTask'>": {
>             "run_end": 1304780442.927438,
>             "run_start": 1304780442.339977,
>             "status": "PASSED"
>         },
>         "<class '__main__.DepmodTask'>": {
>             "run_end": 1304780242.4898851,
>             "run_start": 1304780242.0290351,
>             "status": "PASSED"
>         },
>         "<class '__main__.InstallZFSFuseTask'>": {
>             "run_end": 1304780360.88288,
>             "run_start": 1304780360.882858,
>             "status": "SKIPPED"
>         },
>         "<class '__main__.SaveCrashDumpsTask'>": {
>             "run_end": 1304780569.548317,
>             "run_start": 1304780569.535913,
>             "status": "PASSED"
>         },
>         "<class '__main__.ZFSDepDebianTask'>": {
>             "run_end": 1304780315.426764,
>             "run_start": 1304780296.448957,
>             "status": "PASSED"
>         },
>         "<class '__main__.ZFSDepsCentOSTask'>": {
>             "run_end": 1304780316.4739571,
>             "run_start": 1304780316.4739261,
>             "status": "SKIPPED"
>         },
>         "<class '__main__.ZFSDepsOpenSUSETask'>": {
>             "run_end": 1304780317.4897759,
>             "run_start": 1304780317.489748,
>             "status": "SKIPPED"
>         }
>     }
> }
> 
> (And possibly other files that can be used to diagnose problems in more
> detail.)
> 
> An average test run takes about 5 minutes (installing the system,
> building spl/zfs, creating a test zpool and post-processing the results)
> - which means I'm getting about 50 test results per hour using my
> current test setup (http://charon.shroudbox.net/public/zfsci-cluster).
> 
> There are still quite a few items on my TODO list though:
> 
> * capturing stdout/stderr output (especially for 'early' failures, e.g.
> during the install process) and system information (IP address, etc.)
> * integrating the watchdog script with my remote-controlled power strip
> (the watchdog script would've worked just fine, but after a couple dozen
> reboots the on-board Realtek NIC isn't detected anymore - and the test
> boxes need to be power-cycled to fix that)
> * conditionally building job descriptions (right now the script creates
> jobs even when we already have results for the same input parameters)
> * web thingie/visualization to make the results more user-friendly
> * support for more distributions (Ubuntu should be trivial to add,
> should probably also support Fedora)
> * making the scripts more robust and cleaning them up
> * modifying the job scripts so they automatically pick up new git branches
> * documentation
> * benchmarks (sort of low priority, considering all the other things
> that need to be fixed first)
> * and a few other things
> 
> Regards,
> Gunnar
> 
> On 28.04.2011 19:49, Brian Behlendorf wrote:
> > Good idea.  We absolutely need something like this.  It's really the
> > best way to ensure the quality of the port.  I've had similar thoughts
> > myself, but for the moment I am still making due with a dozen or so VMs.
> > I'd love to have the infrastructure in place to automatically run a
> > development branch through N different distributions checking for build
> > failures and performance regressions.
> > 
> > A while ago I came across a project at kernel.org called Autotest.  It
> > is a framework which was designed for fully automated Linux kernel
> > testing.  It appears to be under active development and quite far along.
> > Plus since it's designed for kernel testing I would think extending it
> > to kernel module testing might not be too hard.  In fact it may already
> > support this.  It might be worth spending the time to see what they've
> > done and if it's something we can use.
> > 
> >   http://autotest.kernel.org/
> >   http://autotest.kernel.org/wiki/WhitePaper
> > 




More information about the zfs-discuss mailing list