Can someone explain this result?

Brian Behlendorf behlendorf1 at llnl.gov
Mon May 9 16:02:29 EDT 2011


A couple thoughts about this.  The second 'find' may counter intuitively
run slower than the first if your walking a large enough number of files
such that they cannot all be cached.  I believe this fits with your
observation that you don't see this for small numbers of files.

The reason is that on the first pass you simply need to allocate enough
memory to cache the new inode.  But on the second pass, you need to not
only allocate enough memory for the new inode.  But you may also need to
pick and inode to drop from the cache and have ZFS free the resources
associated with it.  Right now we rely on Linux which uses a simple LRU
to determine which inode should be dropped.  That means that cache hits
will be rare if the meta data for all the files doesn't fit in cache.
And remember we are strictly honoring the zfs_meta_limit now.

Ideally, the overhead for this will be low but I haven't had a chance to
profile,  It's certainly true there may be some lock contention here but
how bad it is I haven't profiled yet.

Concerning gzip compression performing worse I have a hunch about that
as well.  The gzip algorithm (unlike lzjb) requires a fair bit of
scratch working space in memory.  This memory needs to be contiguous
mapped in memory which means we need to use vmalloc().  Now vmalloc()
has always been notoriously slow in the kernel, and ZFS already puts
considerable pressure on it.  So the cost of vmalloc()'ing the needed
buffer space may be slowing things down considerably here.

Anyway, those are my best guesses as to what's going on.  And they are
all things which need to be carefully looked at once serious work gets
underway to improve the performance.  The good news is there's no reak
reason this can't all be improved.

-- 
Thanks,
Brian 

On Sun, 2011-05-08 at 09:11 -0700, devsk wrote:
> As I said, the same issue happened when I was using the rootfs WITHOUT
> the L2ARC cache device. The RAM as a cache device is definitely a
> faster device than disk. Unless our cache implementation has races and
> is blocking on things other than the typical physical access latency.
> That's a better explanation.
> 
> L2ARC device (Indilinx based Vertex 1) I have in there is fine. Its
> capable of much faster random access than HDD.
> 
> One more angle I have on this is that this has been reproduced only on
> gzip compression based FSs so far. I can't reproduce this on FSs with
> no compression or default compression. Some intricate CPU/memory race.
> 
> Also, noteworthy is the fact that two successive invocations of find
> on default or no compression based FSs take the same time. There is
> absolutely no cache advantage!
> 
> And you have to do the find on a fairly large FS with lots of files
> for it to be an effective test.
> 
> -devsk
> 
> 
> 
> On May 8, 12:24 am, Christ Schlacta  <aarc... at aarcane.org> wrote:
> > Only this g I can think of is a crappy cache device.  
> >
> >
> >
> >
> >
> >
> >
> > devsk <dev... at gmail.com> wrote:
> > ># time find rsynced/|wc -l
> > >384736
> >
> > >real    0m50.615s
> > >user    0m0.350s
> > >sys     0m19.404s
> >
> > ># time find rsynced/|wc -l
> > >384736
> >
> > >real    2m5.708s
> > >user    0m0.315s
> > >sys     0m5.185s
> >
> > ># time find rsynced/|wc -l
> > >384736
> >
> > >real    1m53.062s
> > >user    0m0.350s
> > >sys     0m4.418s
> >
> > >The FS is idle at this time and so is the pool. This is not rootfs.
> > >Its a RAIDZ1 pool with dedup and compression on. There is an SSD
> > >configured as a cache device.
> >
> > >How is it possible that when I run the first time, I get the fastest
> > >time while subsequent runs are slower by more than half. I am baffled.
> > >And this is not the first time ZFS on Linux has done this: this has
> > >been seen multiple times now because it did this when I was running
> > >ZFS rootfs on my laptop.
> >
> > >-devsk



More information about the zfs-discuss mailing list