[zfs-discuss] Re: ZFS memory usage

Prakash Surya surya1 at llnl.gov
Wed Sep 4 19:25:21 EDT 2013

On Wed, Sep 04, 2013 at 08:34:16PM +0200, Niels de Carpentier wrote:
> > Feature.. Bug.. That's picking nits, either way it's not preferable.
> > But, yea, you seem to understand the issue precisely. I _think_ that's
> > happening with objects such as dnodes (which have they're own slab), but
> > again, I haven't checked.
> >
> > If that _is_ happening with dnodes, it _might_ be part of the reason why
> > metadata heavy workloads is a sore spot for the Linux port at the
> > moment.
> >
> > If you have some cycles to look into this, it'd be useful to get an idea
> > of how much space we're wasting, why, and for what objects. That's the
> > first step to fixing it.
> I already spent some time looking at this, and the problem is mainly with
> the zio buffers. These have alignment requirements that cause high
> overhead especially for the smaller buffers. I think metadata heavy
> workloads are a problem because they mainly use small blocks (= high
> overhead), while data mainly uses larger blocks.
> The worst ones are the bonus buffers, which need 320B, use a 512B zio
> buffer and so actually use 1024B, while only 320B use is registered by the
> ARC. I don't think these buffers need to be 512B aligned, so a separate
> slab for these would be useful.

Sigh, that's not good at all.

> Another issue is that not all memory used by ZFS is accounted for in the
> SLAB, and so the actual memory usage is not just dependent on arc_max.
> Some of the memory usage is not actually not visible at all, as it uses
> kmalloc, and isn't registered anywhere. (I need to turn on memory
> debugging, as it should be visible then). Others use the slab and so are
> visible, but there are lots of users of the zio buffers, and so it's very
> hard to see what they are actually used for.
> Actual ARC memory use can easily be 50% over ARC size, and total memory
> use can easily be 100% over ARC size. Metadata is mainly a problem because

Yes, I can easily believe that.

> of the small block sizes, and the znode cache which can grow quite big as
> well.
> A quick solution might be to use the kernel slab allocator for the small
> (non vmem backed) zio buffers, and a separate slab for the bonus buffers.
> These are the main problem, and since they don't require virtual memory
> and don't have destructors using the kernel allocator shouldn't be a
> problem. And SLUB can pack the buffers tightly avoiding the overhead.

I believe that idea has been kicked around in the past, but hasn't
gained much traction since that's really just a workaround; and time
spent doing that would be better spent on the real fix (i.e. remove the
dependence on vmalloc and integrate with the linux page cache).

Cheers, Prakash

> Niels
> To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.

To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.

More information about the zfs-discuss mailing list