[zfs-discuss] Re: no activity, uses all memory then crashes

Mikko Tanner mikko.tanner at gmail.com
Mon Jun 4 09:51:57 EDT 2012


Hey again,

RCU stalls often happen on ZoL (presently) because min_free_kbytes is
too low. Try setting that to 256MB or so, that should alleviate the
stalls.

If at all possible, upgrade that system's RAM to at least 32GB-48GB
(even if only temporarily). Memory is cheap, and that should get you
over the issue at hand. It's clear now that memory starvation causes
your issues. The behaviour you describe is consistent with how ZFS
handles dedup tables, especially if there is an outstanding destroy
operation waiting in queue: first the system needs to read in ALL of
DDTs to be able to decrement the counts as data is removed from pool.
If you run out of ARC metadata allotment during this read, you will
start to spill out to L2ARC, BUT those pointers also take up some
metadata. In the end you will run out of memory either way, and
probably will have to wait a long time for the system to settle down
enough to be able to issue zfs commands. If you don't crash before
that.

Basically, 4GB of metadata is way too low in my estimation. I'd try to
get that up to 16GB, that _should_ be enough for your dataset. I
wouldn't even turn on dedup on a pool without at least 48GB RAM,
personally.

-M

On Jun 4, 3:35 pm, Stephane Chazelas <stephane.chaze... at gmail.com>
wrote:
> 2012-06-04 13:53:42 +0200, Niels de Carpentier:
> [...]
>
> > > For instance, shortly before a crash:
> > > zio_buf_4096 0x10040 9942073344 4795518976   262144     4096  37926 37879
> > > 37926  1175706 1170781 1175677
>
> > > 9GB with arc_max=4GB. I also monitored the arc site with arcstat
> > > and it wasn't going over the limit.
>
> > I've raised this before, because the alignment requirements and the header
> > used before each entry almost half the memory is wasted by the slab for
> > these objects. The arc only counts the objects use for it's limit, so a
> > lot more memory can be used than arc_max implies. With fragementation this
> > becomes even worse. I'm working on a patch to address this, but have been
> > extremely busy with work the last few months so haven't been able to make
> > much progress.
>
> I'm not sure it's the same thing here. We do end up with almost
> 16GB being used with an arc_max of 4GB. I don't think
> fragmentation can cause that.
>
> > zdb -DD should tell you the size of the DDT tables.
>
> Thanks.
>
> > Have you tried increasing /proc/sys/vm/min_free_kbytes to a high value to
> > prevent oom related crashes?
>
> [...]
>
> It's not oom here, it's sequences of "failure to allocate a tage
> (84)" (sic), or
>
> [15181.312533] SLUB: Unable to allocate memory on node -1 (gfp=0xd0)^M
> [15181.312536]   cache: kmalloc-64, object size: 64, buffer size: 64, default order: 0, min order: 0^M
> [15181.312538]   node 0: slabs: 23811, objs: 1523904, free: 0^M
> [15181.312540]   node 1: slabs: 7956, objs: 509184, free: 9^M
> [15181.312568] z_wr_iss/6: page allocation failure: order:0, mode:0xd0^M
>
> or
>
> [20845.640000] INFO: rcu_sched_state detected stall on CPU 7 (t=375242 jiffies)^M
>
> or combinations thereof.
>
> Also the pattern of memory consumption suggests something else.
> Like constant usage, then increases slowly, then increase
> quickly (a few GB in a few seconds) before crash.
>
> Having said that. The system has now been up for 5 hours with no
> crash. Disks busy 60%, memory usage stable around 8GB.
>
> A difference this time is that the FS are not mounted, possibly
> because there's a "zpool export" running (which hasn't returned
> yet after a few hours). zpool status doesn't return either.
>
> Something is definitely happening under the hood. Hopefully
> something will come out of it (though not a single byte has been
> written to the disks yet according to dstat).
>
> Again, there's nothing accessing the pool, the system is totally
> idle.
>
> I've got 60MB worth of serial console capture from a few months
> back including the ouput of sysrq-w run most of the time when
> the system crashes in case somebody can make sense out of it.
> zfs events go on the serial console as well.
>
> --
> Stephane



More information about the zfs-discuss mailing list