[zfs-discuss] memory congestion and high txg_sync txg_quiesce CPU utilization and very bad performance
cpumichael at gmail.com
cpumichael at gmail.com
Thu Dec 18 13:40:05 EST 2014
On Thursday, December 18, 2014 10:17:13 AM UTC-8, Tren Blackburn wrote:
>
> Hi Michael;
>
> On December 18, 2014 at 9:58:20 AM, cpumi... at gmail.com <javascript:> (
> cpumi... at gmail.com <javascript:>) wrote:
>
> I've got the latest ZOL installed on a box with 64G RAM, 72 6TB data disks
> in the zpool, and 2 SSDs partitioned to be used as a ZIL and L2ARC.
>
> Can you post a zpool status so we can see the layout of your pool? As
> well, by "latest" do you mean from the site, or from github?
>
Its the 0.6.3 zfsonlinux release, not the github one. zpool status is
attached.
>
>
> The ARC is set to be metadata only as this machine is an rsync backup box.
> All zfs filesystems have lz4 compression enabled. I haven't tuned any
> other ARC parameters.
>
> You might want to tune your ARC usage, perhaps set ARC to be 50% of ram?
> Remember that rsync will eat up a lot of memory as well if you're dealing
> with many small files. As well, the only thing your L2ARC will ever hold is
> metadata due to ARC being set to metadata only, so it may not be useful in
> this scenario. L2ARC can only ever hold things that lived in ARC. The ZIL
> is of questionable benefit as well, since I don't think rsync does
> synchronous writes (someone please correct me if I'm mistaken).
>
Yes, the ZIL does nothing right now, and the L2ARC seems to be OK. This is
a D2D disaster recovery filesystem that can be flipped into production in
the case of a disaster, and then we would switch the ARC to be metadata and
data, and the ZIL would be used more after the flip.
rsync does not use ram directly, only indirectly through Linux's vm system,
and that is what I would like to tune. Its also worth noting that the
filesystem being backed up is GPFS, which like ZFS does its own caching
strategies as well.
> Performance and reliability have been quite good, however recently we've
> noticed that the ZFS performance will plummet to about 0, and the txg_sync
> and txg_quiesce threads are spinning quite hard, and normal commands (eg,
> catting a small text file) get stuck in a D wait state, and this continues
> until the txg_sync and txg_quiesce threads calm down and the Linux memory
> buffers increase in size, and then everything is fine until this happens
> again. This seems to be happening about 1x per day, maybe more than that
> as I'm just enabled more monitoring of the machine.
>
> Perhaps it's ARC evicting headers to L2ARC? Can you also post
> /proc/spl/kstat/zfs/arcstat for us? This will show where memory is being
> used, as well as information about L2ARC usage.
>
I've attached it, but the system is working great right now. I didn't
touch anything, it just recovered on its own.
> Attached are some graphs of the ARC and Linux memory that is correlated
> with the performance drop, but I'm not sure which is the ultimate cause. I
> believe that its Linux's memory buffers getting exhausted, but I don't know
> how to prevent this from happening. I've added 25 megs of reserved memory
> via the vm.extra_free_kbyte sysctl setting, but this does not seem to solve
> the problem.
>
> I'd recommend using the vm.min_free_kbyte setting as this will ensure that
> much ram is kept free. As well, I'd recommend it being 1GB, not 25MB.
>
Great. That sounds good. I will do that. Thanks.
>
> In my monitoring of the last time that this happened, I've found that
> something interesting happened at around 08:22.
>
> The ARC Target Size, and ARC Size went down drastically as well as the
> memory buffers. The amount of free memory actually increases, but I'm
> concerned about the lack of memory buffers.
>
> I've done echo 3 > /proc/sys/vm/drop_caches, and I don't know if this
> fixes the problem or if the problem fixes itself over time.
>
> This seems to be a tuning issue either in ZFS or Linux, but I'm not sure
> what knobs to turn, and searching this list and the web doesn't seem to
> provide any insight.
>
> Anybody have any ideas what can be done to alleviate these issues?
>
> We need more information, but I'm sure you can tune most of these issues
> away.
>
Yes, me too, as I said, I'm just not sure what to tune. Thanks for the
tips, I'll up the min free value.
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20141218/46e1613e/attachment.html>
-------------- next part --------------
[root at zbackup backup]# zpool status
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
A0 ONLINE 0 0 0
B0 ONLINE 0 0 0
C0 ONLINE 0 0 0
A3 ONLINE 0 0 0
B3 ONLINE 0 0 0
C3 ONLINE 0 0 0
A6 ONLINE 0 0 0
B6 ONLINE 0 0 0
C6 ONLINE 0 0 0
A9 ONLINE 0 0 0
B9 ONLINE 0 0 0
C9 ONLINE 0 0 0
A12 ONLINE 0 0 0
B12 ONLINE 0 0 0
C12 ONLINE 0 0 0
A15 ONLINE 0 0 0
B15 ONLINE 0 0 0
C15 ONLINE 0 0 0
A18 ONLINE 0 0 0
B18 ONLINE 0 0 0
C18 ONLINE 0 0 0
A21 ONLINE 0 0 0
B21 ONLINE 0 0 0
C21 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
A1 ONLINE 0 0 0
B1 ONLINE 0 0 0
C1 ONLINE 0 0 0
A4 ONLINE 0 0 0
B4 ONLINE 0 0 0
C4 ONLINE 0 0 0
A7 ONLINE 0 0 0
B7 ONLINE 0 0 0
C7 ONLINE 0 0 0
A10 ONLINE 0 0 0
B10 ONLINE 0 0 0
C10 ONLINE 0 0 0
A13 ONLINE 0 0 0
B13 ONLINE 0 0 0
C13 ONLINE 0 0 0
A16 ONLINE 0 0 0
B16 ONLINE 0 0 0
C16 ONLINE 0 0 0
A19 ONLINE 0 0 0
B19 ONLINE 0 0 0
C19 ONLINE 0 0 0
A22 ONLINE 0 0 0
B22 ONLINE 0 0 0
C22 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
A2 ONLINE 0 0 0
B2 ONLINE 0 0 0
C2 ONLINE 0 0 0
A5 ONLINE 0 0 0
B5 ONLINE 0 0 0
C5 ONLINE 0 0 0
A8 ONLINE 0 0 0
B8 ONLINE 0 0 0
C8 ONLINE 0 0 0
A11 ONLINE 0 0 0
B11 ONLINE 0 0 0
C11 ONLINE 0 0 0
A14 ONLINE 0 0 0
B14 ONLINE 0 0 0
C14 ONLINE 0 0 0
A17 ONLINE 0 0 0
B17 ONLINE 0 0 0
C17 ONLINE 0 0 0
A20 ONLINE 0 0 0
B20 ONLINE 0 0 0
C20 ONLINE 0 0 0
A23 ONLINE 0 0 0
B23 ONLINE 0 0 0
C23 ONLINE 0 0 0
logs
mirror-3 ONLINE 0 0 0
Z1-part1 ONLINE 0 0 0
Z2-part1 ONLINE 0 0 0
cache
Z1-part2 ONLINE 0 0 0
Z2-part2 ONLINE 0 0 0
errors: No known data errors
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.
-------------- next part --------------
[root at zbackup backup]# cat /proc/spl/kstat/zfs/arcstats
5 1 0x01 85 4080 19916089778 3103641190879975
name type data
hits 4 13077926999
misses 4 238905974
demand_data_hits 4 0
demand_data_misses 4 8642060
demand_metadata_hits 4 8232799182
demand_metadata_misses 4 164521642
prefetch_data_hits 4 0
prefetch_data_misses 4 0
prefetch_metadata_hits 4 4845127817
prefetch_metadata_misses 4 65742272
mru_hits 4 586246032
mru_ghost_hits 4 31688880
mfu_hits 4 8164451047
mfu_ghost_hits 4 10453118
deleted 4 766763472
recycle_miss 4 34282709
mutex_miss 4 600363
evict_skip 4 19014338669
evict_l2_cached 4 768085074944
evict_l2_eligible 4 521208647680
evict_l2_ineligible 4 367544011776
hash_elements 4 30555651
hash_elements_max 4 30789778
hash_collisions 4 812773059
hash_chains 4 1048576
hash_chain_max 4 60
p 4 897688064
c 4 13266024448
c_min 4 4194304
c_max 4 33759293440
size 4 13207869480
hdr_size 4 193075488
data_size 4 1277440
meta_size 4 851442688
other_size 4 2281313440
anon_size 4 23845376
anon_evict_data 4 0
anon_evict_metadata 4 0
mru_size 4 706918400
mru_evict_data 4 0
mru_evict_metadata 4 52140032
mru_ghost_size 4 12559056384
mru_ghost_evict_data 4 12163015680
mru_ghost_evict_metadata 4 396040704
mfu_size 4 121956352
mfu_evict_data 4 0
mfu_evict_metadata 4 719872
mfu_ghost_size 4 303316480
mfu_ghost_evict_data 4 0
mfu_ghost_evict_metadata 4 303316480
l2_hits 4 98359057
l2_misses 4 140546895
l2_feeds 4 3103669
l2_rw_clash 4 41779
l2_read_bytes 4 80056634880
l2_write_bytes 4 60669944320
l2_writes_sent 4 563713
l2_writes_done 4 563713
l2_writes_error 4 0
l2_writes_hdr_miss 4 2756
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_free_on_write 4 376077
l2_abort_lowmem 4 34
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 223125421056
l2_asize 4 38743075328
l2_hdr_size 4 9880760424
l2_compress_successes 4 19736778
l2_compress_zeros 4 0
l2_compress_failures 4 2323
memory_throttle_count 4 0
duplicate_buffers 4 0
duplicate_buffers_size 4 0
duplicate_reads 4 0
memory_direct_count 4 295
memory_indirect_count 4 7446740
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 1948
arc_meta_used 4 13206592040
arc_meta_limit 4 25319470080
arc_meta_max 4 25445701984
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.
More information about the zfs-discuss
mailing list