[zfs-discuss] dedup enabled and bad write performance (although not short on RAM)

Richard Elling richard.elling at richardelling.com
Sun Apr 29 18:56:01 EDT 2018


> On Apr 29, 2018, at 1:12 PM, BeetRoot ServAdm via zfs-discuss <zfs-discuss at list.zfsonlinux.org> wrote:
> 
> Hi all!
> 
> I have dedup enabled (+lz4 compression) on a 28T pool (on hw RAID-6 SAS), with SLOG and 100 GB SSD cache. My problem is bad write performance. On datasets without dedup, I get >300MB/s writes, but on datasets with dedup enabled, I only get ~25MB/s.

The checksum algorithm is, by default, much more computationally expensive when
dedup is enabled. To examine this variable's impact on performance, you can set the
checksum algorithm to be the same with and without dedup.

> 
> I suspect that this is due to DDT sitting (partly) on disk.

This is a common case. Portions are loaded on demand, so unless there was
previous demand, then you'll see metadata misses in the arcstats.

> 
> As far as I understand things, DDT is 15G
> ( 
>    dedup: DDT entries 106492891, size 291 on disk, 151 in core
>     => 106492891 * 151 / 1024^3 = 15G
> )
> 
> Server's RAM is 32G ECC and I have set zfs_arc_max to 22G.

... and the setting for zfs_arc_meta_limit_percent or zfs_arc_meta_limit is?
hint: /proc/spl/kstat/zfs/arcstats shows the current values of:
+ arc_meta_used
+ arc_meta_limit
+ arc_meta_max

and you can consider setting zfs_arc_meta_min
+ arc_meta_min

> 
> During a test write (server free from other activities) with random data for quite some time, arcsz is only get around 13~14G and metadata misses do not fall below ~45%.
> 
> So, I don't understand these numbers. Metadata (which contains DDT) shouldn't be all sitting on RAM? Why are metadata misses so high? And why arc size doesn't get to use all allowed size (22G) during this procedure? How could I make all DDT always be on RAM, since it fits (15G << 22G)?
> 
> Below follows arcstats after some time that the write test has been going on.

Percentages are useless without the rate. For example, if I have 2 accesses,
one hit + one miss, then the miss% = 50%, but that condition is not comparable
to the same miss% when the access rate is 1Mps.

> 
> Last note: trying to fix my issue, my cache pool settings are the following:
> primarycache => metadata
> secondarycache => all     

If it isn't in the primary cache, it won't make it to the secondary cache.
Also, your arcstats show a high data miss rate. If this is a write-only workload, then
the high data miss rate indicates the recordsize/volblocksize is not well tuned to the
workload. In these situations, it is possible to get better performance by setting
primarycache=all. YMMV, tune accordingly.
 -- richard

> 
> Thanks for all possible help!
> B.
> 
> time  miss%  dm%  pm%  mm%  arcsz  l2asize  l2miss%  l2read  l2bytes
> 22:53:41     34   35    7   32    13G     8.4G       57     806     874K
> 22:53:42     42   42    0   40    13G     8.4G       52     751     909K
> 22:53:43     46   46    0   46    13G     8.4G       45     599     830K
> 22:53:44     47   47    0   47    13G     8.4G       41     655     976K
> 22:53:45     45   45    0   45    12G     8.4G       40     718     1.1M
> 22:53:46     33   44    0   33    12G     8.4G       37     659     1.0M
> 22:53:47     43   44    0   43    12G     8.4G       33     648     1.1M
> 22:53:48     42   42    0   42    12G     8.4G       38     625     987K
> 22:53:49      4    4    9    4    13G     8.4G       16    1.1K     2.3M
> 22:53:50      6    4  100    6    13G     8.4G       35     838     1.3M
> 22:53:51     47   47    0   47    13G     8.4G       24     344     646K
> 22:53:52     44   44    0   44    14G     8.4G       30     581     1.0M
> 22:53:53     47   47    0   47    14G     8.4G       33     555     940K
> 22:53:54     46   46    0   46    14G     8.4G       39     427     659K
> 22:53:55     48   48    0   48    14G     8.4G       36     557     890K
> 22:53:56     49   49    0   49    14G     8.4G       34     679     1.1M
> 22:53:57     47   47    0   47    14G     8.4G       37     659     1.0M
> 22:53:58     48   48    0   48    14G     8.4G       35     655     1.1M
> 22:53:59     41   48    0   41    14G     8.4G       34     659     1.1M
> 22:54:00     45   45    0   45    14G     8.4G       32     691     1.2M
> 22:54:01     45   45    0   45    14G     8.4G       34     687     1.1M
> 22:54:02     46   46    0   46    14G     8.4G       35     689     1.1M
> 22:54:03     45   46    0   45    14G     8.4G       35     648     1.0M
> 22:54:04     49   49    0   49    14G     8.4G       36     652     1.0M
> 22:54:05     48   48    0   48    14G     8.4G       37     653     1.0M
> 22:54:07     47   49    0   47    14G     8.4G       33     682     1.1M
> 22:54:08     47   47    0   47    14G     8.4G       36     616     982K
> 22:54:09     48   49    0   48    14G     8.4G       30     707     1.2M
> 22:54:10     43   47    0   43    14G     8.4G       32     708     1.2M
> 22:54:11     46   46  100   43    14G     8.4G       47     794     1.0M
> 22:54:12     56   56    0   53    14G     8.4G       50     814    1012K
> 22:54:13     45   51    1   43    14G     8.4G       46     766     1.0M
> 22:54:14     47   47    0   47    14G     8.4G       33     692     1.2M
> 22:54:15     46   46    0   46    14G     8.4G       37     631     995K
> 22:54:16     47   47    0   47    14G     8.4G       34     682     1.1M
> 22:54:17     48   48    0   48    14G     8.4G       35     663     1.1M
> 22:54:18     47   48    0   47    14G     8.4G       35     661     1.1M
> 22:54:19     46   47    0   46    14G     8.4G       34     664     1.1M
> 22:54:20     45   46    2   45    14G     8.4G       35     664     1.1M
> 22:54:21     49   49    0   49    14G     8.4G       36     666     1.0M
> 22:54:22     50   50   55   50    14G     8.4G       38     706     1.1M
> 22:54:23     49   49    0   49    14G     8.4G       35     775     1.2M
> 22:54:24     48   48    0   48    13G     8.4G       35     713     1.1M
> 22:54:25     47   48    0   47    13G     8.4G       35     715     1.1M
> 22:54:26     48   48    0   48    13G     8.4G       37     690     1.1M
> 22:54:27     49   49   50   49    13G     8.4G       37     709     1.1M
> 22:54:28     46   46    0   46    13G     8.4G       37     782     1.2M
> 22:54:29     48   48    0   48    13G     8.4G       38     717     1.1M
> 22:54:30     37   41    0   37    13G     8.4G       34     687     1.1M
> 22:54:31     41   41    0   41    13G     8.4G       39     610     955K
> 22:54:32     43   44   28   43    13G     8.4G       35     639     1.0M
> 22:54:33     46   46    0   46    13G     8.4G       36     698     1.1M
> 22:54:34     45   45    0   45    13G     8.4G       34     727     1.2M
> 22:54:35     46   46    0   46    13G     8.4G       38     634     979K
> 22:54:36     43   43    0   43    13G     8.4G       36     679     1.1M
> 22:54:37     44   44   11   44    13G     8.4G       34     719     1.2M
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss



More information about the zfs-discuss mailing list