[zfs-discuss] FW: dedup enabled and bad write performance (although not short on RAM)

Richard Jahnel Richard.Jahnel at RealPage.com
Mon Apr 30 13:16:04 EDT 2018


I suspect that this would mainly be due to the fact that the DDT metadata shares space in the l1arc as non DDT related l1arc things and portions of the DDT metadata are being aged out. My back of napkin math suggests that I would want in the ball park of 150gb of ram for a 30tb pool and preferably twice that.

I would be happy for someone to correct me on that.

1024k / 64 = 16 blocks per mb
16 * 1024 = 16,384 blocks per gb
16,384 * 1024 = 16,777,216 blocks per tb
16,777,216 blocks * 320 = 5,368,709,120 bytes per tb (Each DDT entry is about 320 bytes)
5368709120 / 1024 = 5242880 kb per tb
5242880 / 1024 = 5120 mb per tb

This means 5gb of ram per tb of ddt pool data
So a 30 tb pool requires 150gb of ram max to hold the dedup table in ram.
You would probably want to double that to have enough space for L1ARC of data as well as the DDT metadata.

Richard Jahnel
Backups Team
2201 Lakeside Blvd, Richardson, Tx 75007
Office: (972) 810-2527

From: zfs-discuss [mailto:zfs-discuss-bounces at list.zfsonlinux.org] On Behalf Of BeetRoot ServAdm via zfs-discuss
Sent: Sunday, April 29, 2018 3:12 PM
To: zfs-discuss at list.zfsonlinux.org
Subject: [zfs-discuss] dedup enabled and bad write performance (although not short on RAM)

Hi all!
I have dedup enabled (+lz4 compression) on a 28T pool (on hw RAID-6 SAS), with SLOG and 100 GB SSD cache. My problem is bad write performance. On datasets without dedup, I get >300MB/s writes, but on datasets with dedup enabled, I only get ~25MB/s.

I suspect that this is due to DDT sitting (partly) on disk.

As far as I understand things, DDT is 15G
(
   dedup: DDT entries 106492891, size 291 on disk, 151 in core
    => 106492891 * 151 / 1024^3 = 15G
)
Server's RAM is 32G ECC and I have set zfs_arc_max to 22G.
During a test write (server free from other activities) with random data for quite some time, arcsz is only get around 13~14G and metadata misses do not fall below ~45%.
So, I don't understand these numbers. Metadata (which contains DDT) shouldn't be all sitting on RAM? Why are metadata misses so high? And why arc size doesn't get to use all allowed size (22G) during this procedure? How could I make all DDT always be on RAM, since it fits (15G << 22G)?
Below follows arcstats after some time that the write test has been going on.

Last note: trying to fix my issue, my cache pool settings are the following:
primarycache => metadata
secondarycache => all

Thanks for all possible help!
B.

time  miss%  dm%  pm%  mm%  arcsz  l2asize  l2miss%  l2read  l2bytes
22:53:41     34   35    7   32    13G     8.4G       57     806     874K
22:53:42     42   42    0   40    13G     8.4G       52     751     909K
22:53:43     46   46    0   46    13G     8.4G       45     599     830K
22:53:44     47   47    0   47    13G     8.4G       41     655     976K
22:53:45     45   45    0   45    12G     8.4G       40     718     1.1M
22:53:46     33   44    0   33    12G     8.4G       37     659     1.0M
22:53:47     43   44    0   43    12G     8.4G       33     648     1.1M
22:53:48     42   42    0   42    12G     8.4G       38     625     987K
22:53:49      4    4    9    4    13G     8.4G       16    1.1K     2.3M
22:53:50      6    4  100    6    13G     8.4G       35     838     1.3M
22:53:51     47   47    0   47    13G     8.4G       24     344     646K
22:53:52     44   44    0   44    14G     8.4G       30     581     1.0M
22:53:53     47   47    0   47    14G     8.4G       33     555     940K
22:53:54     46   46    0   46    14G     8.4G       39     427     659K
22:53:55     48   48    0   48    14G     8.4G       36     557     890K
22:53:56     49   49    0   49    14G     8.4G       34     679     1.1M
22:53:57     47   47    0   47    14G     8.4G       37     659     1.0M
22:53:58     48   48    0   48    14G     8.4G       35     655     1.1M
22:53:59     41   48    0   41    14G     8.4G       34     659     1.1M
22:54:00     45   45    0   45    14G     8.4G       32     691     1.2M
22:54:01     45   45    0   45    14G     8.4G       34     687     1.1M
22:54:02     46   46    0   46    14G     8.4G       35     689     1.1M
22:54:03     45   46    0   45    14G     8.4G       35     648     1.0M
22:54:04     49   49    0   49    14G     8.4G       36     652     1.0M
22:54:05     48   48    0   48    14G     8.4G       37     653     1.0M
22:54:07     47   49    0   47    14G     8.4G       33     682     1.1M
22:54:08     47   47    0   47    14G     8.4G       36     616     982K
22:54:09     48   49    0   48    14G     8.4G       30     707     1.2M
22:54:10     43   47    0   43    14G     8.4G       32     708     1.2M
22:54:11     46   46  100   43    14G     8.4G       47     794     1.0M
22:54:12     56   56    0   53    14G     8.4G       50     814    1012K
22:54:13     45   51    1   43    14G     8.4G       46     766     1.0M
22:54:14     47   47    0   47    14G     8.4G       33     692     1.2M
22:54:15     46   46    0   46    14G     8.4G       37     631     995K
22:54:16     47   47    0   47    14G     8.4G       34     682     1.1M
22:54:17     48   48    0   48    14G     8.4G       35     663     1.1M
22:54:18     47   48    0   47    14G     8.4G       35     661     1.1M
22:54:19     46   47    0   46    14G     8.4G       34     664     1.1M
22:54:20     45   46    2   45    14G     8.4G       35     664     1.1M
22:54:21     49   49    0   49    14G     8.4G       36     666     1.0M
22:54:22     50   50   55   50    14G     8.4G       38     706     1.1M
22:54:23     49   49    0   49    14G     8.4G       35     775     1.2M
22:54:24     48   48    0   48    13G     8.4G       35     713     1.1M
22:54:25     47   48    0   47    13G     8.4G       35     715     1.1M
22:54:26     48   48    0   48    13G     8.4G       37     690     1.1M
22:54:27     49   49   50   49    13G     8.4G       37     709     1.1M
22:54:28     46   46    0   46    13G     8.4G       37     782     1.2M
22:54:29     48   48    0   48    13G     8.4G       38     717     1.1M
22:54:30     37   41    0   37    13G     8.4G       34     687     1.1M
22:54:31     41   41    0   41    13G     8.4G       39     610     955K
22:54:32     43   44   28   43    13G     8.4G       35     639     1.0M
22:54:33     46   46    0   46    13G     8.4G       36     698     1.1M
22:54:34     45   45    0   45    13G     8.4G       34     727     1.2M
22:54:35     46   46    0   46    13G     8.4G       38     634     979K
22:54:36     43   43    0   43    13G     8.4G       36     679     1.1M
22:54:37     44   44   11   44    13G     8.4G       34     719     1.2M
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20180430/eada8690/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT00001.txt
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20180430/eada8690/attachment-0001.txt>


More information about the zfs-discuss mailing list