[zfs-discuss] FW: dedup enabled and bad write performance (although not short on RAM)

Richard Yao ryao at gentoo.org
Mon Apr 30 13:22:53 EDT 2018



> On Apr 30, 2018, at 1:16 PM, Richard Jahnel via zfs-discuss <zfs-discuss at list.zfsonlinux.org> wrote:
> 
> I suspect that this would mainly be due to the fact that the DDT metadata shares space in the l1arc as non DDT related l1arc things and portions of the DDT metadata are being aged out. My back of napkin math suggests that I would want in the ball park of 150gb of ram for a 30tb pool and preferably twice that.
>  
> I would be happy for someone to correct me on that.
>  
> 1024k / 64 = 16 blocks per mb
> 16 * 1024 = 16,384 blocks per gb
> 16,384 * 1024 = 16,777,216 blocks per tb
> 16,777,216 blocks * 320 = 5,368,709,120 bytes per tb (Each DDT entry is about 320 bytes)
> 5368709120 / 1024 = 5242880 kb per tb
> 5242880 / 1024 = 5120 mb per tb
>  
> This means 5gb of ram per tb of ddt pool data
> So a 30 tb pool requires 150gb of ram max to hold the dedup table in ram.
> You would probably want to double that to have enough space for L1ARC of data as well as the DDT metadata.

It depends on the record size and the number of unique records.  I recall calculating that you needed an absurdly small amount of space when using 16MB records. Something like 50MB per 1TB is unique records, but don’t quote me on that.

This memory per data metric is fairly bad though. I could easily “store” a 1PB file with a single DDT entry, provided that the 1PB file is say, all 1s. What matters is the number of unique records. That is harder to calculate. zdb can do it if I recall, provided that the DDT that would be used if the data on a pool were deduplicated is not so large that it runs out of memory.
>  
> Richard Jahnel
> Backups Team
> 2201 Lakeside Blvd, Richardson, Tx 75007
> Office: (972) 810-2527
>  
> From: zfs-discuss [mailto:zfs-discuss-bounces at list.zfsonlinux.org] On Behalf Of BeetRoot ServAdm via zfs-discuss
> Sent: Sunday, April 29, 2018 3:12 PM
> To: zfs-discuss at list.zfsonlinux.org
> Subject: [zfs-discuss] dedup enabled and bad write performance (although not short on RAM)
>  
> Hi all!
> 
> I have dedup enabled (+lz4 compression) on a 28T pool (on hw RAID-6 SAS), with SLOG and 100 GB SSD cache. My problem is bad write performance. On datasets without dedup, I get >300MB/s writes, but on datasets with dedup enabled, I only get ~25MB/s.
> 
> I suspect that this is due to DDT sitting (partly) on disk.
> 
> As far as I understand things, DDT is 15G
> ( 
>    dedup: DDT entries 106492891, size 291 on disk, 151 in core
>     => 106492891 * 151 / 1024^3 = 15G
> )
> 
> Server's RAM is 32G ECC and I have set zfs_arc_max to 22G.
> 
> During a test write (server free from other activities) with random data for quite some time, arcsz is only get around 13~14G and metadata misses do not fall below ~45%.
> 
> So, I don't understand these numbers. Metadata (which contains DDT) shouldn't be all sitting on RAM? Why are metadata misses so high? And why arc size doesn't get to use all allowed size (22G) during this procedure? How could I make all DDT always be on RAM, since it fits (15G << 22G)?
> 
> Below follows arcstats after some time that the write test has been going on.
>  
> Last note: trying to fix my issue, my cache pool settings are the following:
> primarycache => metadata
> secondarycache => all    
> 
> Thanks for all possible help!
> B.
> 
> time  miss%  dm%  pm%  mm%  arcsz  l2asize  l2miss%  l2read  l2bytes
> 22:53:41     34   35    7   32    13G     8.4G       57     806     874K
> 22:53:42     42   42    0   40    13G     8.4G       52     751     909K
> 22:53:43     46   46    0   46    13G     8.4G       45     599     830K
> 22:53:44     47   47    0   47    13G     8.4G       41     655     976K
> 22:53:45     45   45    0   45    12G     8.4G       40     718     1.1M
> 22:53:46     33   44    0   33    12G     8.4G       37     659     1.0M
> 22:53:47     43   44    0   43    12G     8.4G       33     648     1.1M
> 22:53:48     42   42    0   42    12G     8.4G       38     625     987K
> 22:53:49      4    4    9    4    13G     8.4G       16    1.1K     2.3M
> 22:53:50      6    4  100    6    13G     8.4G       35     838     1.3M
> 22:53:51     47   47    0   47    13G     8.4G       24     344     646K
> 22:53:52     44   44    0   44    14G     8.4G       30     581     1.0M
> 22:53:53     47   47    0   47    14G     8.4G       33     555     940K
> 22:53:54     46   46    0   46    14G     8.4G       39     427     659K
> 22:53:55     48   48    0   48    14G     8.4G       36     557     890K
> 22:53:56     49   49    0   49    14G     8.4G       34     679     1.1M
> 22:53:57     47   47    0   47    14G     8.4G       37     659     1.0M
> 22:53:58     48   48    0   48    14G     8.4G       35     655     1.1M
> 22:53:59     41   48    0   41    14G     8.4G       34     659     1.1M
> 22:54:00     45   45    0   45    14G     8.4G       32     691     1.2M
> 22:54:01     45   45    0   45    14G     8.4G       34     687     1.1M
> 22:54:02     46   46    0   46    14G     8.4G       35     689     1.1M
> 22:54:03     45   46    0   45    14G     8.4G       35     648     1.0M
> 22:54:04     49   49    0   49    14G     8.4G       36     652     1.0M
> 22:54:05     48   48    0   48    14G     8.4G       37     653     1.0M
> 22:54:07     47   49    0   47    14G     8.4G       33     682     1.1M
> 22:54:08     47   47    0   47    14G     8.4G       36     616     982K
> 22:54:09     48   49    0   48    14G     8.4G       30     707     1.2M
> 22:54:10     43   47    0   43    14G     8.4G       32     708     1.2M
> 22:54:11     46   46  100   43    14G     8.4G       47     794     1.0M
> 22:54:12     56   56    0   53    14G     8.4G       50     814    1012K
> 22:54:13     45   51    1   43    14G     8.4G       46     766     1.0M
> 22:54:14     47   47    0   47    14G     8.4G       33     692     1.2M
> 22:54:15     46   46    0   46    14G     8.4G       37     631     995K
> 22:54:16     47   47    0   47    14G     8.4G       34     682     1.1M
> 22:54:17     48   48    0   48    14G     8.4G       35     663     1.1M
> 22:54:18     47   48    0   47    14G     8.4G       35     661     1.1M
> 22:54:19     46   47    0   46    14G     8.4G       34     664     1.1M
> 22:54:20     45   46    2   45    14G     8.4G       35     664     1.1M
> 22:54:21     49   49    0   49    14G     8.4G       36     666     1.0M
> 22:54:22     50   50   55   50    14G     8.4G       38     706     1.1M
> 22:54:23     49   49    0   49    14G     8.4G       35     775     1.2M
> 22:54:24     48   48    0   48    13G     8.4G       35     713     1.1M
> 22:54:25     47   48    0   47    13G     8.4G       35     715     1.1M
> 22:54:26     48   48    0   48    13G     8.4G       37     690     1.1M
> 22:54:27     49   49   50   49    13G     8.4G       37     709     1.1M
> 22:54:28     46   46    0   46    13G     8.4G       37     782     1.2M
> 22:54:29     48   48    0   48    13G     8.4G       38     717     1.1M
> 22:54:30     37   41    0   37    13G     8.4G       34     687     1.1M
> 22:54:31     41   41    0   41    13G     8.4G       39     610     955K
> 22:54:32     43   44   28   43    13G     8.4G       35     639     1.0M
> 22:54:33     46   46    0   46    13G     8.4G       36     698     1.1M
> 22:54:34     45   45    0   45    13G     8.4G       34     727     1.2M
> 22:54:35     46   46    0   46    13G     8.4G       38     634     979K
> 22:54:36     43   43    0   43    13G     8.4G       36     679     1.1M
> 22:54:37     44   44   11   44    13G     8.4G       34     719     1.2M
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flist.zfsonlinux.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fzfs-discuss&data=02%7C01%7Crichard.jahnel%40realpage.com%7Cb8f04237850f407c4b4308d5ae0d874d%7C2c94bed6d6754d3da53b7b461fd6acc2%7C1%7C0%7C636606295485778665&sdata=J9PspR1LsyYUwRjUwj%2Bggvtr9%2Bf3EVi91F65niCCD%2F4%3D&reserved=0
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20180430/cf58ca84/attachment-0001.html>


More information about the zfs-discuss mailing list