[zfs-discuss] dedup enabled and bad write performance (although not short on RAM)

Richard Yao ryao at gentoo.org
Sun Apr 29 18:00:25 EDT 2018


I am not sure why zpool status claims that the size in core is smaller than the size on disk. The traditional number used for the space used is 320 bytes. That number was misleading due to there never being more than say a few hundred entries of this size in RAM because they are fetched from ARC and kept around for a short period. It was a surprise to me when I noticed that behavior, but it made sense. Integrating into ARC otherwise would have been far more messy had it not been done that way.

Since compressed ARC is now a thing, DDT entries likely use less space in cache than they previously did, but they should not use less space than they use on disk. I have not scrutinized the accounting, but it sounds like there is a bug somewhere. Either that, or we are missing an optimization opportunity. Your observations might make more sense if the ARC were double caching the DDT due to ditto blocks, the on disk number included the ditto blocks and the in-core number ignored the ditto blocks, but I would need to find time to look into that to be certain. It is just a hypothesis at this point.

I am CCing Matt on this one. He might know how the dedup statistics are calculated offhand.

> On Apr 29, 2018, at 4:12 PM, BeetRoot ServAdm via zfs-discuss <zfs-discuss at list.zfsonlinux.org> wrote:
> 
> Hi all!
> 
> I have dedup enabled (+lz4 compression) on a 28T pool (on hw RAID-6 SAS), with SLOG and 100 GB SSD cache. My problem is bad write performance. On datasets without dedup, I get >300MB/s writes, but on datasets with dedup enabled, I only get ~25MB/s.
> 
> I suspect that this is due to DDT sitting (partly) on disk.
> 
> As far as I understand things, DDT is 15G
> ( 
>    dedup: DDT entries 106492891, size 291 on disk, 151 in core
>     => 106492891 * 151 / 1024^3 = 15G
> )
> 
> Server's RAM is 32G ECC and I have set zfs_arc_max to 22G.
> 
> During a test write (server free from other activities) with random data for quite some time, arcsz is only get around 13~14G and metadata misses do not fall below ~45%.
> 
> So, I don't understand these numbers. Metadata (which contains DDT) shouldn't be all sitting on RAM? Why are metadata misses so high? And why arc size doesn't get to use all allowed size (22G) during this procedure? How could I make all DDT always be on RAM, since it fits (15G << 22G)?
> 
> Below follows arcstats after some time that the write test has been going on.
> 
> Last note: trying to fix my issue, my cache pool settings are the following:
> primarycache => metadata
> secondarycache => all     
> 
> Thanks for all possible help!
> B.
> 
> time  miss%  dm%  pm%  mm%  arcsz  l2asize  l2miss%  l2read  l2bytes
> 22:53:41     34   35    7   32    13G     8.4G       57     806     874K
> 22:53:42     42   42    0   40    13G     8.4G       52     751     909K
> 22:53:43     46   46    0   46    13G     8.4G       45     599     830K
> 22:53:44     47   47    0   47    13G     8.4G       41     655     976K
> 22:53:45     45   45    0   45    12G     8.4G       40     718     1.1M
> 22:53:46     33   44    0   33    12G     8.4G       37     659     1.0M
> 22:53:47     43   44    0   43    12G     8.4G       33     648     1.1M
> 22:53:48     42   42    0   42    12G     8.4G       38     625     987K
> 22:53:49      4    4    9    4    13G     8.4G       16    1.1K     2.3M
> 22:53:50      6    4  100    6    13G     8.4G       35     838     1.3M
> 22:53:51     47   47    0   47    13G     8.4G       24     344     646K
> 22:53:52     44   44    0   44    14G     8.4G       30     581     1.0M
> 22:53:53     47   47    0   47    14G     8.4G       33     555     940K
> 22:53:54     46   46    0   46    14G     8.4G       39     427     659K
> 22:53:55     48   48    0   48    14G     8.4G       36     557     890K
> 22:53:56     49   49    0   49    14G     8.4G       34     679     1.1M
> 22:53:57     47   47    0   47    14G     8.4G       37     659     1.0M
> 22:53:58     48   48    0   48    14G     8.4G       35     655     1.1M
> 22:53:59     41   48    0   41    14G     8.4G       34     659     1.1M
> 22:54:00     45   45    0   45    14G     8.4G       32     691     1.2M
> 22:54:01     45   45    0   45    14G     8.4G       34     687     1.1M
> 22:54:02     46   46    0   46    14G     8.4G       35     689     1.1M
> 22:54:03     45   46    0   45    14G     8.4G       35     648     1.0M
> 22:54:04     49   49    0   49    14G     8.4G       36     652     1.0M
> 22:54:05     48   48    0   48    14G     8.4G       37     653     1.0M
> 22:54:07     47   49    0   47    14G     8.4G       33     682     1.1M
> 22:54:08     47   47    0   47    14G     8.4G       36     616     982K
> 22:54:09     48   49    0   48    14G     8.4G       30     707     1.2M
> 22:54:10     43   47    0   43    14G     8.4G       32     708     1.2M
> 22:54:11     46   46  100   43    14G     8.4G       47     794     1.0M
> 22:54:12     56   56    0   53    14G     8.4G       50     814    1012K
> 22:54:13     45   51    1   43    14G     8.4G       46     766     1.0M
> 22:54:14     47   47    0   47    14G     8.4G       33     692     1.2M
> 22:54:15     46   46    0   46    14G     8.4G       37     631     995K
> 22:54:16     47   47    0   47    14G     8.4G       34     682     1.1M
> 22:54:17     48   48    0   48    14G     8.4G       35     663     1.1M
> 22:54:18     47   48    0   47    14G     8.4G       35     661     1.1M
> 22:54:19     46   47    0   46    14G     8.4G       34     664     1.1M
> 22:54:20     45   46    2   45    14G     8.4G       35     664     1.1M
> 22:54:21     49   49    0   49    14G     8.4G       36     666     1.0M
> 22:54:22     50   50   55   50    14G     8.4G       38     706     1.1M
> 22:54:23     49   49    0   49    14G     8.4G       35     775     1.2M
> 22:54:24     48   48    0   48    13G     8.4G       35     713     1.1M
> 22:54:25     47   48    0   47    13G     8.4G       35     715     1.1M
> 22:54:26     48   48    0   48    13G     8.4G       37     690     1.1M
> 22:54:27     49   49   50   49    13G     8.4G       37     709     1.1M
> 22:54:28     46   46    0   46    13G     8.4G       37     782     1.2M
> 22:54:29     48   48    0   48    13G     8.4G       38     717     1.1M
> 22:54:30     37   41    0   37    13G     8.4G       34     687     1.1M
> 22:54:31     41   41    0   41    13G     8.4G       39     610     955K
> 22:54:32     43   44   28   43    13G     8.4G       35     639     1.0M
> 22:54:33     46   46    0   46    13G     8.4G       36     698     1.1M
> 22:54:34     45   45    0   45    13G     8.4G       34     727     1.2M
> 22:54:35     46   46    0   46    13G     8.4G       38     634     979K
> 22:54:36     43   43    0   43    13G     8.4G       36     679     1.1M
> 22:54:37     44   44   11   44    13G     8.4G       34     719     1.2M
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss



More information about the zfs-discuss mailing list