[zfs-discuss] zfs receive performance

zfs-discuss at use.startmail.com zfs-discuss at use.startmail.com
Fri Feb 2 13:20:53 EST 2018

Hi list,

I was hoping someone on this list can give me some pointers: we're 
replacing a 5 year old Nexenta cluster with a zfsonlinux based one, and 
the performance of the new setup is somewhat disappointing.

The setup is roughly as follows:

 - clients access the server using nfs4 (the files used by these 
clients are encrypted ext4 images, with 4k block size, so any writes 
should be 4k)
 - the contents of the volumes are replicated using zfs send/recv to a 
second and third zfsonlinux machine for having a standby setup and an 
off-site backup

Each storage server is a Dell R730xd filled with 10 HGST Helium 10T 
(7200rpm, SATA, 4k) disks. All disks sit on a controller doing JBOD, no 
fancy stuff like write back. 
These disks are encrypted using LUKS, and then form a single pool with 
5 mirrors. Because of this LUKS layer, zfs is not changing the 
scheduler to noop, so that is something we're doing manually. We've 
also set ashift=11 manually, so it should work properly with the 4k 
sectors. Finally, we've set atime=off on the volume. Usage is hovering 
around 40% currently.
Other tweak was to increase the ARC size to 90% of RAM, as these 
machines are dedicated to providing NFS.

These machines have 96GB of RAM and a single E5-2603 CPU each.
Software wise, these are running Ubuntu 16.04 with zfsonlinux from the 
jonathonf ppa, now at version 0.7.5-0york1~16.04.

Besides that, there is a pair of NVME drives that are used for the ZIL, 
but those are not used when doing zfs receive according to zpool 
Finally, the machines are connected to a pair of switches using a LACP 
4*1 Gbps connection.

So what is the issue? The client load generates a steady stream of 
writes of about 5MBps. This for some bizarre reason ends up causing 50 
MBps of writes to the disks, but the disks manage just fine.

Our zfs send/receive setup is as follows:
 - we create snapshots on the source at 5 minute intervals using zfsnap 
(removing them after they are 3 days old);
 - we try to fire of a zfs send/receive continuously, with zfs send -I 
(capital i) so we get all snapshots on the client;
 - on the send side, we can saturate the network link (at 1Gbps, there 
is only a single ssh connection) when pumping to /dev/null on the 
receiving side
 - on the receive side, the system cannot keep up. Looking at zpool 
iostat again shows all 10 disks continuously trying to keep up at about 
200 iops, and replication is not quick enough to keep up with the 

When performing zfs send -i (lowercase i) it can keep up though! There 
probably is lots of data being modified.

This zfs recv behaviour is completely unexpected: did we mis a setting 
somewhere? If more information or numbers are needed, I can try to look 
them up of course.

(Oh, and the reason for mentioning nexenta: the old setup is quite 
similar architecturally, with the main differences being that it runs a 
solaris-like zfs, obviously doesn't have LUKS, and the disks inside it 
are 512 native.  Also, they hold more disks, forming 8 mirrors in 
total. Here, the zfs send/recv pair can keep up with the load quite 

Thanks for any help!

Jasper Spaans

               This line was last modified 0 seconds ago.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20180202/b38ffc49/attachment.html>

More information about the zfs-discuss mailing list