[zfs-discuss] zfs receive performance
zfs-discuss at use.startmail.com
zfs-discuss at use.startmail.com
Fri Feb 2 13:20:53 EST 2018
I was hoping someone on this list can give me some pointers: we're
replacing a 5 year old Nexenta cluster with a zfsonlinux based one, and
the performance of the new setup is somewhat disappointing.
The setup is roughly as follows:
- clients access the server using nfs4 (the files used by these
clients are encrypted ext4 images, with 4k block size, so any writes
should be 4k)
- the contents of the volumes are replicated using zfs send/recv to a
second and third zfsonlinux machine for having a standby setup and an
Each storage server is a Dell R730xd filled with 10 HGST Helium 10T
(7200rpm, SATA, 4k) disks. All disks sit on a controller doing JBOD, no
fancy stuff like write back.
These disks are encrypted using LUKS, and then form a single pool with
5 mirrors. Because of this LUKS layer, zfs is not changing the
scheduler to noop, so that is something we're doing manually. We've
also set ashift=11 manually, so it should work properly with the 4k
sectors. Finally, we've set atime=off on the volume. Usage is hovering
around 40% currently.
Other tweak was to increase the ARC size to 90% of RAM, as these
machines are dedicated to providing NFS.
These machines have 96GB of RAM and a single E5-2603 CPU each.
Software wise, these are running Ubuntu 16.04 with zfsonlinux from the
jonathonf ppa, now at version 0.7.5-0york1~16.04.
Besides that, there is a pair of NVME drives that are used for the ZIL,
but those are not used when doing zfs receive according to zpool
Finally, the machines are connected to a pair of switches using a LACP
4*1 Gbps connection.
So what is the issue? The client load generates a steady stream of
writes of about 5MBps. This for some bizarre reason ends up causing 50
MBps of writes to the disks, but the disks manage just fine.
Our zfs send/receive setup is as follows:
- we create snapshots on the source at 5 minute intervals using zfsnap
(removing them after they are 3 days old);
- we try to fire of a zfs send/receive continuously, with zfs send -I
(capital i) so we get all snapshots on the client;
- on the send side, we can saturate the network link (at 1Gbps, there
is only a single ssh connection) when pumping to /dev/null on the
- on the receive side, the system cannot keep up. Looking at zpool
iostat again shows all 10 disks continuously trying to keep up at about
200 iops, and replication is not quick enough to keep up with the
When performing zfs send -i (lowercase i) it can keep up though! There
probably is lots of data being modified.
This zfs recv behaviour is completely unexpected: did we mis a setting
somewhere? If more information or numbers are needed, I can try to look
them up of course.
(Oh, and the reason for mentioning nexenta: the old setup is quite
similar architecturally, with the main differences being that it runs a
solaris-like zfs, obviously doesn't have LUKS, and the disks inside it
are 512 native. Also, they hold more disks, forming 8 mirrors in
total. Here, the zfs send/recv pair can keep up with the load quite
Thanks for any help!
This line was last modified 0 seconds ago.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the zfs-discuss