[zfs-discuss] zfs receive performance

Alex Chekholko alex at calicolabs.com
Mon Feb 5 12:51:07 EST 2018


While I'm not very familiar with zfs internals; I could imagine a situation
where you are doing pure random 4K I/O (which it sounds like you are,
writing to an encrypted volume that has ext4 inside), and those
transactions getting nicely reordered locally because of ARC and how
transaction groups work, etc, but then when you send the transactions over
to the other host, they get re-played in a suboptimal way.  But I am only
speculating.  Is there a way for you to compare the order of transactions
on the two systems or something like that?

On Mon, Feb 5, 2018 at 2:13 AM, <zfs-discuss at use.startmail.com> wrote:

> Hi Alex,
>
> Thanks for the reply. Those 1k IOPS saturating the disks is not what
> surprises me - what does surprise me is that there are 1k IOPS.
>
> I don't really understand how performing an incremental zfs send/receive
> would lead to far more IOPS on the receiving side, compared to what
> happened on the server when clients were writing their data, I actually
> would have expected there to be less changes to be transferred.
>
> (When looking at our disk stats, the production machine fluctuates between
> 250 and 1250 writes per second across the day with an average of about 500,
> whereas the replicas are constantly doing above 2000.)
>
> We are experimenting with ripping out LUKS on one of our machines to see
> if that might help, but filling the machine with testing data takes about 3
> days.
>
> Jasper
>
> On Friday, February 2, 2018 at 7:27 PM, Alex Chekholko <
> alex at calicolabs.com> wrote:
>
>
> At first glance, it sounds to me like your bottleneck is the 1k IOPS max
> at your disk backend.
>
> 1k IOPS is not a lot if that is your total I/O available to serve many
> VMs.  On your primary host you mask that with some caching.
>
> On Fri, Feb 2, 2018 at 10:20 AM, zfs-discuss--- via zfs-discuss <
> zfs-discuss at list.zfsonlinux.org> wrote:
>
> Hi list,
>
> I was hoping someone on this list can give me some pointers: we're
> replacing a 5 year old Nexenta cluster with a zfsonlinux based one, and the
> performance of the new setup is somewhat disappointing.
>
> The setup is roughly as follows:
>
>  - clients access the server using nfs4 (the files used by these clients
> are encrypted ext4 images, with 4k block size, so any writes should be 4k)
>  - the contents of the volumes are replicated using zfs send/recv to a
> second and third zfsonlinux machine for having a standby setup and an
> off-site backup
>
> Each storage server is a Dell R730xd filled with 10 HGST Helium 10T
> (7200rpm, SATA, 4k) disks. All disks sit on a controller doing JBOD, no
> fancy stuff like write back.
> These disks are encrypted using LUKS, and then form a single pool with 5
> mirrors. Because of this LUKS layer, zfs is not changing the scheduler to
> noop, so that is something we're doing manually. We've also set ashift=11
> manually, so it should work properly with the 4k sectors. Finally, we've
> set atime=off on the volume. Usage is hovering around 40% currently.
> Other tweak was to increase the ARC size to 90% of RAM, as these machines
> are dedicated to providing NFS.
>
> These machines have 96GB of RAM and a single E5-2603 CPU each.
> Software wise, these are running Ubuntu 16.04 with zfsonlinux from the
> jonathonf ppa, now at version 0.7.5-0york1~16.04.
>
> Besides that, there is a pair of NVME drives that are used for the ZIL,
> but those are not used when doing zfs receive according to zpool iostat.
> Finally, the machines are connected to a pair of switches using a LACP 4*1
> Gbps connection.
>
> So what is the issue? The client load generates a steady stream of writes
> of about 5MBps. This for some bizarre reason ends up causing 50 MBps of
> writes to the disks, but the disks manage just fine.
>
> Our zfs send/receive setup is as follows:
>  - we create snapshots on the source at 5 minute intervals using zfsnap
> (removing them after they are 3 days old);
>  - we try to fire of a zfs send/receive continuously, with zfs send -I
> (capital i) so we get all snapshots on the client;
>  - on the send side, we can saturate the network link (at 1Gbps, there is
> only a single ssh connection) when pumping to /dev/null on the receiving
> side
>  - on the receive side, the system cannot keep up. Looking at zpool iostat
> again shows all 10 disks continuously trying to keep up at about 200 iops,
> and replication is not quick enough to keep up with the changes.
>
> When performing zfs send -i (lowercase i) it can keep up though! There
> probably is lots of data being modified.
>
> This zfs recv behaviour is completely unexpected: did we mis a setting
> somewhere? If more information or numbers are needed, I can try to look
> them up of course.
>
> (Oh, and the reason for mentioning nexenta: the old setup is quite similar
> architecturally, with the main differences being that it runs a
> solaris-like zfs, obviously doesn't have LUKS, and the disks inside it are
> 512 native.  Also, they hold more disks, forming 8 mirrors in total. Here,
> the zfs send/recv pair can keep up with the load quite easily.)
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20180205/41444b97/attachment.html>


More information about the zfs-discuss mailing list