[zfs-discuss] ZFS zvols for KVM virtual machines storage
gmm at csdoc.com
Thu Dec 28 13:44:50 EST 2017
On 28.12.2017 19:21, Gionatan Danti wrote:
>> As far I know, XFS by default always use block size 4096 bytes
>> for I/O, so read-modify-write with XFS inside VM will never occur.
> I stand partially corrected: recent xfs versions (at least what provided
> with RHEL 7.4+) default to use 4k block size *with* read/modify/write
> behavior for smaller-than-blocksize writes, unless you use O_DIRECT. I
> just tested it writing 512 bytes to a 4k xfs filesystem: when not using
> O_DIRECT, 4k read/writes where issued to the disk.
In virtual machines I/O with O_DIRECT can generate only Percona Server
with innodb_flush_method = O_DIRECT, but Percona Server uses 16k pages.
> However, start using O_DIRECT and no more read/modify/write occurs on
> the filesystem data portion; on the other side, journal metadata updates
> seems to be *always* full 4k aligned blocks. As you are using Qemu/KVM
> with disabled host caching, which implies O_DIRECT, I suggest you to
> check your actual I/O behavior.
Yes, I disable host caching on QEMU/KVM level,
this is recommended for Linux virtual machines.
Actually now I not disable ARC and have primarycache=all for zvols,
and as I can see ARC is used, and it use half of the server RAM:
ARC Size: 99.74% 125.58 GiB
Target Size: (Adaptive) 100.00% 125.90 GiB
Min Size (Hard Limit): 6.25% 7.87 GiB
Max Size (High Water): 16:1 125.90 GiB
ARC Total accesses: 110.71m
Cache Hit Ratio: 23.53% 26.04m
Cache Miss Ratio: 76.47% 84.66m
Actual Hit Ratio: 21.46% 23.76m
Data Demand Efficiency: 28.66% 63.52m
Data Prefetch Efficiency: 66.15% 1.11m
But cache miss ratio is high, and ARC is not very efficient, as for me.
> Anyway, the basic reasoning stands: 4k volblocksize are quite slow on
> mechanical HDDs (unless your workload perfectly fits) and are
> sub-optimal for modern SSDs also (due to flash page cache being 8/16K
IMHO 4k volblocksize on HDDs zvols is good to avoid write amplification.
Write one block is faster than read-modify-write with volblocksize > 4k.
For Percona Server separate pool on SSD should be used and its zvol
should be created with volblocksize 16k for the maximum performance.
>> It is fast, but by default ZFS use half of server RAM for ARC.
>> And it will be double caching - inside VM and at ZFS layer.
>> Probably it is better to use ZFS ARC only for ZFS metadata
>> and use caching only inside VMs if server has low free memory.
> Sure, but it *should* deflate itself under memory pressure from the OS.
> My plan B is to use primarycache=metadata and secondarycache=data, but
> for now it is not needed.
I see very low L2ARC usage on servers with KVM, from output of
"zpool iostat -v" command, see alloc and free capacity on cache.
probably on your servers L2ARC almost not used too.
I even set in /etc/modprobe.d/zfs.conf
options zfs l2arc_noprefetch=0
options zfs l2arc_write_boost=838860800
options zfs l2arc_write_max=838860800
but this does not help, L2ARC usage still was very low.
So for all new servers I decide to use SSD for dedicated
ZFS pool for Percona Server and not use ZFS L2ARC at all.
More information about the zfs-discuss