[zfs-discuss] about zol 0.6.2 mirror vdev read performance
gordan.bobic at gmail.com
Fri Oct 25 05:17:10 EDT 2013
On Fri, Oct 25, 2013 at 9:23 AM, Gregor Kopka <gregor at kopka.net> wrote:
> Am 25.10.2013 10:07, schrieb Gordan Bobic:
>> This isn't unusual. ZFS maximum block size is 128KB. A 7200rpm disk will
>> top out at about 120 IOPS. Multiply those two out and you get a figure of
>> about 15MB/s per disk. But the current generation of disks is capable of
>> about 150MB/s on linear transfers. That means that to max out the disk you
>> need to be reading data in sequential blocks of at least 1.25MB (this
>> number will be larger in practice due to squeezing in a seek at the
>> beginning of an operation).
>> When you have fewer disks, there is a greater chance of operations being
>> linear on each disk even after round-robin rotation. Say you have 2 disks
>> and want to read a large file. You'll read block 1 from disk 1, block 2
>> from disk 2, block 3 from disk 1, etc. But blocks 1 and 3 have a pretty
>> good chance of having been written in adjecent locations on disk (unless
>> the disk was getting very full). So block 3 will be getting prefetched by
>> disk following the read of block 1, while block 2 was being read.
>> As you add disks, you are reducing the number of sequential operations on
>> each disk, so you are getting closer to the 15MB/s/disk figure. The IOPS
>> capacity will go up linearly, but the throughput on sequential transfers
>> will not.
> Andrew is currently porting a patch from FreeBSD which will takes
> locality into mind, with this you should be able to tune for this.
Are you saying that this will potentially make ZFS decide to put file
blocks on a single vdev rather than striping them across all vdevs in a
pool? I guess this already happens in some cases anyway, e.g. when you have
a mostly full vdev in a pool and you add a new vdev to it. What is less
clear, however, is how to determine at write time whether it would be
better to stripe the file or store it onto a single vdev. You could do this
per by performing some statistical analysis on it over time, but that would
mean relocating the file transparently after it's been written.
Or to put it differently, I don't see how exactly this might be implemented
to yield a worthwhile performance boost in the general case without
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the zfs-discuss