[zfs-discuss] about zol 0.6.2 mirror vdev read performance

Gordan Bobic gordan.bobic at gmail.com
Fri Oct 25 06:27:26 EDT 2013


Oh, I see what you mean. On reads we can tell whether it is more efficient
to just keep streaming a read off a single disk that has already completed
it's seek instead of getting another disk to do a seek to do a read that
the first disk is already aligned to read.

I get it now. Thanks for explaining.

Gordan



On Fri, Oct 25, 2013 at 11:24 AM, Gregor Kopka <gregor at kopka.net> wrote:

>
> Am 25.10.2013 11:31, schrieb Gordan Bobic:
>
>  On Fri, Oct 25, 2013 at 10:25 AM, Andreas Dilger <adilger at dilger.ca>wrote:
>
>>
>> On Oct 25, 2013, at 3:17 AM, Gordan Bobic <gordan.bobic at gmail.com> wrote:
>> > On Fri, Oct 25, 2013 at 9:23 AM, Gregor Kopka <gregor at kopka.net> wrote:
>> >
>> >> Andrew is currently porting a patch from FreeBSD which will takes
>> locality into mind, with this you should be able to tune for this.
>> >>
>> >> https://github.com/zfsonlinux/zfs/issues/1803
>> >
>> > Are you saying that this will potentially make ZFS decide to put file
>> blocks on a single vdev rather than striping them across all vdevs in a
>> pool?  I guess this already happens in some cases anyway, e.g. when you
>> have a mostly full vdev in a pool and you add a new vdev to it. What is
>> less clear, however, is how to determine at write time whether it would be
>> better to stripe the file or store it onto a single vdev. You could do this
>> per by performing some statistical analysis on it over time, but that would
>> mean relocating the file transparently after it's been written.
>>
>> Since this is about mirror VDEVs, the blocks will be allocated on all
>> VDEVs anyway.  However, during READ if there is round-robin block
>> selection for a linear read it means there will be IOPS on all of the
>> disks but the disks will actually see a strided read pattern that will
>> waste disk bandwidth without improving performance.
>>
>> It only really makes sense to round-robin reads with very large or
>> disjoint read requests, since modern disks have large readahead track
>> buffers that are beneficial even if they are not “visible” to the
>> kernel.
>>
>>
>  Right, so what you are saying is that the allocator should favour
> writing multiple consecutive blocks to a single vdev rather than striping
> them all.
>
> This would make sense up to a point (disk track length), as long as it is
> a streaming, single-user write-once (due to the CoW nature of ZFS it *will
> * fragment on partial rewrites) with streaming single-user reads.
>
> Problem with being smart about this is that disks don't expose the inner
> geometry anymore (just LBA, so you won't know if you're writing within a
> track), drive-internal reallocation of bad sectors might play tricks on
> you, etc. But in case you have ideas i'm sure a patch would be welcome.
>
>
>  Would the same thing not be (better?) achieved by porting the patch that
> increases the maximum block size to 1MB?
>
>
> The Idea behind the vdev mirror patchset (which only hits the read code
> path) is for a read zfs will pick a mirror vdev member which will likely be
> the fastest to serve the write, taking load, locality and if the drive is
> rotational in consideration (and make this tuneable) - which according to
> some tests gives a nice speedup, especially in case you throw nonrotating
> media into the mirror. See
> http://svnweb.freebsd.org/base?view=revision&revision=256956
>
>
> Regarding the OP: zfs_top_maxinflight (default: 32) might need to be
> raised should there be an excessive amount of mirror members, with 7
> members zfs would throttle to an average of 4.5 inflight requests per drive.
>
> Gregor
>
>  To unsubscribe from this group and stop receiving emails from it, send
> an email to zfs-discuss+unsubscribe at zfsonlinux.org.
>

To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20131025/a8e3b592/attachment.html>


More information about the zfs-discuss mailing list