[zfs-discuss] Slow performance need some suggestions for tuning

Christ Schlacta aarcane at aarcane.org
Wed Jun 22 00:05:04 EDT 2011

On 6/21/2011 20:52, Andreas Dilger wrote:
> On 2011-06-21, at 5:19 PM, Christ Schlacta wrote:
>> On 6/21/2011 13:37, Brian Behlendorf wrote:
>>>> Can this be detected from /sys/block/{dev}/queue/physical_block_size
>>>> automatically?  Newer versions of libblkid also have the ability to
>>>> query this kind of information without having to dig around in /sys
>>>> for it.
>>> Currently we do use the physical_block_size reported by the drive.  If
>>> the drives reports 4k sectors we do correctly set the ashift to 12.  But
>>> it turns out some dives report 512 and internally use 4k sectors.  That
>>> makes for a lot of read-modify-write operations internal to the drive.
> Sad, really.  The drive vendors have just broken any chance of userspace
> being able to work properly with these drives.  The whole point of the
> physical_block_size, logical_block_size, and alignment_offset (and the
> code page that they are extracted from) was for the kernel/userspace to
> be able to figure out the _right_ blocksize/alignment to use instead of
> having to guess (and get it wrong).
>>> I've flirted with adding a white list of known offending drive models.
>>> This information is in /sys/devices/pci*/host*/target*/*/model which I
>>> believe in symlinked off /dev/block/ somewhere.  This is the only way I
>>> can think of the automatically handle this.
>>> Does anyone happen to have a list of offending drive models?
> I imagine the list is very long, and will continue to grow, like the list
> of drives that pretend to sync data but don't so they win the benchmarks.
>> can we use a greylist to identify models known to lie about block size to enhance the accuracy going forward?  Perhaps we should default to ashift=12, will it seriously degrade performance on 512 drives?  is there some other chriteria we can use to predict, is there a lower limit to AF drive capacity that can be used as a cutoff for triggering a greylist lookup or performance probing or other more intelligent means of making the decision?
> I was going to suggest this as well.  The negative performance impact on
> 512-byte drives probably isn't even noticable, due to PAGE_SIZE allocations
> and much higher seek overhead vs. the cost of doing a slightly larger data
> transfer.  The performance improvement on 4096-byte drives is significant,
> so long as the IO is aligned on the 4096-byte boundaries.  Newer fdisk will
> warn about creating misaligned partitions (if people are actually using them),
> so hopefully not too many people will get it wrong.
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Engineer
> Whamcloud, Inc.
noone uses fdisk, the default tools are cfdisk and parted, both of which 
warn about boundaries, and neither of which will suggest alternatives 
for you that ARE aligned, for some undogly reason.  That's an issue that 
should probably be raised with both of those tools' maintainers.

More information about the zfs-discuss mailing list