[zfs-discuss] ashift confusion

Gordan Bobic gordan.bobic at gmail.com
Wed Jan 15 09:26:03 EST 2014


You haven't addressed the other point I made, of asynchronous writes. At
the moment the possibility hasn't been excluded that the first write is
always fastest because it gets buffered into RAM. Subsequent runs go slower
because the RAM is already exhausted and the data has to get flushed out to
disk. What happens if you do the test in reverse order? Does the
performance boost follow the first test, or does it follow the ashift
value? Re-test with oflag=sync conv=fdatasync and see what happens.



On Wed, Jan 15, 2014 at 2:04 PM, Andrew Holway <andrew.holway at gmail.com>wrote:

> You right. Using /dev/zero for this kind of shenanigans is quite
> stupid. This curious effect remains however when using less
> compressible data.
>
> [root at localhost shm]# for i in 12 9 0; do { dd if=/dev/shm/111.gz
> of=/mount$i/hello bs=16k count=10000000 ; echo $i ; } & done
>
> 0 -1564463614 bytes (1.6 GB) copied, 7.27561 s, 215 MB/s
> 9 - 1564463614 bytes (1.6 GB) copied, 14.3979 s, 109 MB/s
> 12 - 1564463614 bytes (1.6 GB) copied, 14.4101 s, 109 MB/s
>
>
> On 15 January 2014 13:30, Gordan Bobic <gordan.bobic at gmail.com> wrote:
> > You seem to have used oflag=direct in some tests and no sync spec in
> others.
> > IIRC ZFS doesn't support O_DIRECT, so you should probably re-test with
> > conv=fdatastync and/or oflag=sync.
> >
> > Also note that most recent SSDs will ZLE 0-filled blocks (a.k.a. "let's
> not
> > and say we did") as well as silently dedupe the sector contents, so
> testing
> > with a data set generated from /dev/zero is probably not going to yield
> the
> > most meaningful and consistent results.
> >
> >
> >
> > On Wed, Jan 15, 2014 at 1:12 PM, Andrew Holway <andrew.holway at gmail.com>
> > wrote:
> >>
> >> Hello,
> >>
> >> I have configured 3 raidz pools.
> >>
> >> [root at localhost ~]# zpool list
> >>
> >> NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> >> tank0        2.72T  2.29G  2.72T     0%  1.00x  ONLINE  -
> >> tank12       2.72T  1.16M  2.72T     0%  1.00x  ONLINE  -
> >> tank9        2.72T  4.58G  2.71T     0%  1.00x  ONLINE  -
> >>
> >> [root at localhost ~]# zpool get all | grep ashift
> >> tank0        ashift                 0                      default
> >> tank12       ashift                 12                     local
> >> tank9        ashift                 9                      local
> >>
> >>
> >> I am a little confused by these [truncated for clarity] results here:
> >>
> >> [root at localhost ~]# for i in 12 0 9; do { dd if=/dev/zero
> >> of=/mount$i/hello bs=16k count=100000 ; echo $i ; } & done
> >>
> >> tank0 - 1638400000 bytes (1.6 GB) copied, 7.69338 s, 213 MB/s
> >> tank12 - 1638400000 bytes (1.6 GB) copied, 14.0659 s, 116 MB/s
> >> tank9 - 1638400000 bytes (1.6 GB) copied, 14.2219 s, 115 MB/s
> >>
> >> tank0 - 16384000000 bytes (16 GB) copied, 82.6222 s, 198 MB/s
> >> tank9 - 16384000000 bytes (16 GB) copied, 167.183 s, 98.0 MB/s
> >> tank12 - 16384000000 bytes (16 GB) copied, 167.199 s, 98.0 MB/s
> >>
> >> ashift=0 seems to be way faster than ashift=9 or ashift=12 although we
> >> can see in zbd ( output at the bottom of this post) that tank0 is set
> >> to ashift: 9. I tested with some SSD (results below) and I got an
> >> expected result: ZOL was incorrectly detecting the disks as 9 (512)
> >> and setting them to 12 (4096) speeded things up considerably.
> >>
> >> Thanks,
> >>
> >> Andrew
> >>
> >>
> >> ### Some command history ###
> >>
> >> zpool create tank0 -f -o ashift=0 raidz sda sdb sdc
> >> zpool create tank9 -f -o ashift=9 raidz sdd sde sdf
> >> zpool create tank12 -f -o ashift=12 raidz sdg sdh sdi
> >>
> >> zfs create -o mountpoint=/mount0 tank0/filesystem0
> >> zfs create -o mountpoint=/mount9 tank9/filesystem9
> >> zfs create -o mountpoint=/mount12 tank9/filesystem12
> >>
> >> zpool create tank0 -f -o ashift=0 raidz sda sdb sdc
> >> zpool create tank9 -f -o ashift=0 raidz sdd sde sdf
> >> zpool create tank12 -f -o ashift=0 raidz sdg sdh sdi
> >>
> >> zpool create tank0 -f -o ashift=0 raidz sda sdd sdg
> >> zpool create tank9 -f -o ashift=9 raidz sdb sde sdh
> >> zpool create tank12 -f -o ashift=12 raidz sdc sdf sdi
> >>
> >> ### Testing raw devices ###
> >>
> >> for i in a b c d e f g h i; do {  dd if=/dev/zero of=/dev/sd$i bs=16k
> >> count=100000 oflag=direct; echo $i; } & done
> >>
> >> ### Testing on SATA SSD ###
> >>
> >> [root at localhost ~]# for i in m n r p ; do {  dd if=/dev/zero
> >> of=/dev/sd$i bs=16k count=100000 oflag=direct; echo $i; } & done
> >>
> >> sdn - 1638400000 bytes (1.6 GB) copied, 11.4111 s, 144 MB/s
> >> sdm - 1638400000 bytes (1.6 GB) copied, 11.5415 s, 142 MB/s
> >> sdp - 1638400000 bytes (1.6 GB) copied, 11.8209 s, 139 MB/s
> >> sdr - 1638400000 bytes (1.6 GB) copied, 11.888 s, 138 MB/s
> >>
> >> zpool create ssd-tank0 -f -o ashift=0 sdm
> >> zpool create ssd-tank9 -f -o ashift=9 sdn
> >> zpool create ssd-tank12 -f -o ashift=12 sdr
> >>
> >> zfs create -o mountpoint=/ssd-mount0 ssd-tank0/filesystem0
> >> zfs create -o mountpoint=/ssd-mount9 ssd-tank9/filesystem9
> >> zfs create -o mountpoint=/ssd-mount12 ssd-tank12/filesystem12
> >>
> >> for i in 12 0 9; do { dd if=/dev/zero of=/ssd-mount$i/hello bs=16k
> >> count=100000 ; echo $i ; } & done
> >>
> >> [root at localhost ~]# for i in 12 0 9; do { dd if=/dev/zero
> >> of=/ssd-mount$i/hello bs=16k count=100000 ; echo $i ; } & done
> >>
> >> ssd-tank12 -1638400000 bytes (1.6 GB) copied, 4.75403 s, 345 MB/s
> >> ssd-tank0 - 1638400000 bytes (1.6 GB) copied, 6.5122 s, 252 MB/s
> >> ssd-tank9 - 1638400000 bytes (1.6 GB) copied, 6.84761 s, 239 MB/s
> >>
> >> zdb output: https://gist.github.com/mooperd/21780947b82940f3ac9f
> >>
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to zfs-discuss+unsubscribe at zfsonlinux.org.
> >
> >
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to zfs-discuss+unsubscribe at zfsonlinux.org.
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to zfs-discuss+unsubscribe at zfsonlinux.org.
>

To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20140115/a85b5f57/attachment.html>


More information about the zfs-discuss mailing list