[zfs-discuss] Sudden write() + fsync() performance drop

Gregor Kopka zfs-discuss at kopka.net
Fri Feb 5 05:28:36 EST 2016


Storage is still in the cloud, which can be exposed to some strange
winds not under your control.
What I have found on a quick search is this:
http://hatim.eu/2014/05/24/leveraging-ssd-ephemeral-disks-in-ec2-part-1/
see pre-warming. Might be connected.

Gregor

Am 04.02.2016 um 15:28 schrieb Miguel Wang via zfs-discuss:
> Gregor,
>
> Although I am using Amazon cloud, I am using the local ephemeral disks
> which is indicated by the pool name "local0" in the initial output I
> provided.
>
> On Thu, Feb 4, 2016 at 9:20 AM, Gregor Kopka <zfs-discuss at kopka.net
> <mailto:zfs-discuss at kopka.net>> wrote:
>
>     Miguel,
>
>     writes are expected to be faster with sync=disabled, since that
>     will turn all writes into async ones, making ZFS return the write
>     call immediately. ZFS will block on sync writes till the drive
>     reports completion, so in case the latency toward the drives goes
>     up throughput on sync writes will go down.
>
>     Since you are using some cloud (which is a important information,
>     since that means that underlying hardware changes are out of your
>     control, and out of your sight) the latency between your server
>     and the drives might simply have gone up. So the cloud seems to be
>     to blame, which fits nicely into several servers going bad at the
>     same time.
>
>     Gregor
>
>
>     Am 04.02.2016 um 13:17 schrieb Miguel Wang via zfs-discuss:
>>     Gregor -
>>
>>     Sorry for the confusion, it is indeed confusing. Let me explain:
>>
>>     I set up two new servers: one with recordsize 16K and one with
>>     recordsize 128K. fsync test and the real use test shows they are
>>     both "good" servers. The one with 16K recordsize achieved a
>>     compression ratio 1.81 and leave very little free space, the one
>>     with 128K recordsize achieved a compression ratio 2.50 and have
>>     plenty free space.
>>
>>     We know that 16K recordsize is the optimal size for MySQL
>>     application because most of the writes are 16K. But due to the
>>     space conservation (1.81 compression ratio vs 2.50 compression
>>     ratio) and that the server with 128K recordsize performs well, we
>>     dynamically changed the server with 16K recordsize to 128K
>>     recordsize but during the restore and initial use it was 16K
>>     recordsize. 
>>
>>     The underlying disks are Amazon EC2 ephemeral SSD. These facts:
>>
>>     [a] Multiple old servers becomes bad at the same time [I switched
>>     server and still does not perform, fsync shows the same bad results]
>>     [b] Amazon confirmed there are no hardware issues
>>     [c] Reboot does not help
>>     [d] There are no software or configuration changes on the newly
>>     build servers and they perform well
>>
>>     led me believe there is something wrong with the pool. I wanted
>>     to believe it is the frag and free space left, but the good
>>     server has the same frag number 86% and less free space [406G
>>     versus 977G]. Maybe the frag number is not the true indication
>>     how fragmented the data filesystem [/srv/mysqldata/data] or what
>>     else changed after recreating the pool and restore [we copy the
>>     mysql backup at the file level with cp and rsync command].
>>
>>     Another piece of information: after I disabled sync on the bad
>>     servers, they start to perform well.
>>
>>     Thanks for your reply, I will check ARC. Thanks!
>>
>>     On Thu, Feb 4, 2016 at 3:24 AM, Gregor Kopka
>>     <zfs-discuss at kopka.net <mailto:zfs-discuss at kopka.net>> wrote:
>>
>>
>>
>>         Am 03.02.2016 um 19:03 schrieb Miguel Wang via zfs-discuss:
>>>         The good:
>>>             local0/mysqldata/data       recordsize            128K  
>>>                            local
>>
>>>         The bad:
>>>             local0/mysqldata/data       recordsize            16K  
>>>                             local
>>
>>>         NB: The "bad" server used to have 128K recordsize that is
>>>         why it has better
>>>         compression ratio and more free space. The "good" server had
>>>         16K recordsize
>>>         on the volume from start when we rebuilt the server.
>>         Now I am confused.
>>         So you set the good server from 16k to 128k, and the bad the
>>         other way around?
>>
>>         Apart from that:
>>         Anything in dmesg, smartctl or zpool status -v indicating
>>         that a drive might have problems (which could be only on
>>         certain areas that are unused after restore from backup)?
>>         Enough memory in the systems, maybe the bad one was swapping
>>         when the * happened (maybe because of ARC overgrow)?
>>
>>         Gregor
>>
>>
>>
>>
>>     _______________________________________________
>>     zfs-discuss mailing list
>>     zfs-discuss at list.zfsonlinux.org
>>     <mailto:zfs-discuss at list.zfsonlinux.org>
>>     http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
>
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20160205/39cba90c/attachment.html>


More information about the zfs-discuss mailing list