zpool scrub speed

Jean-Michel Bruenn jean.bruenn at ip-minds.de
Sat May 7 18:49:35 EDT 2011


Devsk,

i thought i wrote it already several times, thats why i said "once"
again. There's a similar bug with software raid in linux, as soon as
you're using a raid5 and you're having a x11 environment running, the
box will lock itself down. The solution to solve this is seperating
i/o from desktop-environment (using cgroups) - i never got that
working correctly, tho. For me switching the i/o-scheduler from CFQ to
Deadline helped a lot. I noticed this same bug at ZFS several times,
now i'm using ZFS on a spare box with no running desktop environment
and everything is fine.

So this bug is related to cfq, i/o, desktop environment.

i asked a kernel dev about this ago some time (january) lemme cite him:

"scheduler issue, it's known and kills desktop performance

enabling cgroups helps, see the various cgroups issues around. in
short: segmenting workloads helps (as in, separating user desktop stuff
from IO stuff and allowing them different scheduler resources)."

So, i hope this is of any help to you.

Jean


On Sat, 7 May 2011 15:29:38 -0700 (PDT) devsk
<devsku at gmail.com> wrote:

> Jean,
> 
> I don't remember u asking me this but yes, this is my desktop+server
> system. So, X is running. But if you are trying to conclude that its X
> that's killing my box, rest assured that my box is extremely stable
> with respect to Nvidia drivers and Xorg version I run. I have not had
> lock ups ever which I could attribute to X or graphics driver (Nvidia
> do a decent job on Linux). And most scary part is that sysrq did not
> work, which has never happened in any lock ups outside of this ZFS
> related lockup.
> 
> -devsk
> 
> On May 7, 3:17 pm, Jean-Michel Bruenn <jean.bru... at ip-minds.de> wrote:
> > may i ask (once again) are you using that zfs together with a running
> > x11 environment?
> >
> > On Sat, 7 May 2011 15:09:54 -0700 (PDT) devsk
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > <dev... at gmail.com> wrote:
> > > Ok, the scrub actually killed my box. Basically, while it was 80%
> > > done, the machine locked hard. I have no idea what happened. The
> > > machine had a lot of free memory and the box was mostly idle apart
> > > from scrub. Nothing showed up in the /var/log/messages after I booted
> > > back up.
> >
> > > I think we still have some work to do with respect to lock ups, but
> > > without any info in /var/log/messages and no way to debug the system
> > > directly, I think we are pretty much depended on the zfs stress
> > > testing test suite, which we should make effort to get going.
> >
> > > The good thing is that the scrub continued from where it left off and
> > > it finished faster than before (86 mins vs. 73 mins (43 mins before
> > > lockup, 30mins after)). So, it looks like scrub speed is also
> > > determined by how much memory is fragmented i.e. on a box which is
> > > freshly booted, things are generally faster and as the time goes on
> > > and zfs allocates and arc_reclaim eventually reclaims memory
> > > repeatedly, things begin to slow down.
> >
> > > -devsk
> >
> > > On May 7, 8:28 am, devsk <dev... at gmail.com> wrote:
> > > > Just throwing in my numbers for RAIDZ2 in there. This is a zfs-fuse
> > > > pool which I upgraded to version 26 today. The numbers seem better
> > > > with version 26 compared to 23 (I had compared numbers for version 23
> > > > for zfs-fuse and native zfs recently).
> >
> > > > # zpool status -v
> > > >   pool: mydata
> > > >  state: ONLINE
> > > >    see:http://www.sun.com/msg/ZFS-8000-EY
> > > >  scan: scrub in progress since Sat May  7 08:07:12 2011
> > > >     182G scanned out of 1.42T at 365M/s, 0h59m to go
> > > >     0 repaired, 12.44% done
> >
> > > > This will slow down I guess as time passes. It has typically taken
> > > > about 3 hrs in the past.
> >
> > > > -devsk
> >
> > > > On May 6, 8:29 am, "Jason J. W. Williams" <jasonjwwilli... at gmail.com>
> > > > wrote:
> >
> > > > > Depends on how fragmented the volume is and how much data is in the pool (it'll only scrub the amount of data in the pool as opposed to scrubbing freespace). Had to scrub an onv_131 system the other night...22x 7200rpm disks with 247gb of data took about 19mins.
> >
> > > > > I've seen a 14 drive (7200rpm) pool with 1.8tb of data (raid-z2) take over a day to scrub due to failing disks. If you start to see lots of checksum errors on the drives you'll see a long scrub.
> >
> > > > > -J
> >
> > > > > Sent via iPhone
> >
> > > > > Is your e-mail Premiere?
> >
> > > > > On May 6, 2011, at 2:07, Gordan Bobic <gordan.bo... at gmail.com> wrote:
> >
> > > > > > What is the general performance expected from zfs scrub? I would have expected it to be in the region of the combined read speed of the data bearing disks (i.e. excluding the parity disks). But my 13 disk array only gets about 170MB/s on a scrub. Is this normal, i.e. explained by the disk-seeking required?
> >
> > > > > > Gordan
> >
> > --
> > Jean-Michel Bruenn <jean.bru... at ip-minds.de>


-- 
Jean-Michel Bruenn <jean.bruenn at ip-minds.de>



More information about the zfs-discuss mailing list