[zfs-discuss] Current state of the PPA builds? How safe? How fast?

Daniel Smedegaard Buus danielbuus at gmail.com
Wed Jul 27 04:52:42 EDT 2011

Thanks, Brian and Christ, I'll be reporting back with benchmark data shortly
(and/or later - I have a lunch date with my girlfriend in an hour).

Couple of questions to Steve:

On Wed, Jul 27, 2011 at 06:18, Steve Costaras <stevecs at chaven.com> wrote:

> I am also using the PPA release here with a decent sized deployment in a
> 'semi' production mode (lots of backups just to be safe).    About 4 million
> files of various sizes, over ~200TB of raw space (~128TB usable in the
> configuration I have).

Heh :D Biiiiig!!!

>     As brian states, I don't see any corruption issues.   Stability of ZFS
> in itself is good with some minor issues like TXG_SYNC really sucking up
> CPU/threads whenever a delete happens which slows down the system for long
> periods (1-3 minutes) but it does recover.

Like every time a delete happens, or some corner case that happens once in a

>   Also lack of direct subsystem reporting (have to check syslog/messages
> opposed to zpool status for transient hard/soft errors).

Let me see if I understand you correctly: ZFS will not report errors at all
in zpool status? No "ONLINE/UNAVAIL" or chksum counts or known data errors
for files? What about triggering events on pool state changes? ATM I get
emails from ZFS-FUSE via postfix if my pool goes into a degraded state.

>     Also with large memory systems (~>16GiB) you should manually limit your
> arc size,  ZFS seems to take too much always by default than is healthy for
> itself or the system.   I would say default should be to leave either 4GiB
> free or 1/8th of system whichever is less.
Hmm... Wondering if I'm underpowered for my 19 * 2TB RAIDZ3 pool here...
This is a 4GB system. It's just my system, though, it'll be serving apps on
the local machine, via Gigabit ether to a secondary desktop computer, and
wifi/100 Mbit ether to my girlfriend's Mac or my laptop. On the other hand,
until recently I had that pool up on a 2GB system... There's no
deduplication or other fancies in my setup - shouldn't 4GB be fine you
think? I'll be snapshotting regularly, but only for potentially rolling back
a backup of another computer in case of disaster, I should always be
mounting the "current" FS state only...

> I have not tried taking snapshots to extremes, I hope someone here can
> expound on the implementations usability in that  regard.   As for speed, I
> find it pretty good it is reaching upwards of 85%+ of what I can get with
> the same drives under solaris 11.    Scrub performance (to me a big
> indicator when dealing with large amounts of space to make sure it can be
> done in less than 1 day) is very good, around 3.5GB/s over 96 drives
> (2xX5680 cpu's).  Bottleneck here is the drives themselves (low-end sata
> ST2000DL03's).  These are AFT drives, have found that ashift=12 is minimum
> for proper alignment but ashift=13 seems to give a little more of boost
> (+10% over ashift=12) at a cost of minimum file size of 8KiB (so can loose
> space if you have very small files).

Oookay, I think there's something I don't quite understand about this ashift
parameter. I had somehow understood some kind of offset/alignment value as
being expressed by some number of (512-byte) blocks, so that 9*512 = 4608
bytes, i.e. an eighth of a 4k block "too far". Come to think of it, then
that would mean ashift=12 by that logic would be half a 4k sector off... So
ashift=8 or ashift=16 would be properly "aligned" to 4k blocks. Clearly I've
completely missed the point. Do you have links are more info to this ashift
value being explained for someone like me?

> Memory is King, would say that without any other features having 1GiB/TiB
> is a baseline.   With compression/DeDup you probably want 2GiB/TiB
But is this true also for a system where "it's just me" reading and writing?
I would expect that to be a rule of thumb for real servers serving many
users simultaneously?

> Outside of ZFS itself, even through the 'labeling on the package' seems to
> indicate that it is a good match for low-end hardware
> (drives/sub-systems/interconnects) that's not really the case or the
> definitions are not the same.    You have to be very conscience of the types
> of drives; how they're attached; etc.   As any drive on a the same
> backplane/controller can take down other drives on the same controller if
> you have a problem with it.   Degradation of performance can also happen
> across the entire pool if one device is behaving badly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20110727/61297808/attachment.html>

More information about the zfs-discuss mailing list