[zfs-discuss] Improving read IOPS on RAIDZ

Phil Harman phil.harman at gmail.com
Fri Apr 29 05:55:23 EDT 2016


ZFS came from a culture at Sun that stated: "correctness is a constraint; performance is a goal".

ZFS' number one feature is that it has checksums. The second is that ZFS is self healing (if you allow it to be). Yes, there's a lot more (snapshots, clones, compression, hybrid pools, send/receive etc), but these were the primary design goals.

When it comes to RAIDZ, ZFS's combined volume manager / filesystem and copy-on-write architecture also neatly sidesteps the RAID5 "write hole" issue.

But here's a "fact": quick, safe, cheap - pick any two.

If you want lots of quick safe 4K random IO, you should probably choose 3-way mirrors, though multiple small RAIDZn vdevs and/or L2ARC and/or ZIL and/or an all SSD pools might be suitable alternatives.

As cost is obviously more important to you than correctness, perhaps you should just layer ZFS over a RAID controller or metadisk driver? At least then ZFS can still tell you when your data has been corrupted (even if you don't want it to fix it for you).


> On 29 Apr 2016, at 10:10, Hans Henrik Happe via zfs-discuss <zfs-discuss at list.zfsonlinux.org> wrote:
> 
> On 29-04-2016 04:52, Edward Ned Harvey (zfsonlinux) wrote:
>>> From: zfs-discuss [mailto:zfs-discuss-bounces at list.zfsonlinux.org] On Behalf
>>> Of Hans Henrik Happe via zfs-discuss
>>> 
>>> The intention of my question was to find out if ZFS could be changed to
>>> only access one disk when doing a small reads on RAIDZ.
>> 
>> If my understanding is correct, that raidz distributes data more similar to raid-1e, then you don't have to change anything to get the desired behavior. It is already that way by default.
> 
> It is a bit like RAID1E, but even more complicated. Read [1]. The layout is not the problem. It's that the checksum (not parity) is per block (stripe). Reading data that is on one disk only, results in reading from all data disks for that block, to calculate the checksum.
> 
>> Do you have any reason to believe it's not already that way?
> 
> Again, read [1] and [2].
> 
>>> It's a fact, that due to the RAIDZ design, ZFS cannot perform as well as
>>> regular RAID5/6, where reads basically work like RAID0. This can be a
>> 
>> I mean... You say "it's a fact," but the thing you're asserting to be fact, completely disagrees with everything I know and believe. I'm still not seeing any reason to believe your presumption is correct.
> 
> I'm sorry that my facts, which are not presumptions, do not match your beliefs. Where is your references?
> 
> Cheers,
> Hans Henrik
> 
> [1] http://blog.delphix.com/matt/2014/06/06/zfs-raidz-stripe-width/
> [2] https://blogs.oracle.com/roch/entry/when_to_and_not_to
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss


More information about the zfs-discuss mailing list