[zfs-discuss] ZFS, DRBD, GFS and Clustering

Gordan Bobic gordan.bobic at gmail.com
Wed Oct 16 07:37:24 EDT 2013

On Wed, Oct 16, 2013 at 12:10 PM, Uncle Stoat <stoatwblr at gmail.com> wrote:

> On 16/10/13 10:13, Micky wrote:
>> GFS is indeed very slow, mainly due to its FUSE nature.
> Untrue.
> GFS on a single node with cluster locking disabled (lock=no_lock) is
> blisteringly fast, even with multiple TB FSes or tens of millions of files
> onboard.

But that that point it's not a cluster FS but a local FS. :)

> The slowness comes from the clustered part of operation, in particular the
> need for all nodes to agree if any given node wants to set a write lock,
> resulting in a 75-200ms penalty for each file open (even when reading, with
> atime disabled).  A particular pain point isd metadata related -
> directories containing more than 512 entries become exponentially slower to
> open as you pass each power of 2 in entry numbers. (opening a directory
> with 16385+ entries can take several minutes.)

Yup. A while back I had to deal with the fallout of somebody having tried
to use GFS for maildir storage on a large system behind load balancers. The
part that killed it was round-robin load balancing between the nodes.
Source IP hash based load balancing cured the problem (at least for users
that weren't behind multi-homed transparent proxies, e.g. AOL at the time).

>  Since I will
>> running the nodes on Xen hypervisor in active/active mode, the GFS
>> partition will be very small and will only be used for sharing VM
>> configuration files and/or provisioning ISO images.
> This (IMO) is the best use for GFS. It's not intended for general or large
> scale file handling.

It's actually not too bad for general purpose FS tasks, as long as you
ensure you minimize as much as possible the situations where different
nodes access the same directory subtrees. That way lies lock-bouncing

> I've looked into using ZFS as a replacement for GFS in cluster
> environments and decided that due to a lack of absolute pool locking, it's
> simply not safe enough to entrust 2-400Tb of data to. All the data
> integrity in the world is no good if several cluster nodes attempt to mount
> a pool simultaneously - and restoring that amount of data from backups
> takes _weeks_

Backups are an increasingly contentious issue. Frequently the data churn
rate is such (I have witnessed OLTP rates well in excess of petabyte/month)
that full backups simply aren't plausible to have, from the point of view
of storage space, restore time, and server capacity. It becomes an
intractable problem, at least without an orders of magnitude increase in
hardware. I'm not condoning the notion of running anything without backups,
but sometimes it sometimes it just isn't possible between the requirements
and the constraints. :(

To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20131016/061df6fe/attachment.html>

More information about the zfs-discuss mailing list