[zfs-discuss] ZFS in the Cloud on the Cheap - ZFS on top of Object Storage

Fajar A. Nugraha list at fajar.net
Tue Dec 27 09:45:13 EST 2016

On Tue, Dec 27, 2016 at 8:17 PM, Gordan Bobic via zfs-discuss <
zfs-discuss at list.zfsonlinux.org> wrote:

> I have just had what is arguably a _crazy_ idea, but I think it makes
> sense on some levels, so please do bear with me.
> Cloud ZFS storage already exists, e.g. from rsync.net, but it is
> prohibitively expensive at between $40 and $80 per TB per month, compared
> to all-you-can-eat Amazon Cloud Drive for £55/year.
AWS has more options, starting from about $25 / TB / mo for EBS Cold HDD,
to $100 for General Purpose SSD. Add to that t2.micro free tier (1GB mem,
enough for limited zfs use) or spot instances, it should be significantly
cheaper than rsync.net for certain use cases.

> So what I have been thinking is how ZFS could be implemented on top of
> could based object storage: one block per file, WORM per file, no edits, no
> appends, any edit means an entire file's worth of RMW (deeply impractical
> for, e.g. VMs if for a change to a single 4KB block you have to download
> 10GB, edit 4KB, and upload 10GB).
https://en.wikipedia.org/wiki/GmailFS comes to mind :D

> So what I'm toying with is something borderline insane here:
> Use object storage file per ZFS block, which should allow for reasonably
> efficient I/O while minimizing the RMW overheads.
> In the ideal world, there would be a way to make ZFS use the object
> storage API directly and avoid the overheads on that. This isn't trivial
> because  this would inevitable rely on a userspace library (I might have to
> ponder on whether this might be meaningfully doable in zfs-fuse, but I
> expect it would be a mammoth task since it probably never occurred to
> anyone to use ZFS on top of such a storage stack).
> So, I'm thinking in the direction that is usually precisely the wrong
> thing to so with ZFS, specifically running it on top of RAID, in this case
> the Linux MD RAID variety.

I'm not familiar enough with amazon cloud drive to comment on how it would
interact with MD/zfs, but aside from the obvious
possible-term-of-use-violation, I'd like to share my experience using  S3
as HDFS replacement for hadoop, which might be relevant in some areas.

S3 is an interesting candidate for hdfs because:
- it offloads storage processing power (usually you'd need to dedicate
significant CPU and memory for namenode and datanodes)
- no need to pre-assign disk space (which might be wasted, like in the case
of EBS)
- support for S3 in vanilla hadoop has significantly improved. It's
good-enough that you can use S3 as the default fs (using s3a scheme) for
some uses (e.g. hive/spark + yarn + orc/parquet), completely replacing hdfs
- it's significantly cheaper compared to EBS, when you consider that S3
already includes redundancy

All seems good in theory, and initial testing (e.g. processing 1GB of
data). Until you feed it bigger data size (e.g. 100GB). Then you begin to
see problems:
- writes to S3 are REALLY slow. More than 10x slower compared to EBS.
- generic hive/spark insert pattern includes creating temporary directory,
and then renaming it. S3 doesn't support rename, so it had to
create-delete. Which means even more slow down
- eventual consistency for overwrite

Is it usable for my use case? No.
Is it usable for SOME use case? Definitely. Archiving old hdfs files/tables
comes to mind.

Assuming you can write some kind of low-level driver for
amazon-cloud-storage that would do to zfs what s3a does for hadoop, then my
GUESS is the write-slowness and eventual-consistency would be the
deal-breaker for most use case.

What MIGHT be possible in the short term, is using it to store the result
of "zfs send" (either full or incremental), compressed and split to
multiple files (to make their size manageble).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20161227/f3427b64/attachment-0001.html>

More information about the zfs-discuss mailing list