[zfs-devel] Efficient offsite snapshots of zfs filesystems/volumes

Struan Bartlett struan.bartlett at NewsNow.co.uk
Sun Oct 2 15:25:45 EDT 2016


Hi

Amazon EC2 EBS disks, and Google Compute Cloud disks, both support
snapshots to the cloud, through a mechanism that stores only incremental
changes but always stores a sufficient set of changes to allow a disk to
be reconstructed from any snapshot. The underlying snapshot creation
logic must: identify unstored blocks; store them while keeping a record
of stored blocks and their digests; create an index for the snapshot,
that identifies how to reconstruct it from stored blocks. Underlying
snapshot deletion logic must: delete the snapshot index; identify and
delete stored blocks no longer needed. The methodology is likely to be
similar (in principle) to that of Brad Fitzpatrick's brackup tool (a
forked version of which can be found here
https://github.com/NewsNow/brackup-nn).

I've been considering how similar functionality could be provided for
zfs filesystems or volumes, which I imagine would be generally useful
for organisations requiring off-site backups of their snapshots, who do
not want to duplicate blocks in common between entire filesystem
snapshots (which would be costly in upload time and data storage) and
who do not want to store traditional daily full backups with incremental
snapshot diffs (from which it can be slow and onerous to reproduce a
complete filesystem in a disaster recovery scenario).

I would appreciate any feedback on whether this would be useful, and/or
suggestions or observations on how this could be done.

In the case of volumes, it would be simple to logically divide the zvol
into chunks, and checksum each chunk, to determine which blocks have
changed. I could write a process to do this, as it would require no
zfs-specific knowledge, but be costly in disk I/O, and I imagine zfs
already knows which blocks of one zvol snapshot are different to another
so if this data could be accessed the process could be made highly
efficient.

In the case of filesystems, I can imagine how a modified zfs send
process might identify the changed blocks between snapshots, and
potentially produce a snapshot index. But what I imagine would be ideal
would be: (a) access to an internal zfs map or index of snapshots to
data blocks, which could be compared against the stored block digest to
identify unstored blocks; (b) the same map could be used to generate an
index for the snapshot; (c) direct access to those data blocks so they
could be stored.

You may have guessed my knowledge of zfs internals is minimal, so
any/all pointers appreciated.

Kind regards

Struan

-- 

Struan Bartlett
NewsNow.co.uk

The UK's #1 News Portal:
> www.NewsNow.co.uk <http://www.NewsNow.co.uk> (est. 1998)

Tel:  	+44 (0)845 838 8890
Fax:  	+44 (0)845 838 8898

NewsNow Publishing Limited, trading also as NewsNow.co.uk, is a company
registered in England and Wales under company no. 3435857 with
registered office The Euston Office, 1 Euston Square, 40 Melton Street,
London NW1 2FD

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-devel/attachments/20161002/4452f643/attachment.html>


More information about the zfs-devel mailing list