[zfs-discuss] Tape Backup of ZVOLs
Edward Ned Harvey (zfsonlinux)
zfsonlinux at nedharvey.com
Sun Apr 1 10:15:02 EDT 2018
To backup a zvol, yes it's valid to make a snap of the zvol, and then zfs send the snap, and record the zfs send stream to tape. If a single bit is bad inside the stream, it's invalid, but that's actually desirable (open to interpretation) because that bit could be in the guest OS kernel, or some other catastrophic location. For people who wish to force-restore a zfs send stream despite corruption, I think they actually did create a way to do that some years ago (but maybe not; I think I remember hearing about it, but I never used it). You can efficiently send incrementals, with the aforementioned reliability caveats and granularity of restoring only whole-VM's, not individual files. The recommendation would be to occasionally (say once a month) do a full zfs send of zvol and optionally nightly incrementals of the zvol, but additionally run a nightly file-level agent inside the guest to provide the file-level granularity as well as data recovery in the event of zfs send stream corruption. Better yet, have people inside the guests store all their non-OS files on a nfs or samba share outside the VM, or in a file sync utility, so you can benefit from filesystem snapshots on those volumes (and not care about backing up files inside the VM).
A few times, FEC and similar encodings have been mentioned, but it's important to understand the implications of what that means. First of all, every storage medium, including disks and tapes, already use something of that nature built into hardware. They can use FEC, ECC, and other error detection-correction techniques. (Exceptions are very cheap junky devices such as $1 USB fobs and SD cards). Adding a layer of FEC increases the size of the data consumed - it's a form of redundancy without making a full second copy. It operates on probabilities and assumptions about random distribution of errors. Sometimes it can't detect errors, and when it does, sometimes it can't recover. But it's likely to recover from a single bit error. I don't have enough knowledge to say if tapes usually fail with a single bit error, or a 1MB chunk, or a random distribution of errors... By the time data has passed the hardware FEC, I don't know how often another layer of software FEC will be effective. For example, if the disk or tape fails the hardware FEC, it results in an unrecoverable (hard) IO error, and a layer of software FEC on top isn't going to help. If the hardware FEC detects an error but is able to recover (correct it), it's reported as a soft error and you can see it in SMART. FEC can help with some of those cases, but definitely can't help with a 1MB chunk gone bad, or unrecoverable IO error.
More information about the zfs-discuss