[zfs-discuss] Clarifying questions
lists at benjamindsmith.com
Tue Jan 7 14:07:35 EST 2014
On 01/07/2014 02:02 AM, Gordan Bobic wrote:
> 1a) If (1) is correct, we could in theory have a perpetual
> loop on the master creating snapshots, sending them, then
> destroying the snapshots before repeating, creating a
> near-real-time off-site copy? We're thinking disaster recovery
> You would be better of with lsyncd for continuous asynchronous
My concerns with lsyncd are:
A) The long duration of the initial rsync before the "Near Real
time" thing kicks in. In the best of situations, this already takes over
a day, *just to crawl through all the files*. So to apply security
updates and reboot means that async replication is effectively down for
at least 24 hours.
B) Simple resource exhaustion. Our first test for lsyncd in a very
limited size scenario required heavy increases in kernel parameters. (I
forget which at the moment)
I'm hoping ZFS + snapshots + send/receive would likely perform
significantly better since my understanding is that ZFS would "already
know" about the delta between a snapshot point and a previous snapshot
> 1b) For the case where we are having to fail to the off site
> disaster recovery data set: although it's not openly said as such,
> it would seem that we would want to zfs rollback to the most
> recent snapshot sent to the DR ZFS server.
> You could roll back to any snapshot you sent to it. Or you could fail
> over to it and use the data set that is as current as was plausibly
> possible if you are using lsyncd.
Although you confirm that I could expect something like rsync "send over
the changes" with the "-i" option with ZFS send/receive, you then
immediately suggest an rsync-based tool to be used instead. Rsync is an
extremely valuable tool, but has caveats mentioned above (see lsync
issues point #1) for routine offsite backups and/or near realtime
Why do you prefer using rsync?
> 2) Is there any problem sending a filesystem from a larger pool to
> a smaller pool if the receiving pool has plenty of disk space in
> its pool?
> Not a problem, as long as what you are sending will fit.
> 3) Scrubbing weekly is often enough? How do I get notified if it
> fails? I'm thinking of mdadm monitor for a RAID 1/5 array, where
> it sends an overly polite email message that the array is degraded...
> You write a monitoring script to do whatever you want based on zpool
> status output.
Can do, was hoping there was something "built in".
> 5) Since fragmentation is an issue with COW filesystems, am I
> right in guessing that SSDs suffer much less in comparison to
> spinning rust in performance degradation as fragmentation
> increases? I don't see any way to "defrag" a ZFS pool / filesystem.
> Yes. For example, dedupe feature (which massively increases
> fragmentation) is only really usable on SSDs if you expect more than
> about 10MB/s/vdev (more realistically 2.5-3MB/s)
Dedupe seems like a massive pain in the neck. It would offer virtually
no benefit for our use case anyway, even if it worked perfectly.
> 6) After purchasing hardware, I've read that it's best to get a
> "power of two" + 1 for RAIDZ1 performance. In other words, 3, or 5
> drives. (I don't have 9 bays) I chose 4 because we have 8 bays in
> the 2U server purchased, and I want to be able to add capacity
> without downtime. In this case, add 4x $BIGGER TB drives, then
> replace the current 4 TB drives. Is this "sweet spot" for
> performance at multiples of 2 enough for me to change my plans?
> Depends on what you are storing. The smaller the transaction commit
> size (i.e. the block size that will be committed, between ashift and
> 128KB) the more space overheads and performance degradation you are
> likely to experience. With spinning rust, you probably need all the
> help you can get.
> With 4TB disks, 3+1 is, IMO, a woefully inadequate level of
> redundancy, to the point where statistically you are likely to lose
> some data during resilvering every time you have a disk failure. 2+2
> RAIDZ2 is a much saner option.
Our file stores are redundant at the application level. The ZFS server
in question would be implemented as a 3rd store, augmenting two file
store servers currently running with EXT4 on RAID, so if the ZFS
partition actually did fail during a resilver, the effect would be minimal.
I'm more interested in maintaining write performance today without
sabotaging future expandability. 2x 4 drives is easy, 2x 5 drives is
only doable with eSata or something.
Following your lead, I could do RAIDZ2 with 6 drives total instead of 4
but then to add another vdev of similar proportions I have to with an
eSata solution or something. Clumsy...
Looks like it's time to do some performance testing...
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the zfs-discuss