[zfs-discuss] L2ARC and SLOG on HW RAID with writeback cache

Edward Ned Harvey (zfsonlinux) zfsonlinux at nedharvey.com
Sun Apr 29 14:50:15 EDT 2018


> From: zfs-discuss <zfs-discuss-bounces at list.zfsonlinux.org> On Behalf 
> Of Gandalf Corvotempesta via zfs-discuss
> 
> This bitrot myth is totally nonsense today

I have seen both cases - I've seen environments like Gandalf describes, where bitrot simply never occurs, and I've seen environments like Gordon, Steve, Richard, and Durval describe, where it occurs. I've also seen environments where if it occurs, it could result in millions of dollars lost, and environments where if it occurs, nobody cares.

It certainly is related to the hardware, and related to the price of the hardware, but that's not a pure indicator. You can't just blindly assume expensive SAS hardware will not do it, nor can you assume cheap SATA disks will do it. It partly comes down to manufacturer specifics in specific models of disk and specific factories... It also comes down to climate in the datacenter, cable management within the system chassis (interference and cross-talk) and various other factors.

There's no way to have an absolute guarantee (if you buy this type of hardware you won't be affected) so the easiest and cheapest thing to do is simply use filesystems that provide data integrity. Poof, problem solved.

To emphasize this point (you can't just assume because of the hardware) search for intel errata. Even in ubiquitous enterprise standard hardware, errors occur, and manufacturing flaws get designed in. Not to mention manufacturing imperfections. I once had a CPU where one instruction (a single instruction, related to multitasking) was flawed. So the CPU passed all diagnostics, and could run the OS installer (which was single threaded), but still could not boot the OS, the system crashed every time it tried to start the first multi-tasked process in the system. And I've seen other hardware that would do weird shit like this... But only sometimes. Called "flaky" hardware. Enterprise or commodity, it can happen to them all, but less often on the enterprise. It's just a random probability distribution.


More information about the zfs-discuss mailing list