Benjamin - scrub of death

New About Yours API Help
1.8 KB, Plain text
This is based on my understanding.

The blog ( you linked to glosses over several fine points about ECC and ZFS.

1)  Matthew Ahrens stated ZFS is fine without ECC "if you enable the unsupported ZFS_DEBUG_MODIFY flag (zfs_flags=0x10). This will checksum the data while at rest in memory"

2) Without check-summing the data in memory, which is by default off and "unsupported", ZFS is free to write out the bad data to the pool because it thinks the pool is wrong. ZFS was architected and designed assuming the memory will never be corrupted. The pseudo-axiom is "if you can't trust the memory, you can't trust anything".

3) Not using ECC is "fine" because ZFS will eventually detect the data is corrupt and you can "just restore from backup". I forget which co-founder said this, but you can feel the sarcastic warning.

The article completely glosses over all of these warnings and just says "ZFS is fine without ECC" as a simplistic and dangerous summary. Yes, ZFS will eventually detect corruption caused by not having ECC, but there is real chance of it destroying everything before this happens. Pray you have a backup, because unlike other file systems, ZFS is not meant to be recoverable once corrupted.

Again, based on my understanding.

Nearly every part of the system has ECC/FEC/Parity. Kind of odd that the most important part of the system so happens to be the most sensitive to corruption has nothing. Nearly all data the gets to the CPU is via the memory, no matter where it comes from. It is the gate keeper for all data moving in and out of the CPU, and having zero protection is just asking for trouble. I am ignoring certain aspects of DMA, but there is a lot of cross-device IO buffering in memory.
Pasted 1 month, 1 week ago — Expires in 325 days