Excellent deep dive into the amplification tradeoffs. The size-tiered vs leveled comparsion makes the RUM conjecture super tangible because seeing the order-of-magnitude differences in write amp really clarifies why systems like Cassandra and RocksDB make such different choices. I debugged a production issue once where understanding that leveled compaction was rewriting data ~30x helped us shift to a more suitableengine.
Thanks for the comment! I agree, but since those are "typically" correlated I papered over that to make the core concept simpler. I'll add a footnote to clarify.
Excellent deep dive into the amplification tradeoffs. The size-tiered vs leveled comparsion makes the RUM conjecture super tangible because seeing the order-of-magnitude differences in write amp really clarifies why systems like Cassandra and RocksDB make such different choices. I debugged a production issue once where understanding that leveled compaction was rewriting data ~30x helped us shift to a more suitableengine.
I'm glad you found the post helpful! I've had many horror stories related to Cassandra and RocksDB using improper compaction choices myself as well...
Read amplification and write amplification must
also be measured in number of IO requests and
not just in relative data volumes.
Thanks for the comment! I agree, but since those are "typically" correlated I papered over that to make the core concept simpler. I'll add a footnote to clarify.
Relative data volumes and relative numbers of IO
requests are not correlated at all. Take a close look
at how file systems work. Extending writes in sparse
allocation file systems may result in multiple small
IOs to space maps and to inodes for even one byte
of data written.