5 Comments
User's avatar
The AI Architect's avatar

Excellent deep dive into the amplification tradeoffs. The size-tiered vs leveled comparsion makes the RUM conjecture super tangible because seeing the order-of-magnitude differences in write amp really clarifies why systems like Cassandra and RocksDB make such different choices. I debugged a production issue once where understanding that leveled compaction was rewriting data ~30x helped us shift to a more suitableengine.

almog gavra's avatar

I'm glad you found the post helpful! I've had many horror stories related to Cassandra and RocksDB using improper compaction choices myself as well...

Dan Koren's avatar

Read amplification and write amplification must

also be measured in number of IO requests and

not just in relative data volumes.

almog gavra's avatar

Thanks for the comment! I agree, but since those are "typically" correlated I papered over that to make the core concept simpler. I'll add a footnote to clarify.

Dan Koren's avatar

Relative data volumes and relative numbers of IO

requests are not correlated at all. Take a close look

at how file systems work. Extending writes in sparse

allocation file systems may result in multiple small

IOs to space maps and to inodes for even one byte

of data written.