Cassandra denormalization datamodel Cassandra denormalization datamodel database database

Cassandra denormalization datamodel


"Yes" for the most part, taking an approach of query-based data modeling really is the best way to do it.

  1. That is still a good idea to do, because the speed of your query times make it worth it. Yes, there's a little more housecleaning to do. I haven't had to execute 100s of deletes from other column families, but occasionally there is some complicated clean-up to do. But, you shouldn't be doing a whole lot of deleting in Cassandra anyway (anti-pattern).

  2. No. Client-side JOINs are just as bad as distributed JOINs. The whole idea is to create a table to return data for each specific query...denormalized and/or replicated...and thus negating the need to do a JOIN at all. The exception to this, is if you are running OLAP queries for analysis, you can use a tool like Apache Spark to execute an ad-hoc, distributed JOIN. But it's definitely not something you'd want to do on a production system.

  3. A few articles I can recommend:


It is worth adding that Cassandra 3.0 introduced Materialized Views, which does this denormalization automatically, including the necessary house-keeping to keep the data in sync. It is most likely not suitable for every situation, but it's worth to have a look.

Example from DataStax

Cassandra documentation