Column oriented database vs row oriented database Column oriented database vs row oriented database database database

Column oriented database vs row oriented database


I am having trouble with your terminology. You are describing an EAV structure (standing for Entity-Attribute-Value).

Aside: A "column-oriented" database usually refers to a database that stores each column separately from others (when I learned about databases, this was called "vertical partitioning", but I don't think that caught on). Examples include Paracel and Vertica.

An entity-attribute-value database is storing each attribute for an entity as a separate row.

The first problem that you have with your particular structure is typing. Some of the attributes are strings and some are numbers. This becomes a management nightmare in an EAV world. Either you store everything as strings (losing the ability to type check values and to guarantee that arithmetic words) or you include multiple columns for different types with a type column (making queries much more complicated).

Similarly, constraints and foreign key references are much harder to implement. Also, because you are repeating the entity id and attribute id on each row, the data often takes up more space. NULL values are typically quite space efficient.

On the OLTP side, you have another problem. When you want to insert an entity, you typically want to insert a bunch of attributes as well. One insert has now turned into many inserts, and you'll want to start wrapping these in transactions, affecting performance.

Given all these shortcomings, you might think never use EAV models. There is a place for them. They are particularly useful when attributes are changing over time. Say, if you have an application where users can put in their own information with tags. In such cases, a hybrid approach is the best solution. Use a regular relational table with many columns for the common information. Use an EAV table for optional information for each entity.


Source: WIKI

  1. Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.
  2. Column-oriented organizations are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows.
  3. Row-oriented organizations are more efficient when many columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek.
  4. Row-oriented organizations are more efficient when writing a new row if all of the column data is supplied at the same time, as the entire row can be written with a single disk seek.

In practice, row-oriented storage layouts are well-suited for OLTP-like workloads which are more heavily loaded with interactive transactions. Column-oriented storage layouts are well-suited for OLAP-like workloads (e.g., data warehouses) which typically involve a smaller number of highly complex queries over all data (possibly terabytes).


In addition to the problems Gordon Linoff mentions, EAV data models are also fiendishly hard to query - find all cars where the make is BMW and the months between 12 and 24 and the cost < 10000 becomes a huge jumble of nasty SQL, especially if you're doing string comparison on numbers...