In what way does denormalization improve database performance?

database performance database-design normalization denormalization

Denormalization is generally used to either:

Avoid a certain number of queries
Remove some joins

The basic idea of denormalization is that you'll add redundant data, or group some, to be able to get those data more easily -- at a smaller cost; which is better for performances.

A quick examples?

Consider a "Posts" and a "Comments" table, for a blog
- For each Post, you'll have several lines in the "Comment" table
- This means that to display a list of posts with the associated number of comments, you'll have to:
  - Do one query to list the posts
  - Do one query per post to count how many comments it has (Yes, those can be merged into only one, to get the number for all posts at once)
  - Which means several queries.
Now, if you add a "number of comments" field into the Posts table:
- You only need one query to list the posts
- And no need to query the Comments table: the number of comments are already de-normalized to the Posts table.
- And only one query that returns one more field is better than more queries.

Now, there are some costs, yes:

First, this costs some place on both disk and in memory, as you have some redundant informations:
- The number of comments are stored in the Posts table
- And you can also find those number counting on the Comments table
Second, each time someone adds/removes a comment, you have to:
- Save/delete the comment, of course
- But also, update the corresponding number in the Posts table.
- But, if your blog has a lot more people reading than writing comments, this is probably not so bad.

database performance database-design normalization denormalization

Denormalization is a time-space trade-off. Normalized data takes less space, but may require join to construct the desired result set, hence more time. If it's denormalized, data are replicated in several places. It then takes more space, but the desired view of the data is readily available.

There are other time-space optimizations, such as

denormalized view
precomputed columns

As with any of such approach, this improves reading data (because they are readily available), but updating data becomes more costly (because you need to update the replicated or precomputed data).

database performance database-design normalization denormalization

The word "denormalizing" leads to confusion of the design issues. Trying to get a high performance database by denormalizing is like trying to get to your destination by driving away from New York. It doesn't tell you which way to go.

What you need is a good design discipline, one that produces a simple and sound design, even if that design sometimes conflicts with the rules of normalization.

One such design discipline is star schema. In a star schema, a single fact table serves as the hub of a star of tables. The other tables are called dimension tables, and they are at the rim of the schema. The dimensions are connected to the fact table by relationships that look like the spokes of a wheel. Star schema is basically a way of projecting multidimensional design onto an SQL implementation.

Closely related to star schema is snowflake schema, which is a little more complicated.

If you have a good star schema, you will be able to get a huge variety of combinations of your data with no more than a three way join, involving two dimensions and one fact table. Not only that, but many OLAP tools will be able to decipher your star design automatically, and give you point-and-click, drill down, and graphical analysis access to your data with no further programming.

Star schema design occasionally violates second and third normal forms, but it results in more speed and flexibility for reports and extracts. It's most often used in data warehouses, data marts, and reporting databases. You'll generally have much better results from star schema or some other retrieval oriented design, than from just haphazard "denormalization".

CodeHunter

In what way does denormalization improve database performance?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last