Database Normalization Database Normalization database database

Database Normalization


Database Normalization is all about constructing relations (tables) that maintain certain functionaldependencies among the facts (columns) within the relation (table) and among the various relations (tables)making up the schema (database). Bit of a mouth-full, but that is what it is all about.

A Simple Guide to Five Normal Forms in Relational Database Theoryis the classic reference for normal forms. This paper defines in simple terms what the essence of each normal form isand its significance with respect to database table design. This is a very good "touch-stone" reference.

To answer your specific question properly requires additional information. Some critical questions you have to askare:

  • Is an Address a simple fact (e.g. blob of text) or a composite fact (e.g.composed of multiple attributes: Address line, City Name, Postal Code etc.)
  • What are the other "facts" relating to "Accommodation","Airport" and "Train Station"?
  • What sets of "facts" uniquely and minimally identify an "Airport", an "Accommodation"and a "Train Station" (these facts are typically called a key or candidate key)?
  • What functional dependencies exist among Address facts and the factscomposing each relations key?

All this to say, the answer to your question is not as straight forward as one might hope for!

Is there such a thing as "over normalization"? Maybe. This depends on whether thefunctional dependencies you have identified and used to build your tables areof significance to your application domain.

For example, suppose it was determined that an addresswas composed of multiple attributes; one of which is postal code. Technically a postalcode is a composite item too (at least Canadian Postal Codes are). Further normalizing yourdatabase to recognize these facts would probably be an over-normalization. This is becausethe components of a postal code are irrelevant to your application and therefore factoringthem into the database design would be an over-normalization.


For addresses, I would almost always create a separate address table. Not only for normalization but also for consistency in fields stored.

As for such a thing as over-normalization, absolutely there is! It's hard to give you guidance on what is and isn't over-normalization as I think it mostly comes from experience. However, follow the books on each level of normalization and then once it starts to get difficult to see where things are you've probably gone too far.

Look at all the sample/example databases you can as well. They will give you a good indication on when you should be splitting out data and when you shouldn't.

Also, be well aware of the type and amount of data you're storing, along with the speed of access, etc. A lot of modern web software is going fully de-normalized for many performance and scalability reason. It's worth looking into those for reason why and when you should and shouldn't de-normalize.


Would I have address columns in each table or an address table that is referenced by the other tables?

Can airports, train stations and accommodation each have a different address format?

A single ADDRESS table minimizes the work necessary dealing with addresses - suite, RR, postal/zip code, state/province...

Is there such a thing as over-normalization?

There are different levels of normalization. I've only encountered what I'd consider poor design rather than normalization.