What exactly is the problem with hierarchal and network models of databases?

database database-design relational-database non-relational-database

The answers so far cover a lot of the practical reasons why the network and hierarchical models were eventually displaced by the relational model (including SQL database systems). Codd's 1970 paper explains why a new model is needed, in detail. It's a great read. Indeed, before Codd, the term "data model" was practically unheard of. So he coined the terms "hierachical model" and "network model" in order to describe database systems that had been constructed with no precise model in mind.

The hierarchical and network models can be collected into a general term, called the "graph model". The essential feature of the graph model of data is that data items are referenced by stating their location. If you understand pointers, you understand everything fundamental about the graph model.

There are two very powerful advantages to the graph model of data. The first is that it's very easy for programmers to grasp. Novice programmers go through a certain learning curve coming to grips with pointers, but once they've done that, they are ready to understand graph data easily.

The second advantage is that pointers are extremely fast, provided that the navigation path to be followed was anticipated at the time the data was written.

There are several disadvantages to using pointers to identify data. One is that the data becomes "pinned". That is, when the data is to be shuffled all of the pointers that reference the data have to be located and updated. Or a "forwarding address" has to be left at the old location. If you've ever been in the web and clicked on a button that has always worked, only to be greeted with the infamous "page not found" error, you've probably come across the pitfall of shuffling pinned data without updating references to it.

A second one is that navigating data along unplanned access paths can be downright disastrous, both in terms of performance, and in terms of logical correctness. This is one of the reasons why ad hoc reporting is so difficult with graph databases.

A third drawback of graph data is that there may be logical relationships in the graph data that are not inherent in the data as given. The fundamental advantage of the relational model is that all the relationships are inherent in the data itself. The reason why this is an advantage is complex. I refer you again to the 1970 paper.

In all the "relational DBMSes" that you and I are likely to use, there is a bridge between using data to identify data and using pointers to locate data. It's called an index. The index relates two items: an index key (one or more columns from a table), and a pointer (that locates a row containing the index key). I'm glossing over all the details about indexes.

Anyway an index allows the SQL engine to translate a query that states what data is being sought into where to look for that data. Data that is pointed to by indexes can still be shuffled, but the index has to be rebuilt as part of the process.

This is an overview.

database database-design relational-database non-relational-database

The basic problem was the inability to support ad hoc queries. These databases were very fast, but only if you queried them in the ways their original designers expected you to. If you came up with another type of query, they could either be very slow or at worst require that the database schema be changed to support the query.

I actually worked on both kinds of these in the 80s (Codasyl and Nomad/2) and was very glad when SQL became more widely available.

database database-design relational-database non-relational-database

no easy way to generate reports
rigid schemas
the network model was too similar to the hierarchical, so it didn't really get the benefits from using a network model. instead of having a single parent like in the hierarchical model a record could have multiple parents. so everything is still thought of in terms of parent/child relationships.

What was good in these models was performance, and that's what I think made it take so long for RDBMS to become dominant (they performed really bad in the beginning).

If you want to dig deeper into history this interview with Charles Bachman is highly recommended reading! He's an interesting person as well, actually he coded the first automated data modeling tool for RDBMS!

BTW, hierarchical/network databases are still in use at least in mainframe settings.

CodeHunter

What exactly is the problem with hierarchal and network models of databases?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last