Why in the world would I have_many relationships? Why in the world would I have_many relationships? postgresql postgresql

Why in the world would I have_many relationships?


Embedding a data structure in a field can work for simple cases but it prevents you from taking advantage of relational databases. Relational databases are designed to find, update, delete and protect your data. With an embedded field containing its own wad-o-data (array, JSON, xml etc), you wind up writing all the code to do this yourself.

There are cases where the embedded field might be more suitable, but for this question as an example I will use a case that highlights the advantages of a related table approch.

Imagine a User and Post example for a blog.

For an embedded post solution, you would have a table something like this (psuedocode - these are probably not valid ddl):

create table Users {id int auto_increment,name varchar(200)post text[][],}

With related tables, you would do something like

create table Users {id int auto_increment,name varchar(200)}create table Posts {id auto_increment,user_id int,content text}

Object Relational Mapping (ORM) tools: With the embedded post, you will be writing the code manually to add posts to a user, navigate through existing posts, validate them, delete them etc. With the separate table design, you can leverage the ActiveRecord (or whatever object relational system you are using) tools for this which should keep your code much simpler.

Flexibility: Imagine you want to add a date field to the post. You can do it with an embedded field, but you will have to write code to parse your array, validate the fields, update the existing embedded posts etc. With the separate table, this is much simpler. In addition, lets say you want to add an Editor to your system who approves all the posts. With the relational example this is easy. As an example to find all posts edited by 'Bob' with ActiveRecord, you would just need:

Editor.where(name: 'Bob').posts

For the embedded side, you would have to write code to walk through every user in the database, parse every one of their posts and look for 'Bob' in the editor field.

Performance: Imagine that you have 10,000 users with an average of 100 posts each. Now you want to find all posts done on a certain date. With the embedded field, you must loop through every record, parse the entire array of all posts, extract the dates and check agains the one you want. This will chew up both cpu and disk i/0. For the database, you can easily index the date field and pull out the exact records you need without parsing every post from every user.

Standards: Using a vendor specific data structure means that moving your application to another database could be a pain. Postgres appears to have a rich set of data types, but they are not the same as MySQL, Oracle, SQL Server etc. If you stick with standard data types, you will have a much easier time swapping backends.

These are the main issues I see off the top. I have made this mistake and paid the price for it, so unless there is a super-compelling reason do do otherwise, I would use the separate table.


what if users John and Ann have the same thingies? the records will be duplicated and if you decide to change the name of thingie you will have to change two or more records. If thingie is stored in the separate table you have to change only one record. FYI https://en.wikipedia.org/wiki/Database_normalization


Benefits of one to many:

  1. Easier ORM (Object Relational Mapping) integration. You can use it either way, but you have to define your tables with native sql. Having distinct tables is easier and you can make use of auto-generated mappings.
  2. Your space limitation of 10,000 rows will go further with the one to many relationship in the case that 2 or more people can have the same "thingies."
  3. Handle users and thingies separately. In some cases, you might only care about people or thingies, not their relationship with each other. Some examples, updating a username or thingy description, getting a list of all thingies (or all users). Selecting from the single table can make it harding to work with.
  4. Maintenance and manipulation is easier. In the case that a user or a thingy is updated (name change, email address update, etc), you only need to update 1 record in their table instead of writing update statements "where user_id=?".
  5. Enforceable database constraints. What if a thingy is not owned by anyone? Is the user column now nillable? It would have to be in the single table case, so you could not enforce a simple "not nillable" username, for example.

There are a lot of reasons of course. If you are using a relational database, you should make use of the one to many by separating your objects (users and thingies) as separate tables. Considering your limitation on number of records and that the size of your dataset is small (under 10,000), you shouldn't feel the down side of normalized data.

The short truth is that there are benefits of both. You could, for example, get faster read times from the single table approach because you don't need complicated joins.

Here is a good reference with the pros/cons of both (normalized is the multiple table approach and denormalized is the single table approach).http://www.ovaistariq.net/199/databases-normalization-or-denormalization-which-is-the-better-technique/