Recommended SQL database design for tags or tagging [closed] Recommended SQL database design for tags or tagging [closed] sql sql

Recommended SQL database design for tags or tagging [closed]


Three tables (one for storing all items, one for all tags, and one for the relation between the two), properly indexed, with foreign keys set running on a proper database, should work well and scale properly.

Table: ItemColumns: ItemID, Title, ContentTable: TagColumns: TagID, TitleTable: ItemTagColumns: ItemID, TagID


Normally I would agree with Yaakov Ellis but in this special case there is another viable solution:

Use two tables:

Table: ItemColumns: ItemID, Title, ContentIndexes: ItemIDTable: TagColumns: ItemID, TitleIndexes: ItemId, Title

This has some major advantages:

First it makes development much simpler: in the three-table solution for insert and update of item you have to lookup the Tag table to see if there are already entries. Then you have to join them with new ones. This is no trivial task.

Then it makes queries simpler (and perhaps faster). There are three major database queries which you will do: Output all Tags for one Item, draw a Tag-Cloud and select all items for one Tag Title.

All Tags for one Item:

3-Table:

SELECT Tag.Title   FROM Tag   JOIN ItemTag ON Tag.TagID = ItemTag.TagID WHERE ItemTag.ItemID = :id

2-Table:

SELECT Tag.TitleFROM TagWHERE Tag.ItemID = :id

Tag-Cloud:

3-Table:

SELECT Tag.Title, count(*)  FROM Tag  JOIN ItemTag ON Tag.TagID = ItemTag.TagID GROUP BY Tag.Title

2-Table:

SELECT Tag.Title, count(*)  FROM Tag GROUP BY Tag.Title

Items for one Tag:

3-Table:

SELECT Item.*  FROM Item  JOIN ItemTag ON Item.ItemID = ItemTag.ItemID  JOIN Tag ON ItemTag.TagID = Tag.TagID WHERE Tag.Title = :title

2-Table:

SELECT Item.*  FROM Item  JOIN Tag ON Item.ItemID = Tag.ItemID WHERE Tag.Title = :title

But there are some drawbacks, too: It could take more space in the database (which could lead to more disk operations which is slower) and it's not normalized which could lead to inconsistencies.

The size argument is not that strong because the very nature of tags is that they are normally pretty small so the size increase is not a large one. One could argue that the query for the tag title is much faster in a small table which contains each tag only once and this certainly is true. But taking in regard the savings for not having to join and the fact that you can build a good index on them could easily compensate for this. This of course depends heavily on the size of the database you are using.

The inconsistency argument is a little moot too. Tags are free text fields and there is no expected operation like 'rename all tags "foo" to "bar"'.

So tldr: I would go for the two-table solution. (In fact I'm going to. I found this article to see if there are valid arguments against it.)


If you are using a database that supports map-reduce, like couchdb, storing tags in a plain text field or list field is indeed the best way. Example:

tagcloud: {  map: function(doc){     for(tag in doc.tags){       emit(doc.tags[tag],1)     }  }  reduce: function(keys,values){    return values.length  }}

Running this with group=true will group the results by tag name, and even return a count of the number of times that tag was encountered. It's very similar to counting the occurrences of a word in text.