Social web application database design: how can I improve this schema? Social web application database design: how can I improve this schema? database database

Social web application database design: how can I improve this schema?


In general, is there anything I'm doing wrong or can improve upon?

Overall, I don't see any big flaws in your current setup or schema.

What I'm wonderng is your split into 3 User* tables. I get what you want your intendtion was (having different user-related things seperate) but I don't know if I would go with the exact same thing. If you plan on displaying only data from the User table on the site, this is fine, since the other info is not needed multiple times on the same page but if users need to use their real name and display their real name (like John Doe instead of doe55) than this will slow down things when the data gets bigger since you may require joins. Having the Preferences seperate seems like a personal choice. I have no argument in favor of nor against it.

Your many-to-many tables would not need an addtional PK (e.g PostFavoriteID). A combined primary of both PostID and UserID would be enough since PostFavoriteID is never used anywhere else. This goes for all join tables

Is there any reason why I shouldn't combine the ExternalAccounts table into the UserProfiles table?

As withe the prev. answer, I don't see a advatanage or disadvantage. I may put both in the same table since the NULL (or maybe better -1) values would not bother me.

Is there any reason why I shouldn't combine the PostStats table into the Posts table?

I would put them into the same table using a trigger to handle the increment of the ViewCount table

Should I expand the design to include the features we are doing in the second version just to ensure that the initial schema can support it?

You are using a normalsied schema so any additions can be done at any time.

Is there anything I can do to optimize the DB design for Solr indexing/performance/whatever?

Can't tell you, haven't done it yet but I know that Solr is very powerfull and flexible so I think you should be doing fine.

Should I be using more natural primary keys, like Username instead of UserID, or zip/area code instead of a surrogate LocationID in the Locations table?

There are many threads here on SO discussing this. Personally, I like a surrogate key better (or another unique number key if available) since it makes queries more easier and faster since an int is looked up easier. If you allow a change of username/email/whatever-your-PK-is than there are massive updates required. With the surrogate key, you don't need to bother.

What I would also do is to add things like created_at, last_accessed at (best done via triggers or procedures IMO) to have some stats already available. This can realy give you valuable stats

Further strategies to increate the performance would be things like memcache, counter cache, partitioned tables,... Such things can be discussed when you are really overrun by users because there may be things/technologies/techniques/... that are very specific to your problem.


I'm not clear what's going on with your User* tables - they're set up as if they're 1:1 but the diagram reflects 1-to-many (the crow's foot symbol).

The ExternalAccounts and UserSettings could be normalised further (in which case they would then be 1-to-many!), which will give you a more maintainable design - you wouldn't need to add further columns to your schema for additional External Account or Notification Types (although this may be less scalable in terms of performance).

For example:

ExternalAccounts    UserId int,    AccountType varchar(45),      AccountIdentifier varchar(45)

will allow you to store LinkedIn, Google, etc. accounts in the same structure. Similarly, further Notification Types can be readily added using a structure like:

UserSettings    UserId int,      NotificationType varchar(45),      NotificationFlag ENUM('on','off')

hth