Nosql database design - MongoDB Nosql database design - MongoDB mongodb mongodb

Nosql database design - MongoDB


You can do the second design you suggested but it all comes down to how you want to use the data. I assume you’re going to be using it for a website.

If you want the comments to be clickable, in such that clicking on the topic name will redirect to the topic’s page or clicking the username will redirect to the user’s page where you can see all his comments, i suggest you keep them as IDs. Since you can later use .populate(“field1 field2”) and you can select the fields you would like to get from that ID.

Alternatively you can store both the topic_name and username and their IDs in the same document to reduce queries, but you would end up storing more redundant data.


Revised design:

The three queries (in the question post) are likely to be like this (pseudo-code):

  • select all topics from comments, where date is today, group by topic and count comments, order by count (desc)
  • select topics from comments, where topic matches search, group by topic.
  • select all from comments, where topic matches topic_param, order by comment_date (desc).

So, as you had intended (in your question post) it is likely there will be one main collection, comments.

comments:

  date  author  text  topic

The user and topic collections with one field each, are optional, to maintain uniqueness.

Note the group-by queries will be aggregation queries, for example, the main query will be like this:

db.comments.aggregate( [  { $match: { date: ISODate("2019-11-15") } },  { $group: { _id: "$topic", count: { $sum: 1 } } },  { $sort: { count: -1 } }] )

This will give you all the topics names, today and with highest counted topics first.


You could also take a bit different approach. Storing information redundant is not a bad thing in all cases.

1. Homepage query: Listing all the topics, which have received the most comments today (will run very often)

You could implement this as two extra fields in your Topic entity. One describing the last date a comment was added and the second to count the amount of comments added that day. By doing so you do not need to join but can write a query that only looks at the Topic collection.

You could also store these statistics independently of the other data and update it when required. Think of this as having a document that describes your database its current state (at least those parts relevant to you).

This might give you a time penalty on storing information but it improves reading times.

2. Auto suggestion list for search field: Listing all the topics, whose title contains string "X"

Far as I understand this one you only need the topic title. Meaning you can query the database once and retrieve all titles. If the collection grows so big this becomes slow you could trigger a refresh of the retrieval query that only returns a subset (a user is not likely to go through 100 possible topics).

3. Main page of a topic query: Listing all the comments of a topic, with their authors' username.

This is actually the tricky one. If this is really what it is you want to do then you are most likely best off storing all data in one document. However I would ask you: what is the problem making more than one query? I doubt you will be showing all comments at once when there are thousands (as you say). Instead of storing each in a separate document or throwing all in one document, you could also bucket them and retrieve only the 20 most recent ones (if you would create buckets of size 20). Read more about the bucket pattern here and update the ones shown when required.