Pros and Cons of using MongoDB instead of MS SQL Server [closed] Pros and Cons of using MongoDB instead of MS SQL Server [closed] mongodb mongodb

Pros and Cons of using MongoDB instead of MS SQL Server [closed]


I am myself a starter on NoSQL databases. So I am answering this at the expense of potential down votes but it will be a great learning experience for me.

Before trying my best to answer your questions I should say that if MS SQL Server is working well for you then stick with it. You have not mentioned any valid reason WHY you want to use MongoDB except the fact that you learnt about it as a document oriented db. Moreover I see that you have almost the same set of meta-data you are capturing for each camera i.e. your schema is dynamic.

  • to tell if MongoDB is good for holding such data, which eventually will be queried against time ranges (e.g. retrieve all images of a particular camera between a specified hour)? Any suggestions about Document Based schema design for my case?

MongoDB being a document oriented db, is good at querying within an aggregate (you call it document). Since you already are storing each camera's data in its own table, in MongoDB you will have a separate collection created for each camera. Here is how you perform date range queries.

  • What should be the specs of server (CPU, RAM, Disk)? any suggestion?

All NoSQL data bases are built to scale-out on commodity hardware. But by the way you have asked the question, you might be thinking of improving performance by scaling-up. You can start with a reasonable machine and as the load increases, you can keep adding more servers (scaling-out). You no need to plan and buy a high end server.

  • Should i consider Sharding/Replication for this scenario (while considering the performance in writing to synch replica sets)?

MongoDB locks the entire db for a single write (but yields for other operations) and is meant for systems which have more reads than writes. So this depends upon how your system is. There are multiple ways of sharding and should be domain specific. A generic answer is not possible. However some examples can be given like sharding by geography, by branches etc.

Also read A plain english introduction to CAP Theorem

Updated with answer to the comment on sharding

According to their documentation, You should consider deploying a sharded cluster, if:

  • your data set approaches or exceeds the storage capacity of a single node in your system.
  • the size of your system’s active working set will soon exceed the capacity of the maximum amount of RAM for your system.
  • your system has a large amount of write activity, a single MongoDB instance cannot write data fast enough to meet demand, and all other approaches have not reduced contention.

So based upon the last point yes. The auto-sharding feature is built to scale writes. In that case, you have a write lock per shard, not per database. But mine is a theoretical answer. I suggest you take consultation from 10gen.com group.


to tell if MongoDB is good for holding such data, which eventually will be queried against time ranges (e.g. retrieve all images of a particular camera between a specified hour)?

This quiestion is too subjective for me to answer. From personal experience with numerous SQL solutions (ironically not MS SQL) I would say they are both equally as good, if done right.

Also:

What should be the specs of server (CPU, RAM, Disk)? any suggestion?

Depends on too many variables that only you know, however a small cluster of commodity hardware works quite well. I cannot really give a factual response to this question and it will come down to your testing.

As for a schema I would go for a document of the structure:

{    _id: {},    camera_name: "my awesome camera",    images: [        {             url: "http://I_like_S3_here.amazons3.com/my_image.png" ,            // All your other fields per image        }    ]}

This should be quite easy to mantain and update so long as you are not embedding much deeper since then it could become a bit of pain, however, that depends upon your queries.

Not only that but this should be good for sharding since you have all the data you need in one document, if you were to shard on _id you could probably get the perfect setup here.

Should i consider Sharding/Replication for this scenario (while considering the performance in writing to synch replica sets)?

Possibly, many people assume they need to shard when in reality they just need to be more intelligent in how they design the database. MongoDB is very free form so there are a lot of ways to do it wrong, but that being said, there are also a lot of ways of dong it right. I personally would keep sharding in mind. Replication can be very useful too.

Are there any benefits of using multiple databases on same machine, so that one database will hold images of current day for all cameras, and the second one will be used to archive previous day images?

Even though MongoDBs write lock is on DB level (currently) I would say: No. The right document structure and the right sharding/replication (if needed) should be able to handle this in a single document based collection(s) under a single DB. Not only that but you can direct writes and reads within a cluster to certain servers so as to create a concurrency situation between certain machines in your cluster. I would promote the correct usage of MongoDBs concurrency features over DB separation.

Edit

After reading the question again I omitted from my solution that you are inserting 80k+ images for each camera a day. As such instead of the embedded option I would actually make a row per image in a collection called images and then a camera collection and query the two like you would in SQL.

Sharding the images collection should be just as easy on camera_id.

Also make sure you take you working set into consideration with your server.


to tell if MongoDB is good for holding such data, which eventually will be queried against time ranges (e.g. retrieve all images of a particular camera between a specified hour)? Any suggestions about Document Based schema design for my case?

MongoDB can do this. For better performance, you can set an index on your time field.

What should be the specs of server (CPU, RAM, Disk)? any suggestion?

I think RAM and Disk would be important.

  • If you don't want to do sharding to scale out, you should consider a larger size of disk so you can store all your data in it.
  • Your hot data should can fit into your RAM. If not, then you should consider a larger RAM because the performance of MongoDB mainly depends on RAM.

Should i consider Sharding/Replication for this scenario (while considering the performance in writing to synch replica sets)?

I don't know many cameras do you have, even 1000 inserts/second with total 1000 cameras should still be easy to MongoDB. If you are concerning insert performance, I don't think you need to do sharding(Except the data size are too big that you have to separate them into several machines).

Another problem is the read frequency of your application. It it is very high, then you can consider sharding or replication here.And you can use (timestamp + camera_id) as your sharding key if your query only on one camera in a time range.

Are there any benefits of using multiple databases on same machine, so that one database will hold images of current day for all cameras, and the second one will be used to archive previous day images?

You can separate the table into two collections(archive and current). And set index only on archive if you only query date on archive. Without the overhead of index creation, the current collection should benefit with insert.

And you can write a daily program to dump the current data into archive.