elasticsearch v.s. MongoDB for filtering application [closed] elasticsearch v.s. MongoDB for filtering application [closed] elasticsearch elasticsearch

elasticsearch v.s. MongoDB for filtering application [closed]


First off, there is an important distinction to make here: MongoDB is a general purpose database, Elasticsearch is a distributed text search engine backed by Lucene. People have been talking about using Elasticsearch as a general purpose database but know that it was not its' original design. I think that general purpose NoSQL databases and search engines are headed for consolidation but as it stands, the two come from two very different camps.

We are using both MongoDB and Elasticsearch in my company. We store our data in MongoDB and use Elasticsearch exclusively for its' full-text search capabilities. We only send a subset of the mongo data fields that we need to query to elastic. Our use case differs from yours in that our Mongo data changes all the time: a record, or a subset of the fields of a record, can be updated several times a day and this can call for re-indexing of that record to elastic. For that reason alone, using elastic as the sole data store is not a good option for us, as we can't update select fields; we would need to re-index a document in its' entirety. This is not an elastic limitation, this is how Lucene works, the underlying search engine behind elastic. In your case, the fact that records won't be changed once stored saves you from having to make that choice. Having said that, if data safety is a concern, I would think twice about using Elasticsearch as the only storage mechanism for your data. It may get there at some point but I'm not sure it's there yet.

In terms of speed, not only is Elastic/Lucene on par with the querying speed of Mongo, in your case where there is "very little constant in terms of which fields are used for the filtering at any moment", it could be orders of magnitude faster, especially as the datasets become larger. The difference lies in the underlying query implementations:

  • Elastic/Lucene use the Vector Space Model and inverted indexes for Information Retrieval, which are highly efficient ways of comparing record similarity against a query. When you query Elastic/Lucene, it already knows the answer; most of its' work lies in ranking the results for you by the most likely ones to match your query terms. This is an important point: search engines, as opposed to databases, can't guarantee you exact results; they rank results by how close they get to your query. It just so happens that most of the times, the results are close to exact.
  • Mongo's approach is that of a more general purpose data store; it compares JSON documents against one another. You can get great performance out of it by all means, but you need to carefully craft your indexes to match the queries you will be running. Specifically, if you have multiple fields by which you will query, you need to carefully craft your compound keys so that they reduce the dataset that will be queried as fast as possible. E.g. your first key should filter down the majority of your dataset, your second should further filter down what left, and so on and so forth. If your queries don't match the keys and the order of those keys in the defined indexes, your performance will drop quite a bit. On the other hand, Mongo is a true database, so if accuracy is what what you need, the answers it will give will be spot on.

For expiring old records, Elastic has a built in TTL feature. Mongo just introduced it as of version 2.2 I think.

Since I don't know your other requirements such as expected data size, transactions, accuracy or what your filters will look like, it's hard to make any specific recommendations. Hopefully, there is enough here to get you started.