Using elasticsearch as central data repository Using elasticsearch as central data repository elasticsearch elasticsearch

Using elasticsearch as central data repository


As is the case with all database deployments, it really depends on your specific application.

Elasticsearch is a great open source search engine built on top of Apache Lucene. Its features and upgrades allow it to basically function just like a schema-less JSON datastore that can be accessed using both search-specific methods and regular database CRUD-like commands.

Nevertheless all the advantages Elasticsearch that brings, there are still some main disadvantages:

  • Security - Elasticsearch does not provide any authentication or access control functionality. It's supported since they have introduced shield.

  • Transactions - There is no support for transactions or processing on data manipulation. Well now data manipulation is handled with logstash.

  • Durability - ES is distributed and fairly stable but backups and durability are not as high priority as in other data stores.

  • Maturity of tools - ES is still relatively new and has not had time to develop mature client libraries and 3rd party tools which can make development much harder. We can consider that it's quite mature now with a variety of connectors and tools around it like kibana. But it's still not suited for large computations - Commands for searching data are not suited to "large" scans of data and advanced computation on the db side.

  • Data Availability - ES makes data available in "near real-time" which may require additional considerations in your application (ie: comments page where a user adds new comment, refreshing the page might not actually show the new post because the index is still updating).

If you can deal with these issues then there's certainly no reason why you can't use Elasticsearch as your primary data store. It can actually lower complexity and improve performance by not having to duplicate your data but again this depends on your specific use case.

As always, weigh the benefits, do some experimentation and see what works best for you.

DISCLAIMER: This answer was written a while ago for the Elasticsearch 1.x series. These critics still somehow stand with the 2.x series. But Elastic is working on them, as the 2.x series comes with more mature tools, APIs and plugins per example, security wise, like Shield or even transport clients like Logstash or Beats, etc.


I'd highly discourage most users from using elasticsearch as your primary datastore. It will work great until your cluster melts down due to a network partition. Even settings such as minimum_master_nodes that the ES pros always set won't save you. See this excellent analysis by Aphyr with his Call Me Maybe series:http://aphyr.com/posts/317-call-me-maybe-elasticsearch

eliasah, is right, it depends on your use case, but if your data (and job) is important to you, stay away.

Keep your golden record of your data stored in something really focused on persisting and sync your data out to search from there. It adds extra complexity and resources, but will result in a better nights rest :)

There are plenty of ways to go about this and if elasticsearch does everything you need, you can look into Kafka for persisting all the events going into a cluster which would allow replaying if things go wrong. I like this approach as it provides an async ingestion pipeline into elasticsearch that also does the persistence.