Database needed with elasticsearch? Database needed with elasticsearch? elasticsearch elasticsearch

Database needed with elasticsearch?


I faced a similar problem before, on a elasticsearch setup with a mysql with the data. The solution was to store only the data that was needed to be searched on elasticsearch, with a reference to the relational database. If the data on elasticsearch was enough for the request, I returned only the elasticsearch record. If it wasn't I went to the relational database and returned that record instead.

I divided in these two processes because of the lag that the relational database introduced (it was an API for a high demand web service, elasticsearch was faster). That introduced a synchronization problem, but that was not critical on my application and we pulled periodically the data from the relational db and reindexed only the changed data set on elasticsearch. Elasticsearch can reindex only a subset of records.

We considered not using a db and storing everything in the search engine, but it depends on the importance of your data. If you can't risk losing any part of your data, don't store only on elasticsearch. We always considered the data in elasticsearch as perishable and that it the search indexes could be reconstructed from the database.


Coming from the hibernate-search school of thought, I'm confused on whether or not your suppose to store your entire data model in elasticsearch and do away with the traditional database or if your suppose to store your search data in the indexes and again like hibernate-search return primary keys to pull complete records from your relational database.

You could store everything, but you're going to get better scalability if you just store the fields that need to be searched. The smaller the records, the smaller the index and the more that can fit into a given amount of RAM.

If your using the indexes with a a db, should you be manually maintaining them during transactions? I seen a jdbc project called river, but it looks to be deprecated and not recommended for production use, is there a library out there capable of automatically handling your transactions for you?

I'm using Spring transaction synchronization for this. Basically triggering asynchronous reindexing after the transaction has been successfully committed.

What would be the drawback of not using a database for anything search related?

ES isn't a database and doesn't support transactional operations across documents.


Note that the Hibernate Search / Elasticsearch integration is almost ready now, and making progress quickly: