Indexing data from postgres to solr/elasticsearch Indexing data from postgres to solr/elasticsearch postgresql postgresql

Indexing data from postgres to solr/elasticsearch


At the risk of someone marking this question as a duplicate, here's the link to setting up postgres-to-elasticsearch in another StackOverflow thread. There's also this blog post on Atlassian that also talks about how to get real time updates from PostgreSQL into ElasticSearch.

The Atlassian thread, for the tl;dr crowd, uses stored PGS procedures to copy updated/inserted data to a staging table, then separately processes the staging table. It's a nice approach that would work for either ES or Solr. Unfortunately, it's a roll-your-own solution, unless you are familiar with Clojure.


In case of Solr, a general approach is to use Data Import Handler (DIH for short). Config the full-import & delta-import sql properly, where delta import import data from database that changes since last import judging via timestamps (so, u need design schema with proper timestamps).

The timing of delta-import, has 2 styles which could be used separately or combined:

  • Do delta-import with a timer. (e.g every 5 minutes)
  • After each update in database, make a call to delta-import.

Refer to https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler for DIH detail.