Sync elasticsearch & cassandra with postgres database Sync elasticsearch & cassandra with postgres database elasticsearch elasticsearch

Sync elasticsearch & cassandra with postgres database


Basically, you'll need to use method described here https://qafoo.com/blog/086_how_to_synchronize_a_database_with_elastic_search.html and insert & select data from one database table. But make sure you limit the number of selects when selection data in "updates" eg: LIMIT 100.

Here's the workflow:

  1. save data to "updates" table during insert/update (if delete make sure you mark it as deleted in a column)
  2. then run this process ->

    • select you last insert: sequence_id from elasticsearch or Cassandra
    • use it to select data from "updates" table like so: id > :sequence_id

You can then insert data (into elasticsearch or cassandra) or do whatever. Make sure you insert data into "updates" table before dependent databases. And there is no need to duplicate document_id so replace them with new one. This gives consistency and allows you to choose between running cron job or sync it during a specific action all at once. Then update your sequence_id to the last one.

I choose to sync data straight after insert/update/delete to "updates", Then I do res.end() (or whatever to finish response) and use sync() function to select 100 new records in ascending order. I also run a cron job every 24 hours (without LIMIT 100) to make sure any data that was left out will be synced. Ohh yeah, and if updates were successful for all databases, then you might as well delete records from "updates" unless you use it for some other reason. But note that elasticsearch can loose data in memory

Good luck :) And I am opened to suggestions