Integrate Elasticsearch with PostgreSQL while using Sails.js with Waterline ORM Integrate Elasticsearch with PostgreSQL while using Sails.js with Waterline ORM elasticsearch elasticsearch

Integrate Elasticsearch with PostgreSQL while using Sails.js with Waterline ORM


Here you could find a pretty straightforward package (sails-elastic). It operates by configs directly from elasticsearch itself.

Elasticsearch docs and index creation in particular


There are lots of approach to solve this issue. The recommended way is to use logstash by elasticsearch which I have given in detail.I would list most of the approaches that I know here:

  1. Using Logstash

    curl https://download.elastic.co/logstash/logstash/logstash-2.3.2.tar.gz > logstash.tar.gztar -xzf logstash.tar.gzcd logstash-2.3.2

    Install the jdbc input plugin:

    bin/logstash-plugin install logstash-input-jdbc

    Then download postgresql jdbc driver.

    curl https://jdbc.postgresql.org/download/postgresql-9.4.1208.jre7.jar > postgresql-9.4.1208.jre7.jar

    Now create a configuration file for logstash to use jdbc input as input.conf:

    input {  jdbc {    jdbc_driver_library => "/Users/khurrambaig/Downloads/logstash-2.3.2/postgresql-9.4.1208.jre7.jar"    jdbc_driver_class => "org.postgresql.Driver"    jdbc_connection_string => "jdbc:postgresql://localhost:5432/khurrambaig"    jdbc_user => "khurrambaig"    jdbc_password => ""    schedule => "* * * * *"    statement => 'SELECT * FROM customer WHERE "updatedAt" > :sql_last_value'    type => "customer"  }  jdbc {    jdbc_driver_library => "/Users/khurrambaig/Downloads/logstash-2.3.2/postgresql-9.4.1208.jre7.jar"    jdbc_driver_class => "org.postgresql.Driver"    jdbc_connection_string => "jdbc:postgresql://localhost:5432/khurrambaig"    jdbc_user => "khurrambaig"    jdbc_password => ""    schedule => "* * * * *"    statement => 'SELECT * FROM employee WHERE "updatedAt" > :sql_last_value'    type => "employee"  }  # add more jdbc inputs to suit your needs}output {    elasticsearch {        index => "khurrambaig"        document_type => "%{type}"   # <- use the type from each input        document_id => "%{id}" # <- To avoid duplicates        hosts => "localhost:9200"    }}

    Now run logstash using the above file:

    bin/logstash -f input.conf

    For every model that you want to insert as a document(table) type in a index(database, khurrambaig here), use appropriate SQL statement ( SELECT * FROM employee WHERE "updatedAt" > :sql_last_value here). Here I have use sql_last_value to put only updated data only. You can do scheduling also and many stuff in logstash. Here I am using every minute. For more details refer this.

    To see the documents which has been inserted into index for a particular type:

    curl -XGET 'http://localhost:9200/khrm/user/_search?pretty=true'    

    This will list all the documents under customer models for my case. Look into elastic search api. Use that. Or use nodejs official client.

  2. Using jdbc input

    https://github.com/jprante/elasticsearch-jdbc

    You can read its readme. It's quite straightforward. But this doesn't provide scheduling and many of the things that are provided by logstash.

  3. Using sails-elastic

    You need to use multiple adapters as given in README.

    But this isn't recommended because it will slow down your requests. For every creation, updation and deletion, you will be calling two dbs : elastic search and postgresql.

    In logstash, indexing of documents is independent of requests. This approach is used by many including wikipedia. Also you remain independent of framework. Today you are using sails, tomorrow you might use something else but you don't need to change anything in case of logstash if you still use postgresql. (If you change db, even then many of the db's input are available and in case of change from any sql rdbms to another, you just need to change to jdbc driver)

There's zombodb also but it work for pre 2.0 elastic only currently (Support for > ES 2.0 coming also).