How to avoid inserting a duplicate document to ElasticSearch How to avoid inserting a duplicate document to ElasticSearch elasticsearch elasticsearch

How to avoid inserting a duplicate document to ElasticSearch


Are you using your ID as the document _id? Then it should be easy by using the operation type where you can specify that a document with a specific ID should only be created, but not overwritten:

PUT your-index/your-type/123456/_create{    "foo" : "bar",}


when you pushing data to elastic with bulk api, you can perform index action, and use as _id your source data ID, in that case elastic will create or replace document (if document with same id exist), here is example of bulk action

function createBulkBody(items, indexName) {  var result = [];  _.forEach(items, function(item) {    result.push({      index: {        _index: indexName,        _type: item.type,        _id: item.ID      }    });    result.push(item);  });  return result;}

And then push data with bulk api,

   var body = createBulkBody(items, indexName);   esClient.bulk({     body: body   }, function(err, resp) {     if (err) {       console.log(err);     } else {     console.log(resp);     }   });

Hope this helps


If you want to check for the existence of an item before trying to insert it, you can just query your db for this document. If the result is not empty, this means that a document with this id already exists.

You can use a term query for that:

q = {'term': {'id': '123456'}}

I suppose it will be quite time-consuming, but it is a way to be sure that no duplicate will be inserted.