How to avoid inserting a duplicate document to ElasticSearch
Are you using your ID as the document _id
? Then it should be easy by using the operation type where you can specify that a document with a specific ID should only be created, but not overwritten:
PUT your-index/your-type/123456/_create{ "foo" : "bar",}
when you pushing data to elastic with bulk api, you can perform index action, and use as _id your source data ID, in that case elastic will create or replace document (if document with same id exist), here is example of bulk action
function createBulkBody(items, indexName) { var result = []; _.forEach(items, function(item) { result.push({ index: { _index: indexName, _type: item.type, _id: item.ID } }); result.push(item); }); return result;}
And then push data with bulk api,
var body = createBulkBody(items, indexName); esClient.bulk({ body: body }, function(err, resp) { if (err) { console.log(err); } else { console.log(resp); } });
Hope this helps
If you want to check for the existence of an item before trying to insert it, you can just query your db for this document. If the result is not empty, this means that a document with this id
already exists.
You can use a term
query for that:
q = {'term': {'id': '123456'}}
I suppose it will be quite time-consuming, but it is a way to be sure that no duplicate will be inserted.