How to Ignore Duplicate Key Errors Safely Using insert_many How to Ignore Duplicate Key Errors Safely Using insert_many mongodb mongodb

How to Ignore Duplicate Key Errors Safely Using insert_many


You can deal with this by inspecting the errors produced with BulkWriteError. This is actually an "object" which has several properties. The interesting parts are in details:

import pymongofrom bson.json_util import dumpsfrom pymongo import MongoClientclient = MongoClient()db = client.testcollection = db.duptestdocs = [{ '_id': 1 }, { '_id': 1 },{ '_id': 2 }]try:  result = collection.insert_many(docs,ordered=False)except pymongo.errors.BulkWriteError as e:  print e.details['writeErrors']

On a first run, this will give the list of errors under e.details['writeErrors']:

[  {     'index': 1,    'code': 11000,     'errmsg': u'E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }',     'op': {'_id': 1}  }]

On a second run, you see three errors because all items existed:

[  {    "index": 0,    "code": 11000,    "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }",     "op": {"_id": 1}   },    {     "index": 1,     "code": 11000,     "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }",     "op": {"_id": 1}   },   {     "index": 2,     "code": 11000,     "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 2 }",     "op": {"_id": 2}   }]

So all you need do is filter the array for entries with "code": 11000 and then only "panic" when something else is in there

panic = filter(lambda x: x['code'] != 11000, e.details['writeErrors'])if len(panic) > 0:  print "really panic"

That gives you a mechanism for ignoring the duplicate key errors but of course paying attention to something that is actually a problem.


Adding more to Neil's solution.

Having 'ordered=False, bypass_document_validation=True' params allows new pending insertion to occur even on duplicate exception.

from pymongo import MongoClient, errorsDB_CLIENT = MongoClient()MY_DB = DB_CLIENT['my_db']TEST_COLL = MY_DB.dup_test_colldoc_list = [    {        "_id": "82aced0eeab2467c93d04a9f72bf91e1",        "name": "shakeel"    },    {        "_id": "82aced0eeab2467c93d04a9f72bf91e1",  # duplicate error: 11000        "name": "shakeel"    },    {        "_id": "fab9816677774ca6ab6d86fc7b40dc62",  # this new doc gets inserted        "name": "abc"    }]try:    # inserts new documents even on error    TEST_COLL.insert_many(doc_list, ordered=False, bypass_document_validation=True)except errors.BulkWriteError as e:    print(f"Articles bulk insertion error {e}")    panic_list = list(filter(lambda x: x['code'] != 11000, e.details['writeErrors']))    if len(panic_list) > 0:        print(f"these are not duplicate errors {panic_list}")

And since we are talking about duplicates its worth checking this solution as well.


The correct solution is to use a WriteConcern with w=0:

import pymongofrom pymongo.write_concern import WriteConcernmongodb_connection[db][collection].with_options(write_concern=WriteConcern(w=0)).insert_many(messages)