How to Ignore Duplicate Key Errors Safely Using insert_many
You can deal with this by inspecting the errors produced with BulkWriteError
. This is actually an "object" which has several properties. The interesting parts are in details
:
import pymongofrom bson.json_util import dumpsfrom pymongo import MongoClientclient = MongoClient()db = client.testcollection = db.duptestdocs = [{ '_id': 1 }, { '_id': 1 },{ '_id': 2 }]try: result = collection.insert_many(docs,ordered=False)except pymongo.errors.BulkWriteError as e: print e.details['writeErrors']
On a first run, this will give the list of errors under e.details['writeErrors']
:
[ { 'index': 1, 'code': 11000, 'errmsg': u'E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }', 'op': {'_id': 1} }]
On a second run, you see three errors because all items existed:
[ { "index": 0, "code": 11000, "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }", "op": {"_id": 1} }, { "index": 1, "code": 11000, "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }", "op": {"_id": 1} }, { "index": 2, "code": 11000, "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 2 }", "op": {"_id": 2} }]
So all you need do is filter the array for entries with "code": 11000
and then only "panic" when something else is in there
panic = filter(lambda x: x['code'] != 11000, e.details['writeErrors'])if len(panic) > 0: print "really panic"
That gives you a mechanism for ignoring the duplicate key errors but of course paying attention to something that is actually a problem.
Adding more to Neil's solution.
Having 'ordered=False, bypass_document_validation=True' params allows new pending insertion to occur even on duplicate exception.
from pymongo import MongoClient, errorsDB_CLIENT = MongoClient()MY_DB = DB_CLIENT['my_db']TEST_COLL = MY_DB.dup_test_colldoc_list = [ { "_id": "82aced0eeab2467c93d04a9f72bf91e1", "name": "shakeel" }, { "_id": "82aced0eeab2467c93d04a9f72bf91e1", # duplicate error: 11000 "name": "shakeel" }, { "_id": "fab9816677774ca6ab6d86fc7b40dc62", # this new doc gets inserted "name": "abc" }]try: # inserts new documents even on error TEST_COLL.insert_many(doc_list, ordered=False, bypass_document_validation=True)except errors.BulkWriteError as e: print(f"Articles bulk insertion error {e}") panic_list = list(filter(lambda x: x['code'] != 11000, e.details['writeErrors'])) if len(panic_list) > 0: print(f"these are not duplicate errors {panic_list}")
And since we are talking about duplicates its worth checking this solution as well.