Do bulk inserts/update in MongoDB with PyMongo
You get an error because you try to insert documents with fields which conflict with that of existing documents on the second and subsequent insert_many
calls. You correctly inferred it may be due to your setting _id
explicitly, which would then conflict with existing _id
values in the collection.
MongoDB automatically creates an unique index on _id
, which forbids duplicating values.
You need to update or replace your documents on calls after the first one (which inserted the documents in their first version). There is indeed a concept of "upsert" which will take care of inserting non-previously-existing documents in the collection as well as updating the existing ones.
Your options:
Most efficient:
pymongo.collection.Collection.bulk_write
import pymongooperations = [pymongo.operations.ReplaceOne( filter={"_id": doc["_id"]}, replacement=doc, upsert=True ) for doc in json.loads(dfOut)]result = db["test"].bulk_write(operations)# handle results
Note that it's efficiency also depends on whether the field is indexed in the collection, which incidentally is the case for _id
.(also see pymongo.operations.ReplaceOne
)
Loop over your collection and calling
pymongo.collection.Collection.update_one
orpymongo.collection.Collection.replace_one
(inefficient because not bulk)import pymongoresults = []for doc in json.load(dfOut): result = db["test"].replace_one( filter={"_id": doc["_id"]}, replacement=doc, upsert=True ) results.append(result)# handle results
Note: pymongo.collection.Collection.update_many
seems unsuitable for your needs since you are not trying to set the same value on all matches of a given filter.
batch op error maybe caused by duplicate _id, So delete the same _id documents already in mongo before inserting
Or use update_manyhttps://api.mongodb.com/python/current/api/pymongo/collection.html?highlight=update#pymongo.collection.Collection.update_many
https://docs.mongodb.com/manual/reference/method/db.collection.updateMany/