Do bulk inserts/update in MongoDB with PyMongo Do bulk inserts/update in MongoDB with PyMongo pandas pandas

Do bulk inserts/update in MongoDB with PyMongo


You get an error because you try to insert documents with fields which conflict with that of existing documents on the second and subsequent insert_many calls. You correctly inferred it may be due to your setting _id explicitly, which would then conflict with existing _id values in the collection.

MongoDB automatically creates an unique index on _id, which forbids duplicating values.

You need to update or replace your documents on calls after the first one (which inserted the documents in their first version). There is indeed a concept of "upsert" which will take care of inserting non-previously-existing documents in the collection as well as updating the existing ones.

Your options:

  • Most efficient: pymongo.collection.Collection.bulk_write

    import pymongooperations = [pymongo.operations.ReplaceOne(    filter={"_id": doc["_id"]},     replacement=doc,    upsert=True    ) for doc in json.loads(dfOut)]result = db["test"].bulk_write(operations)# handle results

Note that it's efficiency also depends on whether the field is indexed in the collection, which incidentally is the case for _id.(also see pymongo.operations.ReplaceOne)

Note: pymongo.collection.Collection.update_many seems unsuitable for your needs since you are not trying to set the same value on all matches of a given filter.