How optimize adding new nodes in `django-mptt`? How optimize adding new nodes in `django-mptt`? database database

How optimize adding new nodes in `django-mptt`?


Firstly, don't use insert_at. It's not the reason for slow performance, but it's unnecessary and looks ugly. Just set node.parent:

for new_record in new_records:    new_node = MyMPTTModel(..., parent=get_parent(new_record))    new_node.save()

Now for the performance question. If you're using the latest mptt (git master, not 0.5.4), there's a context manager called delay_mptt_updates to prevent mptt from doing a lot of these updates until you've added all the nodes:

with transaction.atomic():    with MyMPTTModel.objects.delay_mptt_updates():        for new_record in new_records:            new_node = MyMPTTModel(..., parent=get_parent(new_record))            new_node.save()

Alternatively if you're touching almost the entire tree, you can speed things up even more by using disable_mptt_updates and rebuild the whole tree at the end:

with transaction.atomic():    with MyMPTTModel.objects.disable_mptt_updates():        for new_record in new_records:            new_node = MyMPTTModel(..., parent=get_parent(new_record))            new_node.save()    MyMPTTModel.objects.rebuild()


Django-MPTT maintains a tree structure for you. So at each insert_at, it will modify all the nodes to the right of the inserted one - which is why you are experiencing performance issues.

One way is to manually construct the tree structure without django-mptt.

So you will have to take the new records, and according to them, figure out how old nodes in the tree have to be modified. Since you are only inserting data, only left and right attributes will change, but not level, so that should make it a little easier. Once you will know what nodes to modify, you can then modify them using one update transaction (edit).

Then, you can start to insert new data. Again, the fastest way is to calculate the left, right, and level values for each new entry and then do one bulk_insert (Django>=1.4). Doing this will result in only two db operations which obviously should be much faster in terms of db transactions.

This method however will require some smart way to figure out how to change old nodes in the tree. The easiest way is to dump all of the tree into python structure, and then figure out the changes on that structure. That however will not be feasible if your tree is very large due to memory limitation.

Right now not sure if there is any more efficient way of doing this. Maybe somebody else on StackOverflow has some cool ideas...

EDIT

Sorry about the update confusion. I meant one transaction. In cases like this I usually do raw sql query where I do update tbname set ... where id=1; update tbname set ... where id=2; So I do multiple sql statements in one sql query. From my experience, the expensive part of the db is not execution of a statement, but the transaction itself since there is network latency, db locks, etc. So having one transaction allows the db to be fast as possible. Not sure however how to do that in django using querysets. I usually do raw sql query.