How to write a Pandas Dataframe to existing Django model How to write a Pandas Dataframe to existing Django model sqlite sqlite

How to write a Pandas Dataframe to existing Django model


To answer my own question, as I import data using Pandas into Django quite often nowadays, the mistake I was making was trying to use Pandas built-in Sql Alchemy DB ORM which was modifying the underlying database table definition. In the context above, you can simply use the Django ORM to connect and insert the data:

from myapp.models import Agencyclass Command(BaseCommand):    def handle(self, *args, **options):        # Process data with Pandas        agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})        # iterate over DataFrame and create your objects        for agency in agencies.itertuples():            agency = Agency.objects.create(name=agency.name)

However, you may often want to import data using an external script rather than using a management command, as above, or using Django's shell. In this case you must first connect to the Django ORM by calling the setup method:

import os, sysimport djangoimport pandas as pdsys.path.append('../..') # add path to project root diros.environ["DJANGO_SETTINGS_MODULE"] = "myproject.settings"# for more sophisticated setups, if you need to change connection settings (e.g. when using django-environ):#os.environ["DATABASE_URL"] = "postgres://myuser:mypassword@localhost:54324/mydb"# Connect to Django ORMdjango.setup()# process datafrom myapp.models import AgencyAgency.objects.create(name='MyAgency')
  • Here I have exported my settings module myproject.settings to the DJANGO_SETTINGS_MODULE so that django.setup() can pick up the project settings.

  • Depending on where you run the script from, you may need to path to the system path so Django can find the settings module. In this case, I run my script two directories below my project root.

  • You can modify any settings before calling setup. If your script needs to connect to the DB differently than whats configured in settings. For example, when running a script locally against Django/postgres Docker containers.

Note, the above example was using the django-environ to specify DB settings.


For those looking for a more performant and up-to-date solution, I would suggest using manager.bulk_create and instantiating the django model instances, but not creating them.

model_instances = [Agency(name=agency.name) for agency in agencies.itertuples()]Agency.objects.bulk_create(model_instances)

Note that bulk_create does not run signals or custom saves, so if you have custom saving logic or signal hooks for Agency model, that will not be triggered. Full list of caveats below.

Documentation: https://docs.djangoproject.com/en/3.0/ref/models/querysets/#bulk-create