Periodic tasks in Django on Elastic Beanstalk (possibly with celery beat) Periodic tasks in Django on Elastic Beanstalk (possibly with celery beat) django django

Periodic tasks in Django on Elastic Beanstalk (possibly with celery beat)


If you're using redis as your broker, look into installing RedBeat as the celery beat scheduler: https://github.com/sibson/redbeat

This scheduler uses locking in redis to make sure only a single beat instance is running. With this you can enable beat on each node's worker process and remove the use of leader_only=True.

celery worker -B -S redbeat.RedBeatScheduler

Let's say you have Worker A with beat lock and Worker B. If Worker A dies, Worker B will attempt to acquire the beat lock after a configurable amount of time.


I would suggest making a management command that runs with cron.

Using this method, you have your full Django ORM, all methods, etc. to work with. Wrapping your script in a try/except, you have the option to log failures in any way that you wish - email notifications, external logging systems like Sentry, straight to the DB, etc.

I user supervisord to run cron and it works well. It relies on time-tested tools that won't let you down.

Finally, using a database singleton to keep track of if a batch job has been run or is currently running in an environment where you have multiple instances of Django running load-balanced isn't bad practice, even if you feel a little icky about it. The DB is a very reliable means of telling you if the DB is being processed.

The one annoying thing about cron is that it doesn't import environment variables you may need for Django. I solved this with a simple Python script.

It writes the crontab on startup with needed environment variables etc. included. This example is for Ubuntu on EBS but should be relevant.

#!/usr/bin/env python# run-cron.py# sets environment variable crontab fragments and runs cronimport osfrom subprocess import callfrom master.settings import IS_AWS# read django's needed environment variables and set them in the appropriate crontab fragmenteRDS_HOSTNAME = os.environ["RDS_HOSTNAME"]eRDS_DB_NAME = os.environ["RDS_DB_NAME"]eRDS_PASSWORD = os.environ["RDS_PASSWORD"]eRDS_USERNAME = os.environ["RDS_USERNAME"]try:    eAWS_STAGING = os.environ["AWS_STAGING"]except KeyError:    eAWS_STAGING = Nonetry:    eAWS_PRODUCTION = os.environ["AWS_PRODUCTION"]except KeyError:    eAWS_PRODUCTION = NoneeRDS_PORT = os.environ["RDS_PORT"]if IS_AWS:    fto = '/etc/cron.d/stortrac-cron'else:    fto = 'test_cron_file'with open(fto,'w+') as file:    file.write('# Auto-generated cron tab that imports needed variables and runs a python script')    file.write('\nRDS_HOSTNAME=')    file.write(eRDS_HOSTNAME)    file.write('\nRDS_DB_NAME=')    file.write(eRDS_DB_NAME)    file.write('\nRDS_PASSWORD=')    file.write(eRDS_PASSWORD)    file.write('\nRDS_USERNAME=')    file.write(eRDS_USERNAME)    file.write('\nRDS_PORT=')    file.write(eRDS_PORT)    if eAWS_STAGING is not None:        file.write('\nAWS_STAGING=')        file.write(eAWS_STAGING)    if eAWS_PRODUCTION is not None:        file.write('\nAWS_PRODUCTION=')        file.write(eAWS_PRODUCTION)    file.write('\n')    # Process queue of gobs    file.write('\n*/8 * * * * root python /code/app/manage.py queue --process-queue')    # Every 5 minutes, double-check thing is done    file.write('\n*/5 * * * * root python /code/app/manage.py thing --done')    # Every 4 hours, do this    file.write('\n8 */4 * * * root python /code/app/manage.py process_this')    # etc.    file.write('\n3 */4 * * * root python /ode/app/manage.py etc --silent')    file.write('\n\n')if IS_AWS:    args = ["cron","-f"]    call(args)

And in supervisord.conf:

[program:cron]command = python /my/directory/runcron.pyautostart = trueautorestart = false