Periodic tasks in Django on Elastic Beanstalk (possibly with celery beat)
If you're using redis as your broker, look into installing RedBeat as the celery beat scheduler: https://github.com/sibson/redbeat
This scheduler uses locking in redis to make sure only a single beat instance is running. With this you can enable beat on each node's worker process and remove the use of leader_only=True
.
celery worker -B -S redbeat.RedBeatScheduler
Let's say you have Worker A with beat lock and Worker B. If Worker A dies, Worker B will attempt to acquire the beat lock after a configurable amount of time.
I would suggest making a management command that runs with cron.
Using this method, you have your full Django ORM, all methods, etc. to work with. Wrapping your script in a try/except, you have the option to log failures in any way that you wish - email notifications, external logging systems like Sentry, straight to the DB, etc.
I user supervisord to run cron and it works well. It relies on time-tested tools that won't let you down.
Finally, using a database singleton to keep track of if a batch job has been run or is currently running in an environment where you have multiple instances of Django running load-balanced isn't bad practice, even if you feel a little icky about it. The DB is a very reliable means of telling you if the DB is being processed.
The one annoying thing about cron is that it doesn't import environment variables you may need for Django. I solved this with a simple Python script.
It writes the crontab on startup with needed environment variables etc. included. This example is for Ubuntu on EBS but should be relevant.
#!/usr/bin/env python# run-cron.py# sets environment variable crontab fragments and runs cronimport osfrom subprocess import callfrom master.settings import IS_AWS# read django's needed environment variables and set them in the appropriate crontab fragmenteRDS_HOSTNAME = os.environ["RDS_HOSTNAME"]eRDS_DB_NAME = os.environ["RDS_DB_NAME"]eRDS_PASSWORD = os.environ["RDS_PASSWORD"]eRDS_USERNAME = os.environ["RDS_USERNAME"]try: eAWS_STAGING = os.environ["AWS_STAGING"]except KeyError: eAWS_STAGING = Nonetry: eAWS_PRODUCTION = os.environ["AWS_PRODUCTION"]except KeyError: eAWS_PRODUCTION = NoneeRDS_PORT = os.environ["RDS_PORT"]if IS_AWS: fto = '/etc/cron.d/stortrac-cron'else: fto = 'test_cron_file'with open(fto,'w+') as file: file.write('# Auto-generated cron tab that imports needed variables and runs a python script') file.write('\nRDS_HOSTNAME=') file.write(eRDS_HOSTNAME) file.write('\nRDS_DB_NAME=') file.write(eRDS_DB_NAME) file.write('\nRDS_PASSWORD=') file.write(eRDS_PASSWORD) file.write('\nRDS_USERNAME=') file.write(eRDS_USERNAME) file.write('\nRDS_PORT=') file.write(eRDS_PORT) if eAWS_STAGING is not None: file.write('\nAWS_STAGING=') file.write(eAWS_STAGING) if eAWS_PRODUCTION is not None: file.write('\nAWS_PRODUCTION=') file.write(eAWS_PRODUCTION) file.write('\n') # Process queue of gobs file.write('\n*/8 * * * * root python /code/app/manage.py queue --process-queue') # Every 5 minutes, double-check thing is done file.write('\n*/5 * * * * root python /code/app/manage.py thing --done') # Every 4 hours, do this file.write('\n8 */4 * * * root python /code/app/manage.py process_this') # etc. file.write('\n3 */4 * * * root python /ode/app/manage.py etc --silent') file.write('\n\n')if IS_AWS: args = ["cron","-f"] call(args)
And in supervisord.conf:
[program:cron]command = python /my/directory/runcron.pyautostart = trueautorestart = false