How to run a celery worker with Django app scalable by AWS Elastic Beanstalk?

django amazon-web-services celery amazon-elastic-beanstalk django-celery

This is how I set up celery with django on elastic beanstalk with scalability working fine.

Please keep in mind that 'leader_only' option for container_commands works only on environment rebuild or deployment of the App. If service works long enough, leader node may be removed by Elastic Beanstalk. To deal with that, you may have to apply instance protection for your leader node. Check: http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html#instance-protection-instance

Add bash script for celery worker and beat configuration.

Add file root_folder/.ebextensions/files/celery_configuration.txt:

#!/usr/bin/env bash# Get django environment variablesceleryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g' | sed 's/%/%%/g'`celeryenv=${celeryenv%?}# Create celery configuraiton scriptceleryconf="[program:celeryd-worker]; Set full path to celery program if using virtualenvcommand=/opt/python/run/venv/bin/celery worker -A django_app --loglevel=INFOdirectory=/opt/python/current/appuser=nobodynumprocs=1stdout_logfile=/var/log/celery-worker.logstderr_logfile=/var/log/celery-worker.logautostart=trueautorestart=truestartsecs=10; Need to wait for currently executing tasks to finish at shutdown.; Increase this if you have very long running tasks.stopwaitsecs = 600; When resorting to send SIGKILL to the program to terminate it; send SIGKILL to its whole process group instead,; taking care of its children as well.killasgroup=true; if rabbitmq is supervised, set its priority higher; so it starts firstpriority=998environment=$celeryenv[program:celeryd-beat]; Set full path to celery program if using virtualenvcommand=/opt/python/run/venv/bin/celery beat -A django_app --loglevel=INFO --workdir=/tmp -S django --pidfile /tmp/celerybeat.piddirectory=/opt/python/current/appuser=nobodynumprocs=1stdout_logfile=/var/log/celery-beat.logstderr_logfile=/var/log/celery-beat.logautostart=trueautorestart=truestartsecs=10; Need to wait for currently executing tasks to finish at shutdown.; Increase this if you have very long running tasks.stopwaitsecs = 600; When resorting to send SIGKILL to the program to terminate it; send SIGKILL to its whole process group instead,; taking care of its children as well.killasgroup=true; if rabbitmq is supervised, set its priority higher; so it starts firstpriority=998environment=$celeryenv"# Create the celery supervisord conf scriptecho "$celeryconf" | tee /opt/python/etc/celery.conf# Add configuration script to supervisord conf (if not there already)if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf  then  echo "[include]" | tee -a /opt/python/etc/supervisord.conf  echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conffi# Reread the supervisord configsupervisorctl -c /opt/python/etc/supervisord.conf reread# Update supervisord in cache without restarting all servicessupervisorctl -c /opt/python/etc/supervisord.conf update# Start/Restart celeryd through supervisordsupervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beatsupervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker

Take care about script execution during deployment, but only on main node (leader_only: true).Add file root_folder/.ebextensions/02-python.config:

container_commands:  04_celery_tasks:    command: "cat .ebextensions/files/celery_configuration.txt > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"    leader_only: true  05_celery_tasks_run:    command: "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"    leader_only: true

Beat is configurable without need of redeployment, with separate django applications: https://pypi.python.org/pypi/django_celery_beat.
Storing task results is good idea to: https://pypi.python.org/pypi/django_celery_beat

File requirements.txt

celery==4.0.0django_celery_beat==1.0.1django_celery_results==1.0.1pycurl==7.43.0 --global-option="--with-nss"

Configure celery for Amazon SQS broker (Get your desired endpoint from list: http://docs.aws.amazon.com/general/latest/gr/rande.html)root_folder/django_app/settings.py:

...CELERY_RESULT_BACKEND = 'django-db'CELERY_BROKER_URL = 'sqs://%s:%s@' % (aws_access_key_id, aws_secret_access_key)# Due to error on lib region N Virginia is used temporarily. please set it on Ireland "eu-west-1" after fix.CELERY_BROKER_TRANSPORT_OPTIONS = {    "region": "eu-west-1",    'queue_name_prefix': 'django_app-%s-' % os.environ.get('APP_ENV', 'dev'),    'visibility_timeout': 360,    'polling_interval': 1}...

Celery configuration for django django_app app

Add file root_folder/django_app/celery.py:

from __future__ import absolute_import, unicode_literalsimport osfrom celery import Celery# set the default Django settings module for the 'celery' program.os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'django_app.settings')app = Celery('django_app')# Using a string here means the worker don't have to serialize# the configuration object to child processes.# - namespace='CELERY' means all celery-related configuration keys#   should have a `CELERY_` prefix.app.config_from_object('django.conf:settings', namespace='CELERY')# Load task modules from all registered Django app configs.app.autodiscover_tasks()

Modify file root_folder/django_app/__init__.py:

from __future__ import absolute_import, unicode_literals# This will make sure the app is always imported when# Django starts so that shared_task will use this app.from django_app.celery import app as celery_app__all__ = ['celery_app']

Check also:

How do you run a worker with AWS Elastic Beanstalk? (solution without scalability)
Pip Requirements.txt --global-option causing installation errors with other packages. "option not recognized" (solution for problems coming from obsolate pip on elastic beanstalk that cannto deal with global options for properly solving pycurl dependency)

django amazon-web-services celery amazon-elastic-beanstalk django-celery

This is how I extended the answer by @smentek to allow for multiple worker instances and a single beat instance - same thing applies where you have to protect your leader. (I still don't have an automated solution for that yet).

Please note that envvar updates to EB via the EB cli or the web interface are not relflected by celery beat or workers until app server restart has taken place. This caught me off guard once.

A single celery_configuration.sh file outputs two scripts for supervisord, note that celery-beat has autostart=false, otherwise you end up with many beats after an instance restart:

# get django environment variablesceleryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g' | sed 's/%/%%/g'`celeryenv=${celeryenv%?}# create celery beat config scriptcelerybeatconf="[program:celeryd-beat]; Set full path to celery program if using virtualenvcommand=/opt/python/run/venv/bin/celery beat -A lexvoco --loglevel=INFO --workdir=/tmp -S django --pidfile /tmp/celerybeat.piddirectory=/opt/python/current/appuser=nobodynumprocs=1stdout_logfile=/var/log/celery-beat.logstderr_logfile=/var/log/celery-beat.logautostart=falseautorestart=truestartsecs=10; Need to wait for currently executing tasks to finish at shutdown.; Increase this if you have very long running tasks.stopwaitsecs = 10; When resorting to send SIGKILL to the program to terminate it; send SIGKILL to its whole process group instead,; taking care of its children as well.killasgroup=true; if rabbitmq is supervised, set its priority higher; so it starts firstpriority=998environment=$celeryenv"# create celery worker config scriptceleryworkerconf="[program:celeryd-worker]; Set full path to celery program if using virtualenvcommand=/opt/python/run/venv/bin/celery worker -A lexvoco --loglevel=INFOdirectory=/opt/python/current/appuser=nobodynumprocs=1stdout_logfile=/var/log/celery-worker.logstderr_logfile=/var/log/celery-worker.logautostart=trueautorestart=truestartsecs=10; Need to wait for currently executing tasks to finish at shutdown.; Increase this if you have very long running tasks.stopwaitsecs = 600; When resorting to send SIGKILL to the program to terminate it; send SIGKILL to its whole process group instead,; taking care of its children as well.killasgroup=true; if rabbitmq is supervised, set its priority higher; so it starts firstpriority=999environment=$celeryenv"# create files for the scriptsecho "$celerybeatconf" | tee /opt/python/etc/celerybeat.confecho "$celeryworkerconf" | tee /opt/python/etc/celeryworker.conf# add configuration script to supervisord conf (if not there already)if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf  then  echo "[include]" | tee -a /opt/python/etc/supervisord.conf  echo "files: celerybeat.conf celeryworker.conf" | tee -a /opt/python/etc/supervisord.conffi# reread the supervisord config/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf reread# update supervisord in cache without restarting all services/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf update

Then in container_commands we only restart beat on leader:

container_commands:  # create the celery configuration file  01_create_celery_beat_configuration_file:    command: "cat .ebextensions/files/celery_configuration.sh > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && sed -i 's/\r$//' /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"  # restart celery beat if leader  02_start_celery_beat:    command: "/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat"    leader_only: true  # restart celery worker  03_start_celery_worker:    command: "/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker"

django amazon-web-services celery amazon-elastic-beanstalk django-celery

If someone is following smentek's answer and getting the error:

05_celery_tasks_run: /usr/bin/env bash does not exist.

know that, if you are using Windows, your problem might be that the "celery_configuration.txt" file has WINDOWS EOL when it should have UNIX EOL. If using Notepad++, open the file and click on "Edit > EOL Conversion > Unix (LF)". Save, redeploy, and error is no longer there.

Also, a couple of warnings for really-amateur people like me:

Be sure to include "django_celery_beat" and "django_celery_results" in your "INSTALLED_APPS" in settings.py file.
To check celery errors, connect to your instance with "eb ssh" and then "tail -n 40 /var/log/celery-worker.log" and "tail -n 40 /var/log/celery-beat.log" (where "40" refers to the number of lines you want to read from the file, starting from the end).

Hope this helps someone, it would've saved me some hours!

CodeHunter

How to run a celery worker with Django app scalable by AWS Elastic Beanstalk?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last