Airbnb Airflow using all system resources
I have also tried everything I could to get the CPU usage down and Matthew Housley's advice regarding MIN_FILE_PROCESS_INTERVAL was what did the trick.
At least until airflow 1.10 came around... then the CPU usage went through the roof again.
So here is everything I had to do to get airflow to work well on a standard digital ocean droplet with 2gb of ram and 1 vcpu:
1. Scheduler File Processing
Prevent airflow from reloading the dags all the time and set:AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL=60
2. Fix airflow 1.10 scheduler bug
The AIRFLOW-2895 bug in airflow 1.10, causes high CPU load, because the scheduler keeps looping without a break.
It's already fixed in master and will hopefully be included in airflow 1.10.1, but it could take weeks or months until its released. In the meantime this patch solves the issue:
--- jobs.py.orig 2018-09-08 15:55:03.448834310 +0000+++ jobs.py 2018-09-08 15:57:02.847751035 +0000@@ -564,6 +564,7 @@ self.num_runs = num_runs self.run_duration = run_duration+ self._processor_poll_interval = 1.0 self.do_pickle = do_pickle super(SchedulerJob, self).__init__(*args, **kwargs)@@ -1724,6 +1725,8 @@ loop_end_time = time.time() self.log.debug("Ran scheduling loop in %.2f seconds", loop_end_time - loop_start_time)+ self.log.debug("Sleeping for %.2f seconds", self._processor_poll_interval)+ time.sleep(self._processor_poll_interval) # Exit early for a test mode if processor_manager.max_runs_reached():
Apply it with patch -d /usr/local/lib/python3.6/site-packages/airflow/ < af_1.10_high_cpu.patch;
3. RBAC webserver high CPU load
If you upgraded to use the new RBAC webserver UI, you may also notice that the webserver is using a lot of CPU persistently.
For some reason the RBAC interface uses a lot of CPU on startup. If you are running on a low powered server, this can cause a very slow webserver startup and permanently high CPU usage.
I have documented this bug as AIRFLOW-3037. To solve it you can adjust the config:
AIRFLOW__WEBSERVER__WORKERS=2 # 2 * NUM_CPU_CORES + 1AIRFLOW__WEBSERVER__WORKER_REFRESH_INTERVAL=1800 # Restart workers every 30min instead of 30secondsAIRFLOW__WEBSERVER__WEB_SERVER_WORKER_TIMEOUT=300 #Kill workers if they don't start within 5min instead of 2min
With all of these tweaks my airflow is using only a few % of CPU during idle time on a digital ocean standard droplet with 1 vcpu and 2gb of ram.
I just ran into an issue like this. Airflow was consuming roughly a full vCPU in a t2.xlarge instance, with the vast majority of this coming from the scheduler container. Checking the scheduler logs, I could see that it was processing my single DAG more than once a second even though it only runs once a day.
I found that the MIN_FILE_PROCESS_INTERVAL
was set to the default value of 0
, so the scheduler was looping over the DAG. I changed the process interval to 65
seconds, and Airflow now uses less than 10 percent of a vCPU in a t2.medium instance.
Try to change the below config in airflow.cfg
# after how much time a new DAGs should be picked up from the filesystemmin_file_process_interval = 0# How many seconds to wait between file-parsing loops to prevent the logs from being spammed.min_file_parsing_loop_time = 1