EMR Cluster Creation using Airflow dag run, Once task is done EMR will be terminated EMR Cluster Creation using Airflow dag run, Once task is done EMR will be terminated hadoop hadoop

EMR Cluster Creation using Airflow dag run, Once task is done EMR will be terminated


Absolutely, that would be the most efficient use of resources. Let me warn you: there are a lot of details in this; I'll try to list as many as would get you going. I encourage you to add your own comprehensive answer listing any problems that you encountered and the workaround (once you are through this)


Regarding cluster creation / termination


Regarding job submission



Check my implementation, DAG will create emr cluster and run the spark job against the data in s3 and terminate automatically once done.

https://beyondexperiment.com/vijayravichandran06/aws-emr-orchestrate-with-airflow/


The best way to do this is probably to have a node at the root of your Airflow DAG that creates the EMR cluster, and then another node at the very end of the DAG that spins the cluster down after all of the other nodes have completed.