How to call a REST end point using Airflow DAG How to call a REST end point using Airflow DAG docker docker

How to call a REST end point using Airflow DAG


You need to consider both the Operator you are using and the underlying Hook which it uses to connect. The Hook fetches connection information from an Airflow Connection which is just a container used to store credentials and other connection information. You can configure Connections in the Airflow UI (using the Airflow UI -> Admin -> Connections).

So in this case, you need to first configure your HTTP Connection.

From the http_hook documentation:

http_conn_id (str) – connection that has the base API url i.e https://www.google.com/

It so happens that for the httpHook, you should configure the Connection by setting the host argument equal to the base_url of your endpoint: http://localhost:8084/.

Since your operator has the default http_conn_id, the hook will use the Airflow Connection called "http_default" in the Airflow UI. If you don't want to change the default one you can create another Airflow Connection using the Airflow UI, and pass the new conn_id argument to your operator.

See the source code to get a better idea how the Connection object is used.

Lastly, according to the http_operator documentation:

endpoint (str) – The relative part of the full url. (templated)

You should only be passing the relative part of your URL to the operator. The rest it will get from the underlying http_hook.

In this case, the value of endpoint for your Operator should be api/employees (not the full URL).

The Airflow project documentation is unfortunately not very clear in this case. Please consider contributing an improvement, they are always welcome :)


I think you need to set your ENV variable of connection string in your Dockerfile or docker run command:

ENV AIRFLOW__CORE__SQL_ALCHEMY_CONN my_conn_string

see this and this

Connections

The connection information to external systems is stored in the Airflow metadata database and managed in the UI (Menu -> Admin -> Connections) A conn_id is defined there and hostname / login / password / schema information attached to it. Airflow pipelines can simply refer to the centrally managed conn_id without having to hard code any of this information anywhere.

Many connections with the same conn_id can be defined and when that is the case, and when thehooks uses the get_connection method from BaseHook, Airflow will choose one connection randomly, allowing for some basic load balancing and fault tolerance when used in conjunction with retries.

Airflow also has the ability to reference connections via environment variables from the operating system. The environment variable needs to be prefixed with AIRFLOW_CONN_ to be considered a connection. When referencing the connection in the Airflow pipeline, the conn_id should be the name of the variable without the prefix. For example, if the conn_id is named POSTGRES_MASTER the environment variable should be named AIRFLOW_CONN_POSTGRES_MASTER. Airflow assumes the value returned from the environment variable to be in a URI format (e.g.postgres://user:password@localhost:5432/master).

see this

therefore you are now using the default:

Using connection to: id: http_default. Host: https://www.google.com/