A tutorial for get start on Apache Airflow by Serps aipong Navanuraksa
In this repository, I will show you how to get start on Apache Airflow. I will use Docker to run Airflow and PostgreSQL. I will also show you how to create a simple DAG and run it.
Apache Airflow is an open-source platform for authoring, scheduling and monitoring data and computing workflows. First developed by Airbnb, it is now under the Apache Software Foundation. Airflow uses Python to create workflows that can be easily scheduled and monitored. Airflow can run anything—it is completely agnostic to what you are running.
Benefits of Apache Airflow include:
- Ease of use—you only need a little python knowledge to get started.
- Open-source community—Airflow is free and has a large community of active users.
- Integrations—ready-to-use operators allow you to integrate Airflow with cloud platforms (Google, AWS, Azure, etc).
- Coding with standard Python—you can create flexible workflows using Python with no knowledge of additional technologies or frameworks.
- Graphical UI—monitor and manage workflows, check the status of ongoing and completed tasks.
First, you should understand the architecture of Apache Airflow.
Apache Airflow consists of 3 main components.
- Web Server
- Scheduler
- Worker
The Airflow platform lets you build and run workflows, which are represented as Directed Acyclic Graphs (DAGs). A sample DAG is shown in the diagram below.
A DAG contains Tasks (action items) and specifies the dependencies between them and the order in which they are executed. A Scheduler handles scheduled workflows and submits Tasks to the Executor, which runs them. The Executor pushes tasks to workers.
Other typical components of an Airflow architecture include a database to store state metadata, a web server used to inspect and debug Tasks and DAGs, and a folder containing the DAG files.
To get start with Apache Airflow, I will use Docker to run it.
I will use Docker Compose to run Airflow and PostgreSQL. You can see the docker-compose.yaml file in this link or go to folder script/docker-compose.yaml which is curretly newest for now (2024-01-15).
After you using curl to download the docker-compose.yaml file, you can run the following command to start Airflow and PostgreSQL.
docker-compose up -d All the service should be start following.
You can access web UI by using the following this URL.
Please enter the username and password as airflow and airflow respectively.
Note:
- You can see the example DAG in the web UI.
- You can turn off the example DAG by edit the
docker-compose.yamlfile. You can see the example DAG in the web UI.
airflow:
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.2}
environment:
- AIRFLOW__CORE__LOAD_EXAMPLES=FalseIf you want to stop the service, you can run the following command.
docker-compose downThis is the known issue that I found when I try to run Airflow with Docker Compose (which I will try to fix it / update more issue later).





