Skip to content

serpsaipong-nav/airflow_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

airflow_tutorial


A tutorial for get start on Apache Airflow by Serps aipong Navanuraksa

In this repository, I will show you how to get start on Apache Airflow. I will use Docker to run Airflow and PostgreSQL. I will also show you how to create a simple DAG and run it.

Apache Airflow is an open-source platform for authoring, scheduling and monitoring data and computing workflows. First developed by Airbnb, it is now under the Apache Software Foundation. Airflow uses Python to create workflows that can be easily scheduled and monitored. Airflow can run anything—it is completely agnostic to what you are running.

Benefits of Apache Airflow include:

  • Ease of use—you only need a little python knowledge to get started.
  • Open-source community—Airflow is free and has a large community of active users.
  • Integrations—ready-to-use operators allow you to integrate Airflow with cloud platforms (Google, AWS, Azure, etc).
  • Coding with standard Python—you can create flexible workflows using Python with no knowledge of additional technologies or frameworks.
  • Graphical UI—monitor and manage workflows, check the status of ongoing and completed tasks.

First, you should understand the architecture of Apache Airflow.

Airflow Architecuture


airflow_architecture.png

Apache Airflow consists of 3 main components.

  • Web Server
  • Scheduler
  • Worker

The Airflow platform lets you build and run workflows, which are represented as Directed Acyclic Graphs (DAGs). A sample DAG is shown in the diagram below.

DAGs-example.png

A DAG contains Tasks (action items) and specifies the dependencies between them and the order in which they are executed. A Scheduler handles scheduled workflows and submits Tasks to the Executor, which runs them. The Executor pushes tasks to workers.

Other typical components of an Airflow architecture include a database to store state metadata, a web server used to inspect and debug Tasks and DAGs, and a folder containing the DAG files.

airflow-architecture-2.png

To get start with Apache Airflow, I will use Docker to run it.

Docker Compose


I will use Docker Compose to run Airflow and PostgreSQL. You can see the docker-compose.yaml file in this link or go to folder script/docker-compose.yaml which is curretly newest for now (2024-01-15).

After you using curl to download the docker-compose.yaml file, you can run the following command to start Airflow and PostgreSQL.

docker-compose up -d 

All the service should be start following.

docker-compose

You can access web UI by using the following this URL.

apache-airflow-web-ui.png

Please enter the username and password as airflow and airflow respectively.

airflow-web-ui-with-example-dag.png

Note:

  • You can see the example DAG in the web UI.
  • You can turn off the example DAG by edit the docker-compose.yaml file. You can see the example DAG in the web UI.
  airflow:
    image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.2}
    environment:
      - AIRFLOW__CORE__LOAD_EXAMPLES=False

If you want to stop the service, you can run the following command.

docker-compose down

Known Issue


This is the known issue that I found when I try to run Airflow with Docker Compose (which I will try to fix it / update more issue later).

It's not working well with Poetry


Reference


About

A Tutorial for get start on Apache Airflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors