This project aims to provision end-to-end pipeline lineage with Airbyte, Airflow, dbt, BigQuery and DataHub as the Data Catalog/Lineage platform. Also ensuring sibling relationships are not duplicate (e.g: Airbyte destination table for a given source matches the same entity as dbt source table)
- Spin up DataHub
docker compose -f datahub/compose.yaml up -d- Spin up Airflow
docker compose -f airflow/compose.yaml up --build --force-recreate -d- Spin up Airbyte with abctl
brew tap airbytehq/tap
brew install abctl
abctl local install- Fetch Airbyte credentials
abctl local credentials- Build the dbt-bigquery Docker Image
docker build -t dbt-bigquery:latest dbt/ --no-cache- Build the datahub-ingest Docker Image
docker build -t datahub-ingest:latest datahub/ --no-cache- Terraform
Follow the instructions on terraform for guidelines on how to run/apply
Refer to the specific project folder on how to start each component individually