This project integrates FastAPI, dbt-core, dbt-spark, Spark Thrift Server, AWS Glue Catalog, and EMR to read Iceberg table data as a dbt model.
- FastAPI endpoints to trigger dbt runs
- dbt models reading from Iceberg tables via Spark Thrift Server
- AWS Glue Catalog as the metastore
- EMR cluster for Spark execution
- AWS account with Glue Catalog and S3 bucket
- EMR cluster with Spark, Spark Thrift Server, and Iceberg enabled
- Iceberg table registered in Glue Catalog
Set the following environment variables for dbt:
SPARK_HOST: EMR master DNS or Thrift Server hostSPARK_PORT: Thrift Server port (default: 10000)SPARK_USER: Your usernameSPARK_SCHEMA: Target schema/database
pip install -r requirements.txtuvicorn app.main:app --reloadUse the /dbt/run-iceberg endpoint to run the Iceberg model.
See models/sample_iceberg_model.sql for a sample model reading from an Iceberg table.
- Update
profiles.ymland the dbt model with your actual Glue Catalog, S3 bucket, and Iceberg table names. - Ensure network access from FastAPI/dbt to Spark Thrift Server.