This project demonstrates an end-to-end modern data engineering pipeline built using AWS, Snowflake, and dbt (Data Build Tool). It focuses on transforming raw Airbnb data into clean, analytics-ready datasets using a layered architecture (Bronze → Silver → Gold).
- ☁️ AWS – Data storage and orchestration (e.g., S3)
- ❄️ Snowflake – Cloud data warehouse for scalable analytics
- 🔧 dbt (Data Build Tool) – Data transformation and modeling
- 🐍 Python – Environment and dependency management
- 🔁 Git & GitHub – Version control
The project follows a Medallion Architecture:
- Raw ingestion from source systems
- Minimal transformations
- Tables:
bronze_bookings,bronze_hosts,bronze_listings
- Data cleaning and standardization
- Handling nulls, data types, and basic transformations
- Tables:
silver_bookings,silver_hosts,silver_listings
- Business-level aggregations and analytics-ready models
- Fact and dimension tables
- Tables:
fact,obt
- Raw Airbnb data is stored in AWS (S3)
- Data is loaded into Snowflake staging tables
- dbt transforms data through Bronze → Silver → Gold layers
- Final models are ready for BI tools and analytics
- Modular dbt models with clear layer separation
- Reusable macros for transformations
- Source definitions and testing
- Snapshotting for historical tracking
- Scalable cloud-based architecture
airbnb_snowflake_dbt_project/
│
├── models/
│ ├── bronze/
│ ├── silver/
│ ├── gold/
│ └── sources/
│
├── macros/
├── snapshots/
├── tests/
├── dbt_project.yml
├── profiles.yml
dbt debug # Check connection
dbt compile # Compile models
dbt run # Run transformations
dbt test # Run testsThis project showcases how to:
- Build scalable data pipelines
- Transform raw data into insights
- Apply best practices in modern data engineering
- Add orchestration using AWS Airflow
- Integrate BI tools (Power BI / Tableau)
- Implement CI/CD for dbt pipelines