This project is an end-to-end data engineering pipeline that fetches real-time cryptocurrency data (BTC/USDT) from an external API, processes it using Python, stores it in Snowflake, and orchestrates the workflow using Apache Airflow.
The goal of this project is to demonstrate practical implementation of ETL pipelines, cloud data warehousing, and workflow automation.
Binance API → Python ETL → Snowflake → Airflow Scheduler
- Python
- Snowflake (Cloud Data Warehouse)
- Apache Airflow (Workflow Orchestration)
- Pandas
- Docker
- Fetches historical crypto data from Binance API
- Uses REST API (
/api/v3/klines) - Supports configurable symbol, interval, and limit
- Converts raw API response into structured format
- Cleans and formats columns
- Converts timestamps into proper datetime format
- Ensures correct data types for analysis
- Loads processed data into Snowflake
- Automatically creates table if not exists
- Uses efficient bulk loading (
write_pandas)
- Automates the pipeline execution
- Schedules periodic data ingestion
- Manages task dependencies
- Handles failures and retries
FlowState_Inventory/
│
├── src/
│ ├── data_ingestion.py
│ ├── load_to_snowflake.py
│ ├── utils/
│ │ ├── logger.py
│ │ └── exception.py
│
├── airflow/
│ ├── dags/
│ └── docker-compose.yaml
│
├── data/
│ └── btc_data.csv
│
├── .env
├── requirements.txt
└── README.md
Create a .env file in the root directory:
SNOWFLAKE_USER=your_username
SNOWFLAKE_PASSWORD=your_password
SNOWFLAKE_ACCOUNT=your_account
SNOWFLAKE_WAREHOUSE=your_warehouse
SNOWFLAKE_DATABASE=your_database
SNOWFLAKE_SCHEMA=your_schema
git clone <your-repo-url>
cd FlowState_Inventory
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python src/data_ingestion.py
python src/load_to_snowflake.py
cd airflow
docker-compose up
- Automated crypto data ingestion
- Secure credential management using
.env - Scalable cloud data storage
- Workflow scheduling with Airflow
- Modular and production-like code structure
- Add support for multiple cryptocurrencies
- Implement real-time streaming pipeline
- Add data validation layer
- Integrate alerting/monitoring system
Kunal Kumar
Give it a star ⭐ on GitHub!