A Snowflake-powered end-to-end machine learning pipeline for financial signal generation and prediction.
π Hackathon Runner-Up in Snowflake's Dev Premier League!
This project was developed as part of the Snowflake Hackathon, and secured a RunnerβUp position by building a complete data ingestion β transformation β prediction β visualization pipeline.
It leverages Snowflakeβs Data Cloud, Snowpark, and Python ML libraries to extract meaningful trading signals from financial and news data.
The goal: generate predictive buy/sell signals by combining market price movements and news sentiment analysis β all within a scalable Snowflake architecture.
- Runner-Up: Secured Runner-Up place in Snowflake's Dev Premier League Hackathon for innovative ML pipeline design.
- End-to-End Solution: Fully automated pipeline from data ingestion to visualization, deployed on Snowflake.
- Real-World Impact: Demonstrated predictive trading signals with explainability and backtesting.
| Feature | Description |
|---|---|
| π Data Ingestion | Automated collection of stock prices via Alpha Vantage API and financial news via News API |
| π§ Feature Engineering | Technical indicators (EMA, RSI, MACD, Volatility) + sentiment features (polarity, subjectivity, entity signals, recency) |
| π Explainability | SHAP-based model explanations & feature importance for predictions and sentiment analysis |
| π Backtesting | (Under Development) Event-driven engine with PnL, Sharpe ratio, and drawdown analysis |
| βοΈ Orchestration | Snowflake Tasks & Streams (Airflow-compatible) for end-to-end pipeline automation (ingestion β transform β training β scoring β backtests) |
| π€ ML Pipeline | XGBoost + Gradient Regressor with Snowpark; AI_CLASSIFY for sentiment labeling and an annotated dataset maintained for live news to improve and validate classifiers. |
| π± Interactive Dashboard | Streamlit app with signal explorer, explainability charts, and performance metrics |
| Layer | Technology | Purpose |
|---|---|---|
| Data Storage | Snowflake (SIGNAL_EXTRACTION_DB) | Centralized data warehouse |
| Compute | Snowflake Warehouse (COMPUTE_WH) | Scalable compute for ETL + ML |
| Ingestion | Python, REST APIs | Pulls stock + news data |
| Transformation | Snowflake SQL, Snowpark | Data cleaning and feature creation |
| ML | Python (XGBoost, Pandas), Snowflake Libs | Model training & prediction |
| Visualization | Streamlit | Application & Reporting layer |
ποΈ Repository Structure (click to expand)
π¦ signal-extraction-ml-pipeline
βββ π src/
β βββ 1_ingestion/
β β βββ 1_ingest_market_api.ipynb
β β βββ 1_ingest_news_api.ipynb
β β βββ market_config.json
β β βββ news_config.json
β βββ 2_transformation_and_feature_engineering/
β β βββ 1_transformation_and_feature_engineering_market_data.sql
β βββ 3_ml/
β β βββ 1_analyze_news_data.ipynb
β β βββ 1_predict_market_data.ipynb
β β βββ environment.yml
β β βββ market_config.json
β βββ 4_frontend/
β β βββ streamlit_app.py
β β βββ environment.yml
β β βββ market_config.json
β βββ π infra/
β βββ π docs/
β βββ π env/ # Virtual environment (auto-generated)
β βββ π utils/
βββ [requirements.txt]
βββ [README.md]
βββ LICENSE
π Visualizations (click to expand)
βοΈ Steps (click to expand)
- Python 3.9+
- Snowflake account with appropriate roles
- Alpha Vantage & News API keys
- Snowflake Python Connector installed
git clone https://github.com/<your-username>/signal-extraction-ml-pipeline.git
cd signal-extraction-ml-pipelinepip install -r requirements.txtEdit market_config.json & news_config.json:
{
"...": "...",
"snowflake": {
"account": "XXXXXX",
"user": "YOUR_USERNAME",
"password": "YOUR_PASSWORD",
"role": "ACCOUNTADMIN",
"warehouse": "COMPUTE_WH",
"database": "SIGNAL_EXTRACTION_DB",
"schema": "<CODE>"
}
}# Infra Setup
snowsql infra/{code}
# Ingestion Execution
python src/1_ingestion/1_ingest_market_api.ipynb
python src/1_ingestion/1_ingest_news_api.py
# Transformation & Feature Generation Execution
execute 2_transformation_and_feature_engineering/1_transform_and_feature_engineering_market_data.sql
python 2_transformation_and_feature_engineering/1_generate_news_articles_features.ipynb
# ML Execution
python src/3_ml/1_analyze_news_data.sql
python src/3_ml/1_predict_market_data.ipynb
python src/3_ml/2_backtest_market_data.ipynb
# Visualization Application Setup
python src/4_frontend/streamlit_app.py- Streamlit Dashboard/Application (30 Oct 2025): View Live Application here
- Pitch deck / PPT (30 Oct 2025): View PPT
- Demo video submission record (5 Oct 2025): Watch here
- Idea Submission PPT (5 Oct 2025): View PPT
Aravind Suresh
Data Engineer @ GE Aerospace | ML & Cloud Enthusiast
Abirami Sadasivam
SDE @ VISA | ML & Cloud Enthusiast
Sidhanth LS
Data Scientist @ Freshworks
π€ Connect with the team on LinkedIn for collaborations!
This project is licensed under the MIT License.
Special thanks to:
- Snowflake for its developer ecosystem
- Alpha Vantage and News API for financial data sources
- Hackathon Mentors and collaborators for their support (Snowflake - The Dev Premier League)
β If you like this project, give it a star on GitHub β your support keeps it growing!









