Skip to content

Zolo-Hallucinators/Signal-Extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

64 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Signal Extraction ML Pipeline

A Snowflake-powered end-to-end machine learning pipeline for financial signal generation and prediction.


Hackathon Runner-Up Python Snowflake Machine Learning Gradient Regressor License Status


Signal Extraction ML Pipeline Cover

πŸ“– Overview

πŸ† Hackathon Runner-Up in Snowflake's Dev Premier League!

This project was developed as part of the Snowflake Hackathon, and secured a Runner‑Up position by building a complete data ingestion β†’ transformation β†’ prediction β†’ visualization pipeline.

It leverages Snowflake’s Data Cloud, Snowpark, and Python ML libraries to extract meaningful trading signals from financial and news data.

The goal: generate predictive buy/sell signals by combining market price movements and news sentiment analysis β€” all within a scalable Snowflake architecture.


πŸ† Achievements

  • Runner-Up: Secured Runner-Up place in Snowflake's Dev Premier League Hackathon for innovative ML pipeline design.
  • End-to-End Solution: Fully automated pipeline from data ingestion to visualization, deployed on Snowflake.
  • Real-World Impact: Demonstrated predictive trading signals with explainability and backtesting.

✨ Key Features

Feature Description
πŸ“Š Data Ingestion Automated collection of stock prices via Alpha Vantage API and financial news via News API
πŸ”§ Feature Engineering Technical indicators (EMA, RSI, MACD, Volatility) + sentiment features (polarity, subjectivity, entity signals, recency)
πŸ” Explainability SHAP-based model explanations & feature importance for predictions and sentiment analysis
πŸ“ˆ Backtesting (Under Development) Event-driven engine with PnL, Sharpe ratio, and drawdown analysis
βš™οΈ Orchestration Snowflake Tasks & Streams (Airflow-compatible) for end-to-end pipeline automation (ingestion β†’ transform β†’ training β†’ scoring β†’ backtests)
πŸ€– ML Pipeline XGBoost + Gradient Regressor with Snowpark; AI_CLASSIFY for sentiment labeling and an annotated dataset maintained for live news to improve and validate classifiers.
πŸ“± Interactive Dashboard Streamlit app with signal explorer, explainability charts, and performance metrics

🧰 Tech Stack

Layer Technology Purpose
Data Storage Snowflake (SIGNAL_EXTRACTION_DB) Centralized data warehouse
Compute Snowflake Warehouse (COMPUTE_WH) Scalable compute for ETL + ML
Ingestion Python, REST APIs Pulls stock + news data
Transformation Snowflake SQL, Snowpark Data cleaning and feature creation
ML Python (XGBoost, Pandas), Snowflake Libs Model training & prediction
Visualization Streamlit Application & Reporting layer

🧩 High Level Data Pipeline (Architecture)

Architecture

πŸ—‚οΈ Use Case & Repository Structure

πŸ§‘β€πŸ’Ό Use Case Diagram (click to expand)

Use Case Diagram


πŸ—‚οΈ Repository Structure (click to expand)
πŸ“¦ signal-extraction-ml-pipeline
β”œβ”€β”€ πŸ“ src/
β”‚   β”œβ”€β”€ 1_ingestion/
β”‚   β”‚   β”œβ”€β”€ 1_ingest_market_api.ipynb
β”‚   β”‚   β”œβ”€β”€ 1_ingest_news_api.ipynb
β”‚   β”‚   β”œβ”€β”€ market_config.json
β”‚   β”‚   └── news_config.json
β”‚   β”œβ”€β”€ 2_transformation_and_feature_engineering/
β”‚   β”‚   └── 1_transformation_and_feature_engineering_market_data.sql
β”‚   β”œβ”€β”€ 3_ml/
β”‚   β”‚   β”œβ”€β”€ 1_analyze_news_data.ipynb
β”‚   β”‚   β”œβ”€β”€ 1_predict_market_data.ipynb
β”‚   β”‚   β”œβ”€β”€ environment.yml
β”‚   β”‚   └── market_config.json
β”‚   β”œβ”€β”€ 4_frontend/
β”‚   β”‚   β”œβ”€β”€ streamlit_app.py
β”‚   β”‚   β”œβ”€β”€ environment.yml
β”‚   β”‚   └── market_config.json
β”‚   β”œβ”€β”€ πŸ“ infra/
β”‚   β”œβ”€β”€ πŸ“ docs/
β”‚   β”œβ”€β”€ πŸ“ env/  # Virtual environment (auto-generated)
β”‚   └── πŸ“ utils/
β”œβ”€β”€ [requirements.txt]
β”œβ”€β”€ [README.md]
└── LICENSE

πŸ“ˆ Visualization & Outputs

πŸ“ˆ Visualizations (click to expand)

πŸ“ˆ Price Prediction Graph

Price Prediction Graph Price Prediction Graph

πŸ” Price Explainability

Price Explainability

πŸ“Š Model Performance Summary

Model Performance Summary

πŸ“° News Sentiment Ranked

News Sentiment Ranked


πŸ“Š Underlying Data Outputs (click to expand)

πŸ“‹ Underlying Chart Data

Underlying Chart Data

πŸ“¦ Indicators JSON Dump

Indicators JSON Dump


βš™οΈ Setup & Installation

βš™οΈ Steps (click to expand)

1️⃣ Prerequisites

  • Python 3.9+
  • Snowflake account with appropriate roles
  • Alpha Vantage & News API keys
  • Snowflake Python Connector installed

2️⃣ Clone the Repository

git clone https://github.com/<your-username>/signal-extraction-ml-pipeline.git
cd signal-extraction-ml-pipeline

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Configure Environment

Edit market_config.json & news_config.json:

{
  "...": "...",
  "snowflake": {
    "account": "XXXXXX",
    "user": "YOUR_USERNAME",
    "password": "YOUR_PASSWORD",
    "role": "ACCOUNTADMIN",
    "warehouse": "COMPUTE_WH",
    "database": "SIGNAL_EXTRACTION_DB",
    "schema": "<CODE>"
  }
}

5️⃣ Run Pipeline

# Infra Setup
snowsql infra/{code}
# Ingestion Execution
python src/1_ingestion/1_ingest_market_api.ipynb
python src/1_ingestion/1_ingest_news_api.py
# Transformation & Feature Generation Execution
execute 2_transformation_and_feature_engineering/1_transform_and_feature_engineering_market_data.sql
python 2_transformation_and_feature_engineering/1_generate_news_articles_features.ipynb
# ML Execution
python src/3_ml/1_analyze_news_data.sql
python src/3_ml/1_predict_market_data.ipynb
python src/3_ml/2_backtest_market_data.ipynb
# Visualization Application Setup
python src/4_frontend/streamlit_app.py

🎬 Submission & Showcase Resources

πŸ§‘β€πŸ’» Author

Aravind Suresh
Data Engineer @ GE Aerospace | ML & Cloud Enthusiast
LinkedIn GitHub

Abirami Sadasivam
SDE @ VISA | ML & Cloud Enthusiast
LinkedIn GitHub

Sidhanth LS
Data Scientist @ Freshworks
LinkedIn GitHub

🀝 Connect with the team on LinkedIn for collaborations!


πŸͺΆ License

This project is licensed under the MIT License.


🏁 Acknowledgments

Special thanks to:

  • Snowflake for its developer ecosystem
  • Alpha Vantage and News API for financial data sources
  • Hackathon Mentors and collaborators for their support (Snowflake - The Dev Premier League)

⭐ If you like this project, give it a star on GitHub β€” your support keeps it growing!

About

A Snowflake-powered end-to-end machine learning pipeline for financial signal generation and prediction.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors