Machine Learning Research Framework

Author: John Swindell

Note: This repository represents the rapid prototyping environment used to validate quantitative hypotheses before deployment.

To read the full case study for this research focused model, see Full Research Framework Study.

For the production-grade infrastructure that scales these strategies, see the Full Data Pipeline Architecture.

1. Project Overview

This repository contains the end-to-end research framework used to develop, stress-test, and validate machine learning-driven trading strategies.

In a production environment, data pipelines are heavy, encompassing massive datasets and rigid validation gates. To solve the need for agility, this Research Framework was established as a lightweight "sandbox." It allows for the rapid iteration of alpha factors on a concentrated universe of assets (e.g., "Blue Chips") to validate core logic before investing engineering hours into production implementation.

The Lab-to-Production Workflow

Hypothesis: A new signal (e.g., "Momentum Factor") is proposed.
Lab Testing (This Repo): The strategy is tested here on a small, representative universe to optimize hyperparameters and identify fatal flaws (e.g., drawdown risk) quickly.
Validation: Strategies that survive the Signal Funnel analysis are marked for promotion.
Production: Validated logic is ported to production for a full strategy construction, and eventually, full-scale deployment.

2. The Research Pipeline

The framework is organized into a modular two-part workflow designed to separate data acquisition from modeling logic.

01_get_data.ipynb (The Ingestion Module)
- A standalone pipeline that fetches historical OHLCV data from the CoinGecko Pro API.
- Design Choice: This module intentionally restricts the data to a "Small Universe" of top-tier assets. This constraint reduces processing overhead, allowing for thousands of training iterations to be run per day during the hyperparameter tuning phase.
momentum_based_model.ipynb (The Research Core)
- The core laboratory notebook where the strategy logic is defined and tested.
- Feature Engineering: Vectorized generation of 13+ features (Momentum, Volatility, RSI, MACD) to create the input matrix.
- Point-in-Time Preprocessing: Implements a rigorous Expanding Window loop with Winsorization and RobustScaler. This ensures that no future information (lookahead bias) leaks into the training set—a critical requirement for financial ML.
- Walk-Forward Validation: Utilizes GridSearchCV with TimeSeriesSplit to optimize the CatBoost model across multiple distinct market regimes, preventing overfitting to a single bull run.

3. Case Study: Momentum Factor

To demonstrate the framework's capabilities, this repository includes the code for a momentum-based strategy that was iterated upon using this system.

The Diagnostic Process

The framework successfully identified a key weakness in pure momentum strategies—unacceptable risk during market reversals. Through the "Signal Funnel" diagnostic, we were able to implement and validate specific risk overlays:

Market Regime Filter: Prevents long entries when the broad market (BTC) is trading below its 200-day moving average.
Extreme Momentum Guardrail: Automatically disqualifies assets that have appreciated above the maximum momentum cap in 7 days, avoiding speculative bubbles.
Inverse Volatility Weighting: Dynamically sizes positions based on realized volatility, reducing portfolio beta.

Performance Validation

The framework's QuantStats integration provided the final "Go/No-Go" metrics. While the model showed positive expectancy, the diagnostics highlighted that it functions as a "High Beta" strategy, requiring further hedging before production release.

Cumulative Return: +104.62%
Sharpe Ratio: 1.03
Max Drawdown: -57.24%

Click here to view the full QuantStats report

Validation of Risk Logic: The most critical finding was the validation of the risk guardrails. The framework's logs showed that the Extreme Momentum Filter correctly rejected 6 specific trade signals that subsequently crashed, including avoiding a -14.6% single-day loss on DOGE. This proved the hypothesis that volatility-based filtering adds alpha by reducing drawdown.

Disclaimer

Proprietary Note: This repository is a sanitized version of the internal research framework used at Tekly Studio. Specific proprietary alpha factors, signal thresholds, and the full production universe have been removed or replaced with generic placeholders to protect intellectual property.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
01_get_data.ipynb		01_get_data.ipynb
README.md		README.md
momentum-based-model.ipynb		momentum-based-model.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Research Framework

1. Project Overview

The Lab-to-Production Workflow

2. The Research Pipeline

3. Case Study: Momentum Factor

The Diagnostic Process

Performance Validation

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Research Framework

1. Project Overview

The Lab-to-Production Workflow

2. The Research Pipeline

3. Case Study: Momentum Factor

The Diagnostic Process

Performance Validation

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages