GitHub - kaushikd24/statistical-arbitrage-engine: This repository contains an implement of a famous Quantitative Trading strategy known as "Statistical Arbitrage". For a detailed overview, please refer to the README file, for understanding the algorithm in depth, visit my website below.

Pairs Trading using Statistical Arbitrage

This repository contains the implementation of a modular and production ready strategy for exploiting statistical arbitrage amongst two stocks with high cointegration, which we would call pairs. This system automates the entire workflow : from collecting data from APIs to signal generation, with extensibility for machine learning based filtering and robust backtesting.

Our strategy achieves a median Compounded Annual Growth Rate (CAGR) of around 30%. We have used 47 Indian Equities with data ranging from 1/1/2018 to 31/03/2025.

{{This is not financial advice, the user of this strategy is encouraged to explore the markets themselves before considering deployment, as stable past returns do not guarentee stable future returns.}}

The techniques used in this pipeline are Quantitative Analysis, Machine Learning and Risk Management. The pipeline broadly consists of the following steps:

Data Collection
Pair Selection
Spread and Z-score Calculation
Signal Generation
Backtesting
Machine Learning to filter trades
Risk Management
Backtesting again -- but now with taking inputs from our Machine Learning Model and Risk Management Class.
Miscelleanous steps -- performance optimization and trades logging.

Overview of the Pipeline:

Step-1: Data Collection Source: Yahoo Finance

We collected data of 47 Indian Equities from Yahoo Finance, for this strategy we used daily data (frequency = 1 day). We started with OHLC (Open-High-Low-Close) data, but we used "Close" as an input for our algorithm.
Output: we created combined_df.csv as a combined dataframe-csv file for all 47 stock data.
We then dropped LODHA.NS from the data as LODHA's data began from 04/2021 and was corrupting the data.

Stay Tuned for more !

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
data		data
ml_gatekeeper		ml_gatekeeper
results_summary		results_summary
risk_management		risk_management
.gitignore		.gitignore
README.md		README.md
backtest_engine.py		backtest_engine.py
data_collection.py		data_collection.py
pair_selection.py		pair_selection.py
parameter_sweep.py		parameter_sweep.py
performance_logger.py		performance_logger.py
signal_cleaning.py		signal_cleaning.py
signal_generation.py		signal_generation.py
signal_logic.txt		signal_logic.txt
spread_calc.py		spread_calc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages