mlfactor-python

Introduction

This repository is some reading notes and Python implementation for book "Machine Learning for Factor Investing" by Silkdust.

For better compatibility and better accordance with original book, this repository is completed in English.

The author of this book offers some solutions for Python Notebooks. See detailed info at https://www.mlfactor.com/python.html. However, they did not offer executable files (.ipynb). For your convenience, this project offers both jupyter notebooks (.ipynb) and a PDF version produced by nbconvert and XeLaTeX engine.

Finally, I deeply appreciate Guillaume Coqueret and Tony Guida for their devotions to this book. Published in 2020, this book really offers some up-to-date machine learning techniques and their applications to factor investing. They also offer a wide range of literatures in the book for interested readers. As a summary note, this project does not include all literature reviews in the book, and I encourage interested readers explore them in the references section or in the original book.

Hope you have fun reading this project!

Timelines and Contents

Chapter Name	Notebook Ref	PDF Download	Status
Full Version	Here	Download	☑Finished
Chapter 1 - Notations and Data	Here	Download	☑Finished
Chapter 2 - Introduction	Here	Download	☑Finished
Chapter 3 - Factor Investing and Asset Pricing Anomalies	Here	Download	☑Finished
Chapter 4 - Data Preprocessing	Here	Download	☑Finished
Chapter 5 - Penalized Regressions and Sparse Hedging for MVP	Here	Download	☑Finished¹
Chapter 6 - Tree-based Methods	Here	Download	☑Finished
Chapter 7 - Neural Networks	Here	Download	☑Finished
Chapter 8 - Support Vector Machines	Here	Download	☑Finished
Chapter 9 - Bayesian Methods	Here	Download	☑Finished²
Chapter 10 - Validating and Tuning	Here	Download	☑Finished
Chapter 11 - Ensemble Models	Here	Download	☑Finished³
Chapter 12 - Portfolio Backtesting	Here	Download	☑Finished
Chapter 13 - Interpretability	Here	Download	☑Finished
Chapter 14 - Causality and Non-stationarity	Here	Download	☑Finished⁴
Chapter 15 - Unsupervised Learning	Here	Download	☑Finished
Chapter 16 - Reinforcement Learning	Here	Download	☑Finished
References	Here	Download	☑Finished

Updated on November 21, 2023: The notes for all 16 chapters have been finished and an integrated version has been generated with the nbmerge tool for readers to enjoy this project efficiently! The notes in Python are, after all, some humble work. However, this project also serves as a milestone for my first open-source project, and I will try my best to make more contributions.

Finally, I would make my deep appreciation again to the authors of this book. Hope you all enjoy it.

Depedencies

Dependencies packages have be released in the repository as requirements.txt and will be updated regularly. Generally speaking, you have to install Python3 (version 3.8 or later preferred) and jupyter notebook in your device at first. To get your notebooks work properly in your device, run the following command after cloning this repo:

git clone https://github.com/Silkdust/mlfactor-python.git
cd mlfactor-python/
pip install -r ./requirements.txt

If you revise the coding or use additional packages, there is a simple command with pipreqsnb to remake the requirements.txt after cloning this repo as follows:

pip install pipreqs
pip install pipreqsnb
pipreqsnb --force ./ --encoding=utf-8

For better I/O speed, the data_ml object is stored in .pkl format. However, this costs a lot of storage space and is not pushed to the repo. You may run the following commands to generate it under the /data/ folder:

import pandas as pd
import pyreadr
# data = pd.read_excel("./data/data_ml.xlsx") # Not Recommended. Too Slow!
result = pyreadr.read_r('./data/data_ml.RData')
data = result['data_ml']
data.to_pickle("./data/data_ml.pkl")

Acknowledgements

Main reference: Coqueret, G., & Guida, T. (2020). Machine Learning for Factor Investing: R Version. Chapman and Hall/CRC.
Other references: see here. You can also find them on their website here.
This project is completely free and uses CC0-1.0 license. We encourage reproducibility. See here for details.

There are some minor differences between ElasticNet in sklearn and glmnet in R. See here for details. ↩
Two less familiarized packages in Python are faciliated in this Chapter to complete the Bayesian linear regression and the BART. For the first one, we provide the source code inside the notebook so that there is virtually no need for you to install the package conjugate_bayes (which is messy). For the second package bartpy, please use this command to install the BartPy package to avoid from the following issues #37 and #51: pip install git+https://github.com/JakeColtman/bartpy.git@pytorch --upgrade. ↩
You may find some model caches used in this Chapter (or possibly, previous and future chapters) under the /models/ folder. ↩
In this Chapter, the causal additive models (CAM) and the PC algorithms are implemented in Python with the aid of the package cdt, which requires the R environment and packages CAM, (k)pcalg and RCIT. The configuration can be complex, so we strongly recommend readers to get the trained models under the /models/ folder (graph_cam.pkl and graph_pc.pkl). Interested readers can refer to this website for installation guides. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
data		data
images		images
models		models
notes-pdfver		notes-pdfver
.gitignore		.gitignore
Chapter1-Notations.ipynb		Chapter1-Notations.ipynb
Chapter10-Validating-and-Tuning.ipynb		Chapter10-Validating-and-Tuning.ipynb
Chapter11-Ensemble-Models.ipynb		Chapter11-Ensemble-Models.ipynb
Chapter12-Portfolio-Backtesting.ipynb		Chapter12-Portfolio-Backtesting.ipynb
Chapter13-Interpretability.ipynb		Chapter13-Interpretability.ipynb
Chapter14-Causality-and-non-Stationarity.ipynb		Chapter14-Causality-and-non-Stationarity.ipynb
Chapter15-Unsupervised-Learning.ipynb		Chapter15-Unsupervised-Learning.ipynb
Chapter16-Reinforcement-Learning.ipynb		Chapter16-Reinforcement-Learning.ipynb
Chapter2-Introduction.ipynb		Chapter2-Introduction.ipynb
Chapter3-Factor Investing and Asset Pricing Anomalies.ipynb		Chapter3-Factor Investing and Asset Pricing Anomalies.ipynb
Chapter4-Data Preprocessing.ipynb		Chapter4-Data Preprocessing.ipynb
Chapter5-Penalized Regressions and Sparse Hedging for MVP.ipynb		Chapter5-Penalized Regressions and Sparse Hedging for MVP.ipynb
Chapter6-Tree-based Methods.ipynb		Chapter6-Tree-based Methods.ipynb
Chapter7-Neural-Networks.ipynb		Chapter7-Neural-Networks.ipynb
Chapter8-Support-Vector-Machines.ipynb		Chapter8-Support-Vector-Machines.ipynb
Chapter9-Bayesian-Methods.ipynb		Chapter9-Bayesian-Methods.ipynb
LICENSE		LICENSE
MLFactor_Python_Notes.ipynb		MLFactor_Python_Notes.ipynb
README.md		README.md
REFERENCES.md		REFERENCES.md
[Chapman & Hall_CRC Financial Mathematics Series] Guillaume Coqueret, Tony Guida - Machine Learning for Factor Investing_ R Version (2020, Chapman and Hall_CRC) - libgen.li.pdf		[Chapman & Hall_CRC Financial Mathematics Series] Guillaume Coqueret, Tony Guida - Machine Learning for Factor Investing_ R Version (2020, Chapman and Hall_CRC) - libgen.li.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mlfactor-python

Introduction

Timelines and Contents

Depedencies

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mlfactor-python

Introduction

Timelines and Contents

Depedencies

Acknowledgements

Footnotes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages