Skip to content

Silkdust/mlfactor-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlfactor-python

Introduction

This repository is some reading notes and Python implementation for book "Machine Learning for Factor Investing" by Silkdust.

For better compatibility and better accordance with original book, this repository is completed in English.

The author of this book offers some solutions for Python Notebooks. See detailed info at https://www.mlfactor.com/python.html. However, they did not offer executable files (.ipynb). For your convenience, this project offers both jupyter notebooks (.ipynb) and a PDF version produced by nbconvert and XeLaTeX engine.

Finally, I deeply appreciate Guillaume Coqueret and Tony Guida for their devotions to this book. Published in 2020, this book really offers some up-to-date machine learning techniques and their applications to factor investing. They also offer a wide range of literatures in the book for interested readers. As a summary note, this project does not include all literature reviews in the book, and I encourage interested readers explore them in the references section or in the original book.

Hope you have fun reading this project!

Timelines and Contents

Chapter Name Notebook Ref PDF Download Status
Full Version Here Download ☑Finished
Chapter 1 - Notations and Data Here Download ☑Finished
Chapter 2 - Introduction Here Download ☑Finished
Chapter 3 - Factor Investing and Asset Pricing Anomalies Here Download ☑Finished
Chapter 4 - Data Preprocessing Here Download ☑Finished
Chapter 5 - Penalized Regressions and Sparse Hedging for MVP Here Download ☑Finished1
Chapter 6 - Tree-based Methods Here Download ☑Finished
Chapter 7 - Neural Networks Here Download ☑Finished
Chapter 8 - Support Vector Machines Here Download ☑Finished
Chapter 9 - Bayesian Methods Here Download ☑Finished2
Chapter 10 - Validating and Tuning Here Download ☑Finished
Chapter 11 - Ensemble Models Here Download ☑Finished3
Chapter 12 - Portfolio Backtesting Here Download ☑Finished
Chapter 13 - Interpretability Here Download ☑Finished
Chapter 14 - Causality and Non-stationarity Here Download ☑Finished4
Chapter 15 - Unsupervised Learning Here Download ☑Finished
Chapter 16 - Reinforcement Learning Here Download ☑Finished
References Here Download ☑Finished

Updated on November 21, 2023: The notes for all 16 chapters have been finished and an integrated version has been generated with the nbmerge tool for readers to enjoy this project efficiently! The notes in Python are, after all, some humble work. However, this project also serves as a milestone for my first open-source project, and I will try my best to make more contributions.

Finally, I would make my deep appreciation again to the authors of this book. Hope you all enjoy it.

Depedencies

Dependencies packages have be released in the repository as requirements.txt and will be updated regularly. Generally speaking, you have to install Python3 (version 3.8 or later preferred) and jupyter notebook in your device at first. To get your notebooks work properly in your device, run the following command after cloning this repo:

git clone https://github.com/Silkdust/mlfactor-python.git
cd mlfactor-python/
pip install -r ./requirements.txt

If you revise the coding or use additional packages, there is a simple command with pipreqsnb to remake the requirements.txt after cloning this repo as follows:

pip install pipreqs
pip install pipreqsnb
pipreqsnb --force ./ --encoding=utf-8

For better I/O speed, the data_ml object is stored in .pkl format. However, this costs a lot of storage space and is not pushed to the repo. You may run the following commands to generate it under the /data/ folder:

import pandas as pd
import pyreadr
# data = pd.read_excel("./data/data_ml.xlsx") # Not Recommended. Too Slow!
result = pyreadr.read_r('./data/data_ml.RData')
data = result['data_ml']
data.to_pickle("./data/data_ml.pkl")

Acknowledgements

  • Main reference: Coqueret, G., & Guida, T. (2020). Machine Learning for Factor Investing: R Version. Chapman and Hall/CRC.
  • Other references: see here. You can also find them on their website here.
  • This project is completely free and uses CC0-1.0 license. We encourage reproducibility. See here for details.

Footnotes

  1. There are some minor differences between ElasticNet in sklearn and glmnet in R. See here for details.

  2. Two less familiarized packages in Python are faciliated in this Chapter to complete the Bayesian linear regression and the BART. For the first one, we provide the source code inside the notebook so that there is virtually no need for you to install the package conjugate_bayes (which is messy). For the second package bartpy, please use this command to install the BartPy package to avoid from the following issues #37 and #51: pip install git+https://github.com/JakeColtman/bartpy.git@pytorch --upgrade.

  3. You may find some model caches used in this Chapter (or possibly, previous and future chapters) under the /models/ folder.

  4. In this Chapter, the causal additive models (CAM) and the PC algorithms are implemented in Python with the aid of the package cdt, which requires the R environment and packages CAM, (k)pcalg and RCIT. The configuration can be complex, so we strongly recommend readers to get the trained models under the /models/ folder (graph_cam.pkl and graph_pc.pkl). Interested readers can refer to this website for installation guides.

About

Reading notes and Python implementation for book "Machine Learning for Factor Investing" by Silkdust

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors