Reinforcement-learning portfolio allocator that outputs daily weights over a basket of ETFs. Includes clean baselines (Equal-Weight, Risk Parity, Markowitz), a deterministic backtester with costs/constraints, and PPO training via Stable-Baselines3.
# 1) Create venv and install deps
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 2) Download data (adjust tickers/dates as desired)
python scripts/download_data.py --tickers SPY QQQ IWM TLT GLD EEM --start 2005-01-01 --end 2025-01-01
# 3) Train PPO agent
python train_rl.py --tickers SPY QQQ IWM TLT GLD EEM --train_start 2005-01-01 --train_end 2016-12-31 \
--val_start 2017-01-01 --val_end 2019-12-31 \
--lookback 30 --cost_bps 10 --total_timesteps 400000
# 4) Evaluate PPO vs baselines on test set (OOS)
python eval_all.py --tickers SPY QQQ IWM TLT GLD EEM --test_start 2020-01-01 --test_end 2025-01-01 \
--lookback 30 --cost_bps 10Outputs:
artifacts/models/ppo_policy.zip(trained policy)artifacts/reports/metrics_test.json(Sharpe, MDD, etc.)artifacts/charts/equity_curves.png(PPO vs EW/RP/MV)
- Long-only with softmax projection by default; change
allow_short=Truein env to experiment. - Transaction costs modeled as fixed bps per unit turnover.
- Deterministic backtest; single source of truth
data/prices.parquet.