Skip to content

he-yufeng/DRL-MultiFactorTrading

Repository files navigation

DRL-MultiFactorTrading

CI Python NumPy License: MIT

Deep Reinforcement Learning trading strategies combining Double DQN with Transformer Attention and Multi-Factor Models inspired by Fama-French. Features adaptive risk management and volatility targeting.

English | δΈ­ζ–‡

πŸ“Š Performance Visualizations

Excellent adaptability on growth stocks with strong momentum characteristics

Xiaomi Corporation (01810.HK) - DRL Learning in Action

Radical Strategy - Xiaomi

Tencent Holdings (00700.HK) - High Returns, Higher Volatility

Radical Strategy - Tencent

Meituan (03690.HK) - Growth Stock Performance

Radical Strategy - Meituan

πŸ“‹ Overview

This repository contains two sophisticated algorithmic trading strategies designed for quantitative trading:

Strategy Approach Risk Profile Key Technology
Conservative Multi-Factor Model Low-Medium Weighted Signal Aggregation
Radical Deep Reinforcement Learning Medium-High Double DQN + Transformer

πŸ—οΈ Architecture

Strategy 1: Conservative Multi-Factor Model

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SIGNAL GENERATION                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Trend Analysis ──────────────────────────────── 35%        β”‚
β”‚  Momentum Indicators ─────────────────────────── 25%        β”‚
β”‚  RSI (Relative Strength Index) ───────────────── 20%        β”‚
β”‚  MACD (Moving Average Convergence Divergence) ── 15%        β”‚
β”‚  Bollinger Bands ─────────────────────────────── 5%         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                 WEIGHTED AGGREGATION                        β”‚
β”‚                        ↓                                    β”‚
β”‚              FINAL TRADING SIGNAL                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features:

  • Volatility Targeting: Dynamically adjusts position size based on 15% annualized volatility target
  • Drawdown Protection: Reduces exposure when drawdown exceeds 10%
  • ATR-based Stops: Stop-loss at 2x ATR, take-profit at 4x ATR
  • Time-based Exit: Maximum holding period of 150 bars

Strategy 2: Radical Deep Reinforcement Learning

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              24-DIMENSIONAL STATE VECTOR                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  [1-6]   Multi-timeframe Momentum (2,3,5,8,13,21 periods)   β”‚
β”‚  [7-10]  Moving Average Position (5,10,20,40 periods)       β”‚
β”‚  [11-14] Technical Indicators (Vol, RSI, MACD, CCI)         β”‚
β”‚  [15-18] Volume Features (ratio, trend, correlation, vol)   β”‚
β”‚  [19-21] Breakout & Trend Strength                          β”‚
β”‚  [22-24] Acceleration, Volatility Change, Position PnL      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              TRANSFORMER SELF-ATTENTION                     β”‚
β”‚                                                             β”‚
β”‚         Q = XΒ·Wq    K = XΒ·Wk    V = XΒ·Wv                    β”‚
β”‚                                                             β”‚
β”‚         Attention(Q,K,V) = softmax(QK^T/√d)Β·V               β”‚
β”‚                                                             β”‚
β”‚         Output = X + 0.5 Γ— Attention(Q,K,V)                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 DOUBLE DQN NETWORK                          β”‚
β”‚                                                             β”‚
β”‚    Input(24) β†’ Dense(128) β†’ Dense(64) β†’ Dense(32) β†’ (9)     β”‚
β”‚                    ↓           ↓           ↓                β”‚
β”‚                  tanh        tanh        tanh               β”‚
β”‚                                                             β”‚
β”‚    Actions: [-4, -3, -2, -1, 0, +1, +2, +3, +4]             β”‚
β”‚             (Short)     (Hold)      (Long)                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           PRIORITIZED EXPERIENCE REPLAY                     β”‚
β”‚                                                             β”‚
β”‚    Priority = |TD-error|^Ξ±        (Ξ± = 0.6)                 β”‚
β”‚    Sampling = Priority / Ξ£(Priority)                        β”‚
β”‚    IS Weight = (N Γ— P(i))^(-Ξ²)    (Ξ² β†’ 1.0)                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features:

  • Double DQN: Reduces Q-value overestimation using separate target network
  • Transformer Attention: Enhances feature representation with self-attention mechanism
  • Prioritized Replay: Samples important experiences more frequently (Ξ±=0.6, Ξ²=0.4β†’1.0)
  • Ξ΅-greedy Exploration: Starts at 25%, decays to 5% minimum
  • Dynamic Trailing Stop: 1.8x ATR with profit lock-in at 70%

πŸ“ Project Structure

DRL-MultiFactorTrading/
β”œβ”€β”€ Conservative_strategy_clean.py  # Multi-Factor strategy (streamlined)
β”œβ”€β”€ Radical_strategy_clean.py       # DRL strategy (streamlined)
β”œβ”€β”€ requirements.txt                 # Python dependencies
β”œβ”€β”€ LICENSE                          # MIT License
β”œβ”€β”€ README.md                        # This file
β”œβ”€β”€ .gitignore                       # Git ignore rules
β”œβ”€β”€ .flake8                          # Linting configuration
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── ci.yml                   # CI pipeline (Python 3.9-3.12)
β”‚
β”œβ”€β”€ radical-01810HK.png             # Performance: Xiaomi (01810.HK)
β”œβ”€β”€ radical-00700HK.png             # Performance: Tencent (00700.HK)
└── radical-03690HK.png             # Performance: Meituan (03690.HK)

πŸš€ Quick Start

Prerequisites

# Install dependencies
pip install -r requirements.txt

Usage

Both strategies are designed for the AlgoAPI backtesting framework:

from Radical_strategy_clean import AlgoEvent

# Initialize strategy
strategy = AlgoEvent()

# Configure with market event
mEvt = {
    'subscribeList': ['01810HK']  # Hong Kong Xiaomi stock
}
strategy.start(mEvt)

Strategy Parameters

Conservative Strategy

Parameter Default Description
base_position_pct 0.35 Base position size (35% of capital)
max_position_pct 0.55 Maximum position size
target_volatility 0.15 Target annualized volatility (15%)
stop_loss_atr 2.0 Stop-loss in ATR multiples
take_profit_atr 4.0 Take-profit in ATR multiples
min_gap 8 Minimum bars between trades

Radical Strategy

Parameter Default Description
base_position_pct 0.40 Base position size (40% of capital)
max_position_pct 0.70 Maximum position size
epsilon 0.25 Initial exploration rate
epsilon_min 0.05 Minimum exploration rate
gamma 0.97 Discount factor
learning_rate 0.005 Network learning rate
buffer_size 2000 Replay buffer capacity
batch_size 64 Training batch size

πŸ“Š Signal Generation

Multi-Factor Model (Conservative)

The signal is computed as a weighted sum of five independent factors:

Final_Signal = Ξ£(Factor_i Γ— Weight_i Γ— Strength_i)

where:
  - Trend:     Weight = 0.35, based on MA crossovers (8/20/40)
  - Momentum:  Weight = 0.25, based on 5/10-bar returns
  - RSI:       Weight = 0.20, oversold (<35) / overbought (>65)
  - MACD:      Weight = 0.15, histogram direction
  - Bollinger: Weight = 0.05, band breakouts

DQN Action Space (Radical)

Action Signal Strength Interpretation
0 -4 0.55 Strong Short
1 -3 0.45 Medium Short
2 -2 0.35 Weak Short
3 -1 0.25 Very Weak Short
4 0 0.00 Hold
5 +1 0.25 Very Weak Long
6 +2 0.35 Weak Long
7 +3 0.45 Medium Long
8 +4 0.55 Strong Long

πŸ›‘οΈ Risk Management

Both strategies implement comprehensive risk controls:

Position Sizing

# Volatility-adjusted position sizing
if realized_volatility > target_volatility:
    position_size *= target_volatility / realized_volatility

# Drawdown protection
if drawdown > 0.10:
    position_size *= (1 - drawdown * 0.6)

Exit Conditions

  1. Stop-Loss: ATR-based dynamic stop (2.0x for Conservative, 1.8x for Radical)
  2. Take-Profit: ATR-based target (4.0x for Conservative, 5.0x for Radical)
  3. Trailing Stop: Locks in 50-70% of maximum profit
  4. Time Stop: Maximum holding period (150 bars Conservative, 60 bars Radical)

πŸ”¬ Research Methodology

Development Process

  • 600+ iterations on Conservative Strategy (parameter optimization, factor weight tuning)
  • 400+ experiments on Radical Strategy (network architecture search, hyperparameter tuning)
  • 1000+ total backtests across multiple assets and timeframes
  • 4+ years of historical data (2020-2024) covering multiple market regimes

Testing Period Coverage

  • βœ… COVID-19 crash and recovery (2020)
  • βœ… Bull market conditions (2021)
  • βœ… Bear market stress test (2022)
  • βœ… Recovery rally (2023)
  • βœ… Recent market conditions (2024)

Instruments Tested

  • Hong Kong Equities: Tencent (00700.HK), Xiaomi (01810.HK), Meituan (03690.HK)

πŸ“š References

Academic Papers

  1. Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3-56.

  2. Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.

  3. Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. AAAI Conference on Artificial Intelligence.

  4. Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

  5. Schaul, T., et al. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952.

⚠️ Disclaimer

This software is for educational and research purposes only.

  • Past performance does not guarantee future results
  • Trading involves substantial risk of loss
  • The authors are not responsible for any financial losses
  • Always conduct thorough backtesting before live trading
  • Consult with a qualified financial advisor

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Made with ❀️ for Quantitative Trading Research

About

Deep Reinforcement Learning trading strategies: Double DQN with Transformer Attention + Multi-Factor Model (Fama-French inspired). Features adaptive risk management and volatility targeting.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages