Deep Reinforcement Learning trading strategies combining Double DQN with Transformer Attention and Multi-Factor Models inspired by Fama-French. Features adaptive risk management and volatility targeting.
Excellent adaptability on growth stocks with strong momentum characteristics
This repository contains two sophisticated algorithmic trading strategies designed for quantitative trading:
| Strategy | Approach | Risk Profile | Key Technology |
|---|---|---|---|
| Conservative | Multi-Factor Model | Low-Medium | Weighted Signal Aggregation |
| Radical | Deep Reinforcement Learning | Medium-High | Double DQN + Transformer |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SIGNAL GENERATION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Trend Analysis ββββββββββββββββββββββββββββββββ 35% β
β Momentum Indicators βββββββββββββββββββββββββββ 25% β
β RSI (Relative Strength Index) βββββββββββββββββ 20% β
β MACD (Moving Average Convergence Divergence) ββ 15% β
β Bollinger Bands βββββββββββββββββββββββββββββββ 5% β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β WEIGHTED AGGREGATION β
β β β
β FINAL TRADING SIGNAL β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Features:
- Volatility Targeting: Dynamically adjusts position size based on 15% annualized volatility target
- Drawdown Protection: Reduces exposure when drawdown exceeds 10%
- ATR-based Stops: Stop-loss at 2x ATR, take-profit at 4x ATR
- Time-based Exit: Maximum holding period of 150 bars
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 24-DIMENSIONAL STATE VECTOR β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β [1-6] Multi-timeframe Momentum (2,3,5,8,13,21 periods) β
β [7-10] Moving Average Position (5,10,20,40 periods) β
β [11-14] Technical Indicators (Vol, RSI, MACD, CCI) β
β [15-18] Volume Features (ratio, trend, correlation, vol) β
β [19-21] Breakout & Trend Strength β
β [22-24] Acceleration, Volatility Change, Position PnL β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TRANSFORMER SELF-ATTENTION β
β β
β Q = XΒ·Wq K = XΒ·Wk V = XΒ·Wv β
β β
β Attention(Q,K,V) = softmax(QK^T/βd)Β·V β
β β
β Output = X + 0.5 Γ Attention(Q,K,V) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DOUBLE DQN NETWORK β
β β
β Input(24) β Dense(128) β Dense(64) β Dense(32) β (9) β
β β β β β
β tanh tanh tanh β
β β
β Actions: [-4, -3, -2, -1, 0, +1, +2, +3, +4] β
β (Short) (Hold) (Long) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PRIORITIZED EXPERIENCE REPLAY β
β β
β Priority = |TD-error|^Ξ± (Ξ± = 0.6) β
β Sampling = Priority / Ξ£(Priority) β
β IS Weight = (N Γ P(i))^(-Ξ²) (Ξ² β 1.0) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Features:
- Double DQN: Reduces Q-value overestimation using separate target network
- Transformer Attention: Enhances feature representation with self-attention mechanism
- Prioritized Replay: Samples important experiences more frequently (Ξ±=0.6, Ξ²=0.4β1.0)
- Ξ΅-greedy Exploration: Starts at 25%, decays to 5% minimum
- Dynamic Trailing Stop: 1.8x ATR with profit lock-in at 70%
DRL-MultiFactorTrading/
βββ Conservative_strategy_clean.py # Multi-Factor strategy (streamlined)
βββ Radical_strategy_clean.py # DRL strategy (streamlined)
βββ requirements.txt # Python dependencies
βββ LICENSE # MIT License
βββ README.md # This file
βββ .gitignore # Git ignore rules
βββ .flake8 # Linting configuration
βββ .github/
β βββ workflows/
β βββ ci.yml # CI pipeline (Python 3.9-3.12)
β
βββ radical-01810HK.png # Performance: Xiaomi (01810.HK)
βββ radical-00700HK.png # Performance: Tencent (00700.HK)
βββ radical-03690HK.png # Performance: Meituan (03690.HK)
# Install dependencies
pip install -r requirements.txtBoth strategies are designed for the AlgoAPI backtesting framework:
from Radical_strategy_clean import AlgoEvent
# Initialize strategy
strategy = AlgoEvent()
# Configure with market event
mEvt = {
'subscribeList': ['01810HK'] # Hong Kong Xiaomi stock
}
strategy.start(mEvt)| Parameter | Default | Description |
|---|---|---|
base_position_pct |
0.35 | Base position size (35% of capital) |
max_position_pct |
0.55 | Maximum position size |
target_volatility |
0.15 | Target annualized volatility (15%) |
stop_loss_atr |
2.0 | Stop-loss in ATR multiples |
take_profit_atr |
4.0 | Take-profit in ATR multiples |
min_gap |
8 | Minimum bars between trades |
| Parameter | Default | Description |
|---|---|---|
base_position_pct |
0.40 | Base position size (40% of capital) |
max_position_pct |
0.70 | Maximum position size |
epsilon |
0.25 | Initial exploration rate |
epsilon_min |
0.05 | Minimum exploration rate |
gamma |
0.97 | Discount factor |
learning_rate |
0.005 | Network learning rate |
buffer_size |
2000 | Replay buffer capacity |
batch_size |
64 | Training batch size |
The signal is computed as a weighted sum of five independent factors:
Final_Signal = Ξ£(Factor_i Γ Weight_i Γ Strength_i)
where:
- Trend: Weight = 0.35, based on MA crossovers (8/20/40)
- Momentum: Weight = 0.25, based on 5/10-bar returns
- RSI: Weight = 0.20, oversold (<35) / overbought (>65)
- MACD: Weight = 0.15, histogram direction
- Bollinger: Weight = 0.05, band breakouts
| Action | Signal | Strength | Interpretation |
|---|---|---|---|
| 0 | -4 | 0.55 | Strong Short |
| 1 | -3 | 0.45 | Medium Short |
| 2 | -2 | 0.35 | Weak Short |
| 3 | -1 | 0.25 | Very Weak Short |
| 4 | 0 | 0.00 | Hold |
| 5 | +1 | 0.25 | Very Weak Long |
| 6 | +2 | 0.35 | Weak Long |
| 7 | +3 | 0.45 | Medium Long |
| 8 | +4 | 0.55 | Strong Long |
Both strategies implement comprehensive risk controls:
# Volatility-adjusted position sizing
if realized_volatility > target_volatility:
position_size *= target_volatility / realized_volatility
# Drawdown protection
if drawdown > 0.10:
position_size *= (1 - drawdown * 0.6)- Stop-Loss: ATR-based dynamic stop (2.0x for Conservative, 1.8x for Radical)
- Take-Profit: ATR-based target (4.0x for Conservative, 5.0x for Radical)
- Trailing Stop: Locks in 50-70% of maximum profit
- Time Stop: Maximum holding period (150 bars Conservative, 60 bars Radical)
- 600+ iterations on Conservative Strategy (parameter optimization, factor weight tuning)
- 400+ experiments on Radical Strategy (network architecture search, hyperparameter tuning)
- 1000+ total backtests across multiple assets and timeframes
- 4+ years of historical data (2020-2024) covering multiple market regimes
- β COVID-19 crash and recovery (2020)
- β Bull market conditions (2021)
- β Bear market stress test (2022)
- β Recovery rally (2023)
- β Recent market conditions (2024)
- Hong Kong Equities: Tencent (00700.HK), Xiaomi (01810.HK), Meituan (03690.HK)
-
Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3-56.
-
Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
-
Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. AAAI Conference on Artificial Intelligence.
-
Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
-
Schaul, T., et al. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952.
This software is for educational and research purposes only.
- Past performance does not guarantee future results
- Trading involves substantial risk of loss
- The authors are not responsible for any financial losses
- Always conduct thorough backtesting before live trading
- Consult with a qualified financial advisor
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Made with β€οΈ for Quantitative Trading Research


