Reinforcement Learning Trading Strategies for Freqtrade FreqAI — achieved ~20% returns in 15 days of live trading.
pip install freqtrade gym pandas pandas-ta talib# Copy model files
cp freqaimodels/*.py ~/.freqtrade/user_data/freqaimodels/
# Copy strategy files
cp strategies/*.py ~/.freqtrade/strategies/
# Copy strategy configs
cp strategies/*.json ~/.freqtrade/strategies/{
"freqai": {
"enabled": true,
"period": 7,
"backtest_period_days": 45,
"train_period_days": 15,
"identifier": "my_rl_model",
"save_models": true
},
"pairlists": {
"method": "VolumePairList",
"number_assets": 10
}
}# Regression Strategy (stable returns, 3x leverage)
freqtrade trade --config config.json --strategy RLStrategy_Regression --freqaimodel RL_Model_Regression --dry-run
# Trend Strategy (excess returns, 1x leverage)
freqtrade trade --config config.json --strategy RLStrategy_Trend --freqaimodel RL_Model_Trend --dry-runTip: Models are automatically trained on first run and saved to
user_data/freqaimodels/.
~20% returns with Trend + Regression combined portfolio over 15 days
| Metric | Value |
|---|---|
| Backtest Annual Return | ~70% |
| Target Return | 2% |
| Leverage | 3x |
| Philosophy | High-frequency micro-profit + strict risk control |
| Metric | Value |
|---|---|
| Backtest Annual Return | ~50% |
| Target Return | 10% |
| Leverage | 1x |
| Philosophy | Trend following + generous volatility tolerance |
| Dimension | Trend Strategy | Regression Strategy |
|---|---|---|
| Design Philosophy | Trend Following | Mean Reversion |
| Timeframe | 2h | 1h |
| Leverage | 1x | 3x |
| Target Return | 10% | 2% |
| Stoploss Threshold | -15% | -20% |
| Backtest Annual Return | ~50% | ~70% |
| Core Mechanism | Potential Shaping | Multi-factor Reward Decomposition |
| Normalization | Online Normalization | Extended Observation Space |
The Regression strategy applies mean reversion thinking — capturing short-term price reversions and accumulating gains through high-frequency micro-profits.
Total Reward = Delta Reward + Reversion Reward + Quick Profit Bonus + Time Penalty + Stoploss Penalty
reward = delta_p * 2.5Rewards/punishes based on unrealized P&L changes each tick.
if unreal_pnl > 0:
return unreal_pnl * 0.4
return 0.0Extra reward only when profitable — encourages holding winning positions.
if unreal_pnl > 0.008 and hold_time <= 5:
return 0.015Rewards fast profits (0.8%+ in 5 ticks) — encourages quick entries and exits.
if hold_time <= min_holding_time:
return 0.003 # Positive reward during min holding
elif hold_time <= max_holding_time:
return -0.006 * (hold_time - min_holding_time) # Linear penalty
else:
return -0.10 # Heavy penalty for exceeding max holdingLinear time penalty ensures the model doesn't over-hold.
| Feature | Range | Purpose |
|---|---|---|
hold_ratio |
[0, 1] | Current holding time ratio |
unreal_pnl_norm |
[-1, 1] | Normalized unrealized P&L |
drawdown_ratio |
[0, 1] | Drawdown from peak |
position_val |
{-1, 0, 1} | Current position direction |
win_rate |
[0, 1] | Episode win rate |
profit_loss_delta |
[-1, 1] | Consecutive profit/loss state |
pnl_from_entry_norm |
[-1, 1] | Return relative to entry price |
-
Trailing Stoploss: Dynamic adjustment based on profit level
if current_profit > 0.08: return -0.008 elif current_profit > 0.05: return -0.015 elif current_profit > 0.015: return -0.04
-
Action Cooldown:
action_cooldown = 3prevents overtrading -
Whipsaw Penalty: Extra -0.025 penalty after 3 consecutive losses
The Trend strategy applies trend following thinking — using Potential Shaping to guide the model toward big trends, with relaxed volatility tolerance.
Potential Shaping is a reward shaping technique that introduces the concept of "potential energy":
def _compute_potential(self) -> float:
unreal = float(self.get_unrealized_profit() or 0.0)
return self.potential_coef * unreal # 0.01 * unrealized profit
def calculate_reward(self, action):
new_potential = self._compute_potential()
shaped = self.potential_gamma * new_potential - self.prev_potential # γ=0.90
self.prev_potential = new_potential
return original_reward + shapedWhy Potential Shaping?
- Raw RL struggles with sparse rewards in long-term dependencies
- Potential Shaping converts unrealized P&L changes into immediate rewards
- Guides the agent to focus on the trend of P&L changes, not absolute values
Trend uses a dynamic target line mechanism that divides the profit target into multiple steps:
# Step reward: triggered when profit breaks through current target line
if unreal >= self._current_target:
reward += unreal * 0.1 # One-time bonus
self._current_target += self.profit_target # Elevate target line
# Immediate reward/penalty
if unreal > base_line:
reward += excess_reward_coef * (unreal - base_line)
else:
reward -= excess_penalty_coef * (base_line - unreal)Core Idea: The model must "unlock" each step to earn step-based rewards — encourages holding through big trends.
Unlike Regression, Trend uses cross-episode accumulated statistics for normalization:
def _normalize_obs(self, obs):
# Exponential moving average for mean and variance
alpha = 1.0 / self.obs_count
self.obs_mean = (1 - alpha) * self.obs_mean + alpha * obs
self.obs_var = (1 - alpha) * self.obs_var + alpha * ((obs - self.obs_mean) ** 2)
# Standardize + inject step number
stage_num = self._current_target / self.profit_target
return normalized_obs + [stage_num]Key Points:
- Normalization statistics are not cleared on reset (cross-episode accumulation)
- Extra dimension injects current step number, letting the model sense "progress unlocked"
Trend allows greater volatility:
if current_profit > 0.15: return -0.02
elif current_profit > 0.12: return -0.03
# ... max allows -15% stoplossUses ADX indicator to determine trend strength, not RSI overbought/oversold:
long_condition = (
(adx > buy_adx) & # ADX > 20 indicates trend formation
(plus_di > minus_di) & # +DI > -DI indicates uptrend
(close > sma_20) & # Price above moving average
(rsi < 70) & (rsi > 30) # RSI not in extreme zone
)Trend (Excess Return Engine) + Regression (Stable Growth Engine)
- Trend: Captures big moves in trending markets, 10% target, 1x leverage
- Regression: Mean reversion in volatile markets, 2% target, 3x leverage
- Duration: ~15 days
- Combined portfolio returns: ~20%
- Regression strategy: steady consistent profits
- Trend strategy: excess returns during trending markets
Freqai_RL/
├── freqaimodels/ # RL Model definitions
│ ├── RL_Model.py # Base RL model (reference implementation)
│ ├── RL_Model_Regression.py # Regression strategy environment
│ └── RL_Model_Trend.py # Trend strategy environment
├── strategies/ # Freqtrade strategies
│ ├── RLStrategy.py # Base strategy (reference implementation)
│ ├── RLStrategy_Regression.py
│ └── RLStrategy_Trend.py
├── picture/ # Live trading screenshots
│ ├── trend.png # Trend strategy live trading
│ ├── regression.png # Regression strategy live trading
│ ├── simulate_run.png # Simulated portfolio run
│ ├── trend_log.png # Trend training log
│ └── regression_log.png # Regression training log
└── README.md
freqtrade backtest --config config.json --strategy RLStrategy_Regression --freqaimodel RL_Model_Regression --timerange=20230101-20231231freqtrade train --config config.json --strategy RLStrategy_Regression --freqaimodel RL_Model_Regression
freqtrade train --config config.json --strategy RLStrategy_Trend --freqaimodel RL_Model_TrendAdjust parameters in the strategy .json config files:
{
"strategy_name": "RLStrategy_Regression",
"parameters": {
"buy_rsi": {"value": 30},
"sell_rsi": {"value": 70}
}
}| Design | Implementation |
|---|---|
| Goal | Micro high-frequency profits |
| Leverage | 3x |
| Holding Time | Short (max 30 ticks) |
| Stoploss | Trailing dynamic stoploss |
| Reward | Multi-factor decomposition, real-time feedback |
| Design | Implementation |
|---|---|
| Goal | Capture big trends |
| Leverage | 1x |
| Holding Time | Long (hold until trend ends) |
| Stoploss | Relaxed (max -15%) |
| Reward | Potential Shaping + Step rewards |
MIT License — see LICENSE.




