Freqai_RL

Reinforcement Learning Trading Strategies for Freqtrade FreqAI — achieved ~20% returns in 15 days of live trading.

Quick Start (5 Minutes)

1. Install Dependencies

pip install freqtrade gym pandas pandas-ta talib

2. Copy Files to Freqtrade Directory

# Copy model files
cp freqaimodels/*.py ~/.freqtrade/user_data/freqaimodels/

# Copy strategy files
cp strategies/*.py ~/.freqtrade/strategies/

# Copy strategy configs
cp strategies/*.json ~/.freqtrade/strategies/

3. Configure config.json

{
  "freqai": {
    "enabled": true,
    "period": 7,
    "backtest_period_days": 45,
    "train_period_days": 15,
    "identifier": "my_rl_model",
    "save_models": true
  },
  "pairlists": {
    "method": "VolumePairList",
    "number_assets": 10
  }
}

4. Train & Run

# Regression Strategy (stable returns, 3x leverage)
freqtrade trade --config config.json --strategy RLStrategy_Regression --freqaimodel RL_Model_Regression --dry-run

# Trend Strategy (excess returns, 1x leverage)
freqtrade trade --config config.json --strategy RLStrategy_Trend --freqaimodel RL_Model_Trend --dry-run

Tip: Models are automatically trained on first run and saved to user_data/freqaimodels/.

Live Performance

Simulated Trading (15 Days)

~20% returns with Trend + Regression combined portfolio over 15 days

Regression Strategy

Metric	Value
Backtest Annual Return	~70%
Target Return	2%
Leverage	3x
Philosophy	High-frequency micro-profit + strict risk control

Trend Strategy

Metric	Value
Backtest Annual Return	~50%
Target Return	10%
Leverage	1x
Philosophy	Trend following + generous volatility tolerance

Strategy Overview

Dimension	Trend Strategy	Regression Strategy
Design Philosophy	Trend Following	Mean Reversion
Timeframe	2h	1h
Leverage	1x	3x
Target Return	10%	2%
Stoploss Threshold	-15%	-20%
Backtest Annual Return	~50%	~70%
Core Mechanism	Potential Shaping	Multi-factor Reward Decomposition
Normalization	Online Normalization	Extended Observation Space

Regression Strategy Deep Dive

Philosophy: High-frequency Micro-profit + Strict Risk Control

The Regression strategy applies mean reversion thinking — capturing short-term price reversions and accumulating gains through high-frequency micro-profits.

Core Reward Mechanism

Total Reward = Delta Reward + Reversion Reward + Quick Profit Bonus + Time Penalty + Stoploss Penalty

1. Delta PNL Reward

reward = delta_p * 2.5

Rewards/punishes based on unrealized P&L changes each tick.

2. Reversion Reward

if unreal_pnl > 0:
    return unreal_pnl * 0.4
return 0.0

Extra reward only when profitable — encourages holding winning positions.

3. Quick Profit Bonus

if unreal_pnl > 0.008 and hold_time <= 5:
    return 0.015

Rewards fast profits (0.8%+ in 5 ticks) — encourages quick entries and exits.

4. Time Penalty

if hold_time <= min_holding_time:
    return 0.003          # Positive reward during min holding
elif hold_time <= max_holding_time:
    return -0.006 * (hold_time - min_holding_time)  # Linear penalty
else:
    return -0.10           # Heavy penalty for exceeding max holding

Linear time penalty ensures the model doesn't over-hold.

5. Extended Observation Space (+7 Dimensions)

Feature	Range	Purpose
`hold_ratio`	[0, 1]	Current holding time ratio
`unreal_pnl_norm`	[-1, 1]	Normalized unrealized P&L
`drawdown_ratio`	[0, 1]	Drawdown from peak
`position_val`	{-1, 0, 1}	Current position direction
`win_rate`	[0, 1]	Episode win rate
`profit_loss_delta`	[-1, 1]	Consecutive profit/loss state
`pnl_from_entry_norm`	[-1, 1]	Return relative to entry price

Risk Control Design

Trailing Stoploss: Dynamic adjustment based on profit level

if current_profit > 0.08: return -0.008
elif current_profit > 0.05: return -0.015
elif current_profit > 0.015: return -0.04

Action Cooldown: action_cooldown = 3 prevents overtrading
Whipsaw Penalty: Extra -0.025 penalty after 3 consecutive losses

Trend Strategy Deep Dive

Philosophy: Chase Big Trends + Allow Volatility

The Trend strategy applies trend following thinking — using Potential Shaping to guide the model toward big trends, with relaxed volatility tolerance.

Core Mechanism: Potential Shaping

Potential Shaping is a reward shaping technique that introduces the concept of "potential energy":

def _compute_potential(self) -> float:
    unreal = float(self.get_unrealized_profit() or 0.0)
    return self.potential_coef * unreal  # 0.01 * unrealized profit

def calculate_reward(self, action):
    new_potential = self._compute_potential()
    shaped = self.potential_gamma * new_potential - self.prev_potential  # γ=0.90
    self.prev_potential = new_potential
    return original_reward + shaped

Why Potential Shaping?

Raw RL struggles with sparse rewards in long-term dependencies
Potential Shaping converts unrealized P&L changes into immediate rewards
Guides the agent to focus on the trend of P&L changes, not absolute values

Step-based Reward System

Trend uses a dynamic target line mechanism that divides the profit target into multiple steps:

# Step reward: triggered when profit breaks through current target line
if unreal >= self._current_target:
    reward += unreal * 0.1           # One-time bonus
    self._current_target += self.profit_target  # Elevate target line

# Immediate reward/penalty
if unreal > base_line:
    reward += excess_reward_coef * (unreal - base_line)
else:
    reward -= excess_penalty_coef * (base_line - unreal)

Core Idea: The model must "unlock" each step to earn step-based rewards — encourages holding through big trends.

Online Normalization

Unlike Regression, Trend uses cross-episode accumulated statistics for normalization:

def _normalize_obs(self, obs):
    # Exponential moving average for mean and variance
    alpha = 1.0 / self.obs_count
    self.obs_mean = (1 - alpha) * self.obs_mean + alpha * obs
    self.obs_var = (1 - alpha) * self.obs_var + alpha * ((obs - self.obs_mean) ** 2)

    # Standardize + inject step number
    stage_num = self._current_target / self.profit_target
    return normalized_obs + [stage_num]

Key Points:

Normalization statistics are not cleared on reset (cross-episode accumulation)
Extra dimension injects current step number, letting the model sense "progress unlocked"

Relaxed Stoploss Design

Trend allows greater volatility:

if current_profit > 0.15: return -0.02
elif current_profit > 0.12: return -0.03
# ... max allows -15% stoploss

Relaxed Entry Conditions

Uses ADX indicator to determine trend strength, not RSI overbought/oversold:

long_condition = (
    (adx > buy_adx) &          # ADX > 20 indicates trend formation
    (plus_di > minus_di) &     # +DI > -DI indicates uptrend
    (close > sma_20) &         # Price above moving average
    (rsi < 70) & (rsi > 30)    # RSI not in extreme zone
)

Using Both Strategies Together

Combined Logic

Trend (Excess Return Engine) + Regression (Stable Growth Engine)

Trend: Captures big moves in trending markets, 10% target, 1x leverage
Regression: Mean reversion in volatile markets, 2% target, 3x leverage

Live Performance

Duration: ~15 days
Combined portfolio returns: ~20%
Regression strategy: steady consistent profits
Trend strategy: excess returns during trending markets

Project Structure

Freqai_RL/
├── freqaimodels/               # RL Model definitions
│   ├── RL_Model.py            # Base RL model (reference implementation)
│   ├── RL_Model_Regression.py # Regression strategy environment
│   └── RL_Model_Trend.py      # Trend strategy environment
├── strategies/                 # Freqtrade strategies
│   ├── RLStrategy.py           # Base strategy (reference implementation)
│   ├── RLStrategy_Regression.py
│   └── RLStrategy_Trend.py
├── picture/                    # Live trading screenshots
│   ├── trend.png              # Trend strategy live trading
│   ├── regression.png         # Regression strategy live trading
│   ├── simulate_run.png        # Simulated portfolio run
│   ├── trend_log.png          # Trend training log
│   └── regression_log.png     # Regression training log
└── README.md

Advanced Usage

Backtesting

freqtrade backtest --config config.json --strategy RLStrategy_Regression --freqaimodel RL_Model_Regression --timerange=20230101-20231231

Training Models Separately

freqtrade train --config config.json --strategy RLStrategy_Regression --freqaimodel RL_Model_Regression
freqtrade train --config config.json --strategy RLStrategy_Trend --freqaimodel RL_Model_Trend

Parameter Tuning

Adjust parameters in the strategy .json config files:

{
  "strategy_name": "RLStrategy_Regression",
  "parameters": {
    "buy_rsi": {"value": 30},
    "sell_rsi": {"value": 70}
  }
}

Design Philosophy Summary

Regression: High-frequency + Strict Risk Control

Design	Implementation
Goal	Micro high-frequency profits
Leverage	3x
Holding Time	Short (max 30 ticks)
Stoploss	Trailing dynamic stoploss
Reward	Multi-factor decomposition, real-time feedback

Trend: Trend Following + Generous Volatility

Design	Implementation
Goal	Capture big trends
Leverage	1x
Holding Time	Long (hold until trend ends)
Stoploss	Relaxed (max -15%)
Reward	Potential Shaping + Step rewards

License

MIT License — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
freqaimodels		freqaimodels
picture		picture
strategies		strategies
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Freqai_RL

Quick Start (5 Minutes)

1. Install Dependencies

2. Copy Files to Freqtrade Directory

3. Configure config.json

4. Train & Run

Live Performance

Simulated Trading (15 Days)

Regression Strategy

Trend Strategy

Strategy Overview

Regression Strategy Deep Dive

Philosophy: High-frequency Micro-profit + Strict Risk Control

Core Reward Mechanism

1. Delta PNL Reward

2. Reversion Reward

3. Quick Profit Bonus

4. Time Penalty

5. Extended Observation Space (+7 Dimensions)

Risk Control Design

Trend Strategy Deep Dive

Philosophy: Chase Big Trends + Allow Volatility

Core Mechanism: Potential Shaping

Step-based Reward System

Online Normalization

Relaxed Stoploss Design

Relaxed Entry Conditions

Using Both Strategies Together

Combined Logic

Live Performance

Project Structure

Advanced Usage

Backtesting

Training Models Separately

Parameter Tuning

Design Philosophy Summary

Regression: High-frequency + Strict Risk Control

Trend: Trend Following + Generous Volatility

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages