Research Paper | MBA 501: Advanced Decision and Game Theory | Tunis Business School
Author: Montaha GHABRI
Supervisor: Dr. Sonia Rebai | Institution: Tunis Business School (TBS), University of Tunis
Independent reinforcement learning agents deployed in competitive markets can converge to tacitly collusive pricing without explicit coordination. While the emergence of algorithmic collusion is well-documented, its stability once learned remains an open question of direct relevance to antitrust enforcement.
This paper tests four regulatory-style interventions applied after Q-learning agents have converged to collusive pricing in a repeated Bertrand duopoly. Replicating Calvano et al. (2020), the baseline yields an equilibrium price of 1.887 and a Profit Gain Index of Δ = 0.991. All four interventions reduce equilibrium prices but do not restore competitive outcomes (recovery rates range from 54.5% (forced pricing) to 81.8% (memory reset)).
- Research Questions
- Background
- Methodology
- Results
- Discussion & Policy Implications
- Repository Structure
- Replication Guide
- References
| # | Question |
|---|---|
| RQ1 | How stable is algorithmic collusion against market disruptions of different types? |
| RQ2 | Does intervention duration affect the degree of disruption? |
| RQ3 | Do symmetric and asymmetric interventions differ in effectiveness? |
- First systematic comparison of four regulatory intervention types applied to converged algorithmic collusion, using full re-convergence as the measurement standard
- Evidence of duration insensitivity, consistent with a disruption-threshold interpretation for forced pricing
- Evidence that asymmetric interventions are less disruptive than symmetric ones, with implications for algorithm-replacement remedies
- A compact robustness check across a 3×3 (α, β) parameter grid showing qualitative patterns are not confined to a single parameterisation
Two symmetric firms compete in an infinitely repeated duopoly. Consumers purchase according to multinomial logit demand:
Parameters (following Calvano et al.): c = 1, a−c = 1, a₀ = 0, μ = 0.25, δ = 0.95
This yields Nash price pᴺ ≈ 1.473 and monopoly price pᴹ ≈ 1.925.
| Metric | Formula | Interpretation |
|---|---|---|
| Profit Gain Index (Δ) | (π̄ − πᴺ) / (πᴹ − πᴺ) |
0 = competitive, 1 = full monopoly |
| Recovery Rate (R) | (p̄post − pᴺ) / (p̄base − pᴺ) × 100% |
0% = collapsed to Nash, 100% = full return to collusion |
- State space: Previous period prices
sₜ = (p₁,ₜ₋₁, p₂,ₜ₋₁)→ |S| = 225 - Price grid: m = 15 points on [1.2, 2.0]
- Learning rate: α = 0.15
- Exploration decay: εₜ = exp(−βt), β = 4×10⁻⁶
- Convergence: Greedy policy unchanged across all 225 states for 10⁵ consecutive periods
| Intervention | Mechanism | Policy Analogue |
|---|---|---|
| Forced Pricing (k=50) | Firm 0 locked to pᴺ for 50 periods; Firm 1 updates freely | Consent decree |
| Forced Pricing (k=100) | Same mechanism, 100 periods | Extended decree |
| Exploration Shock | Both agents set ε=0.5 for 100 periods | Regulatory audit |
| Memory Reset | Firm 0 Q-table reset to Q₀; Firm 1 retains memory | Algorithm replacement |
Sensitivity grid across 9 parameter combinations:
- α ∈ {0.10, 0.15, 0.25}
- β ∈ {4×10⁻⁶, 10⁻⁵, 10⁻⁴}
- 3 independent sessions per cell
| Metric | Value |
|---|---|
| Nash price (pᴺ) | 1.473 |
| Monopoly price (pᴹ) | 1.925 |
| Equilibrium price (p*) | 1.887 |
| Profit Gain Index (Δ) | 0.991 |
| Convergence (iterations) | 1.4 × 10⁶ |
The impulse response replicates Calvano et al. Figure 3, prices drop below Nash for ~3 periods (punishment phase) before returning to the collusive equilibrium by period 6.
| Intervention | p̄post | R (%) | Δp |
|---|---|---|---|
| Forced (k=50) | 1.699 | 54.5 | −0.188 |
| Forced (k=100) | 1.699 | 54.5 | −0.188 |
| Exploration shock | 1.756 | 68.2 | −0.131 |
| Memory reset | 1.812 | 81.8 | −0.075 |
| Nash (R=0%) | 1.473 | 0.0 | — |
| Baseline (R=100%) | 1.887 | 100.0 | — |
Key finding: All four interventions reduce collusion; none restore competitive pricing. The most disruptive intervention (forced pricing) still leaves agents at R = 54.5%, more than halfway back to collusion.
All 9 cells in the (α, β) grid yield Δ ≥ 0.63, confirming substantial collusion is not confined to a single parameterisation. The minimum is 0.63 ± 0.10 at (α=0.15, β=10⁻⁴); the maximum is 0.88 ± 0.12 at the paper baseline.
Once converged, Q-tables encode a reinforced collusive strategy. Interventions perturb this encoding but do not erase it. Even mechanistically distinct disruptions share the same partial-recovery pattern, suggesting this is a structural feature of Q-learning in repeated Bertrand games.
Forcing Nash pricing for 50 vs. 100 periods produces identical outcomes in this run. This suggests a possible threshold mechanism, once Q-value corruption crosses a critical level, extending enforcement adds little benefit.
The memory reset (asymmetric) achieves the least disruption (R = 81.8%). The intact firm anchors re-convergence toward collusion, effectively re-teaching the reset competitor. This challenges the policy intuition that replacing one firm's algorithm suffices to break coordination.
- Reactive enforcement is not futile: a decline from Δ = 0.991 to 0.55–0.82 represents a real consumer welfare gain
- Symmetric enforcement outperforms targeted: industry-wide audits or coordinated resets should produce larger disruptions than firm-specific remedies
- Preventive approaches may dominate reactive ones: algorithmic design requirements or ex-ante approval regimes could outperform post-detection enforcement
├── LaTeX/ # Paper source files
├── plots/ # Generated figures
├── replication/
│ ├── input/
│ │ ├── main.jl # Julia implementation
│ │ ├── main.m # MATLAB implementation
│ │ └── main.py # Python implementation
│ └── scripts/
│ ├── script_01_baseline.py # Train baseline, save converged game
│ ├── script_02_interventions.py # Apply & measure all 4 interventions
│ └── script_03_sensitivity.py # 3×3 (α, β) robustness grid
├── baseline_game.pkl # Saved converged game object
├── calvano_replication.png # Impulse response figure
├── intervention_results.png # Recovery rates figure
├── Paper.pdf # Full paper
└── draft.pdf
pip install numpy matplotlib seaborn pickle# Step 1: Train baseline once and save converged game
python replication/scripts/script_01_baseline.py
# Step 2: Apply all four interventions and measure recovery rates
python replication/scripts/script_02_interventions.py
# Step 3: Run sensitivity analysis across 3×3 parameter grid
python replication/scripts/script_03_sensitivity.pyNote:
script_01_baseline.pysavesbaseline_game.pkl. All subsequent scripts load this same object, ensuring interventions are applied to an identical baseline. The baseline is trained exactly once.
# Economic environment (Calvano et al. defaults)
c = 1 # Marginal cost
mu = 0.25 # Horizontal differentiation
delta = 0.95 # Discount factor
m = 15 # Price grid points on [1.2, 2.0]
# Q-learning
alpha = 0.15 # Learning rate
beta = 4e-6 # Exploration decay- Headline intervention results are based on one baseline training realisation: interpret as run-specific, not population-level estimates
- Interventions applied once in isolation; repeated or coordinated interventions may produce different dynamics
- Sensitivity analysis uses only 3 sessions per cell; indicative rather than definitive
- Model abstracts from entry, asymmetric costs, demand uncertainty, and multi-product competition
| # | Citation |
|---|---|
| [1] | Abada, Lambin & Tóth (2023). Artificial Intelligence, Algorithmic Pricing, and Tacit Collusion. Journal of Industrial Economics |
| [2] | Assad et al. (2024). Algorithmic Pricing and Competition: Evidence from the German Retail Gasoline Market. Management Science |
| [3] | Baker (2021). Algorithms and Tacit Collusion: An Antitrust Analysis. Antitrust Law Journal |
| [4] | Calvano et al. (2020). Artificial Intelligence, Algorithmic Pricing, and Collusion. American Economic Review ← baseline replicated |
| [5] | Courthoud (2021). Algorithmic Collusion Replication. GitHub repository |
| [6] | Dafoe et al. (2020). Open Problems in Cooperative AI. arXiv:2012.08630 |
| [7] | Ezrachi & Stucke (2016). Virtual Competition. Harvard University Press |
| [8] | Klein (2021). Autonomous Algorithmic Collusion: Q-Learning under Sequential Pricing. RAND Journal of Economics |
| [9] | Leibo et al. (2017). Multi-Agent Reinforcement Learning in Sequential Social Dilemmas. AAMAS |
| [10] | Mas-Colell, Whinston & Green (1995). Microeconomic Theory. Oxford University Press |
| [11] | Mehra (2016). Antitrust and the Roboseller. Minnesota Law Review |
| [12] | Singh et al. (2000). Convergence Results for Single-Step On-Policy RL Algorithms. Machine Learning |
| [13] | Sutton & Barto (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press |
| [14] | Watkins & Dayan (1992). Q-Learning. Machine Learning |
Tunis Business School · University of Tunis · MBA 501: Advanced Decision and Game Theory