🎮 Universal RL Game Arena

One Brain, 10 Games | AlphaZero-Inspired Multi-Game AI

A universal reinforcement learning agent trained across 10 diverse board games with interactive gameplay

Features • Quick Start • Games • Architecture • Play Online

🎯 Overview

Universal RL Arena is an interactive platform showcasing a single AI agent that masters 10 different board games through Q-learning with minimax and MCTS enhancements. Unlike traditional game-specific AI, this universal agent learns transferable strategic patterns across games, from simple Tic-Tac-Toe to complex Ultimate Tic-Tac-Toe.

🏆 Key Features

Universal Agent: Single Q-table architecture for all 10 games
Interactive Gameplay: Human vs AI, AI vs AI battle mode
In-App Training: Train custom agents with adjustable hyperparameters
Real-Time Visualization: Dynamic game state rendering
Performance Analytics: Training stats and win-rate tracking
Model Persistence: Save/load trained agents as .zip archives

🚀 Quick Start

Installation

# Clone repository
git clone https://github.com/Devanik21/universal-rl-arena.git
cd universal-rl-arena

# Install dependencies
pip install -r requirements.txt

requirements.txt:

streamlit>=1.28.0
numpy>=1.21.0
matplotlib>=3.5.0
pandas>=1.5.0

Launch Application

streamlit run aGI.py

The app will open at http://localhost:8501

🎮 Supported Games

Game	Complexity	State Space	Strategy Type
Tic-Tac-Toe	Simple	3³	Tactical
Connect-4	Medium	7⁶	Positional
Nim	Simple	Exponential	Mathematical
Hexapawn	Simple	3³	Tactical
Chomp	Medium	4×6	Strategic
Sim	Medium	C(6,2) edges	Graph Theory
Dots & Boxes	Medium	3×3 grid	Territory Control
Breakthrough	Complex	6×6 board	Positional
Gomoku	Complex	7×7 board	Pattern Recognition
Ultimate Tic-Tac-Toe	Very Complex	9×3³	Multi-level Strategy

Game Rules Summary

Tic-Tac-Toe: First to get 3 in a row wins
Connect-4: First to connect 4 discs vertically/horizontally/diagonally wins
Nim: Player forced to take the last object loses
Hexapawn: Reach opponent's back row or block all enemy moves
Chomp: Avoid eating the poison square (bottom-left)
Sim: First to form a triangle in their color loses
Dots & Boxes: Claim the most boxes by completing squares
Breakthrough: First to reach opponent's back row wins
Gomoku: Get exactly 5 stones in a row
Ultimate Tic-Tac-Toe: Win small boards to claim meta-board positions

🧠 Architecture

Universal Agent Design

class UniversalAgent:
    def __init__(self, player_id, lr=0.01, gamma=0.99, 
                 epsilon=1.0, mcts_sims=50, minimax_depth=2):
        self.q_table = {}  # Shared across all games
        self.game_stats = {}

Core Components:

State Representation: (game_name, *flattened_board_state)
Q-Table: {(state, action): value} mapping
Action Selection: Epsilon-greedy with tactical checks
Learning: Temporal Difference (TD) updates

Learning Algorithm

Q-Learning Update Rule:

$$Q(s,a) \leftarrow Q(s,a) + \alpha[r + \gamma \max_{a'} Q(s',a') - Q(s,a)]$$

Where:

$\alpha$ = learning rate (default: 0.01)
$\gamma$ = discount factor (default: 0.99)
$r$ = immediate reward
$s'$ = next state

Tactical Enhancements:

# 1. Immediate win detection
for action in available_actions:
    if sim_move(action).winner == self.player_id:
        return action

# 2. Block opponent wins
for action in available_actions:
    if sim_move(action, opponent).winner == opponent:
        return action

# 3. Q-value maximization
return argmax_a Q(state, action)

Hyperparameter Configuration

Parameter	Default	Range	Purpose
`lr`	0.01	0.001-0.5	Learning speed
`gamma`	0.99	0.8-0.999	Future reward weight
`epsilon`	1.0→0.01	-	Exploration rate (decays)
`epsilon_decay`	0.998	0.95-0.999	Exploration reduction
`minimax_depth`	2	1-6	Search tree depth
`mcts_simulations`	50	10-500	Monte Carlo rollouts

📊 Training Pipeline

Multi-Game Training Loop

# Initialize agents
agent1 = UniversalAgent(player_id=1, lr=0.01, gamma=0.99)
agent2 = UniversalAgent(player_id=2, lr=0.01, gamma=0.99)

# Games to train
games = [TicTacToe(), Nim(), Connect4(), Hexapawn(), 
         Chomp(), Sim(), DotsAndBoxes(), Breakthrough(),
         Gomoku(), UltimateTicTacToe()]

# Self-play training
for game in games:
    for episode in range(episodes):
        play_game(game, agent1, agent2, training=True)
        agent1.decay_epsilon()
        agent2.decay_epsilon()

Training Results

Typical convergence after 200 episodes per game:

Metric	Value
Total Q-States	~50,000-100,000
Training Time (10 games, 200 eps)	~2-5 minutes
Final Epsilon	0.01
Win Rate (vs random)	>85%

🎨 Visualization System

All games feature custom matplotlib renderers:

Tic-Tac-Toe: X/O symbols with grid
Connect-4: Colored discs with gravity
Nim: Stacked token pyramids
Hexapawn: Chess pawn symbols
Chomp: Chocolate grid with poison marker
Sim: Graph with 6 vertices
Dots & Boxes: Grid with edge highlighting
Breakthrough: Chess-like board
Gomoku: Go-style board
Ultimate TTT: 3×3 meta-board with active board highlighting

Example rendering code:

def visualize_game(env):
    if env.name == "tictactoe":
        return visualize_tictactoe(env.board)
    # ... routing for all 10 games

💾 Model Persistence

Save/Load Format

Agents are serialized to .zip archives containing:

universal_agent.zip
├── agent1.json       # Player 1 Q-table & config
├── agent2.json       # Player 2 Q-table & config
└── config.json       # Game list & metadata

JSON Structure:

{
  "q_table": {
    "[['tictactoe', 0, 0, 0, ...], '(0, 0)']": 0.85
  },
  "player_id": 1,
  "epsilon": 0.01,
  "game_stats": {
    "tictactoe": {"wins": 120, "losses": 75, "draws": 5}
  },
  "lr": 0.01,
  "gamma": 0.99
}

Usage

# Save trained agents
zip_buffer = create_universal_zip(agent1, agent2)
with open("my_agent.zip", "wb") as f:
    f.write(zip_buffer.getvalue())

# Load agents
agent1, agent2, config = load_universal_agents("my_agent.zip")

🎯 Usage Guide

1. Upload Pre-Trained Agent

Sidebar → Upload Universal Agent → Select .zip file → Load

2. Watch AI Battle

Select Game → Watch Battle → Auto-play/Step Mode

3. Play Against AI

Human vs AI → Choose Agent → Click board positions

4. Train New Agent

Training Lab → Set Hyperparameters → Start Multi-Game Training

5. Adjust AI Difficulty

Sidebar → AI Difficulty → Minimax Depth (1-6) & MCTS Sims (10-500)

🔬 Performance Optimization

State Space Reduction

Canonical Forms: Rotations/reflections mapped to single state
Pruning: Invalid actions filtered before Q-lookup
Sparse Storage: Only visited states stored in Q-table

Computational Tricks

# Fast win detection (vectorized)
def _check_win(self, player):
    # Row/column checks
    for i in range(3):
        if all(board[i, :] == player): return True
    # Diagonal checks
    if all(np.diag(board) == player): return True

Memory Efficiency

States stored as tuples (immutable, hashable)
Actions converted to strings for Q-table keys
Numpy arrays for board representations

📈 Future Enhancements

Neural network policy (DQN/A3C)
Transfer learning metrics
Multi-agent tournament mode
Online multiplayer (WebRTC)
Performance benchmarking suite
Additional games (Chess variants, Go)

🛠️ Technical Stack

Component	Technology
Framework	Streamlit 1.28+
ML/RL	Custom Q-Learning
Visualization	Matplotlib
State Management	Streamlit Session State
Serialization	JSON + ZIP
Data	NumPy, Pandas

🌐 Deployment

Streamlit Cloud

# Push to GitHub
git push origin main

# Deploy via Streamlit Cloud
# 1. Visit share.streamlit.io
# 2. Connect repository: Devanik21/universal-rl-arena
# 3. Set main file: aGI.py
# 4. Deploy

Local Docker

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "aGI.py"]

docker build -t universal-rl .
docker run -p 8501:8501 universal-rl

📚 Educational Use

Perfect for teaching:

Reinforcement Learning: Q-learning, exploration/exploitation
Game Theory: Minimax, Nash equilibria
Algorithm Design: State representation, search strategies
Python Programming: OOP, numpy, visualization

Example classroom exercise:

# Students implement a new game
class MyGame:
    def __init__(self): ...
    def reset(self): ...
    def get_state(self): ...
    def get_available_actions(self): ...
    def make_move(self, action): ...

🤝 Contributing

Contributions welcome! Areas for improvement:

New Games: Add games with get_state() interface
Visualizations: Enhance rendering quality
Algorithms: Implement A3C/PPO/DQN variants
UI/UX: Improve Streamlit interface
Documentation: Add tutorials/videos

See CONTRIBUTING.md for guidelines.

📜 License

MIT License - see LICENSE

👤 Author

Devanik

GitHub: @Devanik21
LinkedIn: linkedin.com/in/devanik
Twitter/X: @devanik2005

🙏 Acknowledgments

Inspired by:

AlphaZero (DeepMind) - Universal game-playing architecture
DQN (Mnih et al., 2015) - Deep Q-learning foundations
OpenAI Gym - Environment interface design

Built with ❤️ using Streamlit

📊 Stats

Made for Genius-Level Play 🎮

One Brain. Ten Games. Infinite Possibilities.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Agents		Agents
LICENSE		LICENSE
README.md		README.md
aGI.py		aGI.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎮 Universal RL Game Arena

🎯 Overview

🏆 Key Features

🚀 Quick Start

Installation

Launch Application

🎮 Supported Games

Game Rules Summary

🧠 Architecture

Universal Agent Design

Learning Algorithm

Hyperparameter Configuration

📊 Training Pipeline

Multi-Game Training Loop

Training Results

🎨 Visualization System

💾 Model Persistence

Save/Load Format

Usage

🎯 Usage Guide

1. Upload Pre-Trained Agent

2. Watch AI Battle

3. Play Against AI

4. Train New Agent

5. Adjust AI Difficulty

🔬 Performance Optimization

State Space Reduction

Computational Tricks

Memory Efficiency

📈 Future Enhancements

🛠️ Technical Stack

🌐 Deployment

Streamlit Cloud

Local Docker

📚 Educational Use

🤝 Contributing

📜 License

👤 Author

🙏 Acknowledgments

📊 Stats

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages