One Brain, 10 Games | AlphaZero-Inspired Multi-Game AI
A universal reinforcement learning agent trained across 10 diverse board games with interactive gameplay
Features • Quick Start • Games • Architecture • Play Online
Universal RL Arena is an interactive platform showcasing a single AI agent that masters 10 different board games through Q-learning with minimax and MCTS enhancements. Unlike traditional game-specific AI, this universal agent learns transferable strategic patterns across games, from simple Tic-Tac-Toe to complex Ultimate Tic-Tac-Toe.
- Universal Agent: Single Q-table architecture for all 10 games
- Interactive Gameplay: Human vs AI, AI vs AI battle mode
- In-App Training: Train custom agents with adjustable hyperparameters
- Real-Time Visualization: Dynamic game state rendering
- Performance Analytics: Training stats and win-rate tracking
- Model Persistence: Save/load trained agents as .zip archives
# Clone repository
git clone https://github.com/Devanik21/universal-rl-arena.git
cd universal-rl-arena
# Install dependencies
pip install -r requirements.txtrequirements.txt:
streamlit>=1.28.0
numpy>=1.21.0
matplotlib>=3.5.0
pandas>=1.5.0
streamlit run aGI.pyThe app will open at http://localhost:8501
| Game | Complexity | State Space | Strategy Type |
|---|---|---|---|
| Tic-Tac-Toe | Simple | 3³ | Tactical |
| Connect-4 | Medium | 7⁶ | Positional |
| Nim | Simple | Exponential | Mathematical |
| Hexapawn | Simple | 3³ | Tactical |
| Chomp | Medium | 4×6 | Strategic |
| Sim | Medium | C(6,2) edges | Graph Theory |
| Dots & Boxes | Medium | 3×3 grid | Territory Control |
| Breakthrough | Complex | 6×6 board | Positional |
| Gomoku | Complex | 7×7 board | Pattern Recognition |
| Ultimate Tic-Tac-Toe | Very Complex | 9×3³ | Multi-level Strategy |
- Tic-Tac-Toe: First to get 3 in a row wins
- Connect-4: First to connect 4 discs vertically/horizontally/diagonally wins
- Nim: Player forced to take the last object loses
- Hexapawn: Reach opponent's back row or block all enemy moves
- Chomp: Avoid eating the poison square (bottom-left)
- Sim: First to form a triangle in their color loses
- Dots & Boxes: Claim the most boxes by completing squares
- Breakthrough: First to reach opponent's back row wins
- Gomoku: Get exactly 5 stones in a row
- Ultimate Tic-Tac-Toe: Win small boards to claim meta-board positions
class UniversalAgent:
def __init__(self, player_id, lr=0.01, gamma=0.99,
epsilon=1.0, mcts_sims=50, minimax_depth=2):
self.q_table = {} # Shared across all games
self.game_stats = {}Core Components:
- State Representation:
(game_name, *flattened_board_state) - Q-Table:
{(state, action): value}mapping - Action Selection: Epsilon-greedy with tactical checks
- Learning: Temporal Difference (TD) updates
Q-Learning Update Rule:
Where:
-
$\alpha$ = learning rate (default: 0.01) -
$\gamma$ = discount factor (default: 0.99) -
$r$ = immediate reward -
$s'$ = next state
Tactical Enhancements:
# 1. Immediate win detection
for action in available_actions:
if sim_move(action).winner == self.player_id:
return action
# 2. Block opponent wins
for action in available_actions:
if sim_move(action, opponent).winner == opponent:
return action
# 3. Q-value maximization
return argmax_a Q(state, action)| Parameter | Default | Range | Purpose |
|---|---|---|---|
lr |
0.01 | 0.001-0.5 | Learning speed |
gamma |
0.99 | 0.8-0.999 | Future reward weight |
epsilon |
1.0→0.01 | - | Exploration rate (decays) |
epsilon_decay |
0.998 | 0.95-0.999 | Exploration reduction |
minimax_depth |
2 | 1-6 | Search tree depth |
mcts_simulations |
50 | 10-500 | Monte Carlo rollouts |
# Initialize agents
agent1 = UniversalAgent(player_id=1, lr=0.01, gamma=0.99)
agent2 = UniversalAgent(player_id=2, lr=0.01, gamma=0.99)
# Games to train
games = [TicTacToe(), Nim(), Connect4(), Hexapawn(),
Chomp(), Sim(), DotsAndBoxes(), Breakthrough(),
Gomoku(), UltimateTicTacToe()]
# Self-play training
for game in games:
for episode in range(episodes):
play_game(game, agent1, agent2, training=True)
agent1.decay_epsilon()
agent2.decay_epsilon()Typical convergence after 200 episodes per game:
| Metric | Value |
|---|---|
| Total Q-States | ~50,000-100,000 |
| Training Time (10 games, 200 eps) | ~2-5 minutes |
| Final Epsilon | 0.01 |
| Win Rate (vs random) | >85% |
All games feature custom matplotlib renderers:
- Tic-Tac-Toe: X/O symbols with grid
- Connect-4: Colored discs with gravity
- Nim: Stacked token pyramids
- Hexapawn: Chess pawn symbols
- Chomp: Chocolate grid with poison marker
- Sim: Graph with 6 vertices
- Dots & Boxes: Grid with edge highlighting
- Breakthrough: Chess-like board
- Gomoku: Go-style board
- Ultimate TTT: 3×3 meta-board with active board highlighting
Example rendering code:
def visualize_game(env):
if env.name == "tictactoe":
return visualize_tictactoe(env.board)
# ... routing for all 10 gamesAgents are serialized to .zip archives containing:
universal_agent.zip
├── agent1.json # Player 1 Q-table & config
├── agent2.json # Player 2 Q-table & config
└── config.json # Game list & metadata
JSON Structure:
{
"q_table": {
"[['tictactoe', 0, 0, 0, ...], '(0, 0)']": 0.85
},
"player_id": 1,
"epsilon": 0.01,
"game_stats": {
"tictactoe": {"wins": 120, "losses": 75, "draws": 5}
},
"lr": 0.01,
"gamma": 0.99
}# Save trained agents
zip_buffer = create_universal_zip(agent1, agent2)
with open("my_agent.zip", "wb") as f:
f.write(zip_buffer.getvalue())
# Load agents
agent1, agent2, config = load_universal_agents("my_agent.zip")Sidebar → Upload Universal Agent → Select .zip file → Load
Select Game → Watch Battle → Auto-play/Step Mode
Human vs AI → Choose Agent → Click board positions
Training Lab → Set Hyperparameters → Start Multi-Game Training
Sidebar → AI Difficulty → Minimax Depth (1-6) & MCTS Sims (10-500)
- Canonical Forms: Rotations/reflections mapped to single state
- Pruning: Invalid actions filtered before Q-lookup
- Sparse Storage: Only visited states stored in Q-table
# Fast win detection (vectorized)
def _check_win(self, player):
# Row/column checks
for i in range(3):
if all(board[i, :] == player): return True
# Diagonal checks
if all(np.diag(board) == player): return True- States stored as tuples (immutable, hashable)
- Actions converted to strings for Q-table keys
- Numpy arrays for board representations
- Neural network policy (DQN/A3C)
- Transfer learning metrics
- Multi-agent tournament mode
- Online multiplayer (WebRTC)
- Performance benchmarking suite
- Additional games (Chess variants, Go)
| Component | Technology |
|---|---|
| Framework | Streamlit 1.28+ |
| ML/RL | Custom Q-Learning |
| Visualization | Matplotlib |
| State Management | Streamlit Session State |
| Serialization | JSON + ZIP |
| Data | NumPy, Pandas |
# Push to GitHub
git push origin main
# Deploy via Streamlit Cloud
# 1. Visit share.streamlit.io
# 2. Connect repository: Devanik21/universal-rl-arena
# 3. Set main file: aGI.py
# 4. DeployFROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "aGI.py"]docker build -t universal-rl .
docker run -p 8501:8501 universal-rlPerfect for teaching:
- Reinforcement Learning: Q-learning, exploration/exploitation
- Game Theory: Minimax, Nash equilibria
- Algorithm Design: State representation, search strategies
- Python Programming: OOP, numpy, visualization
Example classroom exercise:
# Students implement a new game
class MyGame:
def __init__(self): ...
def reset(self): ...
def get_state(self): ...
def get_available_actions(self): ...
def make_move(self, action): ...Contributions welcome! Areas for improvement:
- New Games: Add games with
get_state()interface - Visualizations: Enhance rendering quality
- Algorithms: Implement A3C/PPO/DQN variants
- UI/UX: Improve Streamlit interface
- Documentation: Add tutorials/videos
See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE
Devanik
- GitHub: @Devanik21
- LinkedIn: linkedin.com/in/devanik
- Twitter/X: @devanik2005
Inspired by:
- AlphaZero (DeepMind) - Universal game-playing architecture
- DQN (Mnih et al., 2015) - Deep Q-learning foundations
- OpenAI Gym - Environment interface design
Built with ❤️ using Streamlit
Made for Genius-Level Play 🎮
One Brain. Ten Games. Infinite Possibilities.