- Overview
- Objectives
- Methodology
- System Design
- Installation
- Usage
- Repository Structure
- Multi-Paper Research Hub
- License
- References
This repository provides an implementation of the AlphaZero algorithm adapted to the ConnectX environment on Kaggle. The implementation is based on self-play reinforcement learning with Monte Carlo Tree Search (MCTS) guided by a residual neural network.
Note: This repository is structured to serve as a multi-paper research hub. Additional reimplementations of state-of-the-art papers will be added as separate subdirectories.
- ConnectX Adaptation: Implement the AlphaZero paradigm on the 6×7 ConnectX grid.
- Baseline Foundation: Provide a compute-efficient, reproducible implementation.
- Multi-Paper Repository: Expand the repo with further deep learning and RL research paper reimplementations.
- Extensibility: Ensure modular and documented design for easy integration of new ideas.
Agents generate training data by playing against themselves using MCTS guided by neural priors.
- Input: Two-channel tensor for current player and opponent.
- Backbone: 5 residual blocks with 128 filters and batch normalization.
- Heads:
- Policy head: outputs action probabilities.
- Value head: evaluates the current board state.
- PUCT: Balances exploration/exploitation.
- Dirichlet Noise: Injected at the root to encourage exploration.
- Value Propagation: Uses alternating signs for perspective switching.
- Iterative: Self-play → data aggregation → training.
- Loss Function: Combined policy (cross-entropy), value (MSE), and L2 regularization.
| Module | Description |
|---|---|
game/ConnectXState.py |
Game logic and fast win detection |
mcts.py |
MCTS algorithm with exploration enhancements |
network.py |
Residual CNN with dual heads |
self_play.py |
Orchestrates self-play data generation |
train.py |
Handles batching, loss computation, training |
evaluate.py |
Elo-style evaluation against baselines |
## Installation
git clone https://github.com/Alphino1/ConnectX-RL-Research-Paper-Implementations.git
cd ConnectX-RL-Research-Paper-Implementations
# (Optional) create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate # on Windows use: .venv\Scripts\activate
# install all Python dependencies
pip install --upgrade pip
pip install -r requirements.txt
### Requirements
- Python ≥ 3.8
- PyTorch ≥ 1.9
- NumPy
- tqdm
- Run Training (AlphaZero)
python train.py --iterations 5 --self_play_games 50 --mcts_simulations 200
- Run Evaluation
python evaluate.py --checkpoint checkpoints/iter_5.pth --episodes 100
- Explore Notebook
Open the following notebook in Jupyter:
notebook/alphazero_connectx.ipynb
for a step-by-step walkthrough and visualizations.
This repository will evolve into a consolidated library of multiple research paper reimplementations. Each paper will be added under its own directory, maintaining:
-
Interactive Jupyter notebooks
-
Modular scripts and training code
-
Well-documented README files
-
(Optional) Unit tests
Example Future Additions:
paper_muzero/
paper_alphago/
This design enables structured, scalable growth of the repository for both learning and contribution.
This project is licensed under the MIT License.
-
Silver, D., Hubert, T., Schrittwieser, J., et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Science, 2018.
-
Kaggle ConnectX Competition – https://www.kaggle.com/competitions/connectx
This repository reflects an evolving body of work: shaped by deliberate effort, continuous learning, and a continuous attempt to thoughtful progress. While the current state represents a small subset of meaningful work, it remains part of a broader, ongoing journey: open to refinement, extension, and deeper understanding. As with any thoughtful pursuit, the process is dynamic, not definitive.
It serves as a reflection of ongoing exploration rather than a finished destination: a work in progress informed by every question, insight, and perspective encountered along the way. Sustained by curiosity and the steady momentum of iteration, this journey is far from complete. And in that lies its greatest potential.