AlphaZero for ConnectX and Other Research Paper Implementations

Overview

This repository provides an implementation of the AlphaZero algorithm adapted to the ConnectX environment on Kaggle. The implementation is based on self-play reinforcement learning with Monte Carlo Tree Search (MCTS) guided by a residual neural network.

Note: This repository is structured to serve as a multi-paper research hub. Additional reimplementations of state-of-the-art papers will be added as separate subdirectories.

Objectives

ConnectX Adaptation: Implement the AlphaZero paradigm on the 6×7 ConnectX grid.
Baseline Foundation: Provide a compute-efficient, reproducible implementation.
Multi-Paper Repository: Expand the repo with further deep learning and RL research paper reimplementations.
Extensibility: Ensure modular and documented design for easy integration of new ideas.

Methodology (AlphaZero)

1. Self-Play Data Generation

Agents generate training data by playing against themselves using MCTS guided by neural priors.

2. Neural Network Architecture

Input: Two-channel tensor for current player and opponent.
Backbone: 5 residual blocks with 128 filters and batch normalization.
Heads:
- Policy head: outputs action probabilities.
- Value head: evaluates the current board state.

3. MCTS Enhancements

PUCT: Balances exploration/exploitation.
Dirichlet Noise: Injected at the root to encourage exploration.
Value Propagation: Uses alternating signs for perspective switching.

4. Training Loop

Iterative: Self-play → data aggregation → training.
Loss Function: Combined policy (cross-entropy), value (MSE), and L2 regularization.

System Design

Module	Description
`game/ConnectXState.py`	Game logic and fast win detection
`mcts.py`	MCTS algorithm with exploration enhancements
`network.py`	Residual CNN with dual heads
`self_play.py`	Orchestrates self-play data generation
`train.py`	Handles batching, loss computation, training
`evaluate.py`	Elo-style evaluation against baselines

Installation

## Installation


git clone https://github.com/Alphino1/ConnectX-RL-Research-Paper-Implementations.git  
cd ConnectX-RL-Research-Paper-Implementations  

# (Optional) create and activate a virtual environment
python3 -m venv .venv  
source .venv/bin/activate  # on Windows use: .venv\Scripts\activate

# install all Python dependencies
pip install --upgrade pip  
pip install -r requirements.txt

### Requirements

- Python ≥ 3.8  
- PyTorch ≥ 1.9  
- NumPy  
- tqdm

Usage

Run Training (AlphaZero)

python train.py --iterations 5 --self_play_games 50 --mcts_simulations 200

Run Evaluation

python evaluate.py --checkpoint checkpoints/iter_5.pth --episodes 100

Explore Notebook

Open the following notebook in Jupyter:

notebook/alphazero_connectx.ipynb

for a step-by-step walkthrough and visualizations.

Multi-Paper Research Hub

This repository will evolve into a consolidated library of multiple research paper reimplementations. Each paper will be added under its own directory, maintaining:

Interactive Jupyter notebooks
Modular scripts and training code
Well-documented README files
(Optional) Unit tests

Example Future Additions:

paper_muzero/

paper_alphago/

This design enables structured, scalable growth of the repository for both learning and contribution.

License

This project is licensed under the MIT License.

References

Silver, D., Hubert, T., Schrittwieser, J., et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Science, 2018.
Kaggle ConnectX Competition – https://www.kaggle.com/competitions/connectx

Ongoing Work and Future Direction

This repository reflects an evolving body of work: shaped by deliberate effort, continuous learning, and a continuous attempt to thoughtful progress. While the current state represents a small subset of meaningful work, it remains part of a broader, ongoing journey: open to refinement, extension, and deeper understanding. As with any thoughtful pursuit, the process is dynamic, not definitive.

It serves as a reflection of ongoing exploration rather than a finished destination: a work in progress informed by every question, insight, and perspective encountered along the way. Sustained by curiosity and the steady momentum of iteration, this journey is far from complete. And in that lies its greatest potential.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Diagnostic and Comparative Evaluation		Diagnostic and Comparative Evaluation
notebook		notebook
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaZero for ConnectX and Other Research Paper Implementations

Table of Contents

Overview

Objectives

Methodology (AlphaZero)

1. Self-Play Data Generation

2. Neural Network Architecture

3. MCTS Enhancements

4. Training Loop

System Design

Installation

Usage

Multi-Paper Research Hub

License

References

Ongoing Work and Future Direction

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AlphaZero for ConnectX and Other Research Paper Implementations

Table of Contents

Overview

Objectives

Methodology (AlphaZero)

1. Self-Play Data Generation

2. Neural Network Architecture

3. MCTS Enhancements

4. Training Loop

System Design

Installation

Usage

Multi-Paper Research Hub

License

References

Ongoing Work and Future Direction

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages