Skip to content

Alphino1/ConnectX-RL-Research-Paper-Implementations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AlphaZero for ConnectX and Other Research Paper Implementations

Build Status License

Table of Contents

  1. Overview
  2. Objectives
  3. Methodology
  4. System Design
  5. Installation
  6. Usage
  7. Repository Structure
  8. Multi-Paper Research Hub
  9. License
  10. References

Overview

This repository provides an implementation of the AlphaZero algorithm adapted to the ConnectX environment on Kaggle. The implementation is based on self-play reinforcement learning with Monte Carlo Tree Search (MCTS) guided by a residual neural network.

Note: This repository is structured to serve as a multi-paper research hub. Additional reimplementations of state-of-the-art papers will be added as separate subdirectories.


Objectives

  • ConnectX Adaptation: Implement the AlphaZero paradigm on the 6×7 ConnectX grid.
  • Baseline Foundation: Provide a compute-efficient, reproducible implementation.
  • Multi-Paper Repository: Expand the repo with further deep learning and RL research paper reimplementations.
  • Extensibility: Ensure modular and documented design for easy integration of new ideas.

Methodology (AlphaZero)

1. Self-Play Data Generation

Agents generate training data by playing against themselves using MCTS guided by neural priors.

2. Neural Network Architecture

  • Input: Two-channel tensor for current player and opponent.
  • Backbone: 5 residual blocks with 128 filters and batch normalization.
  • Heads:
    • Policy head: outputs action probabilities.
    • Value head: evaluates the current board state.

3. MCTS Enhancements

  • PUCT: Balances exploration/exploitation.
  • Dirichlet Noise: Injected at the root to encourage exploration.
  • Value Propagation: Uses alternating signs for perspective switching.

4. Training Loop

  • Iterative: Self-play → data aggregation → training.
  • Loss Function: Combined policy (cross-entropy), value (MSE), and L2 regularization.

System Design

Module Description
game/ConnectXState.py Game logic and fast win detection
mcts.py MCTS algorithm with exploration enhancements
network.py Residual CNN with dual heads
self_play.py Orchestrates self-play data generation
train.py Handles batching, loss computation, training
evaluate.py Elo-style evaluation against baselines

Installation

## Installation


git clone https://github.com/Alphino1/ConnectX-RL-Research-Paper-Implementations.git  
cd ConnectX-RL-Research-Paper-Implementations  

# (Optional) create and activate a virtual environment
python3 -m venv .venv  
source .venv/bin/activate  # on Windows use: .venv\Scripts\activate

# install all Python dependencies
pip install --upgrade pip  
pip install -r requirements.txt

### Requirements

- Python ≥ 3.8  
- PyTorch ≥ 1.9  
- NumPy  
- tqdm

Usage

  1. Run Training (AlphaZero)

python train.py --iterations 5 --self_play_games 50 --mcts_simulations 200

  1. Run Evaluation

python evaluate.py --checkpoint checkpoints/iter_5.pth --episodes 100

  1. Explore Notebook

Open the following notebook in Jupyter:

notebook/alphazero_connectx.ipynb

for a step-by-step walkthrough and visualizations.


Multi-Paper Research Hub

This repository will evolve into a consolidated library of multiple research paper reimplementations. Each paper will be added under its own directory, maintaining:

  1. Interactive Jupyter notebooks

  2. Modular scripts and training code

  3. Well-documented README files

  4. (Optional) Unit tests

Example Future Additions:

paper_muzero/

paper_alphago/

This design enables structured, scalable growth of the repository for both learning and contribution.


License

This project is licensed under the MIT License.


References

  1. Silver, D., Hubert, T., Schrittwieser, J., et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Science, 2018.

  2. Kaggle ConnectX Competition – https://www.kaggle.com/competitions/connectx


Ongoing Work and Future Direction

This repository reflects an evolving body of work: shaped by deliberate effort, continuous learning, and a continuous attempt to thoughtful progress. While the current state represents a small subset of meaningful work, it remains part of a broader, ongoing journey: open to refinement, extension, and deeper understanding. As with any thoughtful pursuit, the process is dynamic, not definitive.

It serves as a reflection of ongoing exploration rather than a finished destination: a work in progress informed by every question, insight, and perspective encountered along the way. Sustained by curiosity and the steady momentum of iteration, this journey is far from complete. And in that lies its greatest potential.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors