GitHub - Ematrion/master-thesis-experiments: RSTT version of the experiments from chapter 4

Evaluation of Ranking & Matchmaking in Competition
A Simulation and Humanist Approach
Master Thesis
David Bucher
January 2025
University of Fribourg

Content Summary

The repository focuses on Chapter 4 — the experimental part of the thesis.

Each notebook is a standalone experiment. All start with a number matching the section of the written document. They contain the code and comments from the thesis. It is recommended to consult them in order.

4.1 Baseline: I claim we can use Kendall tau rank correlation to track Elo convergence. This notebook presents values to expect in settings relevant to the thesis.
4.2 Elo Prediction: I claim that a perfectly tuned ranking — from a predictive metric perspective — can overfit data even with proper training/validation. I build a synthetic example illustrating it.
4.3 The Snake Scheduler: I propose a simple model for skill-based matchmaking. I run a simulation where I measure game quality computed by Elo, Glicko, TrueSkill, and OpenSkill.
4.4 Random Snake: a conceptual benchmark test highlighting qualitative differences between SOTA rankings. (I am very proud of this setting.)
4.5 Convergence in the Snake Format: an extension of 4.4 showing how the conceptual test translates to a more realistic scenario.

Remark

Texts are close to the ones in the thesis PDF. They have been slightly adjusted to make the notebooks self-readable but may contain references to other chapters not available in this repo.

Extension

Some notebooks include a section Further Notes, which contains insight not presented in the thesis nor to the expert panel during Chapter 5. These are not necessary to follow the experimentation but bring valuable context and elements to anyone exploring my work.

Research Background & Motivation

My thesis started as a personal exploratory project. I wanted to verify and disprove certain frequent claims in the gaming community. I coded a first simulation prototype, then became more passionate about the project and started to investigate ranking algorithms and evaluation methodologies from the perspective of my experience in high-level competition. During the literature review, the experimental protocol and evaluation techniques of SOTA modern rating systems caught my attention.

One fundamental property of the Elo rating is systematically neglected or overlooked. Elo does not simply converge; it does not have a function that predicts game outcomes accurately. Elo is a Markov process with a unique stationary distribution (steady state). We know exactly what Elo does, which is a tremendous asset for reinforcement learning and unsupervised learning.
The data structure and the interplay between a ranking learning from games and producing recommendations for future games is not a detail and can alter competitive results. High-level chess consists of double round-robin or Swiss rounds, and as a graph, contains many cycles. Professional tennis consists of single-elimination brackets with no cycle at all. A cross-validation methodology in which a dataset is split in time for training and test sets produces the same distribution of data points. You need to split the distribution of points to claim it generalizes.
Skill-based matchmaking creates a pleasant gaming experience but produces irrelevant data for learning. There is no information in an a priori balanced game — at least according to Shannon’s theory. The expected-score baseline on such a dataset is a random predictor, yet no paper compares their system to it.

I formulated the “Matchmaking Domain Overfitting Hypothesis” and opted for simulation-based research. The rationale for it:

Only Elo has a proven unique stationary distribution.
Alternative systems have more parameters (typically ratings with a mean and a variance).
Empirical research did not address the underlying data structure nor address the naïve baseline.

Contribution

An illustration of an empirically perfectly tuned Elo rating system that does not generalize to the entire feature space.
A model for skill-based matchmaking-like competition formats.
A conceptual test highlighting differences in SOTA ranking behaviour.
A setting in which TrueSkill and OpenSkill need more than 600 gamess to correctly sort 8 players, proving that players can have a poor ranking experience despite playing at their level.

Installation

This project is written in Python and uses uv for environment and dependency management.

Clone the repository

git clone https://github.com/Ematrion/master-thesis-experiments


cd master-thesis-experiments

Install uv (if you don't have it already)

curl -LsSf https://astral.sh/uv/install.sh | sh

Or with wget

wget -qO- https://astral.sh/uv/install.sh | sh

On Windows use

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Create and sync the environment

uv will read both pyproject.toml and uv.lock to create a fully reproducible environment:

uv sync

Launch JupyterLab or Notebook

uv run jupyter lab

or

uv run jupyter notebook

Addendum

3.4.2 Access The pakcage is available upon request from the author. This is no longer relevant. The RSTT package is now publicly available with pip install rstt. User and API documentation are available. Currently the package is under review by the python open science initiative.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
tests		tests
utils		utils
.gitignore		.gitignore
.python-version		.python-version
4.1 Baseline.ipynb		4.1 Baseline.ipynb
4.2 Elo prediction.ipynb		4.2 Elo prediction.ipynb
4.3 The Snake Scheduler.ipynb		4.3 The Snake Scheduler.ipynb
4.4 Random Snake.ipynb		4.4 Random Snake.ipynb
4.5 Convergence in the Snake format.ipynb		4.5 Convergence in the Snake format.ipynb
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Content Summary

Remark

Extension

Research Background & Motivation

Contribution

Installation

Addendum

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Content Summary

Remark

Extension

Research Background & Motivation

Contribution

Installation

Addendum

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages