The aim of this project is to extract meaningful insights about champion performance, team efficiency, player rankings and game dynamics across 7 years of professional League of Legends play.
A relational database was built from Oracle's Elixir competitive dataset (2018-2025). The original data consisted of separate yearly CSV files, merged via Python into a single staging table and then normalized into a clean dimensional model. It is not uploaded due to its size.
The 2018-2025 window was chosen deliberately: it coincides with the franchising era across major leagues, ensuring more consistent competition structures and team stability.
The model follows a star schema with 1 fact table and 7 dimension tables:
| Table | Type | Description |
|---|---|---|
player_stats |
Fact | One row per player per game. Core analytics table. |
team_stats |
Dimension | Objective and macro stats per team per game. |
games_info |
Dimension | Game metadata: date, patch, split, playoffs, duration. |
teams |
Dimension | Team registry. |
players |
Dimension | Player registry, keyed by Oracle's Elixir player ID. |
leagues |
Dimension | League registry with region mapping. |
champions |
Dimension | Champion pool with role and subrole classification. |
bans |
Dimension | One row per ban, with ban order. |
├── schema.sql # Database creation, tables, PKs, FKs, constraints
├── load_data.sql # Data load from raw_data staging table
├── objects.sql # KDA function & 2 business views
├── eda.sql # Data quality checks, EDA, business questions
└── schema_diagram.JPG # ER diagram
- MySQL 9.7
- Python 3.x with
Pandas,SQLalchemyypyMYSQL: - Execute the lane below to install them:
pip install pandas sqlalchemy pymysql- Download all the yearly CSV files you wanna analyze from Oracle's Elixir.
- Place them in a
/datafolder in the same directory as the Python script. - Execute schema.sql to create the database and all tables for data import.
- After
Configuration(read below), run the Python script to merge CSVs and loads raw_data into MySQL. - Execute
load_data.sql. It may fail due to MYSQL limitations withINDEXandIF EXISTSinteractions. At the beginning of the file, in theCLEANING PATCHsection, it is explained how to proceed. - Execute
objects.sql. - Execute
eda.sql. If you understand the code, feel free to modify some filter criteria to analyze data your own way.
Before running the Python script:
- Make sure all your CSVs regardless of their names are in a
/datafolder within thejoin_files.pyscript. - Update the MYSQL connection string with your credentials. This are the actual lines 24-27:
# Push to MySQL as staging table
engine = create_engine("mysql+pymysql://root:@localhost:3306/lol_competitive")
df.to_sql("raw_data", engine, if_exists = "replace", index=False)
print(f"raw_data cargada en MySQL: {len(df)} filas")Change it to:
# Push to MySQL as staging table
engine = create_engine("mysql+pymysql://YOUR_USER:YOUR_PASSWORD@localhost:3306/lol_competitive")
df.to_sql("raw_data", engine, if_exists = "replace", index=False)
print(f"raw_data cargada en MySQL: {len(df)} filas")All these insights were obtained filtering by major leagues in order to only analyze the best gameplay possible.
- Major vs Minor leagues: Major league games last longer, have fewer kills, improved CSing and higher vision scores, which reflects a more structured, macro-oriented playstyle.
- TOP3 champions by role: Most of the TOP3 champs are off-role champs which implies that during the patch they were played off-role they were broken.
- Dragon soul impact: If we look the whole-time games since dragon soul was released, Hextech is the most porwerful one. However, when we restrict the time lapse to when Hextech and Chemtech dragons were released, to make game amount comparable between souls, the most powerful one is Cloud Soul. Regardless of the time window, Chemtech soul is the worst one eventhough it has 88%+ winrate.
- Gold advantage by role: A gold lead at 15 minutes translates to victory most consistently for Mid laners, followed by Junglers, ADC's and lastly by Toplaners. Support were excluded from this analysis due to their low gold income compared with the other 4 roles.
- Blue side advantage: Blue side maintains a consistent winrate edge across all years even when restricting games to the same region to avoid region-gap.
Limitations
- Team rebranding (e.g. SK Telecom T1 → T1) is treated as separate entities due to source data constraints. No persistent team ID was available in the dataset.
- Dragon soul wasn't recorded on original data, so it was inferred using dragon count and type.
- League of Legends is a complex game where this dataset fails to capture the big difference between teams and players. More complex data as mouse movement, camera management, map awarness or team communication would allow a better analysis.
Future Work
- Extend and upgrade the player ranking metric to all 5 roles.
- Apply logistic regression to model win probability from early-game stats.
- Analyse the data recomended in previous section to extract game changers to players training.