Machine learning model to predict winners of first round of the 2022 nba playoffs.
Project Summary: Predicting NBA Playoff Winners Using Machine Learning
Objective:
Predict the outcome of NBA playoff games using team performance metrics from the regular season (2022-2023).
os_helper.py - untitly module to move files around, see contents or zipped files, unzip bulk files.
API - API and directions to updating data or downloading data from the three sources. https://github.com/shufinskiy/nba_apiv3/tree/master
csv - file that contains all the relevant dataset
This dataset comprises NBA play-by-play data and shot details spanning from the 1996/1997 to 2022/23 seasons, with the latest update on 2023-07-11 incorporating playoff data for all seasons. The data is sourced from three platforms:
- stats.nba.com
- data.nba.com
- pbpstats.com
Each of the datasets have a differnt start season and minor inconsistencies toughout the datasets. For this project we will only be using 2021-2022 , and 2022,2023 seanons and the first round of the playoff.
Stats not available or missing values were updated by api.
| Field | Description |
|---|---|
| GAME_ID | Game ID |
| EVENTNUM | Sequence number of the event |
| EVENTMSGTYPE | Type of event (e.g., shot, rebound, foul) |
| EVENTMSGACTIONTYPE | Subtype or specific action of the event |
| PERIOD | Quarter number |
| WCTIMESTRING | Wall-clock time when the event occurred |
| PCTIMESTRING | Time remaining in the quarter |
| HOMEDESCRIPTION | Description of the home team action |
| NEUTRALDESCRIPTION | Description of a neutral action (e.g., start of period) |
| VISITORDESCRIPTION | Description of the visitor team action |
| SCORE | Current game score at the time of the event |
| SCOREMARGIN | Score margin at the time of the event |
| PERSON1TYPE | Type identifier for the first person involved |
| PLAYER1_ID | ID of the player who performed the main action |
| PLAYER1_NAME | Name of the player who performed the main action |
| PLAYER1_TEAM_ID | Team ID of the player who performed the main action |
| PLAYER1_TEAM_CITY | Team city of the player who performed the main action |
| PLAYER1_TEAM_NICKNAME | Team nickname of the player who performed the main action |
| PLAYER1_TEAM_ABBREVIATION | Team abbreviation of the player who performed the main action |
| PERSON2TYPE | Type identifier for the second person involved |
| PLAYER2_ID | ID of the player who performed a side action |
| PLAYER2_NAME | Name of the player who performed a side action |
| PLAYER2_TEAM_ID | Team ID of the player who performed a side action |
| PLAYER2_TEAM_CITY | Team city of the player who performed a side action |
| PLAYER2_TEAM_NICKNAME | Team nickname of the player who performed a side action |
| PLAYER2_TEAM_ABBREVIATION | Team abbreviation of the player who performed a side action |
| PERSON3TYPE | Type identifier for the third person involved |
| PLAYER3_ID | ID of the player who performed a second side action |
| PLAYER3_NAME | Name of the player who performed a second side action |
| PLAYER3_TEAM_ID | Team ID of the player who performed a second side action |
| PLAYER3_TEAM_CITY | Team city of the player who performed a second side action |
| PLAYER3_TEAM_NICKNAME | Team nickname of the player who performed a second side action |
| PLAYER3_TEAM_ABBREVIATION | Team abbreviation of the player who performed a second side action |
| VIDEO_AVAILABLE_FLAG | Indicates if a video of the event is available |
| Field | Description |
|---|---|
| evt | Sequence number of the event in the game |
| wallclk | Wall clock time when the event occurred |
| cl | Time until end of the quarter |
| de | Description of the action |
| locX | Coordinate of shot along the width of the court relative to its central axis |
| locY | Coordinate of shot along the length of the court relative to its central axis |
| opt1 | Points |
| opt2 | Additional option (context-dependent) |
| opt3 | Additional option (context-dependent) |
| opt4 | Additional option (context-dependent) |
| tid | Team ID |
| pid | Player ID |
| hs | Home team score |
| vs | Visitor team score |
| epid | Extra player ID (context-dependent) |
| oftid | Team ID in offense |
| ord | Order number (context-dependent) |
| pts | Points scored in the event |
| PERIOD | Quarter number |
| GAME_ID | Game ID |
| Field | Description |
|---|---|
| ENDTIME | Time until end of the quarter at time of end of possession |
| EVENTS | Description of all actions in possession |
| FG2A | Count of 2PT Field Goal attempts in possession |
| FG2M | Count of 2PT Field Goals made in possession |
| FG3A | Count of 3PT Field Goal attempts in possession |
| FG3M | Count of 3PT Field Goals made in possession |
| GAMEDATE | Game date |
| GAMEID | Game ID |
| NONSHOOTINGFOULSTHATRESULTEDINFTS | Non-shooting fouls that resulted in free throws |
| OFFENSIVEREBOUNDS | Count of offensive rebounds in possession |
| OPPONENT | Abbreviation of the team in defense |
| PERIOD | Quarter number |
| SHOOTINGFOULSDRAWN | Shooting fouls drawn |
| STARTSCOREDIFFERENTIAL | Difference in score at the start of possession |
| STARTTIME | Time until end of the quarter at time of start of possession |
| STARTTYPE | Type of start (context-dependent) |
| TURNOVERS | Turnovers |
| DESCRIPTION | Description of action |
| URL | Link to video |
| Field | Description |
|---|---|
| evt | Sequence number of the event in the game |
| wallclk | Wall clock time when the event occurred |
| cl | Time until end of the quarter |
| de | Description of the action |
| locX | Coordinate of shot along the width of the court relative to its central axis |
| locY | Coordinate of shot along the length of the court relative to its central axis |
| opt1 | Points |
| opt2 | Additional option (context-dependent) |
| mtype | Type of the move (context-dependent) |
| etype | Type of the event (context-dependent) |
| opid | Opponent player ID (context-dependent) |
| tid | Team ID |
| pid | Player ID |
| hs | Home team score |
| vs | Visitor team score |
| epid | Extra player ID (context-dependent) |
| oftid | Team ID in offense |
| ord | Order number (context-dependent) |
| PERIOD | Quarter number |
| GAME_ID | Game ID |
| Field | Description |
|---|---|
| GRID_TYPE | Shot Chart Detail |
| GAME_ID | Game ID |
| GAME_EVENT_ID | Sequence number of the event in the game |
| PLAYER_ID | ID of player who made the shot |
| PLAYER_NAME | Name of player who made the shot |
| TEAM_ID | Team ID of player who made the shot |
| TEAM_NAME | Team name of player who made the shot |
| PERIOD | Quarter number |
| MINUTES_REMAINING | Minutes remaining until end of the quarter |
| SECONDS_REMAINING | Seconds remaining until end of the quarter |
| EVENT_TYPE | Made or Missed shot |
| ACTION_TYPE | Type of shot |
| SHOT_TYPE | 2PT or 3PT shot |
| SHOT_ZONE_BASIC | General area on the court where the shot was taken |
| SHOT_ZONE_AREA | Specific area on the court where the shot was taken |
| SHOT_ZONE_RANGE | Distance range of the shot |
| SHOT_DISTANCE | Distance to the rim |
| LOC_X | Coordinate of the shot along the width of the court relative to its central axis |
| LOC_Y | Coordinate of the shot along the length of the court relative to its central axis |
| SHOT_ATTEMPTED_FLAG | Shot execution flag (always 1) |
| SHOT_MADE_FLAG | Shot made flag (0 or 1) |
| GAME_DATE | Game date |
| HTM | Abbreviation of home team |
| VTM | Abbreviation of away team |
| Team Name | team_id |
|---|---|
| Atlanta Hawks | 1610612737 |
| Boston Celtics | 1610612738 |
| Brooklyn Nets | 1610612751 |
| Charlotte Hornets | 1610612766 |
| Chicago Bulls | 1610612741 |
| Cleveland Cavaliers | 1610612739 |
| Dallas Mavericks | 1610612742 |
| Denver Nuggets | 1610612743 |
| Detroit Pistons | 1610612765 |
| Golden State Warriors | 1610612744 |
| Houston Rockets | 1610612745 |
| Indiana Pacers | 1610612754 |
| LA Clippers | 1610612746 |
| Los Angeles Lakers | 1610612747 |
| Memphis Grizzlies | 1610612763 |
| Miami Heat | 1610612748 |
| Milwaukee Bucks | 1610612749 |
| Minnesota Timberwolves | 1610612750 |
| New Orleans Pelicans | 1610612740 |
| New York Knicks | 1610612752 |
| Oklahoma City Thunder | 1610612760 |
| Orlando Magic | 1610612753 |
| Philadelphia 76ers | 1610612755 |
| Phoenix Suns | 1610612756 |
| Portland Trail Blazers | 1610612757 |
| Sacramento Kings | 1610612758 |
| San Antonio Spurs | 1610612759 |
| Toronto Raptors | 1610612761 |
| Utah Jazz | 1610612762 |
| Washington Wizards | 1610612764 |
Key Features:
- Recent Performance Metrics: Investigated the significance of a team's performance in the most recent games (
home_last5,vlast,hlast) as predictors for upcoming playoff games. - Statistical Analysis: Conducted exploratory data analysis (EDA) to understand the distributions, correlations, and potential predictors from the dataset.
- Feature Engineering: Developed features based on the win-loss records from the past games to encapsulate a team's recent momentum.
Machine Learning Workflow:
- Data Preprocessing: Handled missing values, outliers, and transformed data into a format suitable for machine learning.
- Model Selection: Chose the Random Forest Classifier due to its ability to handle large datasets with higher dimensionality and its robustness against overfitting.
- Model Training & Testing: Utilized 80% of the data for training and validated the model on the remaining 20%.
- Results: Achieved an accuracy score of 54% on the test data, indicating a strong potential for the model to predict playoff game outcomes based on regular season performance metrics.