ballnet uses a GATv2TCN (Graph Attention Network + Temporal Convolutional Network) model to predict NBA statlines. It provides point estimates and calibrated probabilities for any player and statistic (of those in the outputs, which includes points, assists, rebounds, turnovers, steals, and blocks)
To switch to a new model, change one line in config.py:
ACTIVE_MODEL = "v5" # was "v3"Training and evaluation scripts import config.py and will automatically use weights
from models/v5/.
clean/
├── README.md # This file
├── game_embeddings.md # ← Game outcome prediction reference (optional)
├── config.py # ← Single source of truth for all paths + hyperparams
├── predictor.py # Core predictor class (loads artifacts + runs inference)
├── update.py # ← Data refresh / rebuild tensors (optional)
│
├── architecture/ # Model architecture source
│ ├── gatv2tcn.py # GATv2TCN implementation
│ └── tcn.py # TCN block implementation
│
├── data/ # Runtime data (gitignored: *.pkl, *.npy, *.parquet)
│ ├── raw_boxscores.parquet # Full NBA game log (built by 01_fetch_data.py)
│ ├── game_home_teams.parquet # ← {GAME_ID: home_team_abbr} from LeagueGameFinder (cached)
│ ├── X_seq.pkl # (Days, Players, 13) forward-filled stat tensor
│ ├── X_raw.pkl # (Days, Players, 13) raw sparse stat tensor (no fill)
│ ├── G_seq.pkl # List of networkx graphs, one per game-day
│ ├── player_ids.pkl # Ordered list of player IDs (axis 1 of X_seq)
│ ├── game_dates.pkl # Ordered list of date strings (axis 0 of X_seq)
│ ├── day_seasons.pkl # Season label per day (e.g. "2024-25")
│ ├── team_temporal.pkl # (Days, Players, n_teams) per-day team one-hot arrays
│ ├── pos_temporal.pkl # (Days, Players, 3) per-day position arrays
│ ├── n_teams.pkl # int — number of unique teams
│ ├── player_id2team.pkl # {player_id: "LAL"} — most recent team abbreviation
│ ├── player_id2position.pkl # {player_id: [G,F,C] binary array}
│ ├── mu_per_day.npy # Causal sliding-window normalization means (Days, 1, 13)
│ └── sd_per_day.npy # Causal sliding-window normalization std devs (Days, 1, 13)
│
├── models/ # Trained model weights
│ ├── v5/ # Current active model
│ │ ├── model.pth # GATv2TCN state dict
│ │ ├── team_emb.pth # Linear(n_teams, 2)
│ │ ├── pos_emb.pth # Linear(3, 2)
│ │ └── conformal_residuals.pkl # Calibration residuals
│ └── ... # v1-v4
│
├── scripts/ # Setup and training scripts
│ ├── 01_fetch_data.py # Historical NBA boxscore scrape
│ ├── 02_build_tensors.py # Build tensors from raw data
│ ├── 03_train.py # Training script (MPS/CUDA/Colab)
│ ├── 04_calibrate.py # Compute conformal_residuals.pkl
│ └── prepare_colab.py # Package for Colab training
│
└── upload/ # Google Colab upload bundle
├── train.ipynb # Colab bootstrap notebook
├── config.py # Colab path shim
├── scripts/03_train.py # Training script copy
├── gatv2tcn.py # Model source
├── tcn.py # TCN source
└── data/ # Required data files
# 1. Fetch all historical NBA data (takes hours, uses kamikaze restart protocol)
python scripts/01_fetch_data.py
# 2. Build all tensor artifacts from raw data
python scripts/02_build_tensors.py
# 3a. Train on Google Colab (recommended — ~15-30 min on T4/A100)
python scripts/prepare_colab.py
# → Upload clean/upload/ to Google Drive root
# → Open upload/train.ipynb in Colab → Runtime → Run all
# → Download clean_download/ → copy .pth files to clean/models/v2/
# 3b. OR train locally on MPS/CUDA (expect 1-2+ hours)
python scripts/03_train.py
# 4. Calibrate the model (computes conformal residuals)
python scripts/04_calibrate.py# Fetch any new games and update all tensors
python update.pypython scripts/02_build_tensors.py # regenerate tensors
python scripts/prepare_colab.py # rebuild upload/ bundle with fresh data
# ... train on Colab, copy weights ...
python scripts/04_calibrate.py # recompute residuals for new weightsThe single import every other script depends on.
ACTIVE_MODEL = "v5" # ← Change this to switch models everywhere
ROOT = Path(__file__).resolve().parent
DATA_DIR = ROOT / "data"
MODEL_DIR = ROOT / "models" / ACTIVE_MODEL
GATV2_SRC = ROOT / "architecture"
FEATURE_COLS = ['PTS','AST','REB','TO','STL','BLK','PLUS_MINUS',
'TCHS','PASS','DIST','PACE','USG_PCT','TS_PCT'] # 13 features
PREDICTION_COLS = ['PTS','AST','REB','TO','STL','BLK'] # 6 output stats
PRED_INDICES = [0,1,2,3,4,5] # indices of PREDICTION_COLS within FEATURE_COLS
SEQ_LENGTH = 10 # days of history used as input windowThe forward-filled stat tensor. Zero rows (non-playing days) are filled forward with each player's last known stats. This is the version used as model input.
Gotcha:
X_seq.pklis stored un-normalized (raw stat values like PTS=24.0). Normalization happens in memory only insidepredictor.pyviamu_per_day/sd_per_day. Never write the normalized version back to disk — it would cause double-normalization on the next load.
Same as X_seq but without forward-fill (~84% zeros). Used to detect which
players actually played on a given day (non-zero rows).
Maps player ID → team abbreviation string (e.g. "LAL", "BOS").
Generated by 02_build_tensors.py from the most recent team per player.
Gotcha: This stores strings not integers.
03_train.pyand04_calibrate.pyboth apply an alphabetical string→int encoding (sorted(all_teams)) at runtime to get a consistentn_teams=30integer mapping.
Maps player ID → 3-element binary position vector (e.g. [1, 0, 0] for Forward).
Generated by 02_build_tensors.py using nba_api static player data.
Causal sliding window normalization statistics. To prevent lookahead bias in backtesting, each day leverages an expanding trailing window of up to 150 active days to compute rolling means and standard deviations using purely historical data. sd values < 1e-6 are treated as 1.0.
GATv2TCN(
in_channels = 17, # 13 stats + 2 team_emb + 2 pos_emb
out_channels = 6, # PTS, AST, REB, TO, STL, BLK
len_input = 10, # SEQ_LENGTH
len_output = 1,
temporal_filter = 64,
out_gatv2conv = 32,
dropout_tcn = 0.25,
dropout_gatv2conv = 0.5,
head_gatv2conv = 4,
)Gotcha: The correct kwarg names are
len_input,len_output,out_gatv2conv,dropout_tcn,dropout_gatv2conv,head_gatv2conv. Do NOT useseq_lengthorheads— those don't exist in thegatv2tcn.pyconstructor and will raiseTypeError: unexpected keyword argument.
Embedding layers:
team_emb = nn.Linear(n_teams, 2) # bias=True (default)
pos_emb = nn.Linear(3, 2) # bias=True (default)Gotcha: Always use default
bias=Truewhen creating these layers to load the saved.pthfiles, which include a bias key. Usingbias=FalsecausesRuntimeError: Unexpected key(s) in state_dict: "bias".
Input construction:
x_t = cat([X_norm[day, :, :], team_emb(team_one_hot), pos_emb(pos_vec)], dim=1)
# shape: (P=805, 17)
# stacked over SEQ_LENGTH=10 days → (1, P, 17, 10)The model is small (~77K parameters, 76KB .pth). However, the full training loop
(300 epochs × 20-day batch × 148 val days) takes 1.5+ hours on Apple MPS but
only 15-30 minutes on Colab T4/A100.
python scripts/prepare_colab.pyBuilds upload/ (~44 MB):
scripts/03_train.py— exact copy of the canonical training script (parity guarantee)config.py— auto-generated Colab path shim so03_train.pyresolves imports correctlytrain.ipynb— minimal 4-cell bootstrap notebook that runs03_train.pyvia subprocessgatv2tcn.py+tcn.py— model sourcedata/— the 5 required pkl files
Parity guarantee:
prepare_colab.pycopiesscripts/03_train.pydirectly into the upload bundle rather than duplicating its logic. This means any changes to03_train.py(hyperparameters, normalization, loss function, etc.) are automatically reflected in Colab training after re-runningprepare_colab.py. Never edit training logic in the notebook or inprepare_colab.pydirectly — always edit03_train.py.
After training completes, Colab saves output to clean_download/ in your Drive:
model.pth,team_emb.pth,pos_emb.pth→ copy toclean/models/<ACTIVE_MODEL>/- Re-run
04_calibrate.pyafter copying new weights
Gotcha: Run
prepare_colab.pyfresh each time you retrain with updated data — it copies the current pkl files and03_train.py, so stale uploads will train on stale data with stale code.
Training uses summed MSE (not averaged) over all days in the batch/val set:
loss = sum(mse_per_day) # NOT mean(mse_per_day)This means raw loss values scale linearly with the number of days. Our dataset has ~7× more val days (147) than the original Colab notebook (20), so our val loss will be ~7× larger by construction. This is expected and correct. The per-day loss converges to the same ~0.032 as the original training. Divide the reported val loss by ~147 to compare.
The progress bar shows all four quantities every epoch:
Training: 35% | 105/300 [train=38.4, val=21.3, best=18.9, saved=★]
★ appears when a new best validation loss is saved.
The core predictor class used by inference and evaluation scripts.
p = GATv2Predictor()
p.setup() # loads all artifacts from data/ and models/<ACTIVE_MODEL>/setup() loads: X_seq, G_seq, player_ids, game_dates, mu_per_day,
sd_per_day, team_temporal, pos_temporal, n_teams, conformal_residuals.pkl,
and all three .pth weight files.
Conformal residuals (tiered format):
After loading conformal_residuals.pkl, the predictor exposes:
self.val_residuals—dict[str, list]keyed as"PTS_low","PTS_mid","PTS_high", etc.self.val_bias—dict[str, float]— per-stat mean bias computed at calibration time
Key public helpers:
p.get_residual_std("PTS") # mid-tier std — used by quantile_test.py (and any tiered SD filter)# ONE forward pass for all 805 players, cached per day
pred_matrix = p.predict_all_for_day(day_idx) # → (P, 6) raw stat units
mc_matrix = p.predict_all_mc_for_day(day_idx) # → (20, P, 6) MC-dropout samplesBoth methods are memoized by day_idx. Calling them twice for the same day is
a free dict lookup. This is ~150× faster than the per-player approach for large batches.
Use _get_day_idx_for_date("2025-02-15") to convert a date string to an index.
p.predict_point_estimate(player_id, "PTS")
p.predict_conformal_probability(player_id, "PTS", 22.5)These call the day-level batched methods internally so they also benefit from caching if called multiple times for the same day.
p.clear_day_cache() # frees _day_cache and _mc_cache dicts if RAM is tightReads data/raw_boxscores.parquet and produces all artifacts in data/.
| File | Description |
|---|---|
X_seq.pkl |
Forward-filled stat tensor (Days, Players, 13) |
X_raw.pkl |
Raw sparse stat tensor (no fill) |
G_seq.pkl |
List of networkx graphs |
player_ids.pkl |
Ordered player ID list |
game_dates.pkl |
Ordered date string list |
player_id2team.pkl |
{pid: "LAL"} — most recent team |
player_id2position.pkl |
{pid: [G,F,C]} — position binary vector |
mu_per_day.npy / sd_per_day.npy |
Per-season normalization stats |
Gotcha:
player_id2teamstores team abbreviation strings ("LAL") not integers.03_train.pyand04_calibrate.pyhandle this with an alphabetical sort encoding at runtime.
Before rebuilding, the script compares the current tensor's player count to the incoming data's player count. If counts differ significantly, retraining is required — the model's graph structure is keyed to specific player indices.
# player_id2team.pkl → string→int encoding
all_teams = sorted(set(team_str_values)) # alphabetical, stable
team_str2int = {t: i for i, t in enumerate(all_teams)}This handles both string abbreviations (our pipeline) and integer team IDs (original Colab pipeline) automatically.
Loss is computed only on players who appear in the target day's graph (i.e., players who actually played that game-day):
mask = G_out[i].unique() # node indices in next day's edge tensor
loss = mse_loss(pred[mask], y[mask])Players who didn't play are forward-filled in y but excluded from the loss.
This prevents the model from wasting capacity learning the fill-forward function.
Loads model weights and runs inference on the validation set (days 50%–75%
of the dataset, matching 03_train.py) to compute signed residuals against the
forward-shifted target day (
residual = actual[t+1] - predicted[t]
Residuals are stratified by predicted value magnitude into three tiers per stat (low / mid / high) and mean-centered to remove systematic model bias before saving.
Output format (conformal_residuals.pkl):
{
"bias": {"PTS": -0.247, "AST": -0.326, ...}, # raw mean residual per stat
"residuals": {
"PTS_low": [...], # mean-centered, for predictions < 12
"PTS_mid": [...], # mean-centered, for predictions 12–22
"PTS_high": [...], # mean-centered, for predictions ≥ 22
"AST_low": [...],
... # all 6 stats × up to 3 tiers
}
}Tier boundaries:
| Stat | Low | Mid | High |
|---|---|---|---|
| PTS | <12 | 12–22 | ≥22 |
| AST | <4 | 4–8 | ≥8 |
| REB | <4 | 4–8 | ≥8 |
| STL | <1.5 | 1.5–3 | ≥3 |
| BLK | <1 | 1–2.5 | ≥2.5 |
| TO | <2 | 2–4 | ≥4 |
Tiers with fewer than 30 samples fall back to the mid tier (e.g. STL_high and TO_high typically have 0 samples — the model rarely predicts these stats that high).
Always re-run
04_calibrate.pyafter copying new weights from Colab.
Also re-run if tier boundaries are adjusted — the saved residuals depend on which threshold was used to bin predictions at calibration time.
| # | Issue | Details |
|---|---|---|
| 1 | Double normalization | X_seq.pkl is raw. Normalize in-memory only. Never write X_seq_norm to disk. |
| 2 | Wrong GATv2TCN kwargs | Use len_input, len_output, out_gatv2conv, dropout_tcn, dropout_gatv2conv, head_gatv2conv. Never seq_length or heads. |
| 3 | bias=True on embeddings | nn.Linear(n_teams, 2) default is bias=True. Saved .pth files include bias. Never use bias=False. |
| 4 | team strings not ints | player_id2team.pkl stores "LAL" strings. Use alphabetical sort encoding before computing n_teams. |
| 5 | Loss scale vs Colab | Our val loss is ~7× larger by construction (147 val days vs 20). Compare per-day loss (divide by ~147). |
| 6 | CWD sensitivity | Always run scripts from clean/ or via absolute path. config.py imports fail if clean/ is not importable. |
| 7 | Upload/ freshness | Re-run prepare_colab.py every time you want to retrain with updated data. It copies fresh pkl files AND 03_train.py. |
| 8 | Colab/local parity | Never duplicate training logic in prepare_colab.py or the notebook. All training code lives in 03_train.py. Edit 03_train.py → re-run prepare_colab.py → upload. |
| 9 | Double-denormalization | The model natively outputs raw stat predictions. Never multiply by sd_per_day or add mu in predictor.py or 04_calibrate.py. That inflates predictions (24.5 PTS → 260 PTS). |
| 10 | conformal_residuals.pkl format | The file uses a tiered format. Use predictor._get_residuals_for(stat, pred_val) or predictor.get_residual_std(stat) instead. |
| 11 | LOG_TRANSFORM semantics | predictor.py and 04_calibrate.py must have LOG_TRANSFORM set correctly based on the active model's training configuration. |
| 12 | Colab train.ipynb is NOT the canonical script | Training logic lives in scripts/03_train.py only. Re-run prepare_colab.py before every Colab upload so the bundle includes the current 03_train.py. |
torch torchvision # model
torch-geometric # GATv2Conv
networkx # graph construction
numpy pandas pyarrow # data
nba_api # schedule data (ScoreboardV3)
scikit-learn statsmodels # utilities & metrics
scipy # conformal probability (norm.cdf)
tqdm seaborn patsy xgboost # visualization and analysis
matplotlib # plotting
The model source (gatv2tcn.py, tcn.py) lives at:
architecture/
This path is referenced in config.py as GATV2_SRC and used by 03_train.py,
04_calibrate.py, and prepare_colab.py (which copies the files into upload/).