Production-ready NBA player props prediction and grading system
A comprehensive data pipeline that:
- Scrapes NBA game data, player stats, and betting lines from multiple sources
- Processes raw data into analytics features (1000+ metrics per player/game)
- Predicts player prop outcomes using 7 ML systems (including ensemble models)
- Grades predictions against actual outcomes with 70-90% coverage
- Monitors system health with Grafana dashboards and automated alerts
Current Status: All 6 phases operational, 614 predictions generated daily across 7 systems
| I need to... | Go here |
|---|---|
| Get oriented | docs/00-start-here/README.md |
| Check system health | docs/STATUS-DASHBOARD.md |
| Daily operations | docs/00-start-here/DAILY-SESSION-START.md |
| Recent changes | docs/09-handoff/ (latest session handoffs) |
| System architecture | docs/01-architecture/quick-reference.md |
| Troubleshooting | docs/02-operations/troubleshooting-matrix.md |
All documentation lives in docs/:
docs/
βββ 00-start-here/ β Start here for navigation
βββ 01-architecture/ System design & decisions
βββ 02-operations/ Daily ops, troubleshooting
βββ 03-phases/ 6 pipeline phases (orchestration β publishing)
βββ 04-deployment/ Deployment guides & status
βββ 05-development/ How to build (patterns, testing)
βββ 06-reference/ Quick lookups (processor cards, data flow)
βββ 07-monitoring/ Grafana, alerts, observability
βββ 08-projects/ Active work & completed projects
βββ 09-handoff/ Session handoffs & status updates
Documentation Index: docs/00-PROJECT-DOCUMENTATION-INDEX.md
Phase 1: Orchestration β Daily scheduling & coordination
Phase 2: Raw Data β Scrape from NBA.com, BallDontLie, OddsAPI
Phase 3: Analytics β 1000+ features per player/game
Phase 4: Precompute β ML feature store, zone analysis
Phase 5: Predictions β 7 systems (XGBoost, CatBoost, Ensembles)
Phase 6: Publishing β API endpoints, dashboards
Tech Stack:
- Compute: Google Cloud Run, Cloud Functions, Cloud Scheduler
- Storage: BigQuery (10+ datasets), Cloud Storage
- Orchestration: Firestore-based distributed locks
- ML: XGBoost, CatBoost, custom ensemble models
- Monitoring: Cloud Monitoring, Grafana, custom alerting
Last Updated: 2026-01-19 (Session 112)
| Service | Status | Last Deploy | Notes |
|---|---|---|---|
| Prediction Worker | β Operational | 2026-01-19 07:55 UTC | All 7 systems working |
| Prediction Coordinator | β Operational | 2026-01-19 06:07 UTC | Fixed deployment script |
| Analytics Processors | β Operational | 2026-01-19 06:23 UTC | Session 107 metrics deployed |
| Grading Function | β Operational | Phase 5b | 70-90% coverage |
| Cloud Schedulers | β Enabled | Multiple | Daily triggers working |
| System | Status | Performance | Volume (Jan 19) |
|---|---|---|---|
| Moving Average | β | Baseline | 91 predictions |
| Zone Matchup V1 | β | Matchup analysis | 91 predictions |
| Similarity Balanced V1 | β | Historical | 69 predictions |
| XGBoost V1 | β | ML baseline | 91 predictions |
| CatBoost V8 | β | 3.40 MAE (champion) | 91 predictions |
| Ensemble V1 | β | Weighted | 91 predictions |
| Ensemble V1.1 | β | Performance-based (NEW) | 91 predictions |
Total: 614 predictions per day across all systems
Recent Fix (Session 112): Fixed 37-hour outage caused by missing google-cloud-firestore dependency
- β Added comprehensive spot check system for data accuracy verification
- β 6 automated checks: rolling averages, usage rate, minutes parsing, ML features, cache, points arithmetic
- β Integrated into daily validation (5 spot checks, 95% accuracy threshold)
- β Found real data quality issues: Mo Bamba (28% rolling avg error), usage rate precision issues
- β Documentation: 599-line usage guide + troubleshooting
- π Full guide | Handoff
- β Fixed 13 critical security vulnerabilities (97+ individual issues)
- β SQL injection: 47 queries converted to parameterized format
- β Authentication: Added API key validation to analytics service
- β Removed RCE risks: Fixed eval() and pickle deserialization
- β Input validation: New validation library for all user inputs
- π Security log
- β Fixed prediction pipeline outage (37+ hours down)
- β
Root cause: Missing
google-cloud-firestore==2.14.0dependency - β Result: All 7 systems operational, 614 predictions generated
- π Full handoff
- β Deployed 7 Session 107 metrics (variance + star tracking)
- β Fixed analytics processor schema evolution
- β Investigated prediction failures (fixed in Session 112)
- β Deployed Ensemble V1.1 with performance-based weights
- β Added CatBoost V8 to ensemble (45% weight)
- β Expected MAE improvement: 5.41 β 4.9-5.1 (6-9% better)
See full timeline: docs/STATUS-DASHBOARD.md
- Python 3.11+
- Google Cloud SDK
- BigQuery access
- Service account with appropriate permissions
Required (All Services):
GCP_PROJECT_ID- GCP project identifier (e.g.,nba-props-platform)ENVIRONMENT- Environment name (dev,staging,prod)
Security (Week 0 - Required as of 2026-01-19):
VALID_API_KEYS- Comma-separated API keys for analytics service authenticationBETTINGPROS_API_KEY- BettingPros API key (moved from hardcoded)SENTRY_DSN- Sentry monitoring DSN (moved from hardcoded)
Optional:
SLACK_WEBHOOK_URL- Slack notificationsGOOGLE_APPLICATION_CREDENTIALS- Path to service account key file
See deployment guide for configuration details.
# Check system health
./monitoring/check-system-health.sh
# Run data accuracy spot checks
python scripts/spot_check_data_accuracy.py --samples 10
# Validate tonight's data
python scripts/validate_tonight_data.py
# Deploy prediction worker
bash bin/predictions/deploy/deploy_prediction_worker.sh
# Deploy analytics processors
bash bin/analytics/deploy/deploy_analytics_processors.sh
# Trigger manual predictions
curl -X POST "https://prediction-coordinator-[PROJECT].run.app/start" \
-H "Authorization: Bearer $(gcloud auth print-identity-token)" \
-H "Content-Type: application/json" \
-d '{"force": true, "game_date": "2026-01-19"}'βββ bin/ # Deployment scripts
βββ data_processors/ # Analytics & precompute processors
βββ predictions/ # ML prediction systems
β βββ coordinator/ # Batch coordinator
β βββ worker/ # Prediction worker (7 systems)
βββ scrapers/ # Raw data scrapers
βββ shared/ # Shared utilities
βββ monitoring/ # Health checks & alerts
βββ schemas/ # BigQuery schemas
βββ docs/ # Documentation (main resource)
- Check recent handoffs:
docs/09-handoff/ - Review troubleshooting guide:
docs/02-operations/troubleshooting-matrix.md - Check system status:
docs/STATUS-DASHBOARD.md
Starting a new Claude Code session?
- Read
docs/09-handoff/for latest status - Review
docs/00-start-here/DAILY-SESSION-START.md - Check
docs/STATUS-DASHBOARD.mdfor current health
- Prediction Coverage: 150+ players per day
- Grading Coverage: 70-90%
- Best Model: CatBoost V8 (3.40 MAE)
- Systems: 7 concurrent prediction systems
- Daily Volume: 614 predictions
- Uptime: 99%+ (after Session 112 fix)
Proprietary - All Rights Reserved
Project Contact: NBA Props Platform Team
GCP Project: nba-props-platform
Region: us-west2 (Los Angeles)
Documentation: docs/
Status: Ready to execute after Week 0 validation Timeline: 4 weeks, 42 hours total Goal: 99.7% reliability + $170/month savings + 5x performance
- π Week 1-4 Master Plan - Complete roadmap
- π― Strategic Plan - Full strategy & ROI
- π Week 1 Plan - Day-by-day execution
- π§ Feature Flags - Safe rollout config
- π Progress Tracker - Daily updates
- π° BigQuery optimization: -$60-90/month savings
- π§ Critical scalability fixes
- π‘οΈ Idempotency & data integrity
- π Structured logging & metrics
Next: Validate Quick Win #1 tomorrow (Jan 21, 8:30 AM ET), then begin Week 1!