Skip to content

Engineering Update: PostgreSQL Constraint Coverage + Pipeline Fixes (March 23) #156

@kevsilk597

Description

@kevsilk597

Engineering Update - Last 18 Hours

1. The Core Problem

After the hardening plan was approved and executed, the pipeline hit a cascade of failures that all traced back to one systemic issue: the SQLite-to-PostgreSQL migration transferred all the data but created zero UNIQUE constraints on any table. In SQLite, ON CONFLICT works loosely. In PostgreSQL, every ON CONFLICT(col1, col2) clause requires an actual UNIQUE index on those exact columns or the INSERT silently fails. This meant every single upsert in the entire pipeline was broken - games were not being inserted, stats were not being inserted, NHL data was not being inserted. The pipeline would "complete" with zero errors but store zero data.


2. What Was Fixed (Chronological)

Round 1: Initial PostgreSQL Compatibility (e57edb5)

  • gtd/storage/db.py: Added automatic ROUND((expr)::numeric, n) casting in the SQL converter. PostgreSQL does not allow ROUND(double precision, integer) - it requires a numeric type. This was failing the player_tiers step.
  • gtd/analysis/anomaly_detector.py: Added entity_name to a GROUP BY clause. PostgreSQL is stricter than SQLite about requiring all non-aggregated SELECT columns in GROUP BY.
  • gtd/storage/database.py: Expanded validate_schema() from 3 constraint specs to 25, covering tables used by the analysis pipeline (clutch, referee_history, b2b, rest_advantage, etc.).

Round 2: Resilient DDL (f323cdf)

  • gtd/storage/database.py: Added a _safe_ddl() helper function. In PostgreSQL, if a single DDL statement fails inside a transaction, the entire transaction aborts and all subsequent statements fail with "current transaction is aborted." The old code had ALTER TABLE injuries ADD COLUMN sport ... which would fail because the column already existed, and this cascaded to abort validate_schema() entirely. Now every DDL statement runs in its own mini-transaction with automatic rollback on failure.
  • Changed all ALTER TABLE ADD COLUMN statements to use IF NOT EXISTS.

Round 3: player_baselines constraint (5e14f24)

  • Corrected the validate_schema() entry for player_baselines from UNIQUE(player_name) to UNIQUE(player_id). The anomalies step uses ON CONFLICT(player_id), not player_name. Manually deduplicated the table on the server before adding the correct constraint.

Round 4: Event monitor and card counts (f7c49ba)

  • gtd/realtime/event_monitor.py: Fixed a SQL query that selected home_team alongside SUM() and COUNT(*) without including home_team in the GROUP BY. SQLite allows this; PostgreSQL does not. This was causing [RT] Monitor error every 15 minutes.
  • server.py: The _check_per_sport_card_counts() function queried WHERE sport = 'NBA' on the intelligence_cards table, but that table has no sport column. Replaced with a simple total card count check.

Round 5: Full Constraint Coverage + Step Ordering (186047b)

This is where the real investigation happened. After all the previous fixes, the pipeline was still producing 0 games. Deep log analysis and code tracing revealed four root causes:

(a) Deployed cli.py was massively outdated. The server had a 1,608-line version of gtd/cli.py. The current version is 3,915 lines. The old version was missing STEP_MAP entries for: nhl_stats_backfill, nhl_skater_baselines, nhl_goalie_baselines, nhl_power_play, nhl_penalty_kill, detect_cards, career_totals, all nfl_* sub-steps, all mlb_* sub-steps. Every missing step was logged as "Unknown step" and silently skipped. The full gtd/ directory was resynced to the server.

(b) Core tables had zero UNIQUE constraints. Comprehensive audit scanning every Python file for ON CONFLICT(...) patterns and cross-referencing with pg_constraint found 76 missing UNIQUE constraints across all tables. The critical ones:

  • games needed UNIQUE(game_date, home_team, away_team) - without this, zero NBA games could be inserted
  • player_stats needed UNIQUE(game_id, player_name)
  • nhl_games needed UNIQUE(game_id) - had duplicate rows that had to be cleaned first
  • nhl_skater_stats needed UNIQUE(game_id, player_id)
  • nhl_teams needed UNIQUE(abbreviation)
  • nhl_players needed UNIQUE(player_id)
  • odds needed UNIQUE(game_date, home_team, away_team, source)
  • statements needed UNIQUE(source_handle, content)
  • Plus 60+ more across NFL, MLB, NHL analysis, and feature tables

All 76 were batch-added. 63 succeeded immediately, 4 needed deduplication first (done manually), 9 referenced columns that don't exist in those tables (non-critical edge tables). validate_schema() was expanded from 25 to 47 constraint specs so these are auto-checked and auto-created on every server startup.

(c) Step ordering was wrong. CORE_DATA_STEPS in server.py started with nba_players, nfl_games, nfl_schedule, nfl_standings, nfl_odds, nfl_players, nfl_stats, nfl_team_analytics, nfl_situational, nfl_injuries, nfl_baselines, mlb_games, mlb_batting, mlb_pitching, mlb_player_baselines - 15 slow API-calling steps before the games step. A full pipeline took 45+ minutes before it even attempted to fetch NBA games. Reordered to: games, stats, player_splits, injuries, odds, props first, then NHL, then analysis, then MLB/NFL last.

(d) Two SQL bugs. Event monitor GROUP BY and card counts sport column (described in Round 4).


3. Current State

Metric Before After
Intelligence cards 84 146
Games table latest date March 21 March 23
NHL sub-steps running 0 of 6 All 6
Pipeline step coverage ~60% (old cli.py) 100% (full cli.py)
UNIQUE constraints ~25 110+
Pipeline step order NFL/MLB first (45 min to NBA) NBA first (~2 min to games)

March 23 pipeline output:

  • 12 NBA games stored
  • 10 odds records, 327 player props
  • 1 NHL game, 32 standings, 33 teams, 816 players synced
  • 77 goalie baselines updated
  • 46 cards detected

4. What Jarvis Should Review

  1. Constraint completeness: Run the audit script again to verify no new ON CONFLICT patterns have been introduced without matching constraints.
  2. Pipeline output: Trigger a manual run (curl http://cypher.178.156.223.137.nip.io/api/v1/trigger) and watch /var/log/cypher.log for any remaining "Unknown step", "ON CONFLICT", or "InFailedSqlTransaction" errors.
  3. Card quality: Check the 146 cards in the database - are they passing validation, relevant, and accurate?
  4. NHL skater stats freshness: nhl_skater_stats latest date is March 18 (5 days old). The nhl_stats_backfill step scans 30 games but stored 0 skater rows. Needs investigation.
  5. NFL games staleness: Shows Feb 8 as latest. Expected (offseason), but confirm whether CORE_BLOCKING_TABLES should include nfl_games during offseason - it flags the system as "degraded."
  6. Health endpoint: Currently returns "degraded" because of stale NFL/MLB offseason tables. Consider whether the staleness threshold should be configurable per sport/season.

5. Files Changed

File What Changed
gtd/storage/database.py validate_schema() expanded to 47 specs; _safe_ddl() helper; IF NOT EXISTS for ALTER TABLE
gtd/storage/db.py ROUND() auto-cast to ::numeric; INSERT OR REPLACE to ON CONFLICT DO UPDATE SET fix
gtd/analysis/anomaly_detector.py entity_name added to GROUP BY
gtd/realtime/event_monitor.py Removed non-aggregated home_team from SELECT with aggregates
server.py CORE_DATA_STEPS reordered; card count query fixed
gtd/cli.py Already had correct code locally, was just not deployed to server

6. Commits

186047b fix: complete PostgreSQL constraint coverage and step ordering
f7c49ba fix: event_monitor GROUP BY and card counts missing sport column
5e14f24 fix: correct player_baselines constraint to use player_id
f323cdf fix: make ensure_all_tables resilient to PostgreSQL transaction aborts
e57edb5 fix: PostgreSQL pipeline compatibility - ROUND cast, GROUP BY, missing constraints

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions