Engineering Update - Last 18 Hours
1. The Core Problem
After the hardening plan was approved and executed, the pipeline hit a cascade of failures that all traced back to one systemic issue: the SQLite-to-PostgreSQL migration transferred all the data but created zero UNIQUE constraints on any table. In SQLite, ON CONFLICT works loosely. In PostgreSQL, every ON CONFLICT(col1, col2) clause requires an actual UNIQUE index on those exact columns or the INSERT silently fails. This meant every single upsert in the entire pipeline was broken - games were not being inserted, stats were not being inserted, NHL data was not being inserted. The pipeline would "complete" with zero errors but store zero data.
2. What Was Fixed (Chronological)
Round 1: Initial PostgreSQL Compatibility (e57edb5)
gtd/storage/db.py: Added automatic ROUND((expr)::numeric, n) casting in the SQL converter. PostgreSQL does not allow ROUND(double precision, integer) - it requires a numeric type. This was failing the player_tiers step.
gtd/analysis/anomaly_detector.py: Added entity_name to a GROUP BY clause. PostgreSQL is stricter than SQLite about requiring all non-aggregated SELECT columns in GROUP BY.
gtd/storage/database.py: Expanded validate_schema() from 3 constraint specs to 25, covering tables used by the analysis pipeline (clutch, referee_history, b2b, rest_advantage, etc.).
Round 2: Resilient DDL (f323cdf)
gtd/storage/database.py: Added a _safe_ddl() helper function. In PostgreSQL, if a single DDL statement fails inside a transaction, the entire transaction aborts and all subsequent statements fail with "current transaction is aborted." The old code had ALTER TABLE injuries ADD COLUMN sport ... which would fail because the column already existed, and this cascaded to abort validate_schema() entirely. Now every DDL statement runs in its own mini-transaction with automatic rollback on failure.
- Changed all
ALTER TABLE ADD COLUMN statements to use IF NOT EXISTS.
Round 3: player_baselines constraint (5e14f24)
- Corrected the
validate_schema() entry for player_baselines from UNIQUE(player_name) to UNIQUE(player_id). The anomalies step uses ON CONFLICT(player_id), not player_name. Manually deduplicated the table on the server before adding the correct constraint.
Round 4: Event monitor and card counts (f7c49ba)
gtd/realtime/event_monitor.py: Fixed a SQL query that selected home_team alongside SUM() and COUNT(*) without including home_team in the GROUP BY. SQLite allows this; PostgreSQL does not. This was causing [RT] Monitor error every 15 minutes.
server.py: The _check_per_sport_card_counts() function queried WHERE sport = 'NBA' on the intelligence_cards table, but that table has no sport column. Replaced with a simple total card count check.
Round 5: Full Constraint Coverage + Step Ordering (186047b)
This is where the real investigation happened. After all the previous fixes, the pipeline was still producing 0 games. Deep log analysis and code tracing revealed four root causes:
(a) Deployed cli.py was massively outdated. The server had a 1,608-line version of gtd/cli.py. The current version is 3,915 lines. The old version was missing STEP_MAP entries for: nhl_stats_backfill, nhl_skater_baselines, nhl_goalie_baselines, nhl_power_play, nhl_penalty_kill, detect_cards, career_totals, all nfl_* sub-steps, all mlb_* sub-steps. Every missing step was logged as "Unknown step" and silently skipped. The full gtd/ directory was resynced to the server.
(b) Core tables had zero UNIQUE constraints. Comprehensive audit scanning every Python file for ON CONFLICT(...) patterns and cross-referencing with pg_constraint found 76 missing UNIQUE constraints across all tables. The critical ones:
games needed UNIQUE(game_date, home_team, away_team) - without this, zero NBA games could be inserted
player_stats needed UNIQUE(game_id, player_name)
nhl_games needed UNIQUE(game_id) - had duplicate rows that had to be cleaned first
nhl_skater_stats needed UNIQUE(game_id, player_id)
nhl_teams needed UNIQUE(abbreviation)
nhl_players needed UNIQUE(player_id)
odds needed UNIQUE(game_date, home_team, away_team, source)
statements needed UNIQUE(source_handle, content)
- Plus 60+ more across NFL, MLB, NHL analysis, and feature tables
All 76 were batch-added. 63 succeeded immediately, 4 needed deduplication first (done manually), 9 referenced columns that don't exist in those tables (non-critical edge tables). validate_schema() was expanded from 25 to 47 constraint specs so these are auto-checked and auto-created on every server startup.
(c) Step ordering was wrong. CORE_DATA_STEPS in server.py started with nba_players, nfl_games, nfl_schedule, nfl_standings, nfl_odds, nfl_players, nfl_stats, nfl_team_analytics, nfl_situational, nfl_injuries, nfl_baselines, mlb_games, mlb_batting, mlb_pitching, mlb_player_baselines - 15 slow API-calling steps before the games step. A full pipeline took 45+ minutes before it even attempted to fetch NBA games. Reordered to: games, stats, player_splits, injuries, odds, props first, then NHL, then analysis, then MLB/NFL last.
(d) Two SQL bugs. Event monitor GROUP BY and card counts sport column (described in Round 4).
3. Current State
| Metric |
Before |
After |
| Intelligence cards |
84 |
146 |
| Games table latest date |
March 21 |
March 23 |
| NHL sub-steps running |
0 of 6 |
All 6 |
| Pipeline step coverage |
~60% (old cli.py) |
100% (full cli.py) |
| UNIQUE constraints |
~25 |
110+ |
| Pipeline step order |
NFL/MLB first (45 min to NBA) |
NBA first (~2 min to games) |
March 23 pipeline output:
- 12 NBA games stored
- 10 odds records, 327 player props
- 1 NHL game, 32 standings, 33 teams, 816 players synced
- 77 goalie baselines updated
- 46 cards detected
4. What Jarvis Should Review
- Constraint completeness: Run the audit script again to verify no new
ON CONFLICT patterns have been introduced without matching constraints.
- Pipeline output: Trigger a manual run (
curl http://cypher.178.156.223.137.nip.io/api/v1/trigger) and watch /var/log/cypher.log for any remaining "Unknown step", "ON CONFLICT", or "InFailedSqlTransaction" errors.
- Card quality: Check the 146 cards in the database - are they passing validation, relevant, and accurate?
- NHL skater stats freshness:
nhl_skater_stats latest date is March 18 (5 days old). The nhl_stats_backfill step scans 30 games but stored 0 skater rows. Needs investigation.
- NFL games staleness: Shows Feb 8 as latest. Expected (offseason), but confirm whether
CORE_BLOCKING_TABLES should include nfl_games during offseason - it flags the system as "degraded."
- Health endpoint: Currently returns "degraded" because of stale NFL/MLB offseason tables. Consider whether the staleness threshold should be configurable per sport/season.
5. Files Changed
| File |
What Changed |
gtd/storage/database.py |
validate_schema() expanded to 47 specs; _safe_ddl() helper; IF NOT EXISTS for ALTER TABLE |
gtd/storage/db.py |
ROUND() auto-cast to ::numeric; INSERT OR REPLACE to ON CONFLICT DO UPDATE SET fix |
gtd/analysis/anomaly_detector.py |
entity_name added to GROUP BY |
gtd/realtime/event_monitor.py |
Removed non-aggregated home_team from SELECT with aggregates |
server.py |
CORE_DATA_STEPS reordered; card count query fixed |
gtd/cli.py |
Already had correct code locally, was just not deployed to server |
6. Commits
186047b fix: complete PostgreSQL constraint coverage and step ordering
f7c49ba fix: event_monitor GROUP BY and card counts missing sport column
5e14f24 fix: correct player_baselines constraint to use player_id
f323cdf fix: make ensure_all_tables resilient to PostgreSQL transaction aborts
e57edb5 fix: PostgreSQL pipeline compatibility - ROUND cast, GROUP BY, missing constraints
Engineering Update - Last 18 Hours
1. The Core Problem
After the hardening plan was approved and executed, the pipeline hit a cascade of failures that all traced back to one systemic issue: the SQLite-to-PostgreSQL migration transferred all the data but created zero UNIQUE constraints on any table. In SQLite,
ON CONFLICTworks loosely. In PostgreSQL, everyON CONFLICT(col1, col2)clause requires an actual UNIQUE index on those exact columns or the INSERT silently fails. This meant every single upsert in the entire pipeline was broken - games were not being inserted, stats were not being inserted, NHL data was not being inserted. The pipeline would "complete" with zero errors but store zero data.2. What Was Fixed (Chronological)
Round 1: Initial PostgreSQL Compatibility (
e57edb5)gtd/storage/db.py: Added automaticROUND((expr)::numeric, n)casting in the SQL converter. PostgreSQL does not allowROUND(double precision, integer)- it requires anumerictype. This was failing theplayer_tiersstep.gtd/analysis/anomaly_detector.py: Addedentity_nameto aGROUP BYclause. PostgreSQL is stricter than SQLite about requiring all non-aggregated SELECT columns in GROUP BY.gtd/storage/database.py: Expandedvalidate_schema()from 3 constraint specs to 25, covering tables used by the analysis pipeline (clutch, referee_history, b2b, rest_advantage, etc.).Round 2: Resilient DDL (
f323cdf)gtd/storage/database.py: Added a_safe_ddl()helper function. In PostgreSQL, if a single DDL statement fails inside a transaction, the entire transaction aborts and all subsequent statements fail with "current transaction is aborted." The old code hadALTER TABLE injuries ADD COLUMN sport ...which would fail because the column already existed, and this cascaded to abortvalidate_schema()entirely. Now every DDL statement runs in its own mini-transaction with automatic rollback on failure.ALTER TABLE ADD COLUMNstatements to useIF NOT EXISTS.Round 3: player_baselines constraint (
5e14f24)validate_schema()entry forplayer_baselinesfromUNIQUE(player_name)toUNIQUE(player_id). The anomalies step usesON CONFLICT(player_id), notplayer_name. Manually deduplicated the table on the server before adding the correct constraint.Round 4: Event monitor and card counts (
f7c49ba)gtd/realtime/event_monitor.py: Fixed a SQL query that selectedhome_teamalongsideSUM()andCOUNT(*)without includinghome_teamin the GROUP BY. SQLite allows this; PostgreSQL does not. This was causing[RT] Monitor errorevery 15 minutes.server.py: The_check_per_sport_card_counts()function queriedWHERE sport = 'NBA'on theintelligence_cardstable, but that table has nosportcolumn. Replaced with a simple total card count check.Round 5: Full Constraint Coverage + Step Ordering (
186047b)This is where the real investigation happened. After all the previous fixes, the pipeline was still producing 0 games. Deep log analysis and code tracing revealed four root causes:
(a) Deployed cli.py was massively outdated. The server had a 1,608-line version of
gtd/cli.py. The current version is 3,915 lines. The old version was missingSTEP_MAPentries for:nhl_stats_backfill,nhl_skater_baselines,nhl_goalie_baselines,nhl_power_play,nhl_penalty_kill,detect_cards,career_totals, allnfl_*sub-steps, allmlb_*sub-steps. Every missing step was logged as "Unknown step" and silently skipped. The fullgtd/directory was resynced to the server.(b) Core tables had zero UNIQUE constraints. Comprehensive audit scanning every Python file for
ON CONFLICT(...)patterns and cross-referencing withpg_constraintfound 76 missing UNIQUE constraints across all tables. The critical ones:gamesneededUNIQUE(game_date, home_team, away_team)- without this, zero NBA games could be insertedplayer_statsneededUNIQUE(game_id, player_name)nhl_gamesneededUNIQUE(game_id)- had duplicate rows that had to be cleaned firstnhl_skater_statsneededUNIQUE(game_id, player_id)nhl_teamsneededUNIQUE(abbreviation)nhl_playersneededUNIQUE(player_id)oddsneededUNIQUE(game_date, home_team, away_team, source)statementsneededUNIQUE(source_handle, content)All 76 were batch-added. 63 succeeded immediately, 4 needed deduplication first (done manually), 9 referenced columns that don't exist in those tables (non-critical edge tables).
validate_schema()was expanded from 25 to 47 constraint specs so these are auto-checked and auto-created on every server startup.(c) Step ordering was wrong.
CORE_DATA_STEPSinserver.pystarted withnba_players, nfl_games, nfl_schedule, nfl_standings, nfl_odds, nfl_players, nfl_stats, nfl_team_analytics, nfl_situational, nfl_injuries, nfl_baselines, mlb_games, mlb_batting, mlb_pitching, mlb_player_baselines- 15 slow API-calling steps before thegamesstep. A full pipeline took 45+ minutes before it even attempted to fetch NBA games. Reordered to:games, stats, player_splits, injuries, odds, propsfirst, then NHL, then analysis, then MLB/NFL last.(d) Two SQL bugs. Event monitor GROUP BY and card counts sport column (described in Round 4).
3. Current State
March 23 pipeline output:
4. What Jarvis Should Review
ON CONFLICTpatterns have been introduced without matching constraints.curl http://cypher.178.156.223.137.nip.io/api/v1/trigger) and watch/var/log/cypher.logfor any remaining "Unknown step", "ON CONFLICT", or "InFailedSqlTransaction" errors.nhl_skater_statslatest date is March 18 (5 days old). Thenhl_stats_backfillstep scans 30 games but stored 0 skater rows. Needs investigation.CORE_BLOCKING_TABLESshould includenfl_gamesduring offseason - it flags the system as "degraded."5. Files Changed
gtd/storage/database.pyvalidate_schema()expanded to 47 specs;_safe_ddl()helper;IF NOT EXISTSfor ALTER TABLEgtd/storage/db.pyROUND()auto-cast to::numeric;INSERT OR REPLACEtoON CONFLICT DO UPDATE SETfixgtd/analysis/anomaly_detector.pyentity_nameadded to GROUP BYgtd/realtime/event_monitor.pyhome_teamfrom SELECT with aggregatesserver.pyCORE_DATA_STEPSreordered; card count query fixedgtd/cli.py6. Commits