Skip to content

P0: Fix Pipeline Completion - MLB/NFL Type Mismatches #162

@kevsilk597

Description

@kevsilk597

Objective

The pipeline cannot complete a clean run. Every execution fails on the same type mismatch errors. Until this is fixed, we cannot generate cards reliably.

Current Failure

reason_code=step_exception step=mlb_batting attempt=2/2 
detail=operator does not exist: text = integer
LINE 1: ...away_team, game_state FROM mlb_games WHERE season = 2026 AND...
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

reason_code=step_exception step=mlb_pitching attempt=2/2 
detail=operator does not exist: text = integer
(same issue)

reason_code=step_exception step=nfl_team_analytics attempt=2/2 
detail=unsupported operand type(s) for -: 'float' and 'decimal.Decimal'

Root Cause Analysis

MLB Issue

The season column in mlb_games is stored as TEXT, but queries compare it to INTEGER 2026.

PostgreSQL is strict about type comparisons. SQLite was lenient.

Find all occurrences:

grep -rn "season = 2026" --include="*.py" .
grep -rn "season=2026" --include="*.py" .
grep -rn "WHERE season" --include="*.py" .

NFL Issue

Python float cannot do arithmetic with PostgreSQL decimal.Decimal returned from queries.

Find all occurrences:

grep -rn "decimal\|Decimal" --include="*.py" .
# Look for arithmetic operations on query results

Requirements

1. Fix MLB Type Mismatch

Option A (preferred): Fix the queries to use string comparison:

# Before
WHERE season = 2026

# After  
WHERE season = '2026'

Option B: Cast in the query:

WHERE season::integer = 2026

Option C: Fix the column type (requires migration):

ALTER TABLE mlb_games ALTER COLUMN season TYPE integer USING season::integer;

Recommend Option A - fastest, no migration needed.

2. Fix NFL Decimal Issue

When fetching numeric data from PostgreSQL, convert Decimal to float before arithmetic:

# Before
result = row[0] - row[1]  # Fails if one is Decimal, one is float

# After
result = float(row[0]) - float(row[1])

Or configure psycopg2 to return floats:

import psycopg2.extensions
psycopg2.extensions.register_type(psycopg2.extensions.new_type(
    psycopg2.extensions.DECIMAL.values,
    'DEC2FLOAT',
    lambda value, curs: float(value) if value is not None else None
))

3. Audit All Queries for Type Safety

Search the entire codebase for potential type mismatches:

# Find all raw SQL queries
grep -rn "SELECT\|INSERT\|UPDATE\|WHERE" --include="*.py" . | grep -v test

# For each query that compares to a Python variable, verify types match

Create a checklist of every table/column that stores numbers as TEXT and ensure all queries handle this.

4. Add Type Safety Tests

Create a test that runs each pipeline step individually and catches type errors:

def test_pipeline_steps_type_safe():
    for step_name, step_fn in STEP_MAP.items():
        try:
            # Run with minimal data to catch type errors
            step_fn(dry_run=True)
        except TypeError as e:
            pytest.fail(f"Step {step_name} has type error: {e}")

Validation

After fixes are deployed:

  1. Trigger a full pipeline run:
curl http://cypher.178.156.223.137.nip.io/api/v1/trigger
  1. Monitor logs for 30 minutes:
ssh root@178.156.223.137 "tail -f /var/log/cypher.log | grep -E 'step_exception|TypeError|operator does not exist'"
  1. Verify pipeline completes:
curl http://cypher.178.156.223.137.nip.io/health | jq ".pipeline.last_run"

Expected: "status": "completed", "errors": null

Definition of Done

  • mlb_batting step completes without type errors
  • mlb_pitching step completes without type errors
  • nfl_team_analytics step completes without type errors
  • Full pipeline run completes with status: completed
  • Health endpoint shows pipeline healthy
  • No type-related errors in logs for 24 hours

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions