Finding
The morning-health-check PCS plugin pilot surfaced 862 contest.bronze rows (0.00037% of 234M) with out-of-band timestamps:
- Pre-2005 rows (some 1970-01-01 epoch fallbacks, others 1996/1998/2000-2003 — before our intended 2005-2025 contest range)
- Future-dated rows (2027, 2028, 2047, 2054, 2065, 2080, 2088 — operator typos in Cabrillo logs)
Examples of root cause — source file year vs parsed timestamp year mismatch:
| Source |
Parsed timestamp |
Likely operator typo |
cq-ww/2010cw |
1971-03-26 |
"71" instead of "10" |
cq-wpx/2008ph |
2080-03-30 |
"80" instead of "08" |
cq-ww/2007cw |
1970-03-23 |
epoch fallback |
This is upstream data corruption in the original Cabrillo logs themselves, not a pipeline bug — we faithfully ingested what operators submitted.
Proposed fix
Add ingest-time validation in contest-ingest (or the upstream parser):
- Each contest event has a known expected date range (typically a 2-day weekend window). Reject any QSO timestamp outside
[contest_start - 7d, contest_end + 7d].
- Log rejected rows with full row content + reason so operators can review.
- Track rejection count in the ingest summary output.
Scope
- Affects: contest-download/contest-ingest (Cabrillo parsing path)
- Bronze rows: 862 already-ingested rows could either stay (validation is forward-only) or be cleaned via a one-shot ALTER TABLE DELETE — recommend keeping for now (bronze is the raw archive layer).
- Downstream impact: contest.signatures already filters to sane date ranges, so this is hygiene/observability, not data quality.
Discovered by
morning-health-check pilot, first run, 2026-05-17. See KI7MT/fleet-ops/plugins/morning-health-check/README.md finding F-1.
Priority
Low — 0.00037% of rows. Pipeline observability gain rather than data quality blocker.
Labels
infra, bob, deferred
🤖 Generated with Claude Code
Finding
The morning-health-check PCS plugin pilot surfaced 862 contest.bronze rows (0.00037% of 234M) with out-of-band timestamps:
Examples of root cause — source file year vs parsed timestamp year mismatch:
cq-ww/2010cwcq-wpx/2008phcq-ww/2007cwThis is upstream data corruption in the original Cabrillo logs themselves, not a pipeline bug — we faithfully ingested what operators submitted.
Proposed fix
Add ingest-time validation in
contest-ingest(or the upstream parser):[contest_start - 7d, contest_end + 7d].Scope
Discovered by
morning-health-check pilot, first run, 2026-05-17. See
KI7MT/fleet-ops/plugins/morning-health-check/README.mdfinding F-1.Priority
Low — 0.00037% of rows. Pipeline observability gain rather than data quality blocker.
Labels
infra,bob,deferred🤖 Generated with Claude Code