fix: Ingest scanner, bulk queue, error handling, metrics, and MCP job submission by mriechers · Pull Request #60 · mriechers/cardigan

mriechers · 2026-03-31T18:28:52Z

Summary

Recursive directory scanning — Scanner now descends into subdirectories (up to 3 levels deep). Default config changed to scan from root /, auto-discovering directories like /IWP/. Migration 009 updates stored config.
Bulk queue error fix — Frontend now correctly reads BulkQueueResponse wrapper. Also fixes orphaned file records on job creation failure.
Global JSON exception handler — All unhandled server errors return JSON instead of HTML. Fixes "Unexpected token '<'" on uploads.
Transcript metrics persistence — Worker saves word_count and duration_minutes to the job record after calculation, surviving failures and retries.
Rate limit relaxation — RATE_EXPENSIVE bumped to 30/min, made configurable via env vars.
MCP job submission tool — New submit_processing_job tool queues jobs by Media ID from any MCP-connected workspace.

Issues resolved (14 total)

Issue	Fix
#57	Recursive directory scanning discovers /IWP/ and future directories
#45	Frontend reads BulkQueueResponse wrapper correctly
#30	Global JSON exception handler + upload mkdir fix
#44	Worker persists transcript metrics to DB
#39	Metrics now survive failures (root cause of silent None)
#41	Rate limits relaxed to 30/min and made configurable
#47	New MCP tool: submit_processing_job
#43	$0.00 costs are expected for free-tier models (no code change)
#17	Rename already complete (no action needed)
#26	Already fixed in Sprint 14
#27	Table creation works via dual-path init
#32	Docker permissions issue with guidance
#55	Timestamp report display verified correct

Test plan

54 tests passing (42 scanner + 12 API), including 6 new recursive scanning tests
Deploy and run a scan — verify /IWP/ directory files appear
Bulk select transcripts on Ready for Work and queue — verify success toast
Upload transcripts — verify JSON errors on failure
Retry a failed job — verify word_count and duration_minutes populated
Test submit_processing_job MCP tool with a known Media ID

🤖 Generated with Claude Code

The scanner was only checking files in the top-level configured directories (/misc/, /SCC2SRT/, /wisconsinlife/) and skipping all subdirectories. This meant directories like /IWP/ were completely invisible. Now scans from root (/) and recurses into subdirectories up to 3 levels deep, respecting the ignore_directories config. - Add recursive scanning with MAX_SCAN_DEPTH=3 to _scan_directory - Split _parse_directory_listing to return (files, subdirs) tuple - Change default directories from curated list to ["/"] - Wire ignore_directories through router to scanner - Update default scan_time from midnight to 07:00 - Add migration 009 to update stored config values - Add 6 new tests for recursive scanning behavior [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The frontend expected the bulk queue response to be a flat array but the API returns a BulkQueueResponse wrapper object. Calling .filter() on the object threw TypeError, caught as a generic "Failed to queue" error — even though the files were actually queued on the backend. Fix: Read .queued and .failed from the BulkQueueResponse directly. Also fixes orphaned file records: if download_file succeeds but create_job fails, the file status is now reset to 'new' instead of being stuck as 'queued' with no job_id (invisible on Ready for Work). [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

#30) Added global exception handler in api/main.py to ensure all unhandled server errors return JSON responses, not FastAPI's default HTML page. The frontend was getting "Unexpected token '<'" when trying to parse HTML error responses as JSON. Also fixes: - upload.py: mkdir now uses parents=True and catches PermissionError with a clear message instead of an unhandled exception - TranscriptUploader.tsx: gracefully handles non-JSON error responses instead of crashing on response.json() Closes #30, relates to #32 (Docker permissions) [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

) Transcript metrics (word_count, duration_minutes) were calculated in-memory for routing decisions but never saved to the database. If a job failed and was retried, the metrics remained None because the initial calculation was lost. Now persists metrics to the job record after calculation, with a guard to only write when the values are missing (avoids overwriting on retry of already-populated jobs). [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bumped RATE_EXPENSIVE from 10/min to 30/min and RATE_READ from 60/min to 120/min — the previous limits were too restrictive for a single-user internal editorial tool. Both are now configurable via RATE_LIMIT_EXPENSIVE and RATE_LIMIT_READ env vars. Also includes the transcript metrics persistence fix from the previous commit (workers now save word_count and duration_minutes to the job record after calculation). [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New tool allows queuing transcript processing jobs by Media ID from any MCP-connected workspace (Claude Desktop, other projects). The tool: 1. Checks for existing jobs to avoid duplicates 2. Searches local transcripts/ dir for matching files 3. Falls back to ingest server available_files 4. Queues via the appropriate API endpoint This enables workflows like scheduling Content Calendar entries and triggering processing in the same session without switching projects. [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The cost tracker was a single module-level global that got silently overwritten when multiple jobs ran concurrently. The second job's start_run_tracking() destroyed the first job's tracker, causing the first job to write actual_cost=0. Fix: replaced single global with a dict of trackers keyed by job_id. Each job now gets its own isolated tracker. The chat() method looks up the correct tracker by job_id, and end_run_tracking(job_id) retrieves the right one. Also: - Set TESTING=1 early in conftest.py to disable rate limiter in tests - Tests now pass: 307 passed, 0 failed [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes 5 critical issues from PR review: 1. Global exception handler: return generic "Internal server error" instead of leaking str(exc) to clients; add exc_info=True to log 2. Upload error: remove filesystem path from PermissionError detail 3. MCP submit_processing_job: replace bare except:pass with logged warnings for duplicate check and ingest search failures 4. Worker metrics persistence: wrap in try/except so transient DB errors don't kill the whole job for a non-critical backfill 5. Orphaned file reset: include exception details in error log Also adds logging module to MCP server. [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Important issues: - Add finally cleanup for _run_trackers dict to prevent memory leak when jobs crash before end_run_tracking (worker.py) - Add 2 concurrency tests for per-job cost tracker isolation (test_llm.py) - Replace 3 pre-existing bare except:pass in MCP server helpers with debug-level logging (server.py) Suggestions: - Add DEBUG logging to _parse_file_metadata bare except (ingest_scanner.py) - Document ignore_directories matches by name only, not path (ingest_scanner.py) 309 tests passing. [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mriechers and others added 6 commits March 31, 2026 13:07

mriechers changed the title ~~fix: Ingest scanner, bulk queue, and error handling improvements~~ fix: Ingest scanner, bulk queue, error handling, metrics, and MCP job submission Mar 31, 2026

mriechers and others added 4 commits March 31, 2026 16:23

style: Apply black formatting to ingest_scanner and MCP server

2dfb731

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mriechers merged commit 96ab2da into main Mar 31, 2026
6 checks passed

mriechers deleted the worktree-bold-fox-rn6y branch March 31, 2026 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Ingest scanner, bulk queue, error handling, metrics, and MCP job submission#60

fix: Ingest scanner, bulk queue, error handling, metrics, and MCP job submission#60
mriechers merged 10 commits intomainfrom
worktree-bold-fox-rn6y

mriechers commented Mar 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mriechers commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues resolved (14 total)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mriechers commented Mar 31, 2026 •

edited

Loading