fix: Ingest scanner, bulk queue, error handling, metrics, and MCP job submission#60
Merged
fix: Ingest scanner, bulk queue, error handling, metrics, and MCP job submission#60
Conversation
The scanner was only checking files in the top-level configured directories (/misc/, /SCC2SRT/, /wisconsinlife/) and skipping all subdirectories. This meant directories like /IWP/ were completely invisible. Now scans from root (/) and recurses into subdirectories up to 3 levels deep, respecting the ignore_directories config. - Add recursive scanning with MAX_SCAN_DEPTH=3 to _scan_directory - Split _parse_directory_listing to return (files, subdirs) tuple - Change default directories from curated list to ["/"] - Wire ignore_directories through router to scanner - Update default scan_time from midnight to 07:00 - Add migration 009 to update stored config values - Add 6 new tests for recursive scanning behavior [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The frontend expected the bulk queue response to be a flat array but the API returns a BulkQueueResponse wrapper object. Calling .filter() on the object threw TypeError, caught as a generic "Failed to queue" error — even though the files were actually queued on the backend. Fix: Read .queued and .failed from the BulkQueueResponse directly. Also fixes orphaned file records: if download_file succeeds but create_job fails, the file status is now reset to 'new' instead of being stuck as 'queued' with no job_id (invisible on Ready for Work). [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
#30) Added global exception handler in api/main.py to ensure all unhandled server errors return JSON responses, not FastAPI's default HTML page. The frontend was getting "Unexpected token '<'" when trying to parse HTML error responses as JSON. Also fixes: - upload.py: mkdir now uses parents=True and catches PermissionError with a clear message instead of an unhandled exception - TranscriptUploader.tsx: gracefully handles non-JSON error responses instead of crashing on response.json() Closes #30, relates to #32 (Docker permissions) [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
) Transcript metrics (word_count, duration_minutes) were calculated in-memory for routing decisions but never saved to the database. If a job failed and was retried, the metrics remained None because the initial calculation was lost. Now persists metrics to the job record after calculation, with a guard to only write when the values are missing (avoids overwriting on retry of already-populated jobs). [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bumped RATE_EXPENSIVE from 10/min to 30/min and RATE_READ from 60/min to 120/min — the previous limits were too restrictive for a single-user internal editorial tool. Both are now configurable via RATE_LIMIT_EXPENSIVE and RATE_LIMIT_READ env vars. Also includes the transcript metrics persistence fix from the previous commit (workers now save word_count and duration_minutes to the job record after calculation). [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New tool allows queuing transcript processing jobs by Media ID from any MCP-connected workspace (Claude Desktop, other projects). The tool: 1. Checks for existing jobs to avoid duplicates 2. Searches local transcripts/ dir for matching files 3. Falls back to ingest server available_files 4. Queues via the appropriate API endpoint This enables workflows like scheduling Content Calendar entries and triggering processing in the same session without switching projects. [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The cost tracker was a single module-level global that got silently overwritten when multiple jobs ran concurrently. The second job's start_run_tracking() destroyed the first job's tracker, causing the first job to write actual_cost=0. Fix: replaced single global with a dict of trackers keyed by job_id. Each job now gets its own isolated tracker. The chat() method looks up the correct tracker by job_id, and end_run_tracking(job_id) retrieves the right one. Also: - Set TESTING=1 early in conftest.py to disable rate limiter in tests - Tests now pass: 307 passed, 0 failed [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes 5 critical issues from PR review: 1. Global exception handler: return generic "Internal server error" instead of leaking str(exc) to clients; add exc_info=True to log 2. Upload error: remove filesystem path from PermissionError detail 3. MCP submit_processing_job: replace bare except:pass with logged warnings for duplicate check and ingest search failures 4. Worker metrics persistence: wrap in try/except so transient DB errors don't kill the whole job for a non-critical backfill 5. Orphaned file reset: include exception details in error log Also adds logging module to MCP server. [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Important issues: - Add finally cleanup for _run_trackers dict to prevent memory leak when jobs crash before end_run_tracking (worker.py) - Add 2 concurrency tests for per-job cost tracker isolation (test_llm.py) - Replace 3 pre-existing bare except:pass in MCP server helpers with debug-level logging (server.py) Suggestions: - Add DEBUG logging to _parse_file_metadata bare except (ingest_scanner.py) - Document ignore_directories matches by name only, not path (ingest_scanner.py) 309 tests passing. [Agent: Main Assistant] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/, auto-discovering directories like/IWP/. Migration 009 updates stored config.BulkQueueResponsewrapper. Also fixes orphaned file records on job creation failure.submit_processing_jobtool queues jobs by Media ID from any MCP-connected workspace.Issues resolved (14 total)
Test plan
/IWP/directory files appearsubmit_processing_jobMCP tool with a known Media ID🤖 Generated with Claude Code