You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat: Add recursive directory scanning to ingest scanner
The scanner was only checking files in the top-level configured
directories (/misc/, /SCC2SRT/, /wisconsinlife/) and skipping all
subdirectories. This meant directories like /IWP/ were completely
invisible. Now scans from root (/) and recurses into subdirectories
up to 3 levels deep, respecting the ignore_directories config.
- Add recursive scanning with MAX_SCAN_DEPTH=3 to _scan_directory
- Split _parse_directory_listing to return (files, subdirs) tuple
- Change default directories from curated list to ["/"]
- Wire ignore_directories through router to scanner
- Update default scan_time from midnight to 07:00
- Add migration 009 to update stored config values
- Add 6 new tests for recursive scanning behavior
[Agent: Main Assistant]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Bulk queue shows error despite succeeding (fixes#45)
The frontend expected the bulk queue response to be a flat array but
the API returns a BulkQueueResponse wrapper object. Calling .filter()
on the object threw TypeError, caught as a generic "Failed to queue"
error — even though the files were actually queued on the backend.
Fix: Read .queued and .failed from the BulkQueueResponse directly.
Also fixes orphaned file records: if download_file succeeds but
create_job fails, the file status is now reset to 'new' instead of
being stuck as 'queued' with no job_id (invisible on Ready for Work).
[Agent: Main Assistant]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Return JSON errors instead of HTML for unhandled exceptions (fixes#30)
Added global exception handler in api/main.py to ensure all unhandled
server errors return JSON responses, not FastAPI's default HTML page.
The frontend was getting "Unexpected token '<'" when trying to parse
HTML error responses as JSON.
Also fixes:
- upload.py: mkdir now uses parents=True and catches PermissionError
with a clear message instead of an unhandled exception
- TranscriptUploader.tsx: gracefully handles non-JSON error responses
instead of crashing on response.json()
Closes#30, relates to #32 (Docker permissions)
[Agent: Main Assistant]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Persist transcript metrics to database on job processing (fixes#44)
Transcript metrics (word_count, duration_minutes) were calculated
in-memory for routing decisions but never saved to the database.
If a job failed and was retried, the metrics remained None because
the initial calculation was lost.
Now persists metrics to the job record after calculation, with a
guard to only write when the values are missing (avoids overwriting
on retry of already-populated jobs).
[Agent: Main Assistant]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Relax rate limits and make configurable (fixes#41, #44, #39)
Bumped RATE_EXPENSIVE from 10/min to 30/min and RATE_READ from
60/min to 120/min — the previous limits were too restrictive for
a single-user internal editorial tool. Both are now configurable
via RATE_LIMIT_EXPENSIVE and RATE_LIMIT_READ env vars.
Also includes the transcript metrics persistence fix from the
previous commit (workers now save word_count and duration_minutes
to the job record after calculation).
[Agent: Main Assistant]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: Add submit_processing_job MCP tool (fixes#47)
New tool allows queuing transcript processing jobs by Media ID from
any MCP-connected workspace (Claude Desktop, other projects). The
tool:
1. Checks for existing jobs to avoid duplicates
2. Searches local transcripts/ dir for matching files
3. Falls back to ingest server available_files
4. Queues via the appropriate API endpoint
This enables workflows like scheduling Content Calendar entries and
triggering processing in the same session without switching projects.
[Agent: Main Assistant]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Per-job cost tracking prevents race condition (fixes#43)
The cost tracker was a single module-level global that got silently
overwritten when multiple jobs ran concurrently. The second job's
start_run_tracking() destroyed the first job's tracker, causing
the first job to write actual_cost=0.
Fix: replaced single global with a dict of trackers keyed by job_id.
Each job now gets its own isolated tracker. The chat() method looks
up the correct tracker by job_id, and end_run_tracking(job_id)
retrieves the right one.
Also:
- Set TESTING=1 early in conftest.py to disable rate limiter in tests
- Tests now pass: 307 passed, 0 failed
[Agent: Main Assistant]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: Apply black formatting to ingest_scanner and MCP server
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Address PR review findings — error handling and info leaks
Fixes 5 critical issues from PR review:
1. Global exception handler: return generic "Internal server error"
instead of leaking str(exc) to clients; add exc_info=True to log
2. Upload error: remove filesystem path from PermissionError detail
3. MCP submit_processing_job: replace bare except:pass with logged
warnings for duplicate check and ingest search failures
4. Worker metrics persistence: wrap in try/except so transient DB
errors don't kill the whole job for a non-critical backfill
5. Orphaned file reset: include exception details in error log
Also adds logging module to MCP server.
[Agent: Main Assistant]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Address remaining PR review findings
Important issues:
- Add finally cleanup for _run_trackers dict to prevent memory leak
when jobs crash before end_run_tracking (worker.py)
- Add 2 concurrency tests for per-job cost tracker isolation
(test_llm.py)
- Replace 3 pre-existing bare except:pass in MCP server helpers
with debug-level logging (server.py)
Suggestions:
- Add DEBUG logging to _parse_file_metadata bare except
(ingest_scanner.py)
- Document ignore_directories matches by name only, not path
(ingest_scanner.py)
309 tests passing.
[Agent: Main Assistant]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments