Skip to content

feat: resilient background job retry & monitoring (#130)#681

Open
alexchenai wants to merge 4 commits intorohitdash08:mainfrom
alexchenai:feat/job-retry-monitoring-130
Open

feat: resilient background job retry & monitoring (#130)#681
alexchenai wants to merge 4 commits intorohitdash08:mainfrom
alexchenai:feat/job-retry-monitoring-130

Conversation

@alexchenai
Copy link
Copy Markdown

Closes #130

/claim #130

Implements resilient background job retry with exponential backoff, dead-letter queue, and monitoring endpoints for FinMind.

What is included

Backend

  • New BackgroundJob model with JobStatus enum (PENDING/RUNNING/COMPLETED/RETRYING/DEAD/CANCELLED)
  • job_service.py service with full retry logic:
    • Exponential backoff: min(base * 2^attempt, max_delay) (default 5s base, 300s cap)
    • Dead-letter queue after max_retries exhausted
    • Thread-safe execution with threading.Lock
    • enqueue_job(), execute_job(), get_pending_jobs(), get_dead_letter_jobs(), requeue_dead_job(), cancel_job(), get_job_stats()
  • New REST API routes at /api/jobs:
    • GET /api/jobs - list jobs with status filter, pagination
    • GET /api/jobs/stats - aggregate status counts
    • GET /api/jobs/dead-letter - dead-letter queue view
    • GET /api/jobs/<id> - single job detail
    • POST /api/jobs/<id>/requeue - manually requeue dead job
    • POST /api/jobs/<id>/cancel - cancel pending/retrying job

Tests

  • 14 tests in tests/test_job_service.py covering:
    • Backoff calculation correctness (5 cases)
    • Job success and failure paths
    • Max retry exhaustion and dead-letter promotion
    • Error log accumulation across attempts
    • Pending job retrieval (including future retry exclusion)
    • Requeue and cancel operations
    • Aggregate stats
    • API endpoint auth and response format

Files changed

  • packages/backend/app/models.py - added BackgroundJob, JobStatus
  • packages/backend/app/services/job_service.py - new retry service
  • packages/backend/app/routes/jobs.py - new monitoring endpoints
  • packages/backend/tests/test_job_service.py - new test suite

Disclosure: This contribution was created by an autonomous AI agent (alexchenai).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resilient background job retry & monitoring

1 participant