-
Notifications
You must be signed in to change notification settings - Fork 2
Job Traceability, Management, and Audibility Overhaul #645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
bencap
wants to merge
60
commits into
feature/bencap/derived-gene-name-from-mapped-output
Choose a base branch
from
feature/bencap/627/job-traceability
base: feature/bencap/derived-gene-name-from-mapped-output
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Job Traceability, Management, and Audibility Overhaul #645
bencap
wants to merge
60
commits into
feature/bencap/derived-gene-name-from-mapped-output
from
feature/bencap/627/job-traceability
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…cture Break down 1767-line jobs.py into domain-driven modules, improving maintainability and developer experience. - variant_processing/: Variant creation and VRS mapping - external_services/: ClinGen, UniProt, gnomAD integrations - data_management/: Database and view operations - utils/: Shared utilities (state, retry, constants) - registry.py: Centralized ARQ job configuration - constants.py: Environment configuration - redis.py: Redis connection settings - lifecycle.py: Worker lifecycle hooks - worker.py: Main ArqWorkerSettings class - All job functions maintain identical behavior - Registry provides BACKGROUND_FUNCTIONS/BACKGROUND_CRONJOBS lists for ARQ initialization - Test structure mirrors source organization This refactor ensures ARQ worker initialization is backwards compatible. The modular architecture establishes a more maintainable foundation for MaveDB's automated processing workflows while preserving all existing functionality.
Implement complete database foundation for pipeline-based job tracking and monitoring: Database Tables: • pipelines - High-level workflow grouping with correlation IDs for end-to-end tracing • job_runs - Individual job execution tracking with full lifecycle management • job_dependencies - Workflow orchestration with success/completion dependency types • job_metrics - Detailed performance metrics (CPU, memory, execution time, business metrics) • variant_annotation_status - Granular variant-level annotation tracking with success data Key Features: • Pipeline workflow management with dependency resolution • Comprehensive job lifecycle tracking (pending → running → completed/failed) • Retry logic with configurable limits and backoff strategies • Resource usage and performance metrics collection • Variant-level annotation status for debugging failures • Correlation ID support for request tracing across system • JSONB metadata fields for flexible job-specific data • Optimized indexes for common query patterns Schema Design: • Foreign key relationships maintain data integrity • Check constraints ensure valid enum values and positive numbers • Strategic indexes optimize dependency resolution and metrics queries • Cascade deletes prevent orphaned records • Version tracking for audit and debugging Models & Enums: • SQLAlchemy models with proper relationships and hybrid properties • Comprehensive enum definitions for job/pipeline status and failure categories
Add comprehensive job lifecycle management with status-based completion: * Implement convenience methods for common job outcomes: - succeed_job() for successful completion - fail_job() for error handling with exception details - cancel_job() for user/system cancellation - skip_job() for conditional job skipping * Enhance progress tracking with increment_progress() and set_progress_total() * Add comprehensive error handling with specific exception types * Improve job state validation and atomic transaction handling * Implement extensive test coverage for all job operations
- Created PipelineManager capable of coordinating jobs within a pipeline context - Introduced `construct_bulk_cancellation_result` to standardize cancellation result structures. - Added `job_dependency_is_met` to check job dependencies based on their types and statuses. - Created comprehensive tests for PipelineManager covering initialization, job coordination, status transitions, and error handling. - Implemented mocks for database and Redis dependencies to isolate tests. - Added tests for job enqueuing, cancellation, pausing, unpausing, and retrying functionalities.
Adds decorators for managed jobs and pipelines. These can be applied to async ARQ functions to automatically persist their state as they execute
In certain instances (cron jobs in particular), worker processes are invoked from contexts where we have not yet added a job run record to the database. In such cases, it becomes useful to first guarantee a minimal record is added to the database such that the job run can be tracked via existing managed job decorators. This feature adds such a decorator and associated tests.`
Since decorators are applied at import time, this test mode path is a pragmatic solution to run decorators without side effects during unit tests. It's more straightforward and maintainable than other solutions, and still lets us import job definitions up front to register with ARQ.
Additionally contains some small updates to how decorator unit tests handle the new test mode flag.
…ed_job_data` - Updated test files to use `with_populated_job_data` fixture for populating the database with sample job and pipeline data. - Removed the `setup_worker_db` fixture from various test cases in job and pipeline management tests. - Added new sample job and pipeline fixtures in `conftest.py` to streamline test data creation. - Improved clarity and maintainability of tests by consolidating data setup logic.
feat(wip): upload files to S3 prior to job invocation, localstack emulation in dev environment
From certain decorator contexts, we wish to not coordinate the pipeline after starting it. This prevents jobs from being double enqueued mistakenly.
…jecting in lifecycle hooks This contextmanager method ensures sessions are closed in a more consistent and guaranteed manner.
- Updated test assertions to check for "exception" status instead of "failed" in variant creation and mapping tests. - Enhanced exception handling in job management decorators to return structured results with "status", "data", and "exception" fields. - Modified job manager methods to align with new result structure, ensuring consistent handling of job outcomes across success, failure, and cancellation scenarios. - Adjusted integration tests to validate the new result format and ensure proper job state transitions. - Improved clarity in test cases by asserting the presence of exception details where applicable.
Alters the `complete_job` method to remove default updates to the progress message. This allows the job to set its final progress message, which results in generally more useful messages than the generic options we have at our disposal in the complete job method.
…ncy compatibility
This update will support using job definitions directly in scripts.
…ission for score sets
- Integrated `send_slack_error` calls in multiple test cases across different modules to ensure error notifications are sent when exceptions occur. - Updated tests for materialized views, published variants, Clingen submissions, GnomAD linking, UniProt mappings, pipeline management, and variant processing to assert that Slack notifications are triggered on failures. - Enhanced error handling in job management decorators to include Slack notifications for missing context and job failures.
17b2975 to
e85312a
Compare
…or missing Redis in PipelineManager
…ation and error handling
2e07984 to
85a4268
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a robust, auditable, and maintainable background job system for MaveDB, supporting both standalone jobs and complex pipelines. It provides a strong foundation for future workflow automation, error recovery, and developer onboarding.
Features include:
1. Worker Job System Refactor & Enhancements
Refactored the monolithic worker job system into a modular architecture:
2. Job & Pipeline Management Infrastructure
Implemented JobManager and PipelineManager classes for robust job and pipeline lifecycle management: (05fc52b, ae18eeb, 3799d84, 1e447a7, 7b44346)
3. Decorator System for Jobs and Pipelines
Introduced decorators for job and pipeline management:
(c2100a2, 155e549, 4a4055d, 3c4e6b9, 010f15c)
Improved test mode support and simplified stacking/usage patterns.
4. Comprehensive Test Suite
Added and refactored unit and integration tests for all job modules, managers, and decorators.
(05fc52b, ae18eeb, a701d53, 806f8ed, 011522c, 010f15c, 8a22306, a716cc9, b0397b4, 8c5e225, 3c4e6b9, 4a4055d, 1fe076a, 1abe4c6)
Enhanced test coverage for error handling, state transitions, and job orchestration.
Introduced fixtures and utilities for easier test setup and mocking.
Categorized tests with markers for unit, integration, and network tests.
(16a5a50, f34939c)
5. Developer Documentation
Added detailed markdown documentation in the worker/jobs/] directory:
(1abe4c6)
6. Database & Model Changes
(1db6b68)
Alembic migration for pipeline and job tracking schema.
Updated models and enums to support new job/pipeline features.
7. Miscellaneous Improvements
Dependency updates (e.g., added asyncclick).
(a3f36d1)