Skip to content

Release v0.3.0#7

Merged
frapercan merged 7 commits intomainfrom
develop
Mar 25, 2026
Merged

Release v0.3.0#7
frapercan merged 7 commits intomainfrom
develop

Conversation

@frapercan
Copy link
Copy Markdown
Owner

Release v0.3.0 — Re-ranker, evaluation pipeline, annotate workflow, UI overhaul

Features

  • Re-ranker neural model: temporal holdout training pipeline with LightGBM, feature engineering (alignments, taxonomy), and scoring configs
  • CAFA evaluation pipeline: automated evaluation with multiple metrics (Fmax, Smin, AUPR)
  • Annotate workflow: end-to-end functional annotation from FASTA upload to GO term prediction
  • Scoring engine: configurable scoring configs with evidence weights
  • Connection pool, DLQ, structured logging, health probes, stale job reaper
  • Full i18n: 5 locales (EN/ES/DE/PT/ZH) via next-intl
  • Frontend overhaul: scoring config UI, support page, evaluation views, human-readable labels

Tests

  • Coverage expanded from 65% to 88% (283 → 831 tests)

Docs

  • ADRs, operational runbook, re-ranker design spec
  • Full Sphinx documentation update

CI

  • Bump GitHub Actions to v6 (checkout, setup-python)
  • Fix all ruff, flake8, and mypy lint errors

Commits since v0.2.0

  • 2033e0d feat(infra): connection pool, DLQ, structured logging, health probes, stale reaper
  • 7ee749f test: expand coverage from 65% to 88% (283 -> 831 tests)
  • 096823e docs: ADRs, operational runbook, and re-ranker design spec
  • 092f110 release: v0.3.0 — re-ranker, evaluation pipeline, annotate workflow, UI overhaul
  • 8b92868 fix(lint): resolve ruff errors
  • 53e62c9 fix(lint): resolve flake8 E501 and mypy type errors
  • c244d25 ci: bump actions to v6, drop FORCE_JAVASCRIPT_ACTIONS_TO_NODE24

… stale reaper

- DB connection pool (pool_size=20, max_overflow=40, pool_recycle=3600)
- Publisher: thread-local connection reuse, exponential backoff (5 attempts)
- Consumer: dead letter queue (protea.dlx -> protea.dead-letter) on all queues
- Consumer: OperationConsumer emit writes JobEvent to parent job
- Health endpoints: /health (liveness) and /health/ready (DB + RabbitMQ)
- Structured JSON logging with --log-format flag in worker
- StaleJobReaper: marks RUNNING jobs as FAILED after 1h timeout
- BaseWorker: adaptive backoff for RetryLaterError (capped at 600s)
- Cancel endpoint: cancels both QUEUED and RUNNING children
- Composite indexes on (annotation_set_id, accession) and (prediction_set_id, accession)
- Taxonomy DB warmup at worker startup for prediction queues
- Multi-stage Dockerfile with healthcheck
- docker-compose: all 11 services with memory limits
New test files:
- test_logging.py (15 tests): JSONFormatter, configure_logging
- test_evaluation.py (+35 tests): load_children_map, build_negative_keys, compute_evaluation_data
- test_run_cafa_evaluation.py (50 tests): full operation coverage
- test_load_goa_annotations.py (+45 tests): GAF parsing, store buffer, execute
- test_load_quickgo_annotations.py (+33 tests): TSV parsing, pagination, ECO mapping
- test_annotations_router.py (70 tests): all 23 endpoints
- test_embeddings_router.py (+44 tests): configs, predict, prediction sets, CAFA TSV
- test_proteins_router.py (17 tests): stats, list, detail, annotations
- test_admin_router.py (4 tests): reset-db
- test_scoring_router.py (+7 tests): scored TSV, metrics

Extended test files:
- test_queue.py (+20 tests): OperationConsumer on_message, QueueConsumer retry, DLQ
- test_base_worker.py (+24 tests): parent cancel, two-session, publish, reaper, warmup
- test_core.py (+11 tests): fetch_uniprot_metadata paths
- test_infrastructure.py (+7 tests): health endpoints, app factory, pool config
- test_insert_proteins.py (+22 tests): FASTA parsing, store_records, pagination
- test_load_ontology_snapshot.py (+17 tests): OBO parsing, relationships, backfill
Architecture Decision Records (6 ADRs):
- 001: KNN on CPU, not pgvector or GPU
- 002: Two-session worker pattern
- 003: QueueConsumer vs OperationConsumer
- 004: Dead letter queue and retry strategy
- 005: Thread-local RabbitMQ connections
- 006: Sequence deduplication by MD5

Operational runbook covering: start/stop, health checks, scaling,
stuck jobs, batch failures, CUDA OOM, DLQ inspection, DB maintenance

RERANKER.md: formal spec for temporal holdout re-ranker (cross-attention
architecture, LambdaRank loss, WebDataset pipeline, LightGBM baseline)
…UI overhaul

Major features:
- Neural re-ranker: train_reranker operation, ReRankerModel ORM, reranker UI page
- Expanded CAFA evaluation pipeline with scoring router and detailed metrics
- Annotate router and showcase router for streamlined user workflows
- Floating jobs widget, breadcrumbs, context banner, tooltip components
- Frontend overhaul: redesigned pages, improved navigation, i18n updates
- Thesis PDF served from frontend

Infrastructure:
- 4 new Alembic migrations for re-ranker schema
- API deps module, extended scoring endpoints
- Experiment and evaluation helper scripts
- Updated documentation (results, evaluation architecture)
- Version bump to 0.3.0

Tests:
- New test suites: reranker, train_reranker, annotate router, showcase router, integration
- Expanded: predict_go_terms, compute_embeddings, scoring router, embeddings router
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 60.28834% with 606 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.07%. Comparing base (2eeb474) to head (c244d25).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
protea/core/operations/train_reranker.py 31.32% 410 Missing ⚠️
protea/core/operations/run_cafa_evaluation.py 20.35% 90 Missing ⚠️
protea/core/operations/predict_go_terms.py 41.66% 42 Missing ⚠️
protea/api/routers/scoring.py 79.12% 38 Missing ⚠️
protea/api/routers/annotate.py 92.92% 7 Missing ⚠️
protea/core/reranker.py 93.54% 6 Missing ⚠️
protea/core/operations/compute_embeddings.py 61.53% 5 Missing ⚠️
protea/infrastructure/queue/consumer.py 94.73% 2 Missing ⚠️
protea/infrastructure/queue/publisher.py 92.30% 2 Missing ⚠️
protea/api/routers/embeddings.py 92.85% 1 Missing ⚠️
... and 3 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main       #7       +/-   ##
===========================================
+ Coverage   65.01%   82.07%   +17.06%     
===========================================
  Files          55       63        +8     
  Lines        4550     5959     +1409     
===========================================
+ Hits         2958     4891     +1933     
+ Misses       1592     1068      -524     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@frapercan frapercan merged commit cd433b8 into main Mar 25, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant