Add foundational support for durable storage#135
Merged
krisztianfekete merged 5 commits intomainfrom May 6, 2026
Merged
Conversation
18785bd to
99247be
Compare
99247be to
5c6d499
Compare
There was a problem hiding this comment.
Pull request overview
Adds opt-in durable storage and an async run pipeline backed by Postgres, enabling persistent run history/results and a queue-driven worker while keeping the default in-memory workflow unchanged.
Changes:
- Introduces storage abstractions (models + repos) with
memoryandpostgresbackends, plus SQL migrations and an asyncpg pool. - Adds
/api/runsendpoints + in-process async worker to claim/execute queued runs and persist results. - Updates CLI, docs, Makefile, Docker image, and Helm chart to support Postgres-backed deployments; adds comprehensive tests for the new behavior.
Reviewed changes
Copilot reviewed 43 out of 47 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Adds postgres extra and locks asyncpg. |
| pyproject.toml | Declares postgres optional dependency + ensures migrations are packaged in wheels. |
| Dockerfile | Installs postgres extra in the container image. |
| Makefile | Adds local Postgres + migrate + pg-backed dev server targets. |
| README.md | Documents Helm usage for Postgres backend and /api/runs. |
| DEVELOPMENT.md | Documents local dev flow for Postgres backend. |
| charts/agentevals/values.yaml | Adds chart values for storage backend + Postgres configuration. |
| charts/agentevals/templates/service.yaml | Scopes Service selector labels to avoid matching bundled Postgres pods. |
| charts/agentevals/templates/deployment.yaml | Wires env vars for Postgres backend (DSN/urlFile/bundled) into the app deployment. |
| charts/agentevals/templates/_helpers.tpl | Adds helper templates for app selectors and bundled Postgres resources. |
| charts/agentevals/templates/postgresql.yaml | Adds an optional bundled Postgres Deployment/Service/PVC. |
| charts/agentevals/templates/postgresql-secret.yaml | Adds bundled Postgres password Secret. |
| src/agentevals/storage/repos/postgres.py | Implements asyncpg-backed Session/Run/Result repositories. |
| src/agentevals/storage/repos/memory.py | Implements in-memory Session/Run/Result repositories for OSS default/testing. |
| src/agentevals/storage/repos/init.py | Defines repository protocols and Repos bundle. |
| src/agentevals/storage/postgres/pool.py | Adds asyncpg pool factory with readiness retry. |
| src/agentevals/storage/postgres/migrator.py | Adds migration discovery + advisory-lock-protected migrator. |
| src/agentevals/storage/postgres/migrations/000001_init.up.sql | Adds baseline schema for sessions/runs/results and supporting tables/indexes. |
| src/agentevals/storage/postgres/migrations/000001_init.down.sql | Adds destructive rollback (drop schema). |
| src/agentevals/storage/postgres/init.py | Documents Postgres backend intent. |
| src/agentevals/storage/models.py | Adds Pydantic models for persisted Run/Result and trace targets + result_id hashing. |
| src/agentevals/storage/config.py | Adds env-driven storage settings and validation. |
| src/agentevals/storage/init.py | Adds build_repos() factory for selecting backend. |
| src/agentevals/runner.py | Extends RunResult with optional run_id for persistence linkage. |
| src/agentevals/run/worker.py | Adds async worker pool for claiming/running queued work with heartbeat/cancellation. |
| src/agentevals/run/sinks.py | Adds sink fanout (stdout/file/http webhook) with best-effort delivery. |
| src/agentevals/run/service.py | Adds RunService for idempotent submit/list/cancel and /api/evaluate persistence. |
| src/agentevals/run/result_builder.py | Adds pure helpers to build persisted Result rows + run summary. |
| src/agentevals/run/fetcher.py | Adds inline/http trace fetchers for worker execution. |
| src/agentevals/run/init.py | Documents run pipeline modules. |
| src/agentevals/cli.py | Adds agentevals migrate CLI group (up/down/version/force/create). |
| src/agentevals/api/runs_routes.py | Adds /api/runs router (submit/get/list/results/cancel). |
| src/agentevals/api/routes.py | Persists /api/evaluate runs/results when run_service is configured and returns runId. |
| src/agentevals/api/app.py | On startup: loads storage settings, runs migrations, builds repos, wires RunService, starts worker. |
| tests/storage/test_models.py | Unit tests for storage models and deterministic result_id hashing. |
| tests/storage/test_migrator.py | Tests migration discovery/schema substitution + optional live PG tests behind env var. |
| tests/storage/test_memory_repos.py | Contract tests for memory repo behavior (runs/results/sessions). |
| tests/storage/test_config.py | Tests env loading + validation for StorageSettings. |
| tests/storage/init.py | Adds storage test package. |
| tests/run/test_sinks.py | Tests stdout/file/webhook sink behavior and fanout failure isolation. |
| tests/run/test_service.py | Tests RunService submit/idempotency/conflicts + /api/evaluate persistence path. |
| tests/run/test_result_builder.py | Tests result projection, status mapping, summary building, and evaluator classification. |
| tests/run/test_fetcher.py | Tests fetcher dispatch and validation paths. |
| tests/run/init.py | Adds run test package. |
| tests/api/test_runs_routes.py | HTTP-level tests for /api/runs endpoints (503 when unconfigured + happy paths via stub service). |
| tests/api/test_evaluate_persistence.py | HTTP-level tests that /api/evaluate persists when run_service is injected. |
| tests/api/init.py | Adds API test package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is opt-in (AGENTEVALS_STORAGE_BACKEND=postgres), so the existing in-memory developer experience is unchanged: agentevals run trace.json keeps working, the React UI behaves identically, OTLP streaming is untouched.
There is no proper UI support for this at the moment. It's a preview feature, expect breaking changes to the APIs and schema.
Setup
Look for these log lines on startup:
The async run pipeline (POST /api/runs)
Submit a run, watch the worker pick it up, read the persisted results back:
Idempotency, 409, and cancel
Existing /api/evaluate flows persist when backend=postgres
UI uploads, multipart curl, SSE stream, and the JSON variant all now write a Run row plus Result rows. The response carries an extra runId field that wasn't there before. No UI changes required.
Each call yields a new run row with target.kind = "uploaded". That's the OSS user-facing benefit of this PR: persistent run history for any eval that flows through the existing endpoints.
Inspecting the data in Postgres
Live tail while exercising the worker:
Crash recovery
Submit a slow run using a bigger trace, then Ctrl+C the agentevals process. Wait roughly 35 seconds (one lease window plus slack), restart with
make dev-backend-pg. The previously claimed run is re-claimed by a new worker via theSKIP LOCKEDpredicate and completes; the run row's attempt counter reads 2.Memory backend regression (zero-config flow unchanged)
Cleanup