refactor!: remove deprecated Eval API and related APIs by leseb · Pull Request #5290 · llamastack/llama-stack

leseb · 2026-03-25T15:56:22Z

Summary

Remove the Eval API and all connected APIs that are already marked deprecated in the spec:

/v1alpha/eval/ (benchmarks, jobs, evaluations)
/v1beta/datasets/ (dataset CRUD)
/v1alpha/scoring/ (scoring functions, scoring)

These were marked experimental/deprecated and are not part of the v1 API surface.

What changed

Removed provider registry entries for eval, scoring, post_training, datasetio
Removed inline/remote provider implementations
Removed routing tables and routers
Removed API models, protocols, and FastAPI routes from llama_stack_api
Removed distribution config entries
Updated tests: replaced benchmark_id tests with vector_store_id/shield_id, removed braintrust lazy import test
Removed starter-gpu distribution (leftover from merge)
Disabled external provider CI test (lmeval depends on eval API - switched to workflow_dispatch only until a suitable replacement is found)

Migration

Users of the eval API should use external evaluation frameworks (eval-hub, RAGAS, DeepEval) directly. The datasetio provider for HuggingFace datasets can be accessed via the HuggingFace datasets library.

Test plan

Unit tests pass (replaced benchmark tests with vector_store/shield tests)
Pre-commit passes
starter-gpu distribution removed
External provider CI test disabled (was using lmeval)

🤖 Generated with Claude Code

…o, scoring, scoring_functions, benchmarks) Remove the Eval API and all connected APIs that were already marked as deprecated in the spec. This includes the datasets, datasetio, scoring, scoring_functions, and benchmarks APIs along with all their provider implementations, routing tables, routers, registry entries, distribution configs, and tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

github-actions · 2026-03-25T15:57:10Z

✱ Stainless preview builds

This PR will update the llama-stack-client SDKs with the following commit message.

refactor!: remove deprecated Eval API and related APIs

Edit this comment to update it. It will appear in the SDK's changelogs.

✅ llama-stack-client-node studio · conflict

Your SDK build had at least one new note diagnostic, which is a regression from the base state.

New diagnostics (1 note)

💡 Model/Recommended: `#/components/schemas/ImageContentItem-Input` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/vector_io`.

✅ llama-stack-client-go studio · conflict

Your SDK build had at least one new note diagnostic, which is a regression from the base state.

New diagnostics (1 note)

💡 Model/Recommended: `#/components/schemas/ImageContentItem` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/$shared`.

✅ llama-stack-client-python studio · conflict

Your SDK build had at least one new note diagnostic, which is a regression from the base state.

New diagnostics (1 note)

💡 Model/Recommended: `#/components/schemas/ImageContentItem-Input` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/vector_io`.

✅ llama-stack-client-openapi studio · code · diff

Your SDK build had at least one "warning" diagnostic, but this did not represent a regression.
generate ⚠️

New diagnostics (1 note)

💡 Model/Recommended: `#/components/schemas/ImageContentItem-Input` could potentially be defined as a [model](https://www.stainless.com/docs/guides/configure#models) within `#/resources/vector_io`.

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-04-02 15:59:12 UTC

mergify · 2026-03-25T15:57:37Z

This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Sébastien Han <seb@redhat.com>

mergify · 2026-03-26T17:01:08Z

This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…nses rename Merge upstream/main into remove-eval-api branch, resolving conflicts caused by the agents-to-responses API rename (PR llamastack#5195) interacting with the eval API removal. This includes fixing distribution templates that still referenced the old "agents" API name, removing the stale DatasetPurpose import from template.py, cleaning up the stainless config to remove references to deleted eval/scoring/dataset endpoints, removing the now-orphaned nvidia datasetio test, and regenerating all OpenAPI specs and distribution configs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

Resolve merge conflicts from upstream/main. All modify/delete conflicts are resolved by keeping the deletions from this branch since the purpose of this branch is to remove the eval, scoring, datasetio, and benchmarks APIs. Content conflicts in datatypes.py files are resolved by excluding the eval-related classes that upstream added. Signed-off-by: Sébastien Han <seb@redhat.com>

Merge upstream/main into remove-eval-api branch. The Dell distribution was removed upstream, so the conflicting Dell files (config.yaml, dell.py, run-with-safety.yaml) are resolved by accepting the upstream deletion. Signed-off-by: Sebastien Han <seb@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

Merge upstream/main into remove-eval-api branch. The only conflict was in ifeval_utils.py which was modified upstream but deleted in this branch as part of the eval API removal. Kept the deletion since this branch removes the eval, scoring, datasetio, and benchmarks APIs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

These ifeval utility files were reintroduced during the merge with upstream/main. They belong to the scoring provider which was removed as part of the eval API removal in this branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

mergify · 2026-03-30T14:04:45Z

This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Merge upstream/main into remove-eval-api branch. All conflicts were modify/delete type where upstream modified files that this branch intentionally deleted as part of removing the Eval, DatasetIO, Scoring, and Benchmarks APIs. Resolved by keeping the deletions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

mergify · 2026-03-30T14:39:56Z

This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Merge upstream/main into remove-eval-api branch, resolving conflicts in the deprecated OpenAPI spec file by keeping the branch version which excludes eval-related deprecated endpoints (scoring functions, datasets, benchmarks) consistent with the purpose of this branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

The pre-commit OpenAPI codegen hook updated the deprecated spec file to match the current state of the codebase after merging upstream/main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

Replace benchmark_id tests with vector_store_id and shield_id since benchmark_id was removed from RESOURCE_ID_FIELDS with the eval API. Remove braintrust lazy import test since the braintrust scoring provider was removed along with the eval API. Switch external provider CI test from lmeval (which depends on eval API) to the weather/kaze external provider which has no eval dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

The starter-gpu distribution was removed in PR llamastack#5279 but survived the merge into this branch. Remove the leftover directory so CI no longer attempts to build it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

Take upstream's auto-discovery approach for router factories which dynamically discovers routers from the Api enum. Since this branch already removed eval-related APIs from the Api enum, the auto-discovery naturally excludes them without needing an explicit static list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

The lmeval external provider depends on the eval API which is being removed. The weather/kaze provider works but needs further validation. Disable the workflow until we find a suitable external provider for CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

The ClientVersionMiddleware rejects requests when the PyPI client version doesn't match the dev server version, causing false failures on every PR in the merge queue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

… links (llamastack#5411) ## Summary Comprehensive documentation fix addressing 12 issues found by cross-referencing docs against the current codebase. The docs had drifted significantly from the code, particularly around the agents-to-responses migration, provider registry changes, and API stability levels. - **API overview**: add missing APIs (Responses, Shields, Tools, Connectors, File Processors), fix stability levels (Datasets/DatasetIO are v1beta not v1alpha, Shields are stable not deprecated) - **Sidebar navigation**: remove Post Training category and sambanova-openai-compat (no longer in registry), add missing provider pages (responses, file_processors, inline_transformers, remote_oci, remote_llama-cpp-server, remote_passthrough, remote_openai) - **Stale provider pages**: delete docs for removed providers (remote_llama-cpp, remote_llama-server, remote_sambanova-openai-compat, inline_rag-runtime) - **Tools docs**: fix false claim that /v1/tools was removed, update RAG references from inline::rag-runtime to inline::file-search - **Configuration docs**: fix CLI syntax (positional arg not --config), update sample config from agents to responses - **Kubernetes configs**: update from agents to responses, replace inline::rag-runtime with inline::file-search - **Experimental APIs page**: rewrite to reflect actual v1alpha/v1beta APIs, remove stale 2024 roadmap - **Broken links**: fix relative paths in concepts/index, CLI reference, file operations docs - **CLI reference**: close unclosed code block - **SDK reference**: add deprecation notes for Agents and PostTraining sections - **Missing notebook**: remove link to non-existent Llama_Stack_Benchmark_Evals.ipynb > **Note**: `docs/docs/api-overview.mdx` overlaps with llamastack#5410. If that PR merges first, a trivial conflict resolution will be needed on that file (both PRs make the same type of corrections). ## Test plan - [ ] Run `npx docusaurus build` in docs/ to verify no broken links or build errors - [ ] Verify sidebar navigation shows all current providers - [ ] Verify stale provider pages are removed - [ ] Spot-check internal links resolve correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

) # What does this PR do? Addresses DoS due to too many multipart parts Refs: [GHSA-qjxf-f2mg-c6mc](GHSA-qjxf-f2mg-c6mc) Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

…ack#5423) ## Summary Major documentation cleanup and modernization across ~35 files, net -970 lines. ### New features - **`[starter]` pip extra** in pyproject.toml enabling `uvx --from 'llama-stack[starter]' llama stack run starter` - **InstallBlock component** on landing page with Ollama/OpenAI tabs, syntax highlighting, and copy button - **Lucide-style sidebar icons** for provider sub-categories (Inference, Safety, Vector IO, etc.) ### Content updates - Rewrite quickstart: `uvx` for experimenting, `uv init` for projects - Rewrite detailed tutorial: all examples use standard OpenAI SDK - Update all `agents` references to `responses` across distro configs and provider tables - Simplify concepts: architecture matches ARCHITECTURE.md, APIs page focuses on integration patterns - Rewrite Responses vs Agents page: Responses is primary, Agents is legacy - Clean up provider overview page - Fix broken tables, dead links, Sphinx toctree remnants ### Removed - iOS/Android SDK pages (untested, unmaintained) - Post Training provider docs (API removed) - On-Device Distributions sidebar section - Duplicate content across quickstart/tutorial/starting-server pages ## Test plan - [ ] `cd docs && npm install && npx docusaurus build` passes - [ ] Light and dark modes work - [ ] All sidebar links resolve - [ ] InstallBlock copy button works - [ ] API overview card links point to correct generated pages 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove eval/scoring categories from log.py - Remove stale entries from routing_tables, providers, registry READMEs - Fix open-benchmark docstring referencing removed APIs - Remove evaluation.md and scoring.md docs (APIs no longer exist) - Remove scoring sidebar entries - Regenerate OpenAPI specs to clean up empty tags Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 25, 2026

leseb added this to the 0.8.0 milestone Mar 25, 2026

Merge remote-tracking branch 'upstream/main' into remove-eval-api

49312a4

Signed-off-by: Sébastien Han <seb@redhat.com>

mergify bot added needs-rebase and removed needs-rebase labels Mar 25, 2026

fix: resolve agents→responses rename conflict

3e5c7d1

Signed-off-by: Sébastien Han <seb@redhat.com>

mergify bot added the needs-rebase label Mar 26, 2026

mergify bot removed the needs-rebase label Mar 26, 2026

leseb and others added 5 commits March 26, 2026 21:05

Merge branch 'main' into remove-eval-api

c445ccf

mergify bot added the needs-rebase label Mar 30, 2026

mergify bot removed the needs-rebase label Mar 30, 2026

mergify bot added the needs-rebase label Mar 30, 2026

leseb and others added 2 commits March 30, 2026 16:46

mergify bot removed the needs-rebase label Mar 30, 2026

leseb and others added 3 commits March 31, 2026 11:44

Merge branch 'main' into remove-eval-api

60edc21

style: apply ruff formatting

5940526

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

leseb and others added 3 commits March 31, 2026 13:56

leseb marked this pull request as ready for review April 2, 2026 07:23

leseb requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, mattf and raghotham as code owners April 2, 2026 07:24

leseb and others added 7 commits April 2, 2026 10:30

Merge remote-tracking branch 'upstream/main' into remove-eval-api

4f9a5b7

Merge branch 'main' into remove-eval-api

6a874da

fix(security): Pin tornado>=6.5.5 (GHSA-qjxf-f2mg-c6mc) (llamastack#5425

e535b27

) # What does this PR do? Addresses DoS due to too many multipart parts Refs: [GHSA-qjxf-f2mg-c6mc](GHSA-qjxf-f2mg-c6mc) Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor!: remove deprecated Eval API and related APIs#5290

refactor!: remove deprecated Eval API and related APIs#5290
leseb wants to merge 25 commits intollamastack:mainfrom
leseb:remove-eval-api

leseb commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

mergify bot commented Mar 25, 2026

Uh oh!

mergify bot commented Mar 26, 2026

Uh oh!

mergify bot commented Mar 30, 2026

Uh oh!

mergify bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leseb commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Migration

Test plan

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

mergify bot commented Mar 25, 2026

Uh oh!

mergify bot commented Mar 26, 2026

Uh oh!

mergify bot commented Mar 30, 2026

Uh oh!

mergify bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leseb commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Mar 25, 2026 •

edited

Loading