Cross-tool efficiency: per-corpus reference artifact + docs by Mathews-Tom · Pull Request #367 · Mathews-Tom/archex

Mathews-Tom · 2026-06-23T01:26:00Z

Summary

Generates and checks in the per-corpus cross-tool efficiency reference artifact and documents the number. Builds on #366 (the tokens-at-fixed-recall metric).

What this adds

benchmarks/cross-tool-efficiency/cross-tool-comparison.json: the reference artifact, produced from a clean run of archex benchmark cross-tool over the benchmark task set. Metric values are not hand-edited.
docs/LOCAL_METRICS.md: a "Cross-tool efficiency (offline benchmark)" section citing the artifact and its per-corpus figures, stating the number is offline benchmark-only and never enters the in-process ledger or archex metrics summary.
A test asserting the checked-in artifact parses, grades the localization family as its own corpus, holds recall equal in every scored comparison, and renders.

Per-corpus reference values (tokens at 100% required-file recall)

Corpus	Naive model	Comparable	archex	naive	Reduction
self	full_file	16/24	9,484	4,416,681	99.8%
self	grep_window	16/24	9,484	2,626,845	99.6%
external-comprehension	full_file	16/19	22,681	783,725	97.1%
external-comprehension	grep_window	16/19	22,681	492,119	95.4%
external-localization	full_file	20/21	13,247	469,836	97.2%
external-localization	grep_window	20/21	13,247	408,410	96.8%

"Comparable" counts only tasks where both paths reach 100% required-file recall, so no figure compares unequal recall. Localization is graded as its own corpus, never merged with comprehension.

Artifact location

.archex/ is gitignored repo-wide (and by a user-global ignore), so the committed reference artifact lives under the tracked benchmarks/ tree rather than punching negation holes in the ignored workspace dir. The archex benchmark cross-tool default output stays .archex/cross-tool-efficiency for ephemeral ad-hoc runs.

Out of scope (unchanged)

No in-process metrics ledger, metrics reporter, retrieval ranking, or default change.

Stack

Stack-Id: cross-tool-efficiency-cfdfb5
Base: feat/cross-tool-token-model
Position: 2/2

feat/cross-tool-token-model -> Cross-tool token-efficiency: naive baseline + tokens-at-fixed-recall metric #366
feat/cross-tool-artifact-docs -> this PR

Depends on: #366

Validation

uv run ruff check / ruff format --check / uv run pyright on changed Python — pass
uv run pytest tests/benchmark/test_cross_tool.py tests/benchmark/test_reporter.py --no-cov — 80 passed
Doc figures verified byte-for-byte against the checked-in artifact aggregates

Generated by `archex benchmark cross-tool --tasks-dir benchmarks/tasks --output benchmarks/cross-tool-efficiency` over the benchmark task set. Aggregates tokens-at-100%-required-file-recall per corpus (self, external-comprehension, external-localization graded separately), archex_query vs naive full-file and grep-window reads, recall held equal. Tracked under benchmarks/ (the .archex/ default output stays ephemeral/ignored). Stack-Id: cross-tool-efficiency-cfdfb5 Stack-Position: 2/2

Assert the reference artifact parses, grades the localization family as its own corpus (disjoint from comprehension), holds recall equal in every scored comparison, and renders via format_cross_tool_comparison. Stack-Id: cross-tool-efficiency-cfdfb5 Stack-Position: 2/2

Add a cross-tool efficiency section to docs/LOCAL_METRICS.md citing the checked-in artifact and its per-corpus reduction figures, stating the number is offline benchmark-only and never enters the in-process metrics ledger or summary. Stack-Id: cross-tool-efficiency-cfdfb5 Stack-Position: 2/2

The artifact shows every excluded (non-comparable) task is an archex recall miss, not a naive miss: the naive grep/read path reaches full recall on every comparable task. Rewrite the caveat to state this honestly and note the reduction is conditioned on archex fully localizing the task, so it never reads as 'archex always localizes cheaper'. Stack-Id: cross-tool-efficiency-cfdfb5 Stack-Position: 2/2

Mathews-Tom force-pushed the feat/cross-tool-artifact-docs branch from 7040125 to 5de743c Compare June 23, 2026 01:46

Mathews-Tom added 4 commits June 23, 2026 07:27

Mathews-Tom force-pushed the feat/cross-tool-artifact-docs branch from 5de743c to f69874f Compare June 23, 2026 01:57

Mathews-Tom changed the base branch from feat/cross-tool-token-model to main June 23, 2026 01:57

Mathews-Tom merged commit 85e86f0 into main Jun 23, 2026
6 checks passed

Mathews-Tom deleted the feat/cross-tool-artifact-docs branch June 23, 2026 02:02

Mathews-Tom mentioned this pull request Jun 23, 2026

chore(release): v0.15.0 #368

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-tool efficiency: per-corpus reference artifact + docs#367

Cross-tool efficiency: per-corpus reference artifact + docs#367
Mathews-Tom merged 4 commits into
mainfrom
feat/cross-tool-artifact-docs

Mathews-Tom commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mathews-Tom commented Jun 23, 2026

Summary

What this adds

Per-corpus reference values (tokens at 100% required-file recall)

Artifact location

Out of scope (unchanged)

Stack

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant