Skip to content

feat(search): wave 4 R1+R2 — aggregated llmUsage + per-leg max_tokens (v0.11.0)#64

Merged
us merged 4 commits into
mainfrom
feat/search-wave-4-r1-r2-r3
May 30, 2026
Merged

feat(search): wave 4 R1+R2 — aggregated llmUsage + per-leg max_tokens (v0.11.0)#64
us merged 4 commits into
mainfrom
feat/search-wave-4-r1-r2-r3

Conversation

@us
Copy link
Copy Markdown
Owner

@us us commented May 30, 2026

Summary

Wave 4 engine support for managed /v1/search dynamic pricing (consumed by crw-saas).

  • R1: /v1/search always returns an aggregated llmUsage object (provider, model, executedSummaries, answerExecuted, input/output tokens, cache counters) — present even when zero results / no LLM call ran, with provider/model falling back to the configured leg.
  • R2: per-leg max_tokens cap.
  • Acceptance test: search_llm_usage_always_present_on_zero_results guards the R1 invariant (commit 285d113); regression-verified (fails if llm_attempted is removed). cargo test -p crw-server --test search_route → 10 pass; clippy clean.

Tagged v0.11.0. crw-saas managed pricing requires this deployed to all engine pods before its SEARCH_DYNAMIC_PRICING flip.

us added 2 commits May 29, 2026 14:49
R1 — aggregate llmUsage across answer + summaries.
  - crw-core::types::LlmUsage: add executed_summaries (u32) and
    answer_executed (bool) counters. SaaS 5-branch fail-closed
    dispatch keys off these to disambiguate "no work" from
    "work but missing telemetry".
  - search_inner: build a single aggregated LlmUsage at every
    return path once LLM mode is attempted (R1 always-present
    invariant). Summary fan-out usages + answer usage merge via
    saturating_add; provider/model fall back to the leg's config
    when no per-call usage was reported.
  - attach_result_summaries returns Vec<Option<LlmUsage>> alongside
    the ok/failed count.

R2 — per-leg max_tokens cap.
  - SEARCH_LLM_MAX_TOKENS_PER_LEG = 1024 applied via
    cfg.clone().max_tokens.min(...). Mirror in
    crw-saas/src/lib/llm-pricing.ts::legCost so engine cost can
    never exceed the SaaS pre-reservation.

R3 — workspace version bump 0.10.0 -> 0.11.0 across all crates and
path-dep version pins. CARGO_PKG_VERSION flows into /health JSON
automatically; SaaS REQUIRED_ENGINE_VERSION should target "0.11.0".

Tests: cargo test -p crw-server -p crw-core --lib (80 passed),
cargo test -p crw-extract --lib (113 passed). clippy clean.
Acceptance test guarding the Wave 4 R1 invariant: /v1/search returns a
populated llmUsage object (counters zero, provider/model from leg
config) even when SearXNG yields zero results and no LLM call runs.
Adds a sibling test app helper that wires the [extraction.llm] leg.
Copilot AI review requested due to automatic review settings May 30, 2026 11:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

us added 2 commits May 30, 2026 15:11
Cargo workspace.package.version was bumped to 0.11.0 (v0.11.0 tag) but
the python/npm/server version surfaces were left at 0.10.0, failing the
release preflight. Bump them all to match so the engine release can ship.
Resolved Cargo.toml conflicts: kept main's new crw-diff + crw-monitor
path-deps (Monitor feature) and bumped ALL internal path-dep versions to
0.11.0 to match the workspace bump. crw-core/types.rs/search.rs auto-merged.
cargo build (25 crates) OK; search_route 10 tests pass; preflight OK
(0.11.0, 10 published crates).
@us us merged commit 3570f82 into main May 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants