feat(search): wave 4 R1+R2 — aggregated llmUsage + per-leg max_tokens (v0.11.0)#64
Merged
Conversation
R1 — aggregate llmUsage across answer + summaries.
- crw-core::types::LlmUsage: add executed_summaries (u32) and
answer_executed (bool) counters. SaaS 5-branch fail-closed
dispatch keys off these to disambiguate "no work" from
"work but missing telemetry".
- search_inner: build a single aggregated LlmUsage at every
return path once LLM mode is attempted (R1 always-present
invariant). Summary fan-out usages + answer usage merge via
saturating_add; provider/model fall back to the leg's config
when no per-call usage was reported.
- attach_result_summaries returns Vec<Option<LlmUsage>> alongside
the ok/failed count.
R2 — per-leg max_tokens cap.
- SEARCH_LLM_MAX_TOKENS_PER_LEG = 1024 applied via
cfg.clone().max_tokens.min(...). Mirror in
crw-saas/src/lib/llm-pricing.ts::legCost so engine cost can
never exceed the SaaS pre-reservation.
R3 — workspace version bump 0.10.0 -> 0.11.0 across all crates and
path-dep version pins. CARGO_PKG_VERSION flows into /health JSON
automatically; SaaS REQUIRED_ENGINE_VERSION should target "0.11.0".
Tests: cargo test -p crw-server -p crw-core --lib (80 passed),
cargo test -p crw-extract --lib (113 passed). clippy clean.
Acceptance test guarding the Wave 4 R1 invariant: /v1/search returns a populated llmUsage object (counters zero, provider/model from leg config) even when SearXNG yields zero results and no LLM call runs. Adds a sibling test app helper that wires the [extraction.llm] leg.
Cargo workspace.package.version was bumped to 0.11.0 (v0.11.0 tag) but the python/npm/server version surfaces were left at 0.10.0, failing the release preflight. Bump them all to match so the engine release can ship.
Resolved Cargo.toml conflicts: kept main's new crw-diff + crw-monitor path-deps (Monitor feature) and bumped ALL internal path-dep versions to 0.11.0 to match the workspace bump. crw-core/types.rs/search.rs auto-merged. cargo build (25 crates) OK; search_route 10 tests pass; preflight OK (0.11.0, 10 published crates).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wave 4 engine support for managed
/v1/searchdynamic pricing (consumed by crw-saas)./v1/searchalways returns an aggregatedllmUsageobject (provider, model, executedSummaries, answerExecuted, input/output tokens, cache counters) — present even when zero results / no LLM call ran, with provider/model falling back to the configured leg.max_tokenscap.search_llm_usage_always_present_on_zero_resultsguards the R1 invariant (commit285d113); regression-verified (fails ifllm_attemptedis removed).cargo test -p crw-server --test search_route→ 10 pass; clippy clean.Tagged v0.11.0. crw-saas managed pricing requires this deployed to all engine pods before its
SEARCH_DYNAMIC_PRICINGflip.