feat: Fuzzy/cross-field search + exact total hits#103
Conversation
- stabilize nested sampling hash with fixed-width object index - validate nested agg paths against declared nested schema tree - share scoped path helpers between validation and aggregation execution - guard nested fast-field object lookups to the current doc range - remove nested finalize clone and add regression coverage
- reject descendant terms fields inside nested scopes - require nested-in-nested paths to be direct children of current scope - add regression tests for both validation errors
- block nested keyword paths at root scope for terms/collapse - block nested numeric paths at root scope for numeric aggs - add regression tests for root terms, stats, and collapse
- resolve nested paths via schema tree instead of dot-splitting - validate direct-child nested scope checks against schema structure - add regression coverage for dotted nested leaf and object names
There was a problem hiding this comment.
Pull request overview
This PR hardens and completes issue #91 by extending Searchlite’s query and aggregation capabilities (multi_match fuzziness + cross_fields scoring, exact hit counting toggle, and nested aggregations) and adding broad test/bench/doc coverage to lock in the behavior across core, HTTP, CLI, and FFI.
Changes:
- Add
multi_match.fuzziness(AUTO+ explicit edits) and improvecross_fieldsscoring behavior. - Add
track_total_hitsrequest flag to force exact total hit counting (disabling score-only fast paths). - Implement nested aggregations with scoped path resolution/validation, nested fastfield access, plus extensive tests, benches, schema/docs updates, and feature-hardening tooling.
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| searchlite-http/src/lib.rs | Updates HTTP tests to include track_total_hits and adds coverage for nested aggs + invalid fuzziness requests. |
| searchlite-ffi/src/lib.rs | Plumbs track_total_hits through the FFI search request construction. |
| searchlite-core/tests/vector_search.rs | Updates vector search fixtures for the new track_total_hits request field. |
| searchlite-core/tests/sorting.rs | Updates sorting tests’ request literals to include track_total_hits. |
| searchlite-core/tests/smoke.rs | Adds exact-hit counting coverage for WAND (track_total_hits=true) and updates request literals. |
| searchlite-core/tests/regressions.rs | Updates regression test request literals to include track_total_hits. |
| searchlite-core/tests/query_ast.rs | Adds serde/AST coverage for multi_match.fuzziness parsing and validation behavior. |
| searchlite-core/tests/pruning.rs | Updates pruning tests’ request literals to include track_total_hits. |
| searchlite-core/tests/prefix_and_suggest.rs | Updates prefix/suggest tests’ request literals to include track_total_hits. |
| searchlite-core/tests/multi_field.rs | Adds cross_fields fuzziness + determinism tests; updates MultiMatch structs with fuzziness. |
| searchlite-core/tests/function_score.rs | Updates function-score tests’ request literals to include track_total_hits. |
| searchlite-core/tests/coverage.rs | Updates coverage tests’ request literals to include track_total_hits. |
| searchlite-core/tests/analyzers.rs | Updates analyzer tests’ request literals to include track_total_hits. |
| searchlite-core/tests/aggregations.rs | Adds extensive nested aggregation test coverage (counts, scoping rules, dotted names, validation). |
| searchlite-core/tests/aggregation_bounds.rs | Updates aggregation bounds tests’ request literals to include track_total_hits. |
| searchlite-core/src/util/path_scope.rs | Introduces helpers for resolving absolute vs scoped dotted paths used by nested aggs. |
| searchlite-core/src/util/mod.rs | Exposes the new path_scope utility module. |
| searchlite-core/src/query/planner.rs | Plumbs MultiMatchFuzziness, normalizes multi_match field specs, and dedupes field boosts. |
| searchlite-core/src/query/aggs/mod.rs | Adds nested aggregation execution, scoped collection, and per-object sampling support. |
| searchlite-core/src/index/segment.rs | Fixes nested indexing to skip null slots in nullable arrays and maintain correct parent/object indexing. |
| searchlite-core/src/index/fastfields.rs | Adds nested_str_values_at for efficient per-object nested string access; adds a regression test. |
| searchlite-core/src/api/types.rs | Adds MultiMatchFuzziness, track_total_hits, and NestedAggregation types and response variants. |
| searchlite-core/src/api/reader.rs | Implements exact-hit toggle behavior, cross_fields scoring support, and nested agg validation logic. |
| searchlite-core/benches/end_to_end.rs | Updates benchmarks for new request field and adjusts benchmark documents. |
| searchlite-core/benches/aggs.rs | Adds a nested-aggregation benchmark and updates requests for track_total_hits. |
| searchlite-core/Cargo.toml | Registers Criterion benches explicitly. |
| searchlite-cli/src/main.rs | Updates CLI request construction/tests to include track_total_hits. |
| search-request.schema.json | Documents track_total_hits and adds nested aggregation schema definition. |
| docs/feature-hardening/issue-91-fuzzy-cross-fields-track-total-hits/matrix.md | Adds a feature-hardening matrix documenting invariants, risks, and verification steps for issue #91. |
| README.md | Documents nested aggregations, adds a nested facet example, and notes operational costs. |
| .codex/skills/feature-hardening/scripts/update_feature_matrix.py | Adds automation to update feature-hardening matrix metadata from git diff. |
| .codex/skills/feature-hardening/scripts/run_feature_hardening.sh | Adds a scripted quality gate (fmt/build/test/clippy + optional bench) for feature hardening. |
| .codex/skills/feature-hardening/scripts/install_pre_push_hook.sh | Adds optional pre-push hook to run the hardening gate. |
| .codex/skills/feature-hardening/scripts/init_feature_hardening.py | Adds matrix initializer from a template. |
| .codex/skills/feature-hardening/references/matrix-template.md | Adds the feature-hardening matrix template. |
| .codex/skills/feature-hardening/references/ci-snippet.md | Provides a CI integration snippet for the hardening workflow. |
| .codex/skills/feature-hardening/agents/openai.yaml | Adds agent metadata for the feature-hardening workflow. |
| .codex/skills/feature-hardening/SKILL.md | Documents the feature-hardening process and expected artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 20488cf42b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- cache scoped nested term field resolution in TermsCollector hot path - precompute cross-fields dedup list and avoid repeated cache-key/stats dedup work - restore root agg nested-only rejection while allowing top-level dotted collisions - add regression tests for dotted top-level agg-field collision cases
|
@codex review |
|
Codex Review: Didn't find any major issues. Can't wait for the next one! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
search_after(feat: mget &search_after#95)