Goal
Match ES behavior for typo-tolerant multi_match and cross-field relevance, and allow callers to request exact total hit counts.
Features
- MultiMatch fuzziness (
AUTO, numeric edit distance) for string queries.
- Combined/cross-fields scoring akin to ES
combined_fields (BM25F style).
- Exact total hits toggle
track_total_hits=true.
Implementation Plan
1) MultiMatch Fuzziness
- Extend
QueryNode::MultiMatch to accept fuzziness (enum: Auto, Edits(u8)).
- Planner: when fuzziness set, expand terms using bounded Levenshtein per field’s analyzer tokens; cap expansions (
max_expansions) and min_length similar to FuzzyOptions.
- Scoring: treat fuzzy expansions as additional terms with reduced boost (1 / (1+edit_distance)) or configurable.
- For single-term queries, allow Auto mapping: length 1–2 → 0, 3–5 → 1, 6+ → 2.
- Tests: typo cases (“pikchu” -> “pikachu”), ensure non-fuzzy default unaffected.
2) Combined Fields (Cross-Fields)
- Add
match_type: CrossFields in MultiMatch already exists; implement BM25F-like logic:
- Normalize term freq across listed fields; compute combined score using average field length.
- Planner builds a term group spanning fields; scorer uses aggregated tf/len.
- Default operator: AND vs OR controlled by
operator; support minimum_should_match.
- Tests: query spans name/set fields; verify relevance parity with ES behavior.
3) Exact Total Hits
SearchRequest gains track_total_hits: Option<bool> (default false to preserve speed).
- Reader:
- If flag true, compute exact doc count matching query (post-filter) without early termination; may reuse aggregation pipeline to count or run full collect with
DocCollector.
- If false, keep existing estimate.
- Response: reuse
total_hits_estimate; when exact, set field and maybe add boolean total_hits_exact=true to signal accuracy (optional but recommended).
Code Touchpoints
searchlite-core/src/api/types.rs: new fields/enums; serde defaults.
searchlite-core/src/query/planner.rs & scorer: implement cross-field tf aggregation and fuzzy expansions.
searchlite-core/src/api/reader.rs: track_total_hits handling; ensure cursors unaffected.
- HTTP layer: accept new JSON fields; validation errors on invalid fuzziness string.
Performance/Bounds
- Set per-request caps: fuzzy expansions max 50 terms per original token; fail fast if exceeded with 400.
- track_total_hits may be expensive; add warning log when enabled without limit.
Tests
- Unit: planner builds expected term groups; fuzzy expansion obeys limits.
- Integration: search request with
track_total_hits=true returns exact count on small fixture.
- Regression: ensure
profile/explain still work with new scoring paths.
Migration Notes
- Existing clients unaffected unless they opt-in to fuzziness/track_total_hits.
- Document default fuzzy behavior per Auto rules; provide examples mapping from ES queries used in managemco (
multi_match with AUTO on single-term searches).
Goal
Match ES behavior for typo-tolerant multi_match and cross-field relevance, and allow callers to request exact total hit counts.
Features
AUTO, numeric edit distance) for string queries.combined_fields(BM25F style).track_total_hits=true.Implementation Plan
1) MultiMatch Fuzziness
QueryNode::MultiMatchto acceptfuzziness(enum: Auto, Edits(u8)).max_expansions) and min_length similar toFuzzyOptions.2) Combined Fields (Cross-Fields)
match_type: CrossFieldsin MultiMatch already exists; implement BM25F-like logic:operator; supportminimum_should_match.3) Exact Total Hits
SearchRequestgainstrack_total_hits: Option<bool>(default false to preserve speed).DocCollector.total_hits_estimate; when exact, set field and maybe add booleantotal_hits_exact=trueto signal accuracy (optional but recommended).Code Touchpoints
searchlite-core/src/api/types.rs: new fields/enums; serde defaults.searchlite-core/src/query/planner.rs& scorer: implement cross-field tf aggregation and fuzzy expansions.searchlite-core/src/api/reader.rs: track_total_hits handling; ensure cursors unaffected.Performance/Bounds
Tests
track_total_hits=truereturns exact count on small fixture.profile/explainstill work with new scoring paths.Migration Notes
multi_matchwithAUTOon single-term searches).