Skip to content

Search produces false positives from substring matching and ignores sort/body context #253

@jesserobbins

Description

@jesserobbins

Problem

While importing my old 1990's era email archives, I discovered a number of issues with search. I have a pull request with what I think is a fix #252

1. Substring matching produces false positives

From my testing, text search uses ILIKE '%term%', which matches anywhere within a word. Searching for "art" pulls back messages about "starting", "partners", "articles" — burying the actual matches. With short search terms and a large archive, this gets noisy fast.

Both the Parquet fast-search path (DuckDB regexp_matches) and the aggregate search conditions in buildAggregateSearchConditions seem to behave this way.

2. Fast search misses body-only matches

I noticed that when a search term only appears in the message body (not the subject or sender), fast search doesn't find it. You have to manually switch to deep search mode. The snippet field exists in the Parquet data, but as far as I can tell it isn't being surfaced during fast search — so messages where the relevance is in the body text are effectively invisible in the default mode.

3. Search results ignore user sort preferences

From what I can see, Search(), SearchFast(), and SearchFastWithStats() don't accept a sorting parameter. Results come back in a fixed order regardless of what you've selected in the TUI (Name, Count, or Size). If you've explicitly chosen a sort field and then search, the sort just silently resets.

Affected components

  • internal/query/duckdb.gobuildSearchConditions, buildAggregateSearchConditions, fast search paths
  • internal/query/sqlite.go — FTS5 query construction
  • internal/query/engine.goEngine interface signatures
  • All callers: TUI (internal/tui/model.go), API (internal/api/handlers.go), MCP (internal/mcp/handlers.go), CLI (cmd/msgvault/cmd/search.go), remote engine

Expected behavior

  • Word-boundary matching: Searching "art" should match "art" and "Art" but not "starting" or "partners". Word-boundary regex (\b) for DuckDB and FTS5 prefix matching for SQLite could address this.
  • Snippet surfacing: Fast search should check snippet content so body-only matches show up without requiring deep search.
  • Sort propagation: Search results should respect the user's selected sort field. Propagating a MessageSorting parameter from callers through all search methods would fix this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions