Skip to content

feat(SVA-28): in-memory FTS5 search for bear-search-notes (v3.0.0)#113

Closed
vasylenko wants to merge 7 commits intomainfrom
vasylenko/sva-28-in-memory-fts5-search
Closed

feat(SVA-28): in-memory FTS5 search for bear-search-notes (v3.0.0)#113
vasylenko wants to merge 7 commits intomainfrom
vasylenko/sva-28-in-memory-fts5-search

Conversation

@vasylenko
Copy link
Copy Markdown
Owner

@vasylenko vasylenko commented May 4, 2026

Summary

  • Replaces the LIKE-substring backend of bear-search-notes with an in-memory SQLite FTS5 index built on demand from Bear's read-only database, rebuilt when the source data drifts.
  • The term parameter now accepts FTS5 query syntax instead of literal substring matches; multi-word bareword input is OR-joined so BM25 ranks results by overlap density rather than implicit-AND filtering. Punctuation in agent queries (hyphens, colons, brackets) is stripped at the query layer via \w+\*? tokenization, so natural-language terms like "over-engineering coaching" no longer dead-end as rigid phrase matches.
  • Results are ranked by BM25 relevance instead of modification date and every result carries an 80-token snippet field with matched terms wrapped in [...].
  • No persistent derived state. Bear syncs notes across machines via iCloud, so a persisted index would diverge; instead the server lazily builds the index on first search per process (~70 ms / 229 notes empirically). Drift between the live DB and the cached index is detected via MAX(ZMODIFICATIONDATE) + COUNT(*) of active notes — MAX alone misses bulk imports of pre-dated notes whose stale timestamps don't move the maximum, and the combined check still fits one sub-millisecond SELECT.
  • Tokenizer is unicode61, remove_diacritics=2. Tag-join table names are resolved at runtime via Z_PRIMARYKEY because Bear's Core Data entity IDs (e.g. Z_5TAGS / Z_13TAGS) shift across schema migrations — hardcoded literals silently break on renumbered libraries; an integer guard provides defense in depth.
  • FTS5 syntax errors return a structured envelope with operator hints rather than raw SQLite messages. The previous LIKE-substring fallback is removed entirely (no compatibility shim) — this is the reason for the v3.0.0 major bump.

Why

A/B eval comparing the published v2.11.0 baseline against the local v3.0.0 build via the Claude Agent SDK answering natural-language questions through bear-search-notes. Two independent runs of n=15 each (n=30 combined per prompt × provider cell) across 4 prompt categories: verbatim-quote retrieval, specific-item discrimination, paraphrase resilience, and multi-note synthesis.

Headline numbers at n=30:

  • Content correctness: v3.0.0 118/120 (98.3%) vs v2.11.0 92/120 (76.7%).
  • Wins three of four prompts decisively (Fisher exact p<0.001 on verbatim and specific). Synthesis is a true tie at 28/30 each — when both providers can read enough notes, the retrieval-engine difference disappears.
  • Tool-call efficiency: v3.0.0 averages 5.5 MCP tool calls per task vs v2.11.0's 16.6 — ~3× fewer in aggregate (per-prompt: verbatim 4.9×, specific 7.5×, paraphrase 3.1×, synthesis 1.3×).
  • Cost per successful answer: $0.20 (v3.0.0) vs $0.21 (v2.11.0). Per-run cost is +22% for v3.0.0 due to snippet-rich responses, more than offset by the higher success rate.

The 80-token snippet width was chosen after the prototype's 32-token default proved too narrow for usable context. The query-layer punctuation handling was driven by eval data showing a 73% zero-hit rate on hyphen/colon-containing queries when the prior phrase-quote branch turned natural language into rigid token-order phrase matches.

Breaking changes (v3.0.0)

  • bear-search-notes term parameter is now FTS5 query syntax, not literal substring.
  • Results ranked by BM25 relevance, not modification date.
  • New snippet field on every result with matched terms wrapped in [...].
  • FTS5 syntax errors return a structured envelope with operator hints instead of raw SQLite messages.
  • LIKE-substring fallback removed entirely; no shim.

vasylenko added 5 commits May 4, 2026 20:25
Bear stores note↔tag relations in Core Data join tables whose names
embed entity IDs (Z_5TAGS, Z_13TAGS, etc.) assigned at data-model
compile time. The IDs can shift across Bear schema migrations, and
hardcoded literals silently break on renumbered libraries.

This module resolves the IDs at runtime via Z_PRIMARYKEY (Core Data's
entity registry) and verifies the resulting table names exist via
PRAGMA table_info, surfacing missing relations as clear startup errors
instead of cryptic SQL failures at query time. A defense-in-depth
integer guard rejects malformed Z_ENT values before they reach SQL
identifier interpolation.
Replaces LIKE-substring matching with a SQLite FTS5 virtual table
(unicode61 tokenizer, remove_diacritics=2) built in :memory: from
Bear's read-only DB. BM25 ranks results by term-density rather than
modification date.

Drift detection compares the cached MAX(ZMODIFICATIONDATE) and
COUNT(*) of active notes against current values; a mismatch triggers
a full rebuild. Both aggregates fit one SELECT and run sub-millisecond.
The MAX+COUNT pair (rather than MAX alone) catches bulk imports of
pre-dated notes whose timestamps don't move the maximum.

Architecture: no persistent state. Bear syncs across the user's
machines via iCloud; persistent derived state would diverge without
coordination. In-memory rebuild satisfies cross-Mac consistency by
construction at ~70 ms / 229 notes empirical cost.

Query handling (prepareFTS5Term): user-supplied terms with FTS5
syntax (quoted phrases, parentheses, uppercase boolean operators)
pass through verbatim. Multi-word natural-language input is
tokenized via \\w+\\*? — which already strips incidental punctuation
the agent adds (hyphens, colons, brackets) — and OR-joined so BM25
ranks by overlap density rather than implicit-AND filtering out
notes missing any single token. FTS5 syntax errors surface as
structured tool-side errors with operator hints rather than raw
SQLite messages.

Includes parallel note_tags side-table (rowid, tag, pinned_in_tag)
for tag-aware filtering kept separate from the FTS5 corpus, plus
unit and system test coverage for the build, drift, query, snippet,
and error paths.
Operations layer:
- searchNotes() in operations/notes.ts becomes a thin adapter that
  resolves user-facing date strings into Core Data timestamps and
  delegates to searchByQuery in the new FTS5 index module. The prior
  LIKE-substring path is removed cleanly with no fallback shim.
- tags.ts switches to the runtime schema discovery utility, removing
  the last hardcoded Z_5TAGS reference.
- bear-encoding.ts retains decodeTagName as the single Unicode-aware
  tag normalizer used on both the index-build and query sides; tag
  matching never silently diverges between paths.

Tool layer:
- bear-search-notes description tuned for natural-language queries
  (multi-word phrases match best because BM25 ranks by relevance).
- Search results now render an inline snippet under the title with
  matched terms wrapped in [...]. For term searches the snippet is
  produced by FTS5 snippet() at width 80; for filter-only queries
  it is the body's leading prefix. Newlines are collapsed so each
  result occupies one rendered line per metadata field.
Adds a head-to-head eval comparing the v2.11.0 baseline (last
pre-FTS5 release) against the local v3.0.0 build, run as the Claude
Agent SDK answering natural-language questions that route through
the bear-search-notes tool.

Four prompt categories at n=15 repeats per (prompt × provider):
  1. Verbatim quote from a known recent note
  2. Specific saved item discoverable among many of the same kind
  3. Find a lesson when query words differ from note words
     (paraphrase resilience, OR-rank vs implicit-AND)
  4. Synthesis across multiple related notes

Shared default-test.yaml emits MCP tool-call count and turn count
as informational namedScores, with no pass/fail gating on either —
different providers legitimately use different call counts for the
same task, and the gap is the signal we want surfaced in the
report rather than a binary cap.

Two design rules baked into the YAML footer:
  - Prompts must explicitly direct search ('Search my Bear notes…')
    so the SDK's memory-first orientation doesn't bypass MCP entirely.
  - Probe queries a real user would naturally ask. Synthetic
    capability tests (boolean operators, exact-phrase quoting,
    prefix wildcards) belong in unit tests, not paid LLM evals.
Architecture documentation:
- docs/dev/SPECIFICATION.md gains a Search section describing the
  in-memory FTS5 design: drift signal, build path, query handling,
  schema discovery, concurrency invariant, load-bearing FTS5
  availability assumption.
- docs/dev/BEAR_DATABASE_SCHEMA.md notes the runtime-discovered
  join tables and the ZSEARCHTEXT field used for OCR text.

Release notes:
- CHANGELOG entry for the FTS5 cutover documents the breaking
  contract change (term parameter is now FTS5 syntax, results
  ranked by BM25 not modification date), the new snippet field,
  the structured FTS5-syntax-error envelope, and the removal of
  the LIKE-substring fallback. Major-version bump (v3.0.0)
  because these are breaking changes per semver.

Tooling:
- Taskfile.yml gains an eval:setup task for installing released
  baselines into evals/released, plus eval:run wiring for the
  new fts5-promptfooconfig.

Version bump (2.12.0 → 3.0.0) across package.json, package-lock,
manifest.json, and src/config.ts. README and docs/user/NPM.md
tool-count refreshed.
Copilot AI review requested due to automatic review settings May 4, 2026 18:59
@vercel
Copy link
Copy Markdown

vercel Bot commented May 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
bear-notes-mcp Ready Ready Preview, Comment May 4, 2026 8:34pm

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades bear-search-notes from a SQL LIKE-based search to a lazily-built, in-memory SQLite FTS5 index, adding BM25 relevance ranking and per-result snippets, while also making tag-join handling resilient to Bear Core Data schema renumbering.

Changes:

  • Implement in-memory FTS5 indexing (fts-index) with drift detection, BM25 ranking, and snippet generation; bear-search-notes now interprets term as FTS5 syntax.
  • Add runtime Bear schema discovery (discoverBearSchema) to avoid hardcoded Z_<n>TAGS / Z_<n>PINNEDINTAGS table names.
  • Expand tests/docs/evals to cover the new behavior and bump version to 3.0.0.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/system/fts5-search.test.ts System-level coverage for FTS5 behavior (ranking, snippets, OCR, tags, syntax errors).
Taskfile.yml Adds eval:fts5 task and strengthens eval isolation via a fresh CLAUDE_CONFIG_DIR.
src/tools/note-tools.ts Updates bear-search-notes tool description and renders per-result snippets in output.
src/operations/tags.ts Switches tag queries to runtime-discovered join table/columns (no hardcoded entity IDs).
src/operations/notes.ts Replaces SQL LIKE search with searchByQuery against the in-memory FTS index.
src/operations/bear-encoding.ts Reframes decodeTagName as the normalization source of truth (docs updated).
src/infra/fts-index.ts New in-memory FTS5 index builder + query engine with drift detection, snippets, and error remapping.
src/infra/fts-index.test.ts Unit tests for indexing, drift detection, ranking, punctuation handling, tags, pinned, snippets, and errors.
src/infra/bear-schema.ts New runtime schema discovery for Bear Core Data join tables via Z_PRIMARYKEY.
src/infra/bear-schema.test.ts Tests schema discovery plus a CI guard that Node’s bundled SQLite supports required FTS5 features.
src/config.ts Bumps APP_VERSION to 3.0.0.
README.md Updates tool docs to describe relevance-ranked search + snippets + operator support.
package.json Bumps package version to 3.0.0 and updates promptfoo version.
manifest.json Updates manifest version and tool description for the new search behavior.
evals/shared/default-test.yaml Adds shared promptfoo metric emitters for MCP toolCalls and numTurns.
evals/promptfooconfig.yaml Improves provider isolation and reuses shared defaultTest assertions.
evals/fts5-promptfooconfig.yaml Adds dedicated SVA-28 A/B eval config using the shared defaultTest metrics.
docs/user/NPM.md Updates user-facing tool list to reflect new search capabilities.
docs/dev/SPECIFICATION.md Documents the in-memory FTS5 architecture and its rationale.
docs/dev/BEAR_DATABASE_SCHEMA.md Updates schema/query documentation to reflect the FTS5 index approach and runtime join discovery.
CHANGELOG.md Documents the breaking search changes, new snippets, and new structured syntax errors.

Comment thread src/infra/fts-index.ts
const params: (string | number)[] = [];

if (spec.tag) {
const normalizedTag = spec.tag.trim().toLowerCase();
Comment thread src/infra/fts-index.ts
Comment on lines +309 to +317
if (spec.pinned === true) {
clauses.push('n.rowid IN (SELECT rowid FROM note_tags WHERE tag = ? AND pinned_in_tag = 1)');
params.push(normalizedTag);
} else {
clauses.push(
"n.rowid IN (SELECT rowid FROM note_tags WHERE tag = ? OR tag LIKE ? || '/%' ESCAPE '\\')"
);
params.push(normalizedTag, escapedTag);
}
Comment thread src/infra/fts-index.ts
return result;
});

return { notes, totalCount: countMatches(memDb, spec) };
Comment thread src/infra/fts-index.ts
Comment on lines +509 to +510
// Inline (rather than importing operations/bear-encoding.ts) to keep the infra
// layer free of operations-layer dependencies. The constant lives in config.
Comment on lines +4 to +11
* Decodes and normalizes Bear tag names. Single source of truth for tag
* normalization — `src/infra/fts-index.ts` (insertNoteTags) and the search
* query path (`buildFilterClauses`) both call this so the index side and
* the query side use identical Unicode-aware case folding. Doing this in
* JS rather than SQL is deliberate: SQLite's built-in `LOWER()` is ASCII-
* only, while JS `toLowerCase()` folds Unicode (e.g. `CAFÉ` → `café`),
* which is required for non-ASCII tag matching to work.
*
Comment thread docs/dev/SPECIFICATION.md
- Terms containing uppercase boolean operators (`AND` / `OR` / `NOT` / `NEAR`) pass through verbatim. FTS5 itself only recognizes these operators in uppercase, so the case-sensitive check matches FTS5's own semantics — lowercase variants are treated as content tokens.
- Single bareword input (with optional `*` suffix) passes through so FTS5's prefix rule does the right thing.
- Multi-word bare input is tokenized and OR-joined so BM25 ranks by term-overlap density. FTS5's bareword default is implicit-AND, which silently filters out notes missing any single token — including notes that paraphrase or use a different word for one of the user's referents. OR-rank with BM25 lets density-rich notes still surface, matching the user/agent expectation that ranked search returns relevance-ordered results rather than a strict filter.
- Everything else (brackets, hyphens, colons, other punctuation) is wrapped in a phrase quote so FTS5 treats it as a literal phrase rather than throwing a syntax error.
defaultTest: file://shared/default-test.yaml

# Promptfoo merges per-test assertions with the shared default-test, so
# every run is also gated on toolCalls <= 5 globally.
Comment thread src/tools/note-tools.ts
Comment on lines 309 to +319
title: 'Find Bear Notes',
description:
'Find notes in your Bear library by searching text content, filtering by tags, or date ranges. Always searches within attached images and PDF files via OCR. Returns a list with titles, tags, and IDs - use "Open Bear Note" to read full content.',
"Search your Bear notes for words or phrases. The search looks across note titles, body content, and OCR text in attached images and PDFs, returning matching notes ranked by relevance with a snippet of the matching context — so you can see what matched without opening the note. For best results, search with a phrase or several words from what you're looking for; a single word also works. Trashed and archived notes are not included.",
inputSchema: {
term: z
.string()
.trim()
.optional()
.describe('Text to search for in note titles and content'),
.describe(
'A phrase or word to search for. Phrases or several words generally match best because results are ranked by relevance; a single word also works.'
),
Comment on lines +149 to +154
afterAll(() => {
for (const id of noteIds) {
trashNote(id);
}
cleanupTestNotes(TEST_PREFIX);
}, 60_000);
vasylenko added 2 commits May 4, 2026 22:34
The eval harness remains at evals/fts5-promptfooconfig.yaml and is
runnable directly via npx promptfoo. The dedicated task was a
convenience that's no longer load-bearing now that the eval has
served its purpose for the v3.0.0 cutover.
Three changes that reduce maintenance surface without losing real
regression coverage:

- src/infra/bear-schema.test.ts: collapse the four 'throws on missing
  X' cases into a single it.each. Each row exercises a distinct branch
  in discoverBearSchema/verifyJoinExists; the parameterized form makes
  that pattern legible without four near-identical blocks.

- src/infra/fts-index.test.ts: drop two tautological tests. The
  'checkDrift returns true when no driftKey cached' test pins a
  one-line conditional whose removal would fail every downstream
  drift test in louder ways. The 'totalCount = 0 when no notes
  match' test pins an explicit early return that's similarly covered
  by other no-match assertions.

- tests/system/fts5-search.test.ts: reduce nine system tests to the
  two that uniquely earn their integration cost: real-Bear OCR
  (Bear's OCR engine is the only path that populates ZSEARCHTEXT for
  attachments) and the soft-error response path (which exercises the
  tool-layer envelope that unit tests don't reach). The other seven
  duplicated unit-test coverage at the cost of fixture setup that
  can't run in CI.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 4, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
3.8% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@vasylenko vasylenko closed this May 4, 2026
@vasylenko vasylenko deleted the vasylenko/sva-28-in-memory-fts5-search branch May 4, 2026 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants