feat(SVA-28): in-memory FTS5 search for bear-search-notes (v3.0.0) by vasylenko · Pull Request #113 · vasylenko/bear-notes-mcp

vasylenko · 2026-05-04T18:59:51Z

Summary

Replaces the LIKE-substring backend of bear-search-notes with an in-memory SQLite FTS5 index built on demand from Bear's read-only database, rebuilt when the source data drifts.
The term parameter now accepts FTS5 query syntax instead of literal substring matches; multi-word bareword input is OR-joined so BM25 ranks results by overlap density rather than implicit-AND filtering. Punctuation in agent queries (hyphens, colons, brackets) is stripped at the query layer via \w+\*? tokenization, so natural-language terms like "over-engineering coaching" no longer dead-end as rigid phrase matches.
Results are ranked by BM25 relevance instead of modification date and every result carries an 80-token snippet field with matched terms wrapped in [...].
No persistent derived state. Bear syncs notes across machines via iCloud, so a persisted index would diverge; instead the server lazily builds the index on first search per process (~70 ms / 229 notes empirically). Drift between the live DB and the cached index is detected via MAX(ZMODIFICATIONDATE) + COUNT(*) of active notes — MAX alone misses bulk imports of pre-dated notes whose stale timestamps don't move the maximum, and the combined check still fits one sub-millisecond SELECT.
Tokenizer is unicode61, remove_diacritics=2. Tag-join table names are resolved at runtime via Z_PRIMARYKEY because Bear's Core Data entity IDs (e.g. Z_5TAGS / Z_13TAGS) shift across schema migrations — hardcoded literals silently break on renumbered libraries; an integer guard provides defense in depth.
FTS5 syntax errors return a structured envelope with operator hints rather than raw SQLite messages. The previous LIKE-substring fallback is removed entirely (no compatibility shim) — this is the reason for the v3.0.0 major bump.

Why

A/B eval comparing the published v2.11.0 baseline against the local v3.0.0 build via the Claude Agent SDK answering natural-language questions through bear-search-notes. Two independent runs of n=15 each (n=30 combined per prompt × provider cell) across 4 prompt categories: verbatim-quote retrieval, specific-item discrimination, paraphrase resilience, and multi-note synthesis.

Headline numbers at n=30:

Content correctness: v3.0.0 118/120 (98.3%) vs v2.11.0 92/120 (76.7%).
Wins three of four prompts decisively (Fisher exact p<0.001 on verbatim and specific). Synthesis is a true tie at 28/30 each — when both providers can read enough notes, the retrieval-engine difference disappears.
Tool-call efficiency: v3.0.0 averages 5.5 MCP tool calls per task vs v2.11.0's 16.6 — ~3× fewer in aggregate (per-prompt: verbatim 4.9×, specific 7.5×, paraphrase 3.1×, synthesis 1.3×).
Cost per successful answer: $0.20 (v3.0.0) vs $0.21 (v2.11.0). Per-run cost is +22% for v3.0.0 due to snippet-rich responses, more than offset by the higher success rate.

The 80-token snippet width was chosen after the prototype's 32-token default proved too narrow for usable context. The query-layer punctuation handling was driven by eval data showing a 73% zero-hit rate on hyphen/colon-containing queries when the prior phrase-quote branch turned natural language into rigid token-order phrase matches.

Breaking changes (v3.0.0)

bear-search-notes term parameter is now FTS5 query syntax, not literal substring.
Results ranked by BM25 relevance, not modification date.
New snippet field on every result with matched terms wrapped in [...].
FTS5 syntax errors return a structured envelope with operator hints instead of raw SQLite messages.
LIKE-substring fallback removed entirely; no shim.

Bear stores note↔tag relations in Core Data join tables whose names embed entity IDs (Z_5TAGS, Z_13TAGS, etc.) assigned at data-model compile time. The IDs can shift across Bear schema migrations, and hardcoded literals silently break on renumbered libraries. This module resolves the IDs at runtime via Z_PRIMARYKEY (Core Data's entity registry) and verifies the resulting table names exist via PRAGMA table_info, surfacing missing relations as clear startup errors instead of cryptic SQL failures at query time. A defense-in-depth integer guard rejects malformed Z_ENT values before they reach SQL identifier interpolation.

Replaces LIKE-substring matching with a SQLite FTS5 virtual table (unicode61 tokenizer, remove_diacritics=2) built in :memory: from Bear's read-only DB. BM25 ranks results by term-density rather than modification date. Drift detection compares the cached MAX(ZMODIFICATIONDATE) and COUNT(*) of active notes against current values; a mismatch triggers a full rebuild. Both aggregates fit one SELECT and run sub-millisecond. The MAX+COUNT pair (rather than MAX alone) catches bulk imports of pre-dated notes whose timestamps don't move the maximum. Architecture: no persistent state. Bear syncs across the user's machines via iCloud; persistent derived state would diverge without coordination. In-memory rebuild satisfies cross-Mac consistency by construction at ~70 ms / 229 notes empirical cost. Query handling (prepareFTS5Term): user-supplied terms with FTS5 syntax (quoted phrases, parentheses, uppercase boolean operators) pass through verbatim. Multi-word natural-language input is tokenized via \\w+\\*? — which already strips incidental punctuation the agent adds (hyphens, colons, brackets) — and OR-joined so BM25 ranks by overlap density rather than implicit-AND filtering out notes missing any single token. FTS5 syntax errors surface as structured tool-side errors with operator hints rather than raw SQLite messages. Includes parallel note_tags side-table (rowid, tag, pinned_in_tag) for tag-aware filtering kept separate from the FTS5 corpus, plus unit and system test coverage for the build, drift, query, snippet, and error paths.

Operations layer: - searchNotes() in operations/notes.ts becomes a thin adapter that resolves user-facing date strings into Core Data timestamps and delegates to searchByQuery in the new FTS5 index module. The prior LIKE-substring path is removed cleanly with no fallback shim. - tags.ts switches to the runtime schema discovery utility, removing the last hardcoded Z_5TAGS reference. - bear-encoding.ts retains decodeTagName as the single Unicode-aware tag normalizer used on both the index-build and query sides; tag matching never silently diverges between paths. Tool layer: - bear-search-notes description tuned for natural-language queries (multi-word phrases match best because BM25 ranks by relevance). - Search results now render an inline snippet under the title with matched terms wrapped in [...]. For term searches the snippet is produced by FTS5 snippet() at width 80; for filter-only queries it is the body's leading prefix. Newlines are collapsed so each result occupies one rendered line per metadata field.

Adds a head-to-head eval comparing the v2.11.0 baseline (last pre-FTS5 release) against the local v3.0.0 build, run as the Claude Agent SDK answering natural-language questions that route through the bear-search-notes tool. Four prompt categories at n=15 repeats per (prompt × provider): 1. Verbatim quote from a known recent note 2. Specific saved item discoverable among many of the same kind 3. Find a lesson when query words differ from note words (paraphrase resilience, OR-rank vs implicit-AND) 4. Synthesis across multiple related notes Shared default-test.yaml emits MCP tool-call count and turn count as informational namedScores, with no pass/fail gating on either — different providers legitimately use different call counts for the same task, and the gap is the signal we want surfaced in the report rather than a binary cap. Two design rules baked into the YAML footer: - Prompts must explicitly direct search ('Search my Bear notes…') so the SDK's memory-first orientation doesn't bypass MCP entirely. - Probe queries a real user would naturally ask. Synthetic capability tests (boolean operators, exact-phrase quoting, prefix wildcards) belong in unit tests, not paid LLM evals.

Architecture documentation: - docs/dev/SPECIFICATION.md gains a Search section describing the in-memory FTS5 design: drift signal, build path, query handling, schema discovery, concurrency invariant, load-bearing FTS5 availability assumption. - docs/dev/BEAR_DATABASE_SCHEMA.md notes the runtime-discovered join tables and the ZSEARCHTEXT field used for OCR text. Release notes: - CHANGELOG entry for the FTS5 cutover documents the breaking contract change (term parameter is now FTS5 syntax, results ranked by BM25 not modification date), the new snippet field, the structured FTS5-syntax-error envelope, and the removal of the LIKE-substring fallback. Major-version bump (v3.0.0) because these are breaking changes per semver. Tooling: - Taskfile.yml gains an eval:setup task for installing released baselines into evals/released, plus eval:run wiring for the new fts5-promptfooconfig. Version bump (2.12.0 → 3.0.0) across package.json, package-lock, manifest.json, and src/config.ts. README and docs/user/NPM.md tool-count refreshed.

vercel · 2026-05-04T18:59:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
bear-notes-mcp	Ready	Preview, Comment	May 4, 2026 8:34pm

Copilot

Pull request overview

This PR upgrades bear-search-notes from a SQL LIKE-based search to a lazily-built, in-memory SQLite FTS5 index, adding BM25 relevance ranking and per-result snippets, while also making tag-join handling resilient to Bear Core Data schema renumbering.

Changes:

Implement in-memory FTS5 indexing (fts-index) with drift detection, BM25 ranking, and snippet generation; bear-search-notes now interprets term as FTS5 syntax.
Add runtime Bear schema discovery (discoverBearSchema) to avoid hardcoded Z_<n>TAGS / Z_<n>PINNEDINTAGS table names.
Expand tests/docs/evals to cover the new behavior and bump version to 3.0.0.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/system/fts5-search.test.ts	System-level coverage for FTS5 behavior (ranking, snippets, OCR, tags, syntax errors).
Taskfile.yml	Adds `eval:fts5` task and strengthens eval isolation via a fresh CLAUDE_CONFIG_DIR.
src/tools/note-tools.ts	Updates `bear-search-notes` tool description and renders per-result snippets in output.
src/operations/tags.ts	Switches tag queries to runtime-discovered join table/columns (no hardcoded entity IDs).
src/operations/notes.ts	Replaces SQL `LIKE` search with `searchByQuery` against the in-memory FTS index.
src/operations/bear-encoding.ts	Reframes `decodeTagName` as the normalization source of truth (docs updated).
src/infra/fts-index.ts	New in-memory FTS5 index builder + query engine with drift detection, snippets, and error remapping.
src/infra/fts-index.test.ts	Unit tests for indexing, drift detection, ranking, punctuation handling, tags, pinned, snippets, and errors.
src/infra/bear-schema.ts	New runtime schema discovery for Bear Core Data join tables via `Z_PRIMARYKEY`.
src/infra/bear-schema.test.ts	Tests schema discovery plus a CI guard that Node’s bundled SQLite supports required FTS5 features.
src/config.ts	Bumps APP_VERSION to 3.0.0.
README.md	Updates tool docs to describe relevance-ranked search + snippets + operator support.
package.json	Bumps package version to 3.0.0 and updates promptfoo version.
manifest.json	Updates manifest version and tool description for the new search behavior.
evals/shared/default-test.yaml	Adds shared promptfoo metric emitters for MCP toolCalls and numTurns.
evals/promptfooconfig.yaml	Improves provider isolation and reuses shared defaultTest assertions.
evals/fts5-promptfooconfig.yaml	Adds dedicated SVA-28 A/B eval config using the shared defaultTest metrics.
docs/user/NPM.md	Updates user-facing tool list to reflect new search capabilities.
docs/dev/SPECIFICATION.md	Documents the in-memory FTS5 architecture and its rationale.
docs/dev/BEAR_DATABASE_SCHEMA.md	Updates schema/query documentation to reflect the FTS5 index approach and runtime join discovery.
CHANGELOG.md	Documents the breaking search changes, new snippets, and new structured syntax errors.

+  const params: (string | number)[] = [];
+
+  if (spec.tag) {
+    const normalizedTag = spec.tag.trim().toLowerCase();


+    if (spec.pinned === true) {
+      clauses.push('n.rowid IN (SELECT rowid FROM note_tags WHERE tag = ? AND pinned_in_tag = 1)');
+      params.push(normalizedTag);
+    } else {
+      clauses.push(
+        "n.rowid IN (SELECT rowid FROM note_tags WHERE tag = ? OR tag LIKE ? || '/%' ESCAPE '\\')"
+      );
+      params.push(normalizedTag, escapedTag);
+    }


+    return result;
+  });
+
+  return { notes, totalCount: countMatches(memDb, spec) };


+// Inline (rather than importing operations/bear-encoding.ts) to keep the infra
+// layer free of operations-layer dependencies. The constant lives in config.


+ * Decodes and normalizes Bear tag names. Single source of truth for tag
+ * normalization — `src/infra/fts-index.ts` (insertNoteTags) and the search
+ * query path (`buildFilterClauses`) both call this so the index side and
+ * the query side use identical Unicode-aware case folding. Doing this in
+ * JS rather than SQL is deliberate: SQLite's built-in `LOWER()` is ASCII-
+ * only, while JS `toLowerCase()` folds Unicode (e.g. `CAFÉ` → `café`),
+ * which is required for non-ASCII tag matching to work.
+ *


+- Terms containing uppercase boolean operators (`AND` / `OR` / `NOT` / `NEAR`) pass through verbatim. FTS5 itself only recognizes these operators in uppercase, so the case-sensitive check matches FTS5's own semantics — lowercase variants are treated as content tokens.
+- Single bareword input (with optional `*` suffix) passes through so FTS5's prefix rule does the right thing.
+- Multi-word bare input is tokenized and OR-joined so BM25 ranks by term-overlap density. FTS5's bareword default is implicit-AND, which silently filters out notes missing any single token — including notes that paraphrase or use a different word for one of the user's referents. OR-rank with BM25 lets density-rich notes still surface, matching the user/agent expectation that ranked search returns relevance-ordered results rather than a strict filter.
+- Everything else (brackets, hyphens, colons, other punctuation) is wrapped in a phrase quote so FTS5 treats it as a literal phrase rather than throwing a syntax error.


+defaultTest: file://shared/default-test.yaml
+
+# Promptfoo merges per-test assertions with the shared default-test, so
+# every run is also gated on toolCalls <= 5 globally.


      title: 'Find Bear Notes',
      description:
-        'Find notes in your Bear library by searching text content, filtering by tags, or date ranges. Always searches within attached images and PDF files via OCR. Returns a list with titles, tags, and IDs - use "Open Bear Note" to read full content.',
+        "Search your Bear notes for words or phrases. The search looks across note titles, body content, and OCR text in attached images and PDFs, returning matching notes ranked by relevance with a snippet of the matching context — so you can see what matched without opening the note. For best results, search with a phrase or several words from what you're looking for; a single word also works. Trashed and archived notes are not included.",
      inputSchema: {
        term: z
          .string()
          .trim()
          .optional()
-          .describe('Text to search for in note titles and content'),
+          .describe(
+            'A phrase or word to search for. Phrases or several words generally match best because results are ranked by relevance; a single word also works.'
+          ),


+afterAll(() => {
+  for (const id of noteIds) {
+    trashNote(id);
+  }
+  cleanupTestNotes(TEST_PREFIX);
+}, 60_000);


The eval harness remains at evals/fts5-promptfooconfig.yaml and is runnable directly via npx promptfoo. The dedicated task was a convenience that's no longer load-bearing now that the eval has served its purpose for the v3.0.0 cutover.

Three changes that reduce maintenance surface without losing real regression coverage: - src/infra/bear-schema.test.ts: collapse the four 'throws on missing X' cases into a single it.each. Each row exercises a distinct branch in discoverBearSchema/verifyJoinExists; the parameterized form makes that pattern legible without four near-identical blocks. - src/infra/fts-index.test.ts: drop two tautological tests. The 'checkDrift returns true when no driftKey cached' test pins a one-line conditional whose removal would fail every downstream drift test in louder ways. The 'totalCount = 0 when no notes match' test pins an explicit early return that's similarly covered by other no-match assertions. - tests/system/fts5-search.test.ts: reduce nine system tests to the two that uniquely earn their integration cost: real-Bear OCR (Bear's OCR engine is the only path that populates ZSEARCHTEXT for attachments) and the soft-error response path (which exercises the tool-layer envelope that unit tests don't reach). The other seven duplicated unit-test coverage at the cost of fixture setup that can't run in CI.

sonarqubecloud · 2026-05-04T20:35:08Z

Quality Gate failed

Failed conditions
3.8% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

vasylenko added 5 commits May 4, 2026 20:25

Copilot AI review requested due to automatic review settings May 4, 2026 18:59

Copilot started reviewing on behalf of vasylenko May 4, 2026 19:00 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

vasylenko added 2 commits May 4, 2026 22:34

vercel Bot deployed to Preview May 4, 2026 20:34 View deployment

vasylenko closed this May 4, 2026

vasylenko deleted the vasylenko/sva-28-in-memory-fts5-search branch May 4, 2026 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(SVA-28): in-memory FTS5 search for bear-search-notes (v3.0.0)#113

feat(SVA-28): in-memory FTS5 search for bear-search-notes (v3.0.0)#113
vasylenko wants to merge 7 commits intomainfrom
vasylenko/sva-28-in-memory-fts5-search

vasylenko commented May 4, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

sonarqubecloud Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Inline (rather than importing operations/bear-encoding.ts) to keep the infra
		// layer free of operations-layer dependencies. The constant lives in config.

Uh oh!

Conversation

vasylenko commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Breaking changes (v3.0.0)

Uh oh!

vercel Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

sonarqubecloud Bot commented May 4, 2026

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vasylenko commented May 4, 2026 •

edited

Loading

vercel Bot commented May 4, 2026 •

edited

Loading