feat(SVA-28): in-memory FTS5 search for bear-search-notes (v3.0.0)#113
Closed
feat(SVA-28): in-memory FTS5 search for bear-search-notes (v3.0.0)#113
Conversation
Bear stores note↔tag relations in Core Data join tables whose names embed entity IDs (Z_5TAGS, Z_13TAGS, etc.) assigned at data-model compile time. The IDs can shift across Bear schema migrations, and hardcoded literals silently break on renumbered libraries. This module resolves the IDs at runtime via Z_PRIMARYKEY (Core Data's entity registry) and verifies the resulting table names exist via PRAGMA table_info, surfacing missing relations as clear startup errors instead of cryptic SQL failures at query time. A defense-in-depth integer guard rejects malformed Z_ENT values before they reach SQL identifier interpolation.
Replaces LIKE-substring matching with a SQLite FTS5 virtual table (unicode61 tokenizer, remove_diacritics=2) built in :memory: from Bear's read-only DB. BM25 ranks results by term-density rather than modification date. Drift detection compares the cached MAX(ZMODIFICATIONDATE) and COUNT(*) of active notes against current values; a mismatch triggers a full rebuild. Both aggregates fit one SELECT and run sub-millisecond. The MAX+COUNT pair (rather than MAX alone) catches bulk imports of pre-dated notes whose timestamps don't move the maximum. Architecture: no persistent state. Bear syncs across the user's machines via iCloud; persistent derived state would diverge without coordination. In-memory rebuild satisfies cross-Mac consistency by construction at ~70 ms / 229 notes empirical cost. Query handling (prepareFTS5Term): user-supplied terms with FTS5 syntax (quoted phrases, parentheses, uppercase boolean operators) pass through verbatim. Multi-word natural-language input is tokenized via \\w+\\*? — which already strips incidental punctuation the agent adds (hyphens, colons, brackets) — and OR-joined so BM25 ranks by overlap density rather than implicit-AND filtering out notes missing any single token. FTS5 syntax errors surface as structured tool-side errors with operator hints rather than raw SQLite messages. Includes parallel note_tags side-table (rowid, tag, pinned_in_tag) for tag-aware filtering kept separate from the FTS5 corpus, plus unit and system test coverage for the build, drift, query, snippet, and error paths.
Operations layer: - searchNotes() in operations/notes.ts becomes a thin adapter that resolves user-facing date strings into Core Data timestamps and delegates to searchByQuery in the new FTS5 index module. The prior LIKE-substring path is removed cleanly with no fallback shim. - tags.ts switches to the runtime schema discovery utility, removing the last hardcoded Z_5TAGS reference. - bear-encoding.ts retains decodeTagName as the single Unicode-aware tag normalizer used on both the index-build and query sides; tag matching never silently diverges between paths. Tool layer: - bear-search-notes description tuned for natural-language queries (multi-word phrases match best because BM25 ranks by relevance). - Search results now render an inline snippet under the title with matched terms wrapped in [...]. For term searches the snippet is produced by FTS5 snippet() at width 80; for filter-only queries it is the body's leading prefix. Newlines are collapsed so each result occupies one rendered line per metadata field.
Adds a head-to-head eval comparing the v2.11.0 baseline (last
pre-FTS5 release) against the local v3.0.0 build, run as the Claude
Agent SDK answering natural-language questions that route through
the bear-search-notes tool.
Four prompt categories at n=15 repeats per (prompt × provider):
1. Verbatim quote from a known recent note
2. Specific saved item discoverable among many of the same kind
3. Find a lesson when query words differ from note words
(paraphrase resilience, OR-rank vs implicit-AND)
4. Synthesis across multiple related notes
Shared default-test.yaml emits MCP tool-call count and turn count
as informational namedScores, with no pass/fail gating on either —
different providers legitimately use different call counts for the
same task, and the gap is the signal we want surfaced in the
report rather than a binary cap.
Two design rules baked into the YAML footer:
- Prompts must explicitly direct search ('Search my Bear notes…')
so the SDK's memory-first orientation doesn't bypass MCP entirely.
- Probe queries a real user would naturally ask. Synthetic
capability tests (boolean operators, exact-phrase quoting,
prefix wildcards) belong in unit tests, not paid LLM evals.
Architecture documentation: - docs/dev/SPECIFICATION.md gains a Search section describing the in-memory FTS5 design: drift signal, build path, query handling, schema discovery, concurrency invariant, load-bearing FTS5 availability assumption. - docs/dev/BEAR_DATABASE_SCHEMA.md notes the runtime-discovered join tables and the ZSEARCHTEXT field used for OCR text. Release notes: - CHANGELOG entry for the FTS5 cutover documents the breaking contract change (term parameter is now FTS5 syntax, results ranked by BM25 not modification date), the new snippet field, the structured FTS5-syntax-error envelope, and the removal of the LIKE-substring fallback. Major-version bump (v3.0.0) because these are breaking changes per semver. Tooling: - Taskfile.yml gains an eval:setup task for installing released baselines into evals/released, plus eval:run wiring for the new fts5-promptfooconfig. Version bump (2.12.0 → 3.0.0) across package.json, package-lock, manifest.json, and src/config.ts. README and docs/user/NPM.md tool-count refreshed.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR upgrades bear-search-notes from a SQL LIKE-based search to a lazily-built, in-memory SQLite FTS5 index, adding BM25 relevance ranking and per-result snippets, while also making tag-join handling resilient to Bear Core Data schema renumbering.
Changes:
- Implement in-memory FTS5 indexing (
fts-index) with drift detection, BM25 ranking, and snippet generation;bear-search-notesnow interpretstermas FTS5 syntax. - Add runtime Bear schema discovery (
discoverBearSchema) to avoid hardcodedZ_<n>TAGS/Z_<n>PINNEDINTAGStable names. - Expand tests/docs/evals to cover the new behavior and bump version to
3.0.0.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/system/fts5-search.test.ts | System-level coverage for FTS5 behavior (ranking, snippets, OCR, tags, syntax errors). |
| Taskfile.yml | Adds eval:fts5 task and strengthens eval isolation via a fresh CLAUDE_CONFIG_DIR. |
| src/tools/note-tools.ts | Updates bear-search-notes tool description and renders per-result snippets in output. |
| src/operations/tags.ts | Switches tag queries to runtime-discovered join table/columns (no hardcoded entity IDs). |
| src/operations/notes.ts | Replaces SQL LIKE search with searchByQuery against the in-memory FTS index. |
| src/operations/bear-encoding.ts | Reframes decodeTagName as the normalization source of truth (docs updated). |
| src/infra/fts-index.ts | New in-memory FTS5 index builder + query engine with drift detection, snippets, and error remapping. |
| src/infra/fts-index.test.ts | Unit tests for indexing, drift detection, ranking, punctuation handling, tags, pinned, snippets, and errors. |
| src/infra/bear-schema.ts | New runtime schema discovery for Bear Core Data join tables via Z_PRIMARYKEY. |
| src/infra/bear-schema.test.ts | Tests schema discovery plus a CI guard that Node’s bundled SQLite supports required FTS5 features. |
| src/config.ts | Bumps APP_VERSION to 3.0.0. |
| README.md | Updates tool docs to describe relevance-ranked search + snippets + operator support. |
| package.json | Bumps package version to 3.0.0 and updates promptfoo version. |
| manifest.json | Updates manifest version and tool description for the new search behavior. |
| evals/shared/default-test.yaml | Adds shared promptfoo metric emitters for MCP toolCalls and numTurns. |
| evals/promptfooconfig.yaml | Improves provider isolation and reuses shared defaultTest assertions. |
| evals/fts5-promptfooconfig.yaml | Adds dedicated SVA-28 A/B eval config using the shared defaultTest metrics. |
| docs/user/NPM.md | Updates user-facing tool list to reflect new search capabilities. |
| docs/dev/SPECIFICATION.md | Documents the in-memory FTS5 architecture and its rationale. |
| docs/dev/BEAR_DATABASE_SCHEMA.md | Updates schema/query documentation to reflect the FTS5 index approach and runtime join discovery. |
| CHANGELOG.md | Documents the breaking search changes, new snippets, and new structured syntax errors. |
| const params: (string | number)[] = []; | ||
|
|
||
| if (spec.tag) { | ||
| const normalizedTag = spec.tag.trim().toLowerCase(); |
Comment on lines
+309
to
+317
| if (spec.pinned === true) { | ||
| clauses.push('n.rowid IN (SELECT rowid FROM note_tags WHERE tag = ? AND pinned_in_tag = 1)'); | ||
| params.push(normalizedTag); | ||
| } else { | ||
| clauses.push( | ||
| "n.rowid IN (SELECT rowid FROM note_tags WHERE tag = ? OR tag LIKE ? || '/%' ESCAPE '\\')" | ||
| ); | ||
| params.push(normalizedTag, escapedTag); | ||
| } |
| return result; | ||
| }); | ||
|
|
||
| return { notes, totalCount: countMatches(memDb, spec) }; |
Comment on lines
+509
to
+510
| // Inline (rather than importing operations/bear-encoding.ts) to keep the infra | ||
| // layer free of operations-layer dependencies. The constant lives in config. |
Comment on lines
+4
to
+11
| * Decodes and normalizes Bear tag names. Single source of truth for tag | ||
| * normalization — `src/infra/fts-index.ts` (insertNoteTags) and the search | ||
| * query path (`buildFilterClauses`) both call this so the index side and | ||
| * the query side use identical Unicode-aware case folding. Doing this in | ||
| * JS rather than SQL is deliberate: SQLite's built-in `LOWER()` is ASCII- | ||
| * only, while JS `toLowerCase()` folds Unicode (e.g. `CAFÉ` → `café`), | ||
| * which is required for non-ASCII tag matching to work. | ||
| * |
| - Terms containing uppercase boolean operators (`AND` / `OR` / `NOT` / `NEAR`) pass through verbatim. FTS5 itself only recognizes these operators in uppercase, so the case-sensitive check matches FTS5's own semantics — lowercase variants are treated as content tokens. | ||
| - Single bareword input (with optional `*` suffix) passes through so FTS5's prefix rule does the right thing. | ||
| - Multi-word bare input is tokenized and OR-joined so BM25 ranks by term-overlap density. FTS5's bareword default is implicit-AND, which silently filters out notes missing any single token — including notes that paraphrase or use a different word for one of the user's referents. OR-rank with BM25 lets density-rich notes still surface, matching the user/agent expectation that ranked search returns relevance-ordered results rather than a strict filter. | ||
| - Everything else (brackets, hyphens, colons, other punctuation) is wrapped in a phrase quote so FTS5 treats it as a literal phrase rather than throwing a syntax error. |
| defaultTest: file://shared/default-test.yaml | ||
|
|
||
| # Promptfoo merges per-test assertions with the shared default-test, so | ||
| # every run is also gated on toolCalls <= 5 globally. |
Comment on lines
309
to
+319
| title: 'Find Bear Notes', | ||
| description: | ||
| 'Find notes in your Bear library by searching text content, filtering by tags, or date ranges. Always searches within attached images and PDF files via OCR. Returns a list with titles, tags, and IDs - use "Open Bear Note" to read full content.', | ||
| "Search your Bear notes for words or phrases. The search looks across note titles, body content, and OCR text in attached images and PDFs, returning matching notes ranked by relevance with a snippet of the matching context — so you can see what matched without opening the note. For best results, search with a phrase or several words from what you're looking for; a single word also works. Trashed and archived notes are not included.", | ||
| inputSchema: { | ||
| term: z | ||
| .string() | ||
| .trim() | ||
| .optional() | ||
| .describe('Text to search for in note titles and content'), | ||
| .describe( | ||
| 'A phrase or word to search for. Phrases or several words generally match best because results are ranked by relevance; a single word also works.' | ||
| ), |
Comment on lines
+149
to
+154
| afterAll(() => { | ||
| for (const id of noteIds) { | ||
| trashNote(id); | ||
| } | ||
| cleanupTestNotes(TEST_PREFIX); | ||
| }, 60_000); |
The eval harness remains at evals/fts5-promptfooconfig.yaml and is runnable directly via npx promptfoo. The dedicated task was a convenience that's no longer load-bearing now that the eval has served its purpose for the v3.0.0 cutover.
Three changes that reduce maintenance surface without losing real regression coverage: - src/infra/bear-schema.test.ts: collapse the four 'throws on missing X' cases into a single it.each. Each row exercises a distinct branch in discoverBearSchema/verifyJoinExists; the parameterized form makes that pattern legible without four near-identical blocks. - src/infra/fts-index.test.ts: drop two tautological tests. The 'checkDrift returns true when no driftKey cached' test pins a one-line conditional whose removal would fail every downstream drift test in louder ways. The 'totalCount = 0 when no notes match' test pins an explicit early return that's similarly covered by other no-match assertions. - tests/system/fts5-search.test.ts: reduce nine system tests to the two that uniquely earn their integration cost: real-Bear OCR (Bear's OCR engine is the only path that populates ZSEARCHTEXT for attachments) and the soft-error response path (which exercises the tool-layer envelope that unit tests don't reach). The other seven duplicated unit-test coverage at the cost of fixture setup that can't run in CI.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
LIKE-substring backend ofbear-search-noteswith an in-memory SQLite FTS5 index built on demand from Bear's read-only database, rebuilt when the source data drifts.termparameter now accepts FTS5 query syntax instead of literal substring matches; multi-word bareword input is OR-joined so BM25 ranks results by overlap density rather than implicit-AND filtering. Punctuation in agent queries (hyphens, colons, brackets) is stripped at the query layer via\w+\*?tokenization, so natural-language terms like "over-engineering coaching" no longer dead-end as rigid phrase matches.snippetfield with matched terms wrapped in[...].MAX(ZMODIFICATIONDATE) + COUNT(*)of active notes —MAXalone misses bulk imports of pre-dated notes whose stale timestamps don't move the maximum, and the combined check still fits one sub-millisecondSELECT.unicode61, remove_diacritics=2. Tag-join table names are resolved at runtime viaZ_PRIMARYKEYbecause Bear's Core Data entity IDs (e.g.Z_5TAGS/Z_13TAGS) shift across schema migrations — hardcoded literals silently break on renumbered libraries; an integer guard provides defense in depth.Why
A/B eval comparing the published v2.11.0 baseline against the local v3.0.0 build via the Claude Agent SDK answering natural-language questions through
bear-search-notes. Two independent runs of n=15 each (n=30 combined per prompt × provider cell) across 4 prompt categories: verbatim-quote retrieval, specific-item discrimination, paraphrase resilience, and multi-note synthesis.Headline numbers at n=30:
The 80-token snippet width was chosen after the prototype's 32-token default proved too narrow for usable context. The query-layer punctuation handling was driven by eval data showing a 73% zero-hit rate on hyphen/colon-containing queries when the prior phrase-quote branch turned natural language into rigid token-order phrase matches.
Breaking changes (v3.0.0)
bear-search-notestermparameter is now FTS5 query syntax, not literal substring.snippetfield on every result with matched terms wrapped in[...].