Skip to content

perf: 37x faster Calendar, correct-plan Tags, no idle refetch#153

Merged
Arylmera merged 4 commits into
developfrom
perf/roadmap-tiers
May 31, 2026
Merged

perf: 37x faster Calendar, correct-plan Tags, no idle refetch#153
Arylmera merged 4 commits into
developfrom
perf/roadmap-tiers

Conversation

@Arylmera

Copy link
Copy Markdown
Owner

Why

Calendar day-click and the Tags tab were slow on a real-sized DB (245 MB, 147k message rows). Root cause was structural in the read path, not any one view.

What

Four tiers, each measured against the real DB with a perf_probe example (median of 5 warm runs):

Endpoint Before After
/api/day (calendar day-click) 477.7 ms 13.0 ms 37× (6-query layer)
/api/tags-summary 73.2 ms 38.6 ms + now scales with tag count, not table size
/api/daily 88.3 ms 43.6 ms
  • T1 — connections: db::tune() applies WAL + synchronous=NORMAL + mmap/cache/temp_store on both the writer and read paths; open_ro() now hands back a pooled, tuned connection via a Deref guard so the day view stops opening six cold connections per click. 477→131 ms.
  • T2 — sargable dates: expression indexes on substr(timestamp,1,10). The day queries were full-scanning messages because substr() defeated the timestamp index; now they seek (SCANSEARCH … USING INDEX idx_messages_day). 131→14 ms.
  • T3 — planner stats: tag_aggregates drove its join from messages (full scan) for lack of statistics. PRAGMA optimize after scans + a one-time ANALYZE in init_db flip it to drive from the small session_tags table. No rollup tables needed.
  • T4 — idle refetch: scan_complete now broadcasts only when a scan ingested rows, so idle dashboards stop refetching the whole endpoint registry every 10s. Liveness rides the existing 15s keep-alive ping.

Verification

  • cargo test --workspace — all pass
  • cargo fmt --check clean, cargo clippy --all-targets --workspace -- -D warnings clean
  • Launched the Tauri build against the real DB: health OK, /api/day + /api/tags-summary return correct payloads, all responsive.

Notes

  • New indexes + WAL upgrade apply automatically on next init_db (launch) for existing DBs — no manual migration.
  • Follow-up opportunity (not in scope): first_prompts (windowed query over a day's sessions) is now the dominant remaining cost of /api/day end-to-end (~95 ms of ~108 ms); could be shaved if sub-50 ms end-to-end is wanted.

🤖 Generated with Claude Code

Arylmera and others added 4 commits May 31, 2026 10:40
Tier 1 of the performance roadmap. The analytics read path opened a
fresh, untuned SQLite connection per query — the day view opened six
per click on a default-journal-mode DB.

- db::tune() applies WAL, synchronous=NORMAL, mmap_size=256MB,
  cache_size=64MB, temp_store=MEMORY on both the writer open() and the
  read-path open_ro(). WAL stops reader/scanner contention; mmap+cache
  keep the hot DB warm across opens.
- open_ro() now hands back a pooled, tuned connection via a Deref guard
  (PooledConn) that returns itself to a process-wide pool on drop, so
  the PRAGMA setup and page cache stay warm. Callsites compile unchanged
  thanks to Deref coercion.

Measured on a real 245MB / 147k-row DB: /api/day 477.7 -> 130.7 ms
(3.6x), tags-summary 73 -> 45 ms, daily 88 -> 65 ms. Adds a perf_probe
example to keep each tier honest with before/after numbers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tier 2 of the performance roadmap. Every day/* query filters
`WHERE substr(timestamp,1,10) = ?`, which made the plain timestamp
index unusable and full-scanned messages — six scans per calendar
day-click.

Add expression indexes idx_messages_day (substr(timestamp,1,10), type)
and idx_tools_day. SQLite uses an expression index when the WHERE
clause repeats the expression, so the existing SQL needs no rewrite.
Indexes ship in SCHEMA with IF NOT EXISTS, so init_db builds them once
on next launch for existing DBs.

day_by_model EXPLAIN: SCAN -> SEARCH ... USING INDEX idx_messages_day.
/api/day 130.7 -> 14.0 ms (34x vs baseline), under the 50ms target.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tier 3 of the performance roadmap. With no sqlite_stat1, the planner
drove tag_aggregates from messages (SCAN m over 147k rows) instead of
the small session_tags table. ANALYZE flips it to
SCAN st -> SEARCH m USING idx_messages_thread.

Rather than build precomputed rollup tables (write-path complexity +
drift risk), refresh stats in two cheap spots:
- scan_dir runs `PRAGMA analysis_limit=400; PRAGMA optimize` after any
  scan that added rows.
- init_db runs a one-time sampled ANALYZE when sqlite_stat1 is absent,
  so existing DBs get stats on next launch.

tags-summary 46 -> 39 ms here; much larger win for users with few
tagged sessions (no longer scans every message). All hot queries now
under the 50ms target, so no rollups needed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tier 4 of the performance roadmap. The background scan loop ticks every
10s and emitted scan_complete unconditionally, so every connected
frontend refetched its whole endpoint registry every 10s even when the
user generated no new transcripts.

Emit only when the scan ingested rows. Liveness is held up by the
server 15s keep-alive ping, the frontend watchdog is one-shot, and the
manual /api/scan route returns stats over HTTP, so suppressing idle
events is safe. The planned query cache and SSE debounce were dropped
as YAGNI: after T1-T3 every hot query is 13-40ms warm.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Arylmera Arylmera merged commit 122e7fe into develop May 31, 2026
7 checks passed
@Arylmera Arylmera deleted the perf/roadmap-tiers branch May 31, 2026 09:52
Arylmera added a commit that referenced this pull request May 31, 2026
…bottleneck (#154)

Follow-up to PR #153 suspected first_prompts dominated /api/day (~95ms
of ~108ms end-to-end). Measured directly: first_prompts is ~6ms for
~188 sessions and the endpoint is ~31ms server-side (raw HTTP). The
earlier ~108ms was PowerShell Invoke-RestMethod deserializing the 4.6KB
response into objects — a client-side artifact absent in the app, whose
JS frontend parses the payload natively in <1ms.

No query change warranted: rewriting the windowed query to GROUP BY
MIN(timestamp) would risk the identical-timestamp tie case for ~4ms on
an endpoint already under the 50ms target. Add the first_prompts timing
+ EXPLAIN to perf_probe so the finding stays measurable.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant