[Epic] Evaluate adding a RAG layer on top of the knowledge graph (PoC-gated rollout)

# Why
- The knowledge-graph pipeline answers "what shape is this corpus" but not "given a query, what are the most relevant moments in it". Three concrete needs converge on the same missing capability — chunk-level retrieval:
  - Skill synthesis prompts are bloated with loosely-related context (#33).
  - The UI cannot answer natural-language queries about past sessions (#34).
  - Bookmark workflows (#23) want "show me similar moments".
- The full implementation across embedder, retriever, synthesis rewiring, and UI is non-trivial. A small PoC keeps the cost down and protects against committing to the wrong choices (model, store, distribution).

# Hypothesis to validate (PoC)
1. **Retrieval quality**: For a sample of ~100 real sessions, hybrid retrieval (BM25 + dense embedding) returns turns that a human judges *more* relevant to a Skill candidate than the current cluster-blob context.
2. **Footprint**: Index size at turn granularity, even unquantized, fits within a deployable budget for the static frontend (or, failing that, fits comfortably in `skill-server` memory).
3. **Latency**: Embedding generation runs in minutes, not hours, on a developer laptop. Query latency is sub-100ms for a few thousand chunks.

# PoC scope (do this in a branch, in this issue)
- Pick ~100 sessions from a real `~/.claude/projects/`.
- Generate turn-level embeddings with **one** local model (`bge-small-en-v1.5` or `paraphrase-multilingual-MiniLM-L12-v2` via Transformers.js).
- Implement a flat-search retriever — no fancy index needed for ~5k chunks.
- For 5 Skill candidates from that corpus, build two contexts:
  - **A**: today's cluster-blob
  - **B**: top-k turns from hybrid retrieval (BM25 + dense)
- Eyeball B vs A for relevance. Optionally feed both into `claude -p` and compare resulting Skill markdown.
- Measure index size, retrieval latency p50/p95, embedding throughput.
- Output: short writeup at `docs/rag-poc.md` with answers + numbers + a recommendation.

# Decision gate
- **Go**: unblock and prioritize #32, #33, #34. Promote any model / index choices learned in the PoC into those issues.
- **Hold**: capture what failed, decide whether any survived insight justifies a smaller follow-up.
- **Pivot**: if BM25-only with the fixed tokenizer (#29-#31) is already close enough, revisit before paying for embeddings.

# Implementation issues (gated on this PoC)
- #32 — Embedding pipeline + chunk-level vector index
- #33 — Skill synthesis prompt enrichment via hybrid retrieval
- #34 — UI semantic search + bookmark similarity surfacing

# Related issues
- #19 — Knowledge Graph rethink. RAG might absorb some of the questions originally aimed at the graph; this PoC informs that comparison.
- #20 — Skill evaluator. Used in #33 to quantify before/after synthesis quality.
- #29 / #30 / #31 — tokenizer + TF-IDF. Provide the **sparse** half of hybrid retrieval; remain valuable regardless of the PoC outcome.

# Risk to call out
- Over-investing in RAG could deprecate the current knowledge-graph view. The honest answer might be "search bar > graph" for most user jobs. Run #19 in parallel and let user-job evidence drive the call rather than aesthetic preference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Epic] Evaluate adding a RAG layer on top of the knowledge graph (PoC-gated rollout) #35

Why

Hypothesis to validate (PoC)

PoC scope (do this in a branch, in this issue)

Decision gate

Implementation issues (gated on this PoC)

Related issues

Risk to call out

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Epic] Evaluate adding a RAG layer on top of the knowledge graph (PoC-gated rollout) #35

Description

Why

Hypothesis to validate (PoC)

PoC scope (do this in a branch, in this issue)

Decision gate

Implementation issues (gated on this PoC)

Related issues

Risk to call out

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions