Research: vector index for search performance at Context7-scale corpora

Investigate options for an actual vector index (HNSW, IVF, or equivalent) over the `docs.embedding F32_BLOB` column. Today `db.SearchByEmbedding` uses a linear scan via `vector_distance_cos`, which is fine at hundreds of vectors but becomes the dominant query latency around 100k+ vectors — i.e. well within the target scale.

**Parent:** #15

## Why now (ish)

Deadzone targets a Context7-scale corpus (~33k libs eventually, 2-3k near term). The math:

| Corpus | Vectors | Bytes scanned per query | Linear scan |
|---|---|---|---|
| 1 lib × 38 docs | 38 | ~58 KB | <1 ms |
| 50 libs × 100 docs | 5,000 | ~7.5 MB | ~10-20 ms |
| 500 libs × 100 docs | 50,000 | ~75 MB | ~100-200 ms |
| 3000 libs × 100 docs | 300,000 | ~450 MB | seconds |
| 33000 libs × 100 docs | 3.3M | ~5 GB | tens of seconds |

(Based on 384-dim float32 = 1.5 KB per vector, plus the row metadata.)

Even at the near-term 2-3k libs target, MCP query latency starts crossing the threshold where the LLM client perceives lag. At Context7-scale, linear scan is unviable. We need an actual ANN index before we get there.

`docs/research/tursogo-migration.md` already noted this:

> Vector indexes are NOT yet implemented in turso — the docs say "All similarity searches use a linear scan over the table." For deadzone's small corpus (a handful of repos worth of markdown) this is fine. It would be a problem at >100k snippets.

This issue is the "let's not be surprised when we get there" research.

## Areas to investigate

### 1. Tursogo / Turso roadmap

- Is vector index support on the tursogo roadmap? Track upstream issues and discussions.
- ETA, if any. If they ship one in the next 6-12 months, the answer might be "wait".
- Quality of the index — IVF vs HNSW vs proprietary, recall/latency trade-offs.

### 2. Application-level index (in-memory HNSW)

- Load all vectors into a Go-native HNSW index at server startup.
- Libraries: [`hnsw`](https://github.com/nikolaydubina/go-hnsw), [`gomlx`](https://github.com/gomlx/gomlx)'s vector ops, [`weaviate/hnswlib-go`](https://github.com/weaviate/hnswlib-go).
- Pros: works today, no Turso dependency, fast.
- Cons: rebuild on startup (latency cost), memory cost (vectors live in RAM AND on disk), index drift between scrape and serve.

### 3. Sidecar vector DB

- Spin up qdrant / chroma / weaviate / milvus alongside Deadzone, write vectors there at scrape time, query from server.
- Pros: production-grade vector indices, well-supported.
- Cons: breaks the "single binary, no sidecar" promise we've been protecting. Was explicitly rejected for the embedder layer in #2; same logic applies here unless the wins are dramatic.

### 4. Switch storage entirely

- Replace tursogo with a vector-native DB that has indices today (qdrant, lancedb, milvus, ...).
- Pros: solves the problem permanently.
- Cons: throws away the architecture we just built around tursogo. Major rework.

### 5. Hybrid: tursogo for everything, in-memory HNSW only for vectors

- Keep tursogo for the docs table + metadata + management.
- Mirror the embedding column into an in-memory HNSW index alongside.
- The DB stays the source of truth; the index is rebuilt on startup or on demand.
- This is the "least disruption" path that gets us real ANN performance.

## Output

A research note in `docs/research/` (sibling of `tursogo-migration.md`) that:

1. Documents the latency curve we're heading toward
2. Picks one of the five options above with justification
3. Spikes the chosen approach (probably option 5) with a quick perf comparison against the current linear scan
4. Files concrete follow-up issues for the implementation

## When to act

- **Now**: this research issue. Cheap.
- **At ~50 libs / 5k vectors**: re-evaluate. Linear scan still OK but the trend is visible.
- **At ~500 libs / 50k vectors**: implementation must land. Linear scan is becoming a perceptible delay.
- **Before ~3000 libs**: implementation must be merged and load-tested.

## Acceptance criteria

- [ ] Research note in `docs/research/vector-index.md`
- [ ] Picked approach has a measured baseline vs the current linear scan (spike numbers in the note)
- [ ] Concrete follow-up issue filed for the chosen implementation
- [ ] If "wait for tursogo upstream" is the chosen path, includes a tracking note + check-in cadence

## Related

- **Parent:** #15
- **#16** (merged) — established the F32_BLOB + vector_distance_cos linear-scan baseline
- **`docs/research/tursogo-migration.md`** — flags the linear-scan limitation explicitly
- **#3** — hybrid retrieval (BM25 + vector). Becomes more relevant once the vector path has actual indices to combine with.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research: vector index for search performance at Context7-scale corpora #45

Why now (ish)

Areas to investigate

1. Tursogo / Turso roadmap

2. Application-level index (in-memory HNSW)

3. Sidecar vector DB

4. Switch storage entirely

5. Hybrid: tursogo for everything, in-memory HNSW only for vectors

Output

When to act

Acceptance criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Corpus	Vectors	Bytes scanned per query	Linear scan
1 lib × 38 docs	38	~58 KB	<1 ms
50 libs × 100 docs	5,000	~7.5 MB	~10-20 ms
500 libs × 100 docs	50,000	~75 MB	~100-200 ms
3000 libs × 100 docs	300,000	~450 MB	seconds
33000 libs × 100 docs	3.3M	~5 GB	tens of seconds

Research: vector index for search performance at Context7-scale corpora #45

Description

Why now (ish)

Areas to investigate

1. Tursogo / Turso roadmap

2. Application-level index (in-memory HNSW)

3. Sidecar vector DB

4. Switch storage entirely

5. Hybrid: tursogo for everything, in-memory HNSW only for vectors

Output

When to act

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions