Local-model support (Ollama / llama.cpp): swap out paid Claude + OpenAI dependencies

## Goal

Make Engram runnable end-to-end without paid API keys, by swapping the Anthropic and OpenAI dependencies for local-model alternatives (Ollama, llama.cpp, or any OpenAI-compatible local endpoint).

## Why

Right now the bot requires:
- **Anthropic** key — for enrichment (folder routing, title, summary, tags, related notes), vision OCR on photos, and the `/ask` RAG answer step.
- **OpenAI** key (optional but recommended) — for Whisper voice transcription, embeddings (used in semantic dedupe, `/relink`, and ranking inside `/search` / `/ask`).

This locks out the self-hosted / local-LLM audience and adds an ongoing per-message cost. Both are real adoption barriers, especially for r/LocalLLaMA and r/selfhosted users.

## Suggested order (cheapest wins first)

1. **Embeddings** — replace OpenAI embeddings with a local model (e.g. `nomic-embed-text` via Ollama, or `sentence-transformers/all-MiniLM-L6-v2`). Easiest swap because embeddings are unidirectional and deterministic, and the rest of the dedupe / search code already works on plain vectors. Touches `src/engram/embeddings.py` only.

2. **Voice transcription** — replace OpenAI Whisper API with local `whisper.cpp` or `faster-whisper`. Touches `src/engram/whisper.py`.

3. **Enrichment + /ask** — abstract the Claude calls behind a small `LLMClient` protocol with implementations for Anthropic, OpenAI-compatible (covers Ollama, llama.cpp server, vLLM, LM Studio, Together, Groq), and possibly raw llama.cpp. Vision OCR is the trickiest piece — multimodal local models (LLaVA, Llama-3.2-Vision) work but quality is uneven.

4. **Configuration** — add env vars like `LLM_PROVIDER`, `LLM_MODEL`, `LLM_BASE_URL`, and document recommended local-model defaults in the README.

## Notes

- Keep Claude/OpenAI as the default so existing users aren't broken.
- Don't add a new heavyweight dependency unless it's truly needed — `httpx` plus the existing `openai` SDK pointed at a custom `base_url` handles most local servers already.
- Tests should stub the LLM layer; the current test suite already mocks Anthropic / OpenAI calls, so the protocol abstraction should slot in cleanly.

Happy to review PRs. Start with step 1 (embeddings) if you want a small, scoped first contribution.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local-model support (Ollama / llama.cpp): swap out paid Claude + OpenAI dependencies #1

Goal

Why

Suggested order (cheapest wins first)

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Local-model support (Ollama / llama.cpp): swap out paid Claude + OpenAI dependencies #1

Description

Goal

Why

Suggested order (cheapest wins first)

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions