Goal
Make Engram runnable end-to-end without paid API keys, by swapping the Anthropic and OpenAI dependencies for local-model alternatives (Ollama, llama.cpp, or any OpenAI-compatible local endpoint).
Why
Right now the bot requires:
- Anthropic key — for enrichment (folder routing, title, summary, tags, related notes), vision OCR on photos, and the
/ask RAG answer step.
- OpenAI key (optional but recommended) — for Whisper voice transcription, embeddings (used in semantic dedupe,
/relink, and ranking inside /search / /ask).
This locks out the self-hosted / local-LLM audience and adds an ongoing per-message cost. Both are real adoption barriers, especially for r/LocalLLaMA and r/selfhosted users.
Suggested order (cheapest wins first)
-
Embeddings — replace OpenAI embeddings with a local model (e.g. nomic-embed-text via Ollama, or sentence-transformers/all-MiniLM-L6-v2). Easiest swap because embeddings are unidirectional and deterministic, and the rest of the dedupe / search code already works on plain vectors. Touches src/engram/embeddings.py only.
-
Voice transcription — replace OpenAI Whisper API with local whisper.cpp or faster-whisper. Touches src/engram/whisper.py.
-
Enrichment + /ask — abstract the Claude calls behind a small LLMClient protocol with implementations for Anthropic, OpenAI-compatible (covers Ollama, llama.cpp server, vLLM, LM Studio, Together, Groq), and possibly raw llama.cpp. Vision OCR is the trickiest piece — multimodal local models (LLaVA, Llama-3.2-Vision) work but quality is uneven.
-
Configuration — add env vars like LLM_PROVIDER, LLM_MODEL, LLM_BASE_URL, and document recommended local-model defaults in the README.
Notes
- Keep Claude/OpenAI as the default so existing users aren't broken.
- Don't add a new heavyweight dependency unless it's truly needed —
httpx plus the existing openai SDK pointed at a custom base_url handles most local servers already.
- Tests should stub the LLM layer; the current test suite already mocks Anthropic / OpenAI calls, so the protocol abstraction should slot in cleanly.
Happy to review PRs. Start with step 1 (embeddings) if you want a small, scoped first contribution.
Goal
Make Engram runnable end-to-end without paid API keys, by swapping the Anthropic and OpenAI dependencies for local-model alternatives (Ollama, llama.cpp, or any OpenAI-compatible local endpoint).
Why
Right now the bot requires:
/askRAG answer step./relink, and ranking inside/search//ask).This locks out the self-hosted / local-LLM audience and adds an ongoing per-message cost. Both are real adoption barriers, especially for r/LocalLLaMA and r/selfhosted users.
Suggested order (cheapest wins first)
Embeddings — replace OpenAI embeddings with a local model (e.g.
nomic-embed-textvia Ollama, orsentence-transformers/all-MiniLM-L6-v2). Easiest swap because embeddings are unidirectional and deterministic, and the rest of the dedupe / search code already works on plain vectors. Touchessrc/engram/embeddings.pyonly.Voice transcription — replace OpenAI Whisper API with local
whisper.cpporfaster-whisper. Touchessrc/engram/whisper.py.Enrichment + /ask — abstract the Claude calls behind a small
LLMClientprotocol with implementations for Anthropic, OpenAI-compatible (covers Ollama, llama.cpp server, vLLM, LM Studio, Together, Groq), and possibly raw llama.cpp. Vision OCR is the trickiest piece — multimodal local models (LLaVA, Llama-3.2-Vision) work but quality is uneven.Configuration — add env vars like
LLM_PROVIDER,LLM_MODEL,LLM_BASE_URL, and document recommended local-model defaults in the README.Notes
httpxplus the existingopenaiSDK pointed at a custombase_urlhandles most local servers already.Happy to review PRs. Start with step 1 (embeddings) if you want a small, scoped first contribution.