Skip to content

Local-model support (Ollama / llama.cpp): swap out paid Claude + OpenAI dependencies #1

@mishablank

Description

@mishablank

Goal

Make Engram runnable end-to-end without paid API keys, by swapping the Anthropic and OpenAI dependencies for local-model alternatives (Ollama, llama.cpp, or any OpenAI-compatible local endpoint).

Why

Right now the bot requires:

  • Anthropic key — for enrichment (folder routing, title, summary, tags, related notes), vision OCR on photos, and the /ask RAG answer step.
  • OpenAI key (optional but recommended) — for Whisper voice transcription, embeddings (used in semantic dedupe, /relink, and ranking inside /search / /ask).

This locks out the self-hosted / local-LLM audience and adds an ongoing per-message cost. Both are real adoption barriers, especially for r/LocalLLaMA and r/selfhosted users.

Suggested order (cheapest wins first)

  1. Embeddings — replace OpenAI embeddings with a local model (e.g. nomic-embed-text via Ollama, or sentence-transformers/all-MiniLM-L6-v2). Easiest swap because embeddings are unidirectional and deterministic, and the rest of the dedupe / search code already works on plain vectors. Touches src/engram/embeddings.py only.

  2. Voice transcription — replace OpenAI Whisper API with local whisper.cpp or faster-whisper. Touches src/engram/whisper.py.

  3. Enrichment + /ask — abstract the Claude calls behind a small LLMClient protocol with implementations for Anthropic, OpenAI-compatible (covers Ollama, llama.cpp server, vLLM, LM Studio, Together, Groq), and possibly raw llama.cpp. Vision OCR is the trickiest piece — multimodal local models (LLaVA, Llama-3.2-Vision) work but quality is uneven.

  4. Configuration — add env vars like LLM_PROVIDER, LLM_MODEL, LLM_BASE_URL, and document recommended local-model defaults in the README.

Notes

  • Keep Claude/OpenAI as the default so existing users aren't broken.
  • Don't add a new heavyweight dependency unless it's truly needed — httpx plus the existing openai SDK pointed at a custom base_url handles most local servers already.
  • Tests should stub the LLM layer; the current test suite already mocks Anthropic / OpenAI calls, so the protocol abstraction should slot in cleanly.

Happy to review PRs. Start with step 1 (embeddings) if you want a small, scoped first contribution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions