Implementation on local Valkey + Ollama up to an MCP-compatible server, with a test/benchmark-first workflow
Agrama is a local micro-stack—Valkey 8.1 for KV + graph, Faiss 1.11 for vector search, Ollama running qwen3:1.7b for summarisation, all wrapped by a FastAPI MCP server and a lazygit-style TUI built with Textual. You get sub-millisecond direct look-ups, < 50 ms semantic search, and a CLI one-liner—docker compose up—that works the same on macOS, Linux, and Windows 11 (now that Windows ships MCP support) (GitHub, GitHub, GitHub, The Verge).
| Layer | Version | Highlights | Sources |
|---|---|---|---|
| Valkey | 8.1.1 | SIMD BITCOUNT path, TLS crash fix, CVE-2025-21605 patch, binary-compatible with Redis 7.2 (GitHub) |
turn1view0 |
| Faiss | 1.11.0 | RaBitQ, cosine metric, IndexIDMap w/ CAGRA, ARM OpenBLAS 0.3.29, overflow fixes (GitHub) |
turn2view0 |
| Embeddings | jina-embeddings-v3 |
1024-d multilingual, 94 lang MTEB leader (Hugging Face) | turn5search0 |
| LLM | qwen3:1.7b on Ollama ≥ 0.6.6 |
40 k ctx, MoE, 5 GB Q4_K_M bin (Ollama) | turn12search0 |
| MCP | spec v0.13 | Open protocol repo + reference servers (GitHub, GitHub) | turn6search0 / 1 |
| TUI | Textual 0.52 | Rich widgets + Tree & DirectoryTree (GitHub, Textual Documentation) |
turn3view0 / 13search0 |
agrama/
├─ agrama/ # python package
│ ├─ api/ # FastAPI routers (AAP + MCP proxy)
│ ├─ db/ # Valkey client + graph helpers
│ ├─ vector/ # Faiss wrapper
│ ├─ proto/ # *.proto (compiled via buf)
│ ├─ summariser/ # Ollama client
│ └─ tests/
├─ tui/ # Textual app
├─ docker/ # compose, Dockerfiles, healthchecks
├─ bench/ # pytest-benchmark json & compare script
└─ k6/ # load tests
message Node {
string uuid = 1;
string type = 2; // Session | Task | CodeUnit …
int64 created_at = 3;
int64 updated_at = 4;
bytes content = 5; // protobuf-packed struct or raw bytes
repeated float embedding = 6 [packed=true];
}- Serialization:
buf generate→ compact varint framing; ≈ 45 B/node for metadata.
| Purpose | Pattern | TTL |
|---|---|---|
| Node blob | mem:{uuid} → <proto bytes> |
∞ |
| Out edges | mem:{src}:out:{etype} → [dst1 … dstN] |
∞ |
| In edges | mem:{dst}:in:{etype} → [src1 … srcN] |
∞ |
| Temporal | mem:{uuid}:ts:{unix_ms} → <proto bytes> |
60 d |
- Lists are ziplist-encoded (Valkey default) → O(1) push/pop (GitHub).
- Graph traversal:
LRANGE mem:{src}:out:*is avoided—edge type is encoded in the key to keep look-ups O(1).
-
Ingest
vec_id = int(uuid.uuid4().int >> 64) # 64-bit index.add_with_ids(np.array([emb], dtype='float32'), np.array([vec_id])) r.set(f"vec:{vec_id}", node_uuid)
-
Search
I, D = index.search(query_vec, k=10) # I = ids uuids = r.mget([f"vec:{i}" for i in I[0]])
Faiss ID mapping + Redis KV protects against pointer invalidation when index is rebuilt (GitHub).
| Method | Path | Payload / Params |
|---|---|---|
PUT |
/nodes |
Node proto |
GET |
/nodes/{uuid} |
— |
GET |
/nodes/{uuid}/at/{ts} |
time-travel |
PUT |
/edges |
{src,dst,type,weight} |
GET |
/edges/{src} |
type, dir |
POST |
/semantic_search |
{embedding,k} |
GET |
/keyword_search |
q, k, fields |
POST |
/summarise |
{root_uuid} |
/v1/tools,/v1/resources,/v1/promptssimply translate to AAP calls and re-shape to MCP schema json (GitHub).
services:
valkey:
image: valkey/valkey:8.1.1-alpine
command: ["valkey-server","--save","","--appendonly","no"]
healthcheck:
test: ["CMD", "valkey-cli","ping"] # returns PONG
interval: 10s
retries: 3
ports: ["6379:6379"] # expose for local debug
faiss: # stateless side-car
image: ghcr.io/facebookresearch/faiss-cpu:1.11.0
command: ["sleep","infinity"]
ollama:
image: ollama/ollama:latest
volumes: ["./models:/root/.ollama/models"]
ports: ["11434:11434"]
agramad:
build: .
depends_on:
valkey:
condition: service_healthy
environment:
- VALKEY_URL=redis://valkey:6379
- FAISS_HOST=faiss
- OLLAMA_URL=http://ollama:11434
ports: ["8000:8000"]healthcheck keeps the API container in “starting” until Valkey answers PONG (GitHub).
from hypothesis import given
@given(node_type=st.sampled_from(["Session","Task","CodeUnit"]))
def test_key_schema_roundtrip(node_type):
uuid = new_uuid()
key = make_node_key(uuid)
assert parse_node_key(key) == uuidpytest --benchmark-autosave
pytest-benchmark compare --sort=meanCI fails if p99 get_node > 1 ms or if current mean regresses > 10 % (pytest-benchmark).
k6/script.js:
import http from 'k6/http';
export let options = { vus: 50, duration: '1m' };
export default function () {
http.get('http://localhost:8000/nodes/seed-0001');
}Run with docker run --rm -i grafana/k6 run - < k6/script.js expecting < 1 % errors (k6.io).
-
Endpoint:
POST /summarise {root_uuid} -
Flow:
-
Gather all
Interactionnodes underroot_uuid(DFS). -
Chunk to ≤ 8 k tokens; stream into Ollama
/api/chatwith system prompt:“Summarise these interactions for future retrieval …”
-
Receive markdown; store as a new
SummaryNode; link with edgesummarises.
-
-
Ollama call (Python):
import httpx, json
resp = httpx.post(
f"{OLLAMA}/api/generate",
json={"model":"qwen3:1.7b","prompt":prompt,"stream":False}
).json()Example repos show similar one-shot summarizers with Qwen-2; adapt for Qwen3 (GitHub, GitHub).
┌──────── Graph Tree ────────┐┌─ Preview ────────┐
│ Session › Task › … ││ Markdown / code │
│ (Keyboard: ↑ ↓ → ←) │└──────────────────┘
├──────── Neighbors ─────────┤
│ Edge list w/ weights │
├──────── Command Palette ───┤
│ > _ │
└────────────────────────────┘
-
Tree uses
Tree[str],DirectoryTree, orJSONTreewidgets (Textual Documentation). -
Global keymap mirrors lazygit:
Key Action →/←Traverse edge sSemantic search /Keyword search tTime-travel input pCommand palette qQuit
Keybinding names align with lazygit’s canonical list for muscle memory (GitHub).
class AgramaTUI(App):
CSS_PATH = "tui.tcss"
BINDINGS = [("s","semantic_search","Semantic Search"), ("t","time_travel","At Time"), ("q","quit","Quit")]
def compose(self):
self.tree = Tree("Memory")
yield Horizontal(
self.tree,
Vertical(Neighbors(), Preview())
)| Step | Tool | Cache Strategy |
|---|---|---|
| Lint + mypy | ruff, mypy | --pre-commit |
| Unit tests | pytest | GitHub cache for ~/.cache/pip |
| Component | docker compose up –wait | Layered build cache |
| Bench | pytest-benchmark json; pytest-benchmark compare |
Artefact store |
| Image | Buildx multi-arch (linux/amd64, linux/arm64) | cache-from |
| Release | GitHub draft release + OCI push | Signed SBOM |
# bootstrap
make dev # spin containers
make seed # load 10k sample nodes
agrama tui # open Textual UI
# hacking
make proto # re-generate gRPC stubs
make test # full test matrix
make bench # micro-bench
make load # k6 100 RPS
# MCP client demo
python -m agrama.demo.get_code_snippet "TypeScript debounce"- Proto freeze: lock message IDs before week-2 to avoid breaking stored data.
- Implement Valkey pipelines: use
MULTI/EXECfor batch edge inserts to hit 10 k ops/s target. - Wire Search fallback: if
embeddingis missing, default to BM25 only. - Write TUI CSS: grid layout + dark/light theme toggle (
action_toggle_dark) (Textual Documentation).
Valkey 8.1.1 notes (GitHub) Faiss 1.11 release (GitHub) Textual docs (GitHub) Tree widget guide (Textual Documentation) Lazygit keybindings (GitHub) Jina embeddings v3 card (Hugging Face) Ollama qwen3 model page (Ollama) MCP spec repo (GitHub) MCP servers repo (GitHub) Docker healthcheck recipe (GitHub) Pytest-benchmark compare docs (pytest-benchmark) k6 quickstart docs (k6.io) Faiss ID mapping example (GitHub) Ollama summariser examples (GitHub)