Agrama V1 Developer Specification

Implementation on local Valkey + Ollama up to an MCP-compatible server, with a test/benchmark-first workflow

0 · Quick Recap

Agrama is a local micro-stack—Valkey 8.1 for KV + graph, Faiss 1.11 for vector search, Ollama running qwen3:1.7b for summarisation, all wrapped by a FastAPI MCP server and a lazygit-style TUI built with Textual. You get sub-millisecond direct look-ups, < 50 ms semantic search, and a CLI one-liner—docker compose up—that works the same on macOS, Linux, and Windows 11 (now that Windows ships MCP support) (GitHub, GitHub, GitHub, The Verge).

1 · Stack Versions & Why They Matter

Layer	Version	Highlights	Sources
Valkey	8.1.1	SIMD `BITCOUNT` path, TLS crash fix, CVE-2025-21605 patch, binary-compatible with Redis 7.2 (GitHub)	turn1view0
Faiss	1.11.0	`RaBitQ`, cosine metric, `IndexIDMap` w/ CAGRA, ARM OpenBLAS 0.3.29, overflow fixes (GitHub)	turn2view0
Embeddings	`jina-embeddings-v3`	1024-d multilingual, 94 lang MTEB leader (Hugging Face)	turn5search0
LLM	`qwen3:1.7b` on Ollama ≥ 0.6.6	40 k ctx, MoE, 5 GB Q4_K_M bin (Ollama)	turn12search0
MCP	spec v0.13	Open protocol repo + reference servers (GitHub, GitHub)	turn6search0 / 1
TUI	Textual 0.52	Rich widgets + `Tree` & `DirectoryTree` (GitHub, Textual Documentation)	turn3view0 / 13search0

2 · Directory Layout

agrama/
├─ agrama/               # python package
│  ├─ api/               # FastAPI routers (AAP + MCP proxy)
│  ├─ db/                # Valkey client + graph helpers
│  ├─ vector/            # Faiss wrapper
│  ├─ proto/             # *.proto (compiled via buf)
│  ├─ summariser/        # Ollama client
│  └─ tests/
├─ tui/                  # Textual app
├─ docker/               # compose, Dockerfiles, healthchecks
├─ bench/                # pytest-benchmark json & compare script
└─ k6/                   # load tests

3 · Data Model & Key Schema (Valkey)

3.1 Node Encoding

message Node {
  string uuid = 1;
  string type = 2;          // Session | Task | CodeUnit …
  int64  created_at = 3;
  int64  updated_at = 4;
  bytes  content = 5;       // protobuf-packed struct or raw bytes
  repeated float embedding = 6 [packed=true];
}

Serialization: buf generate → compact varint framing; ≈ 45 B/node for metadata.

3.2 Key Conventions

Purpose	Pattern	TTL
Node blob	`mem:{uuid}` → `<proto bytes>`	∞
Out edges	`mem:{src}:out:{etype}` → `[dst1 … dstN]`	∞
In edges	`mem:{dst}:in:{etype}` → `[src1 … srcN]`	∞
Temporal	`mem:{uuid}:ts:{unix_ms}` → `<proto bytes>`	60 d

Lists are ziplist-encoded (Valkey default) → O(1) push/pop (GitHub).
Graph traversal: LRANGE mem:{src}:out:* is avoided—edge type is encoded in the key to keep look-ups O(1).

4 · Vector Search Path

Ingest

vec_id = int(uuid.uuid4().int >> 64)  # 64-bit
index.add_with_ids(np.array([emb], dtype='float32'), np.array([vec_id]))
r.set(f"vec:{vec_id}", node_uuid)

Search

I, D = index.search(query_vec, k=10)  # I = ids
uuids = r.mget([f"vec:{i}" for i in I[0]])

Faiss ID mapping + Redis KV protects against pointer invalidation when index is rebuilt (GitHub).

5 · API Surface (FastAPI)

5.1 AAP Endpoints

Method	Path	Payload / Params
`PUT`	`/nodes`	`Node` proto
`GET`	`/nodes/{uuid}`	—
`GET`	`/nodes/{uuid}/at/{ts}`	time-travel
`PUT`	`/edges`	`{src,dst,type,weight}`
`GET`	`/edges/{src}`	`type`, `dir`
`POST`	`/semantic_search`	`{embedding,k}`
`GET`	`/keyword_search`	`q`, `k`, `fields`
`POST`	`/summarise`	`{root_uuid}`

5.2 MCP Proxy

/v1/tools, /v1/resources, /v1/prompts simply translate to AAP calls and re-shape to MCP schema json (GitHub).

6 · Docker Compose + Healthchecks

services:
  valkey:
    image: valkey/valkey:8.1.1-alpine
    command: ["valkey-server","--save","","--appendonly","no"]
    healthcheck:
      test: ["CMD", "valkey-cli","ping"]   # returns PONG
      interval: 10s
      retries: 3
    ports: ["6379:6379"]                   # expose for local debug
  faiss:                                   # stateless side-car
    image: ghcr.io/facebookresearch/faiss-cpu:1.11.0
    command: ["sleep","infinity"]
  ollama:
    image: ollama/ollama:latest
    volumes: ["./models:/root/.ollama/models"]
    ports: ["11434:11434"]
  agramad:
    build: .
    depends_on:
      valkey:
        condition: service_healthy
    environment:
      - VALKEY_URL=redis://valkey:6379
      - FAISS_HOST=faiss
      - OLLAMA_URL=http://ollama:11434
    ports: ["8000:8000"]

healthcheck keeps the API container in “starting” until Valkey answers PONG (GitHub).

7 · Testing & Bench Harness

7.1 Unit + Property Tests

from hypothesis import given
@given(node_type=st.sampled_from(["Session","Task","CodeUnit"]))
def test_key_schema_roundtrip(node_type):
    uuid = new_uuid()
    key = make_node_key(uuid)
    assert parse_node_key(key) == uuid

7.2 Performance Gate

pytest --benchmark-autosave
pytest-benchmark compare --sort=mean

CI fails if p99 get_node > 1 ms or if current mean regresses > 10 % (pytest-benchmark).

7.3 Load Smoke (k6)

k6/script.js:

import http from 'k6/http';
export let options = { vus: 50, duration: '1m' };
export default function () {
  http.get('http://localhost:8000/nodes/seed-0001');
}

Run with docker run --rm -i grafana/k6 run - < k6/script.js expecting < 1 % errors (k6.io).

8 · Summarisation Worker

Endpoint: POST /summarise {root_uuid}
Flow:
1. Gather all Interaction nodes under root_uuid (DFS).
2. Chunk to ≤ 8 k tokens; stream into Ollama /api/chat with system prompt:
  
  “Summarise these interactions for future retrieval …”
3. Receive markdown; store as a new SummaryNode; link with edge summarises.
Ollama call (Python):

import httpx, json
resp = httpx.post(
  f"{OLLAMA}/api/generate",
  json={"model":"qwen3:1.7b","prompt":prompt,"stream":False}
).json()

Example repos show similar one-shot summarizers with Qwen-2; adapt for Qwen3 (GitHub, GitHub).

9 · TUI Details (Textual)

9.1 Screen Layout

┌──────── Graph Tree ────────┐┌─ Preview ────────┐
│ Session › Task › …         ││ Markdown / code  │
│ (Keyboard: ↑ ↓ → ←)        │└──────────────────┘
├──────── Neighbors ─────────┤
│ Edge list w/ weights       │
├──────── Command Palette ───┤
│ > _                        │
└────────────────────────────┘

Tree uses Tree[str], DirectoryTree, or JSONTree widgets (Textual Documentation).
Global keymap mirrors lazygit:

Key Action

→ / ← Traverse edge

s Semantic search

/ Keyword search

t Time-travel input

p Command palette

q Quit

Key	Action
`→` / `←`	Traverse edge
`s`	Semantic search
`/`	Keyword search
`t`	Time-travel input
`p`	Command palette
`q`	Quit

Keybinding names align with lazygit’s canonical list for muscle memory (GitHub).

9.2 Code Snippet

class AgramaTUI(App):
    CSS_PATH = "tui.tcss"
    BINDINGS = [("s","semantic_search","Semantic Search"), ("t","time_travel","At Time"), ("q","quit","Quit")]

    def compose(self):
        self.tree = Tree("Memory")
        yield Horizontal(
            self.tree,
            Vertical(Neighbors(), Preview())
        )

10 · CI / CD Pipeline

Step	Tool	Cache Strategy
Lint + mypy	ruff, mypy	`--pre-commit`
Unit tests	pytest	GitHub cache for `~/.cache/pip`
Component	docker compose up –wait	Layered build cache
Bench	pytest-benchmark json; `pytest-benchmark compare`	Artefact store
Image	Buildx multi-arch (linux/amd64, linux/arm64)	`cache-from`
Release	GitHub draft release + OCI push	Signed SBOM

11 · Local Dev Cheatsheet

# bootstrap
make dev          # spin containers
make seed         # load 10k sample nodes
agrama tui        # open Textual UI
# hacking
make proto        # re-generate gRPC stubs
make test         # full test matrix
make bench        # micro-bench
make load         # k6 100 RPS
# MCP client demo
python -m agrama.demo.get_code_snippet "TypeScript debounce"

12 · Next Actions for Lead Dev

Proto freeze: lock message IDs before week-2 to avoid breaking stored data.
Implement Valkey pipelines: use MULTI/EXEC for batch edge inserts to hit 10 k ops/s target.
Wire Search fallback: if embedding is missing, default to BM25 only.
Write TUI CSS: grid layout + dark/light theme toggle (action_toggle_dark) (Textual Documentation).

Key References

Valkey 8.1.1 notes (GitHub) Faiss 1.11 release (GitHub) Textual docs (GitHub) Tree widget guide (Textual Documentation) Lazygit keybindings (GitHub) Jina embeddings v3 card (Hugging Face) Ollama qwen3 model page (Ollama) MCP spec repo (GitHub) MCP servers repo (GitHub) Docker healthcheck recipe (GitHub) Pytest-benchmark compare docs (pytest-benchmark) k6 quickstart docs (k6.io) Faiss ID mapping example (GitHub) Ollama summariser examples (GitHub)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agrama V1 Developer Specification

0 · Quick Recap

1 · Stack Versions & Why They Matter

2 · Directory Layout

3 · Data Model & Key Schema (Valkey)

3.1 Node Encoding

3.2 Key Conventions

4 · Vector Search Path

5 · API Surface (FastAPI)

5.1 AAP Endpoints

5.2 MCP Proxy

6 · Docker Compose + Healthchecks

7 · Testing & Bench Harness

7.1 Unit + Property Tests

7.2 Performance Gate

7.3 Load Smoke (k6)

8 · Summarisation Worker

9 · TUI Details (Textual)

9.1 Screen Layout

9.2 Code Snippet

10 · CI / CD Pipeline

11 · Local Dev Cheatsheet

12 · Next Actions for Lead Dev

Key References

FilesExpand file tree

SPECS.md

Latest commit

History

SPECS.md

File metadata and controls

Agrama V1 Developer Specification

0 · Quick Recap

1 · Stack Versions & Why They Matter

2 · Directory Layout

3 · Data Model & Key Schema (Valkey)

3.1 Node Encoding

3.2 Key Conventions

4 · Vector Search Path

5 · API Surface (FastAPI)

5.1 AAP Endpoints

5.2 MCP Proxy

6 · Docker Compose + Healthchecks

7 · Testing & Bench Harness

7.1 Unit + Property Tests

7.2 Performance Gate

7.3 Load Smoke (k6)

8 · Summarisation Worker

9 · TUI Details (Textual)

9.1 Screen Layout

9.2 Code Snippet

10 · CI / CD Pipeline

11 · Local Dev Cheatsheet

12 · Next Actions for Lead Dev

Key References