Skip to content

Latest commit

 

History

History
321 lines (246 loc) · 13.5 KB

File metadata and controls

321 lines (246 loc) · 13.5 KB

Agrama V1 Developer Specification

Implementation on local Valkey + Ollama up to an MCP-compatible server, with a test/benchmark-first workflow


0 · Quick Recap

Agrama is a local micro-stack—Valkey 8.1 for KV + graph, Faiss 1.11 for vector search, Ollama running qwen3:1.7b for summarisation, all wrapped by a FastAPI MCP server and a lazygit-style TUI built with Textual. You get sub-millisecond direct look-ups, < 50 ms semantic search, and a CLI one-liner—docker compose up—that works the same on macOS, Linux, and Windows 11 (now that Windows ships MCP support) (GitHub, GitHub, GitHub, The Verge).


1 · Stack Versions & Why They Matter

Layer Version Highlights Sources
Valkey 8.1.1 SIMD BITCOUNT path, TLS crash fix, CVE-2025-21605 patch, binary-compatible with Redis 7.2 (GitHub) turn1view0
Faiss 1.11.0 RaBitQ, cosine metric, IndexIDMap w/ CAGRA, ARM OpenBLAS 0.3.29, overflow fixes (GitHub) turn2view0
Embeddings jina-embeddings-v3 1024-d multilingual, 94 lang MTEB leader (Hugging Face) turn5search0
LLM qwen3:1.7b on Ollama ≥ 0.6.6 40 k ctx, MoE, 5 GB Q4_K_M bin (Ollama) turn12search0
MCP spec v0.13 Open protocol repo + reference servers (GitHub, GitHub) turn6search0 / 1
TUI Textual 0.52 Rich widgets + Tree & DirectoryTree (GitHub, Textual Documentation) turn3view0 / 13search0

2 · Directory Layout

agrama/
├─ agrama/               # python package
│  ├─ api/               # FastAPI routers (AAP + MCP proxy)
│  ├─ db/                # Valkey client + graph helpers
│  ├─ vector/            # Faiss wrapper
│  ├─ proto/             # *.proto (compiled via buf)
│  ├─ summariser/        # Ollama client
│  └─ tests/
├─ tui/                  # Textual app
├─ docker/               # compose, Dockerfiles, healthchecks
├─ bench/                # pytest-benchmark json & compare script
└─ k6/                   # load tests

3 · Data Model & Key Schema (Valkey)

3.1 Node Encoding

message Node {
  string uuid = 1;
  string type = 2;          // Session | Task | CodeUnit …
  int64  created_at = 3;
  int64  updated_at = 4;
  bytes  content = 5;       // protobuf-packed struct or raw bytes
  repeated float embedding = 6 [packed=true];
}
  • Serialization: buf generate → compact varint framing; ≈ 45 B/node for metadata.

3.2 Key Conventions

Purpose Pattern TTL
Node blob mem:{uuid}<proto bytes>
Out edges mem:{src}:out:{etype}[dst1 … dstN]
In edges mem:{dst}:in:{etype}[src1 … srcN]
Temporal mem:{uuid}:ts:{unix_ms}<proto bytes> 60 d
  • Lists are ziplist-encoded (Valkey default) → O(1) push/pop (GitHub).
  • Graph traversal: LRANGE mem:{src}:out:* is avoided—edge type is encoded in the key to keep look-ups O(1).

4 · Vector Search Path

  1. Ingest

    vec_id = int(uuid.uuid4().int >> 64)  # 64-bit
    index.add_with_ids(np.array([emb], dtype='float32'), np.array([vec_id]))
    r.set(f"vec:{vec_id}", node_uuid)
  2. Search

    I, D = index.search(query_vec, k=10)  # I = ids
    uuids = r.mget([f"vec:{i}" for i in I[0]])

Faiss ID mapping + Redis KV protects against pointer invalidation when index is rebuilt (GitHub).


5 · API Surface (FastAPI)

5.1 AAP Endpoints

Method Path Payload / Params
PUT /nodes Node proto
GET /nodes/{uuid}
GET /nodes/{uuid}/at/{ts} time-travel
PUT /edges {src,dst,type,weight}
GET /edges/{src} type, dir
POST /semantic_search {embedding,k}
GET /keyword_search q, k, fields
POST /summarise {root_uuid}

5.2 MCP Proxy

  • /v1/tools, /v1/resources, /v1/prompts simply translate to AAP calls and re-shape to MCP schema json (GitHub).

6 · Docker Compose + Healthchecks

services:
  valkey:
    image: valkey/valkey:8.1.1-alpine
    command: ["valkey-server","--save","","--appendonly","no"]
    healthcheck:
      test: ["CMD", "valkey-cli","ping"]   # returns PONG
      interval: 10s
      retries: 3
    ports: ["6379:6379"]                   # expose for local debug
  faiss:                                   # stateless side-car
    image: ghcr.io/facebookresearch/faiss-cpu:1.11.0
    command: ["sleep","infinity"]
  ollama:
    image: ollama/ollama:latest
    volumes: ["./models:/root/.ollama/models"]
    ports: ["11434:11434"]
  agramad:
    build: .
    depends_on:
      valkey:
        condition: service_healthy
    environment:
      - VALKEY_URL=redis://valkey:6379
      - FAISS_HOST=faiss
      - OLLAMA_URL=http://ollama:11434
    ports: ["8000:8000"]

healthcheck keeps the API container in “starting” until Valkey answers PONG (GitHub).


7 · Testing & Bench Harness

7.1 Unit + Property Tests

from hypothesis import given
@given(node_type=st.sampled_from(["Session","Task","CodeUnit"]))
def test_key_schema_roundtrip(node_type):
    uuid = new_uuid()
    key = make_node_key(uuid)
    assert parse_node_key(key) == uuid

7.2 Performance Gate

pytest --benchmark-autosave
pytest-benchmark compare --sort=mean

CI fails if p99 get_node > 1 ms or if current mean regresses > 10 % (pytest-benchmark).

7.3 Load Smoke (k6)

k6/script.js:

import http from 'k6/http';
export let options = { vus: 50, duration: '1m' };
export default function () {
  http.get('http://localhost:8000/nodes/seed-0001');
}

Run with docker run --rm -i grafana/k6 run - < k6/script.js expecting < 1 % errors (k6.io).


8 · Summarisation Worker

  • Endpoint: POST /summarise {root_uuid}

  • Flow:

    1. Gather all Interaction nodes under root_uuid (DFS).

    2. Chunk to ≤ 8 k tokens; stream into Ollama /api/chat with system prompt:

      “Summarise these interactions for future retrieval …”

    3. Receive markdown; store as a new SummaryNode; link with edge summarises.

  • Ollama call (Python):

import httpx, json
resp = httpx.post(
  f"{OLLAMA}/api/generate",
  json={"model":"qwen3:1.7b","prompt":prompt,"stream":False}
).json()

Example repos show similar one-shot summarizers with Qwen-2; adapt for Qwen3 (GitHub, GitHub).


9 · TUI Details (Textual)

9.1 Screen Layout

┌──────── Graph Tree ────────┐┌─ Preview ────────┐
│ Session › Task › …         ││ Markdown / code  │
│ (Keyboard: ↑ ↓ → ←)        │└──────────────────┘
├──────── Neighbors ─────────┤
│ Edge list w/ weights       │
├──────── Command Palette ───┤
│ > _                        │
└────────────────────────────┘
  • Tree uses Tree[str], DirectoryTree, or JSONTree widgets (Textual Documentation).

  • Global keymap mirrors lazygit:

    Key Action
    / Traverse edge
    s Semantic search
    / Keyword search
    t Time-travel input
    p Command palette
    q Quit

Keybinding names align with lazygit’s canonical list for muscle memory (GitHub).

9.2 Code Snippet

class AgramaTUI(App):
    CSS_PATH = "tui.tcss"
    BINDINGS = [("s","semantic_search","Semantic Search"), ("t","time_travel","At Time"), ("q","quit","Quit")]

    def compose(self):
        self.tree = Tree("Memory")
        yield Horizontal(
            self.tree,
            Vertical(Neighbors(), Preview())
        )

10 · CI / CD Pipeline

Step Tool Cache Strategy
Lint + mypy ruff, mypy --pre-commit
Unit tests pytest GitHub cache for ~/.cache/pip
Component docker compose up –wait Layered build cache
Bench pytest-benchmark json; pytest-benchmark compare Artefact store
Image Buildx multi-arch (linux/amd64, linux/arm64) cache-from
Release GitHub draft release + OCI push Signed SBOM

11 · Local Dev Cheatsheet

# bootstrap
make dev          # spin containers
make seed         # load 10k sample nodes
agrama tui        # open Textual UI
# hacking
make proto        # re-generate gRPC stubs
make test         # full test matrix
make bench        # micro-bench
make load         # k6 100 RPS
# MCP client demo
python -m agrama.demo.get_code_snippet "TypeScript debounce"

12 · Next Actions for Lead Dev

  1. Proto freeze: lock message IDs before week-2 to avoid breaking stored data.
  2. Implement Valkey pipelines: use MULTI/EXEC for batch edge inserts to hit 10 k ops/s target.
  3. Wire Search fallback: if embedding is missing, default to BM25 only.
  4. Write TUI CSS: grid layout + dark/light theme toggle (action_toggle_dark) (Textual Documentation).

Key References

Valkey 8.1.1 notes (GitHub) Faiss 1.11 release (GitHub) Textual docs (GitHub) Tree widget guide (Textual Documentation) Lazygit keybindings (GitHub) Jina embeddings v3 card (Hugging Face) Ollama qwen3 model page (Ollama) MCP spec repo (GitHub) MCP servers repo (GitHub) Docker healthcheck recipe (GitHub) Pytest-benchmark compare docs (pytest-benchmark) k6 quickstart docs (k6.io) Faiss ID mapping example (GitHub) Ollama summariser examples (GitHub)