nram for short
The continuity layer for everything you do with AI.
One open source server you run yourself: every tool you use, every machine you work from, and the continuity is yours.
Quick Start · Docs · nram.ai
Work in progress: under active development. Expect rough edges, and feedback is welcome.
Right now, you are the continuity layer between your AI tools. You copy context from Claude into ChatGPT, write handoff docs, re-explain the same decisions, and lose a little more each time you switch tools or machines.
nram is a continuity substrate: one self-hosted server that keeps what mattered across every tool, every conversation, and every machine, on infrastructure that belongs to you. Your agent already reads the PDF, watches the video, runs the test, scrapes the page. nram's job is to keep what mattered. Context, not storage.
It is a real server, not a library or a localhost shim: a single MIT-licensed binary with OAuth, passkeys, multi-tenancy, and MCP over HTTP, so your laptop, desktop, and phone all see the same brain. And it is more than a database with vector search: it pulls facts and entities out of your free-text notes, builds a knowledge graph of how they connect, and runs a background dreaming cycle that consolidates, dedups, and prunes while the server is idle, like a notebook that quietly reorganizes itself overnight.
nram is not another memory tool bolted onto one app. It is the layer underneath them, so a single server covers work that today is split across separate products:
| Job | What nram provides | Comparable tools |
|---|---|---|
| Conversational continuity | Memory that survives across sessions, tools, and vendors, reachable over MCP. | Claude Memory, ChatGPT Memory |
| Document-corpus recall | Semantic search, an entity-deduped knowledge graph, and consolidation over a stored corpus. A substrate, not a chat UI. | NotebookLM, AnythingLLM, Khoj |
| Procedural rules | A first-class verbatim tier for standing rules, conventions, and protocols an assistant loads at session start. Returned byte-for-byte, never embedded or paraphrased. | (no direct equivalent) |
Persona / self-knowledge (about_me) |
A reserved, fully-indexed tier for identity, preferences, and ongoing context that surfaces by association on every recall. | (no direct equivalent) |
| Agent memory | Persistent memory for coding, research, and custom agents, with consolidation and a knowledge graph on top. | Mem0, Letta, Zep, Graphiti |
- MCP is how Claude, ChatGPT, Cursor, or a custom agent connects. Streamable HTTP transport at
/mcp, with OAuth discovery published at the well-known paths. - REST API lets any code that can speak HTTP store and recall. See docs/api.md.
- Web Console is the dashboard for organizations, projects, providers, the knowledge graph, and the dreaming cycle.
Memory and recall. Hybrid retrieval fuses vector and lexical search (FTS5 on SQLite, tsvector on Postgres) with Reciprocal Rank Fusion, boosted by the knowledge graph and scored by six tunable terms (similarity, recency, importance, frequency, graph relevance, confidence). MMR reranking demotes near-duplicate results so recall stays diverse. Embedding runs off the write path, so stores stay fast. Semantic search backs onto pgvector, a pure-Go HNSW index, or Qdrant.
Enrichment and the knowledge graph. Background workers extract facts, entities, and relationships from your free-text memories and build a multi-hop graph that connects them over time. An optional ingestion judge decides add / update / delete / none against near-duplicates before extraction runs. Query augmentation paraphrases each memory into short retrieval queries so recall matches the way people actually ask.
Dreaming. An offline nine-phase consolidation cycle: entity dedup, embedding and augmentation backfill, paraphrase dedup, transitive-relationship inference, contradiction detection, consolidation, pruning (with optional confidence decay), and weight recalculation. An LLM novelty audit demotes low-value syntheses. See docs/operations.md.
Tiers. Procedural memory is a verbatim per-user tier for rules and protocols, stored byte-for-byte and never embedded, enriched, or rewritten. The persona (about_me) and global tiers are reserved, auto-provisioned, and always join the recall aperture so relevant self-knowledge and world-knowledge surface alongside project results.
Access and multi-tenancy. Authentication via JWT, per-user API keys, WebAuthn passkeys, and per-organization OIDC SSO. Full OAuth 2.0 (Authorization Code + PKCE, dynamic client registration, resource indicators, discovery metadata). Five RBAC roles across REST and MCP. Organizations, hierarchical namespaces, and projects for isolation, plus share tokens for granting scoped external access without an account.
Operability. The Web Console, a React app, manages providers, settings, the graph, dreaming, the enrichment queue, and analytics. Run on SQLite (zero-config) or PostgreSQL, with SQLite-to-Postgres migration tooling. Provider-agnostic across OpenAI, Anthropic, Google Gemini, Ollama, OpenRouter, and any OpenAI-compatible endpoint, with per-call token accounting. Real-time updates over SSE, HMAC-signed webhooks, Prometheus metrics at /metrics, and JSON / NDJSON import/export.
A full feature-by-feature reference lives across the docs.
nram needs an LLM provider to do anything beyond storing raw text. Without one, recall falls back to keyword-only matching and the knowledge graph stays empty. No error is raised; the system runs degraded. Configure providers in Step 5, and pick a long-context embedding model (see docs/models.md).
| Tool | Required | Check |
|---|---|---|
| Go | 1.26.1+ | go version |
| Node.js | 18+ | node --version |
| npm | any (build uses npm ci, not pnpm or yarn) |
npm --version |
| Ollama | optional, for local LLMs | ollama --version (skip if using a hosted provider) |
git clone <repo-url> nram && cd nram
make buildOutput is a single ./nram binary with the Web Console embedded.
./nramDefaults: port 8674, SQLite (nram.db in the working directory). For Postgres, set DATABASE_URL and restart:
DATABASE_URL=postgres://user:pass@localhost:5432/nram ./nramNavigate to http://localhost:8674, create the initial admin account, and save the API key shown on the completion screen. It is not shown again.
Open Settings → Providers and configure three slots: Embedding (semantic search), Fact Extraction, and Entity Extraction (the knowledge graph and dreaming). Any of OpenAI, Anthropic (chat slots only; it has no embeddings API), Google Gemini, Ollama, OpenRouter, or an OpenAI-compatible endpoint works. Changes hot-reload; no restart needed.
See docs/models.md for which model to put in each slot and how to size local models.
curl http://localhost:8674/v1/healthEach provider slot should report "status": "ok". If a slot is missing or unhealthy, fix it before storing memories, otherwise they will not be embedded or enriched and you will need ./nram --reembed-all-memories afterward.
Local clients that can reach the server over your own network (Claude Code, Codex, Cursor, and other CLI or IDE tools) can use the plain HTTP URL directly, whether that's localhost or a LAN IP like 192.168.1.x. For Claude Code:
claude mcp add --transport http nram http://localhost:8674/mcpOAuth discovery is published at /.well-known/oauth-authorization-server and /.well-known/oauth-protected-resource, so OAuth-capable clients negotiate a token automatically. For clients without OAuth, use the API key from Step 4 as a Bearer token.
Hosted web tools need a public HTTPS URL. ChatGPT, Claude on the web (claude.ai), and the Claude desktop and mobile apps reach your server from the vendor's cloud, not from your machine, so
http://localhostwill not work. They require a real, publicly resolvable hostname served over HTTPS with a valid (not self-signed) TLS certificate. nram serves plain HTTP and does not terminate TLS itself, so put it behind a reverse proxy that handles TLS (Caddy, nginx, Traefik) or expose it through a tunnel (Cloudflare Tunnel, ngrok, Tailscale Funnel), then point the connector at your publichttps://your-host/mcpURL.
Hitting trouble? See Troubleshooting.
The deep reference is split out to keep this page approachable:
- docs/api.md: full REST API and MCP tool/resource reference, including update/supersede and move semantics.
- docs/models.md: recommended models per slot, VRAM sizing for local models, Ollama
num_ctxand keep-alive tuning. - docs/configuration.md: bootstrap vs runtime config, environment variables, databases (SQLite, Postgres, Qdrant), migrations, and operator flags.
- docs/operations.md: troubleshooting and the dreaming / backfill operations guide.
- docs/openapi.yaml: OpenAPI 3.1 specification, also served by the running server at
GET /openapi.yaml. A conformance test keeps it in sync with the router.
make install-ui # install UI dependencies
make dev # React dev server with hot-reload on port 5173
make build # build everything into ./nram
./nram --config config.yamlRepository layout:
cmd/server/ Server entrypoint
internal/
api/ HTTP handlers (REST + admin)
auth/ OAuth 2.0, JWT, WebAuthn, RBAC
config/ Bootstrap configuration loading
dreaming/ Offline consolidation cycle (nine phases) with rollback and retention sweeps
enrichment/ Background enrichment worker pool, ingestion decision, dedup, re-embed
events/ Event bus, SSE, webhooks
mcp/ MCP server and tool handlers
migration/ Database migration runner
model/ Data models
provider/ LLM / embedding provider adapters with token-usage middleware
server/ HTTP router setup
service/ Business logic (recall, store, fusion, settings, lifecycle, export jobs)
storage/ Database repositories (incl. HNSW, pgvector, Qdrant adapters)
ui/ Embedded Web Console assets
migrations/ SQLite and PostgreSQL migration SQL
ui/ React Web Console source (TypeScript, Tailwind)
docs/ Reference docs and the OpenAPI spec
MIT
