Skip to content

nram-ai/nram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

525 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Ram

Neural Ram

nram for short

The continuity layer for everything you do with AI.
One open source server you run yourself: every tool you use, every machine you work from, and the continuity is yours.

License: MIT Go 1.26+ MCP over Streamable HTTP SQLite or PostgreSQL GitHub stars Last commit

Quick Start · Docs · nram.ai

Work in progress: under active development. Expect rough edges, and feedback is welcome.

What is Neural Ram?

Right now, you are the continuity layer between your AI tools. You copy context from Claude into ChatGPT, write handoff docs, re-explain the same decisions, and lose a little more each time you switch tools or machines.

nram is a continuity substrate: one self-hosted server that keeps what mattered across every tool, every conversation, and every machine, on infrastructure that belongs to you. Your agent already reads the PDF, watches the video, runs the test, scrapes the page. nram's job is to keep what mattered. Context, not storage.

It is a real server, not a library or a localhost shim: a single MIT-licensed binary with OAuth, passkeys, multi-tenancy, and MCP over HTTP, so your laptop, desktop, and phone all see the same brain. And it is more than a database with vector search: it pulls facts and entities out of your free-text notes, builds a knowledge graph of how they connect, and runs a background dreaming cycle that consolidates, dedups, and prunes while the server is idle, like a notebook that quietly reorganizes itself overnight.

One substrate, many jobs

nram is not another memory tool bolted onto one app. It is the layer underneath them, so a single server covers work that today is split across separate products:

Job What nram provides Comparable tools
Conversational continuity Memory that survives across sessions, tools, and vendors, reachable over MCP. Claude Memory, ChatGPT Memory
Document-corpus recall Semantic search, an entity-deduped knowledge graph, and consolidation over a stored corpus. A substrate, not a chat UI. NotebookLM, AnythingLLM, Khoj
Procedural rules A first-class verbatim tier for standing rules, conventions, and protocols an assistant loads at session start. Returned byte-for-byte, never embedded or paraphrased. (no direct equivalent)
Persona / self-knowledge (about_me) A reserved, fully-indexed tier for identity, preferences, and ongoing context that surfaces by association on every recall. (no direct equivalent)
Agent memory Persistent memory for coding, research, and custom agents, with consolidation and a knowledge graph on top. Mem0, Letta, Zep, Graphiti

How clients connect

  • MCP is how Claude, ChatGPT, Cursor, or a custom agent connects. Streamable HTTP transport at /mcp, with OAuth discovery published at the well-known paths.
  • REST API lets any code that can speak HTTP store and recall. See docs/api.md.
  • Web Console is the dashboard for organizations, projects, providers, the knowledge graph, and the dreaming cycle.

Features

Memory and recall. Hybrid retrieval fuses vector and lexical search (FTS5 on SQLite, tsvector on Postgres) with Reciprocal Rank Fusion, boosted by the knowledge graph and scored by six tunable terms (similarity, recency, importance, frequency, graph relevance, confidence). MMR reranking demotes near-duplicate results so recall stays diverse. Embedding runs off the write path, so stores stay fast. Semantic search backs onto pgvector, a pure-Go HNSW index, or Qdrant.

Enrichment and the knowledge graph. Background workers extract facts, entities, and relationships from your free-text memories and build a multi-hop graph that connects them over time. An optional ingestion judge decides add / update / delete / none against near-duplicates before extraction runs. Query augmentation paraphrases each memory into short retrieval queries so recall matches the way people actually ask.

Dreaming. An offline nine-phase consolidation cycle: entity dedup, embedding and augmentation backfill, paraphrase dedup, transitive-relationship inference, contradiction detection, consolidation, pruning (with optional confidence decay), and weight recalculation. An LLM novelty audit demotes low-value syntheses. See docs/operations.md.

Tiers. Procedural memory is a verbatim per-user tier for rules and protocols, stored byte-for-byte and never embedded, enriched, or rewritten. The persona (about_me) and global tiers are reserved, auto-provisioned, and always join the recall aperture so relevant self-knowledge and world-knowledge surface alongside project results.

Access and multi-tenancy. Authentication via JWT, per-user API keys, WebAuthn passkeys, and per-organization OIDC SSO. Full OAuth 2.0 (Authorization Code + PKCE, dynamic client registration, resource indicators, discovery metadata). Five RBAC roles across REST and MCP. Organizations, hierarchical namespaces, and projects for isolation, plus share tokens for granting scoped external access without an account.

Operability. The Web Console, a React app, manages providers, settings, the graph, dreaming, the enrichment queue, and analytics. Run on SQLite (zero-config) or PostgreSQL, with SQLite-to-Postgres migration tooling. Provider-agnostic across OpenAI, Anthropic, Google Gemini, Ollama, OpenRouter, and any OpenAI-compatible endpoint, with per-call token accounting. Real-time updates over SSE, HMAC-signed webhooks, Prometheus metrics at /metrics, and JSON / NDJSON import/export.

A full feature-by-feature reference lives across the docs.

Quick Start

nram needs an LLM provider to do anything beyond storing raw text. Without one, recall falls back to keyword-only matching and the knowledge graph stays empty. No error is raised; the system runs degraded. Configure providers in Step 5, and pick a long-context embedding model (see docs/models.md).

1. Install prerequisites

Tool Required Check
Go 1.26.1+ go version
Node.js 18+ node --version
npm any (build uses npm ci, not pnpm or yarn) npm --version
Ollama optional, for local LLMs ollama --version (skip if using a hosted provider)

2. Build

git clone <repo-url> nram && cd nram
make build

Output is a single ./nram binary with the Web Console embedded.

3. Run

./nram

Defaults: port 8674, SQLite (nram.db in the working directory). For Postgres, set DATABASE_URL and restart:

DATABASE_URL=postgres://user:pass@localhost:5432/nram ./nram

4. Open the setup wizard

Navigate to http://localhost:8674, create the initial admin account, and save the API key shown on the completion screen. It is not shown again.

5. Configure an LLM provider (required)

Open Settings → Providers and configure three slots: Embedding (semantic search), Fact Extraction, and Entity Extraction (the knowledge graph and dreaming). Any of OpenAI, Anthropic (chat slots only; it has no embeddings API), Google Gemini, Ollama, OpenRouter, or an OpenAI-compatible endpoint works. Changes hot-reload; no restart needed.

See docs/models.md for which model to put in each slot and how to size local models.

6. Verify

curl http://localhost:8674/v1/health

Each provider slot should report "status": "ok". If a slot is missing or unhealthy, fix it before storing memories, otherwise they will not be embedded or enriched and you will need ./nram --reembed-all-memories afterward.

7. Connect a client (MCP)

Local clients that can reach the server over your own network (Claude Code, Codex, Cursor, and other CLI or IDE tools) can use the plain HTTP URL directly, whether that's localhost or a LAN IP like 192.168.1.x. For Claude Code:

claude mcp add --transport http nram http://localhost:8674/mcp

OAuth discovery is published at /.well-known/oauth-authorization-server and /.well-known/oauth-protected-resource, so OAuth-capable clients negotiate a token automatically. For clients without OAuth, use the API key from Step 4 as a Bearer token.

Hosted web tools need a public HTTPS URL. ChatGPT, Claude on the web (claude.ai), and the Claude desktop and mobile apps reach your server from the vendor's cloud, not from your machine, so http://localhost will not work. They require a real, publicly resolvable hostname served over HTTPS with a valid (not self-signed) TLS certificate. nram serves plain HTTP and does not terminate TLS itself, so put it behind a reverse proxy that handles TLS (Caddy, nginx, Traefik) or expose it through a tunnel (Cloudflare Tunnel, ngrok, Tailscale Funnel), then point the connector at your public https://your-host/mcp URL.

Hitting trouble? See Troubleshooting.

Reference

The deep reference is split out to keep this page approachable:

  • docs/api.md: full REST API and MCP tool/resource reference, including update/supersede and move semantics.
  • docs/models.md: recommended models per slot, VRAM sizing for local models, Ollama num_ctx and keep-alive tuning.
  • docs/configuration.md: bootstrap vs runtime config, environment variables, databases (SQLite, Postgres, Qdrant), migrations, and operator flags.
  • docs/operations.md: troubleshooting and the dreaming / backfill operations guide.
  • docs/openapi.yaml: OpenAPI 3.1 specification, also served by the running server at GET /openapi.yaml. A conformance test keeps it in sync with the router.

Development

make install-ui   # install UI dependencies
make dev          # React dev server with hot-reload on port 5173
make build        # build everything into ./nram
./nram --config config.yaml

Repository layout:

cmd/server/        Server entrypoint
internal/
  api/             HTTP handlers (REST + admin)
  auth/            OAuth 2.0, JWT, WebAuthn, RBAC
  config/          Bootstrap configuration loading
  dreaming/        Offline consolidation cycle (nine phases) with rollback and retention sweeps
  enrichment/      Background enrichment worker pool, ingestion decision, dedup, re-embed
  events/          Event bus, SSE, webhooks
  mcp/             MCP server and tool handlers
  migration/       Database migration runner
  model/           Data models
  provider/        LLM / embedding provider adapters with token-usage middleware
  server/          HTTP router setup
  service/         Business logic (recall, store, fusion, settings, lifecycle, export jobs)
  storage/         Database repositories (incl. HNSW, pgvector, Qdrant adapters)
  ui/              Embedded Web Console assets
migrations/        SQLite and PostgreSQL migration SQL
ui/                React Web Console source (TypeScript, Tailwind)
docs/              Reference docs and the OpenAPI spec

License

MIT

About

The continuity layer for everything you do with AI. A self-hosted, open source server any tool plugs into over MCP or REST: hybrid vector + lexical recall, an auto-built knowledge graph, sleep-style consolidation, procedural and persona tiers, OAuth 2.0, multi-tenant. SQLite or Postgres. MIT.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages