AI architecture copilot for Mermaid, built to turn raw ideas into expert system diagrams.
Describe a system in plain English. Mermate compiles it into production-quality Mermaid diagrams — flowcharts, state machines, sequence diagrams, ER diagrams, and more — with optional AI enhancement powered by whatever local or remote LLM you connect.
| Dependency | Version |
|---|---|
| Node.js | >= 20 |
| npm | >= 9 |
| Python | >= 3.9 (for gpt-oss enhancer, optional) |
Mermate ships without an AI model. It is a diagram compilation engine with a copilot layer. You bring the model.
Tandem protocol (MERMATE ↔ Opseeq): shared run_id / X-Request-Id, stage traces, URL rules, gateway fallback, and idea-to-binary packaging are documented in docs/tandem-opseeq-protocol.md.
Typical local stack:
- Opseeq (optional): OpenAI-compatible gateway + management API on
http://127.0.0.1:9090— setOPSEEQ_URLto that origin without/v1(see docs/tandem-opseeq-protocol.md). - Ollama (optional): local inference on
http://127.0.0.1:11434 - Mermate: idea → markdown → Mermaid → TLA+ → TypeScript runtime → optional Rust binary and macOS
.apponhttp://127.0.0.1:3333 - MCP: Python bridge under
mcp_service/with repo.mcp.json(virtualenv path may need adjusting after clone) - Outputs: diagram and formal artifacts under
flows/; run lineage and traces underruns/(including*.trace.json); successful Rust packaging can copy a.appto the Desktop with a generated landing page andskill.jsonfor agent consumption
External OpenClaw desktop integrations can attach over MCP and HTTP using the /api routes listed below; they are not required for core Mermate operation.
Observed on March 26, 2026:
- Local Ollama
gpt-oss:20b: working - Local Ollama
nemotron-3-nano:4b: working - Local Ollama
kimi-k2.5:cloud: listed by Ollama, but direct chat currently returnsunauthorized - Managed
inference.localroute: currently advertiseskimi-k2.5:cloud,nemotron-3-nano:4b, andgpt-oss:20b - Managed-route requests for
kimi-k2.5:cloud: currently resolve back tonemotron-3-nano:4b
Important caveat:
- The managed route can resolve to a different actual model than the requested one, so any wrapper should trust the response
modelfield over the request payload.
# 1. Clone into your developer folder
git clone <your-fork-or-repo> ~/developer/mermaid
cd ~/developer/mermaid
# 2. Install dependencies
npm install
# 3. Create your local env
cp .env.example .env
# 4. Start the app
./mermaid.sh startOpen http://localhost:3333.
That's it. The app runs completely without an AI model. You can paste Mermaid source directly and compile it to high-resolution PNG and SVG from day one.
On this machine, the first Mermate restart failed because the native DuckDB binding was missing under node_modules/duckdb/lib/binding/duckdb.node.
Rebuild it with:
cd /path/to/this/repo # repository root
npm rebuild duckdbIf you are on a different Node major, prefer rebuilding after any Node upgrade so the native binding matches the active runtime.
The agent workflow, premium render path, and Max mode are configured from your local .env.
Start with:
cp .env.example .envRecommended .env for the OpenAI API path (inference via Opseeq):
OPENAI_API_KEY=sk-proj-YOUR_OPENAI_PROJECT_KEY_HERE
MERMATE_AI_API_KEY=sk-proj-YOUR_OPENAI_PROJECT_KEY_HERE
OPSEEQ_URL=http://localhost:9090
# Optional: override inference base only (must include /v1)
# OPENAI_BASE_URL=http://localhost:9090/v1
DALLE_API_KEY=sk-proj-YOUR_OPENAI_PROJECT_KEY_HERE
MERMATE_IMAGE_MODEL=gpt-image-1
CLAUDE_API_KEY=sk-ant-YOUR_CLAUDE_KEY_HERE
MERMATE_ORCHESTRATOR_MODEL=gpt-5.4
MERMATE_WORKER_MODEL=gpt-5.2
MERMATE_FAST_STRUCTURED_MODEL=gpt-4.1-mini
MERMATE_AI_MODEL=gpt-5.2
MERMATE_AI_MAX_MODEL=gpt-5.4
MERMATE_AI_MAX_ENABLED=trueWhat these do:
OPENAI_API_KEY: primary hosted-model keyMERMATE_AI_API_KEY: backward-compatible alias used by the runtime provider layerOPENAI_BASE_URL: optional; OpenAI-compatible inference base (include/v1). If unset, the provider derives…/v1fromOPSEEQ_URLOPSEEQ_URL: single service root for Opseeq (no/v1). Used by the Opseeq bridge for/health,/api/..., and forwarding stage events; inference appends/v1internallyDALLE_API_KEY/MERMATE_IMAGE_MODEL: OpenAI Images API for packaged-app icon and hero assets (falls back toOPENAI_API_KEYif unset)CLAUDE_API_KEY: Anthropic API for primary TLA+ authoring and optional TypeScript review in formal stagesMERMATE_ORCHESTRATOR_MODEL: strongest model used for final synthesis and mergeMERMATE_WORKER_MODEL: primary branch reasoning modelMERMATE_FAST_STRUCTURED_MODEL: fast structured / repair modelMERMATE_AI_MODEL: backward-compatible worker aliasMERMATE_AI_MAX_MODEL: stronger model used when Max mode is enabledMERMATE_AI_MAX_ENABLED: turns Max mode on in the runtime provider layer
Recommended starting setup:
- Set
OPSEEQ_URLto the bare Opseeq origin; only setOPENAI_BASE_URLif you need a different inference base thanOPSEEQ_URL+/v1 - Keep
gpt-5.2as the default worker model for branch reasoning - Use
gpt-5.4as the orchestrator / Max model for final architect-grade renders - Keep
gpt-4.1-minias the fast structured model for repairs, routing, and narration - Leave local Ollama or the Python enhancer optional unless you specifically want a local-first workflow
Optional local providers:
MERMATE_OLLAMA_URL=http://localhost:11434
MERMATE_OLLAMA_MODEL=gpt-oss:20b
MERMAID_ENHANCER_URL=http://localhost:8100
MERMAID_ENHANCER_TIMEOUT=15000Provider behavior in the app today:
- Copilot suggestions and text enhancement prefer local-first fallback: Ollama -> Python enhancer -> premium API
- Render preparation prefers the strongest available provider path, with premium API first
- Max mode uses
MERMATE_AI_MAX_MODELwhenMERMATE_AI_MAX_ENABLED=true - When premium traffic targets a gateway (e.g. Opseeq) and the gateway errors, the provider may fall back to direct OpenAI; the render API can return
fallback_eventsand the UI shows a short notice - While a render run is active, premium requests send
X-Request-Idset to the MERMATErun_idfor log correlation - If no AI provider is available, the app still works as a Mermaid compiler with local suggestion fallbacks
- Paste Mermaid source → compile to PNG + SVG
- Auto-detection of diagram type (flowchart, sequence, state, ER, gantt, pie, mindmap, etc.)
- Axiomatic pre-compile validation
- Download both outputs as a ZIP
- Fullscreen canvas view with GPU-accelerated pan/zoom
- Diagram history with delete support
./mermaid.sh compile <file.mmd>to compile any.mmdfile from the command line./mermaid.sh validateto validate all archived diagrams against structural rules
Mermate supports three AI paths:
- Premium API provider configured from
.env(openairecommended) - Local Ollama provider for cheap local iteration
- Python enhancer bridge on
http://localhost:8100
The app automatically uses the best available provider chain for the current action. If one provider is offline, Mermate falls through to the next available option.
If you want the simplest and highest-quality agent setup, use the premium API path:
OPENAI_API_KEY=sk-proj-YOUR_OPENAI_PROJECT_KEY_HERE
MERMATE_AI_API_KEY=sk-proj-YOUR_OPENAI_PROJECT_KEY_HERE
OPSEEQ_URL=http://localhost:9090
MERMATE_ORCHESTRATOR_MODEL=gpt-5.4
MERMATE_WORKER_MODEL=gpt-5.2
MERMATE_FAST_STRUCTURED_MODEL=gpt-4.1-mini
MERMATE_AI_MODEL=gpt-5.2
MERMATE_AI_MAX_MODEL=gpt-5.4
MERMATE_AI_MAX_ENABLED=trueThis enables:
Enhancefor architecture text refinement- stronger text-to-Mermaid conversion during render
- Max mode for final higher-quality architecture output
- the staged agent workflow that pauses on a preview render before the final Max pass
Clients can attach to Mermate through MCP tools and direct HTTP calls.
Representative HTTP surfaces:
GET /api/copilot/healthGET /api/agentsPOST /api/render(response may includerun_id,fallback_events)GET /api/mermate/trace/:run_id— stage event timeline (local store; see docs/tandem-opseeq-protocol.md)POST /api/mermate/stage— ingest stage event (same store)GET /api/render/tla/status,POST /api/render/tlaGET /api/render/ts/status,POST /api/render/ts- Rust packaging (e.g. compile +
.app): routes underserver/routes/rust.jsas mounted inserver/index.js POST /api/guide/evaluate— Auto Guide evaluation (heuristic fallback when Opseeq is unhealthy)GET /api/agent/modes,POST /api/agent/run,POST /api/agent/finalize
That enables:
- idea → Mermaid → TLA+ → TypeScript → optional Rust binary and desktop bundle
- correlated traces for the same
run_idacross MERMATE and (when deployed) Opseeq - agent modes and SSE workflows via
/api/agent/*
Use the repo’s .mcp.json and mcp_service/ for MCP-driven access; point MERMATE_URL at your running server.
This repo now also ships its own Python MCP bridge for OpenClaw and other MCP clients.
Files:
.mcp.jsonmcp_service/server.pymcp_service/client.py
The project .mcp.json expects a repo-local virtualenv interpreter:
.venv-mcp/bin/python(under the repository root)
If you clone or move the repo, recreate that venv and point .mcp.json at your interpreter path.
Manual setup for a fresh checkout:
cd /path/to/this/repo # repository root
python3 -m venv .venv-mcp
./.venv-mcp/bin/pip install -r requirements.txtRun it directly if you want to test the bridge outside the client:
cd /path/to/this/repo
./.venv-mcp/bin/python -m mcp_serviceIf you prefer using your active Python instead of the repo-local venv:
cd /path/to/this/repo
python3 -m pip install -r requirements.txt
python3 -m mcp_serviceThe repo-level .mcp.json points MCP clients at the repo-local venv interpreter with MERMATE_URL=http://127.0.0.1:3333.
Exposed MCP surfaces include:
- runtime status and stage discovery
- copilot suggest and enhance
- render, TLA+, and TypeScript stages
- agent preview and finalize SSE workflows
- diagram management
- project search, pipeline status, and scoreboard
- meta-cognition and agent registry endpoints
Any model server that accepts POST /mermaid/enhance works.
POST http://localhost:8100/mermaid/enhance
Content-Type: application/json
{
"stage": "text_to_md" | "md_to_mmd" | "validate_mmd" | "repair" |
"copilot_suggest" | "copilot_enhance",
"raw_source": "user input text",
"system_prompt": "injected axiom prompt from Mermate",
"temperature": 0.0
}
Response:
{
"enhanced_source": "...", // for diagram stages
"suggestion": "...", // for copilot_suggest
"confidence": "high", // for copilot_suggest
"transformation": "..."
}
Mermate sends a full system prompt with each call (built from archs/mermaid_axioms.md). Your model only needs to follow the system prompt and return valid JSON.
This is one approach. You are free to use any model that fits the endpoint contract above.
# If you are using Ollama
ollama list | grep gpt-oss
# If you are using a local server
ls ~/models/ | grep gpt-ossIf nothing shows up, continue to Step 2. If it's already there, jump to Step 4.
# Via Ollama (simplest path)
ollama pull gpt-oss-20b
# Or download GGUF weights manually and load with llama.cpp / LM Studio
# Model page: https://huggingface.co/gpt-oss-20b (placeholder — use your actual model source)# With Ollama
OLLAMA_HOST=0.0.0.0:8100 ollama serve
# Or with llama-cpp-python
python3 -m llama_cpp.server --model ~/models/gpt-oss-20b.gguf --port 8100
# Or with LM Studio: start the server, set port to 8100, and add a proxy route
# that maps POST /mermaid/enhance to the completion endpoint.By default Mermate looks for the enhancer at http://localhost:8100. If your server runs on a different host or port:
# Mermate reads this environment variable
MERMAID_ENHANCER_URL=http://localhost:11434 ./mermaid.sh start
# Or to auto-start the enhancer via mermaid.sh
MERMAID_ENHANCER_START_CMD="ollama serve" ./mermaid.sh startcurl http://localhost:8100/health
# Expected: 200 OKWhen the enhancer is healthy, the app shows "Enhancer: healthy" on startup and the Enhance checkbox becomes active.
Kimi can fit in two ways, but the current local state matters:
- If Ollama cloud auth is configured, Mermate can target
kimi-k2.5:cloudthrough the existing Ollama path by settingMERMATE_OLLAMA_MODEL=kimi-k2.5:cloud - If you want to use Kimi through an OpenAI-compatible remote API, that needs a premium-provider base-url path that is not currently configurable in
server/services/inference-provider.js
So on this machine right now, Kimi is discoverable but not yet usable through either the local Ollama path or the managed NemoClaw route.
Once the app is running, here are the starting prompts to try:
Simple architecture idea:
A user logs in via the browser, the API gateway validates the JWT,
then routes to the user service which reads from PostgreSQL.
On failure, return 401 to the browser.
Event-driven system:
Payment service emits OrderCreated event to Kafka.
Inventory service and notification service both consume it.
If inventory fails, route to dead letter queue.
State machine:
Pod lifecycle: Pending → ContainerCreating → Running.
On OOM kill → Failed. On graceful shutdown → Succeeded.
CI/CD pipeline:
Code push triggers build, then parallel unit tests and lint,
then integration tests, security scan, staging deploy,
manual approval gate, then canary production deploy at 5% → 25% → 100%.
Paste any of these into Simple Idea mode and press Render. Add Enhance for AI-assisted refinement.
The app now includes an agent workflow for iterative architecture refinement.
The frontend agent UI calls two SSE endpoints:
POST /api/agent/run: ingest -> planning -> refinement -> preview render -> pause for notesPOST /api/agent/finalize: optional note incorporation -> final Max render
There is also:
GET /api/agent/modes: returns the available agent modes and labels
thinking: build architecture from ideas, notes, or problem statementscode-review: recover architecture from an existing codebaseoptimize-mmd: improve existing Mermaid or markdown without breaking intent
The route layer loads mode instructions from .cursor/assets:
.cursor/assets/THINKING-MODE.txt.cursor/assets/CODE-REVIEW-MODE.txt.cursor/assets/OPTIMIZE-MMD-MODE.txt
Those mode files are injected into the system prompt used by server/routes/agent.js, which tells the model to:
- preserve what the user already specified
- produce improved architecture text, not Mermaid
- add structure, flows, boundaries, and failure handling
- pause after a preview render so the user can steer the final Max render
The project also includes Cursor-facing guidance for architecture work:
.cursor/agent-architect/SKILL.md: the skill tree and operating philosophy for iterative architecture work.cursor/agent-architect/OPERATING_PROCEDURE.md: runtime guidance, provider order, render rhythm, and evaluation rules.cursor/agents/openai.yaml: the Cursor agent definition for the OpenAI-backed architecture agent
Together, these files act as the project's prompt and behavior layer for the architecture agent experience.
The .cursor/scripts/ directory contains Python modules for a local AI enhancer extension. These scripts are not meant to be run from this repo. They are reference implementations for an LLM extension you should host in a separate top-level directory (e.g. gpt_oss/extensions/mermaid_enhancer/ or your LLM framework's extension path). Copy or symlink them into your local AI project and run the enhancer service there. Mermate connects to it via MERMAID_ENHANCER_URL when the service is running.
Multiple routers are mounted under /api in server/index.js (render, agent, tla, ts, tsx, transcribe, search, openclaw, bundle, guide, artifacts, rust, trace, and others).
Handles the main app workflow:
GET /api/copilot/health: provider availability and Max readinessPOST /api/analyze: input profile analysis without renderingPOST /api/copilot/enhance: copilot suggestion/enhancement proxyPOST /api/render: full analysis -> transform -> compile -> archive pipeline (may returnrun_id,fallback_events, and emits stage events — see docs/tandem-opseeq-protocol.md)GET /api/diagrams: list saved diagram outputsDELETE /api/diagrams/:name: remove compiled artifacts and archived source
This is the core production path for the app. It decides whether to route through premium API, Ollama, the enhancer bridge, or non-AI compile paths.
POST /api/mermate/stage,GET /api/mermate/trace/:run_id,GET /api/mermate/trace-stats— local stage trace ingest and readback
Handles the staged architecture-agent workflow:
- loads prompt skeletons from
.cursor/assets - analyzes the current draft with
input-analyzer - calls the inference provider for planning and refinement
- performs a preview render through
/api/render - pauses for user notes before triggering the final Max render
- reports agent stage events via the same trace mechanism as render/tla/ts/rust
This route makes Mermate more than a one-shot Mermaid compiler: it turns the app into a review-and-refine architecture copilot.
mermaid/
├── mermaid.sh # Start, compile, validate
├── docs/
│ ├── tandem-opseeq-protocol.md # MERMATE ↔ Opseeq tracing and packaging
│ └── specula-integration.md # Formal / Specula artifact layout
├── server/ # Express API (port 3333)
│ ├── routes/render.js # Analyze, enhance, render, list, and delete diagrams
│ ├── routes/agent.js # Agent planning, preview, and finalize flows
│ ├── routes/trace.js # Stage trace ingest/readback (/api/mermate/*)
│ ├── routes/rust.js # Rust compile, .app bundle, desktop deploy
│ ├── routes/guide.js # Auto Guide /api/guide/evaluate
│ └── services/
│ ├── mermaid-compiler.js # mmdc wrapper, high-res PNG/SVG
│ ├── mermaid-classifier.js # Diagram type detection
│ ├── input-detector.js # Content-state detection (text/md/mmd/hybrid)
│ ├── input-router.js # Pipeline routing
│ ├── diagram-selector.js # Axiom-based diagram type selection
│ ├── mermaid-validator.js # Pre-compile structural validation
│ ├── axiom-prompts.js # System prompts for each pipeline stage
│ ├── inference-provider.js # Premium API, Ollama, enhancer; X-Request-Id; fallback events
│ ├── opseeq-bridge.js # Opseeq health, inference proxy helpers, reportStage
│ ├── trace-store.js # In-memory + runs/*.trace.json stage store
│ ├── icon-generator.js # DALL·E / GPT Image icons + macOS bundle helpers
│ ├── landing-page-generator.js # Packaged-app dashboard + skill.json
│ └── gpt-enhancer-bridge.js # HTTP bridge to the enhancer service
├── public/ # Frontend (served statically)
│ ├── js/mermaid-gpt-copilot.js # Ghost-text copilot for Simple Idea mode
│ ├── js/mermaid-gpt-agent.js # Frontend agent orchestration and SSE handling
│ ├── js/mermate-autoguide.js # Auto Guide + /api/guide/evaluate polling
│ └── css/mermaid-gpt.css
├── .cursor/
│ ├── assets/ # Mode prompt skeletons used by /api/agent/*
│ ├── agents/openai.yaml # Cursor agent definition
│ └── agent-architect/ # Skill + operating procedure for architecture work
├── archs/ # Archived diagram sources (.mmd, .md)
│ └── flows/ # Compiled output from ./mermaid.sh compile
├── flows/ # Compiled output from the web app (served at /flows)
├── runs/ # Run JSON + *.trace.json lineage (served at /runs)
├── test/ # Node test suite (incl. test-e2e-tandem.js)
└── archs/mermaid_axioms.md # The intelligence model (read this)
The axioms that govern how Mermate thinks about diagrams live in archs/mermaid_axioms.md. This is the most important file to read if you want to:
- Fine-tune your own model against Mermate's prompts
- Extend the enhancer with custom stages
- Build your own
gpt-ossextension for Mermate
The key design principle: Mermate ships the reasoning framework. You supply the model. The combination is what makes it powerful.
Mermate does not mandate a specific model. These are the questions worth considering:
Model size tradeoffs
- 7B–13B models: fast, local-friendly, good for
validate_mmdandcopilot_suggest - 20B–34B models: better at
text_to_mdandcopilot_enhance(more architectural reasoning) - 70B+ models: best for complex architecture generation and AAD-style decomposition
Fine-tuning targets
The prompts in server/services/axiom-prompts.js are the system prompts Mermate injects. If you fine-tune a model on pairs of (axiom_prompt, mermaid_source), you get a model that follows the axiom framework natively without needing the full prompt injection.
What to build in your gpt-oss extension
The enhancer endpoint receives a stage field. You can add your own stages — for example, a validate_architecture stage that checks if the described system is secure, or a suggest_diagram_type stage that proposes the best visualization for a given input. Mermate's router will call whatever stages you support.
./mermaid.sh start # Start the web app
./mermaid.sh compile # Compile all .mmd files in archs/
./mermaid.sh compile <filename.mmd> # Compile one file
./mermaid.sh validate # Validate all .mmd files against axiom rules
./mermaid.sh test # Run the test suiteEnvironment variables:
PORT=3333 # App server port (default 3333)
OPENAI_API_KEY=<key> # Primary hosted-model key
MERMATE_AI_API_KEY=<key> # Backward-compatible alias
OPSEEQ_URL=http://localhost:9090 # Opseeq origin — no /v1 (see docs/tandem-opseeq-protocol.md)
OPENAI_BASE_URL=http://localhost:9090/v1 # Optional: inference base override (must include /v1)
DALLE_API_KEY=<key> # OpenAI Images for packaged-app assets (fallback: OPENAI_API_KEY)
MERMATE_IMAGE_MODEL=gpt-image-1 # Image model for icons/hero
CLAUDE_API_KEY=<key> # Anthropic: TLA+ authoring + optional TS review
MERMATE_ORCHESTRATOR_MODEL=gpt-5.4 # Strongest model for final synthesis
MERMATE_WORKER_MODEL=gpt-5.2 # Default worker model
MERMATE_FAST_STRUCTURED_MODEL=gpt-4.1-mini # Fast structured / repair model
MERMATE_AI_MODEL=gpt-5.2 # Backward-compatible worker alias
MERMATE_AI_MAX_MODEL=gpt-5.4 # Stronger model used by Max mode
MERMATE_AI_MAX_ENABLED=true # Enable Max mode
MERMATE_OLLAMA_URL=http://localhost:11434 # Optional Ollama base URL
MERMATE_OLLAMA_MODEL=gpt-oss:20b # Optional Ollama model
MERMAID_ENHANCER_URL=http://localhost:8100 # Enhancer service URL
MERMAID_ENHANCER_TIMEOUT=15000 # Enhancer request timeout in ms
MERMAID_ENHANCER_START_CMD="<command>" # Auto-start command for the enhancerMermate does not ship an AI model. The copilot and enhancement features are designed to work with a model you choose and run. The quality of the AI output depends entirely on your model. Mermate's job is to provide excellent system prompts, a structured reasoning pipeline, and a clean compilation layer. Your model's job is to follow the prompts.
If you run Mermate without any model connected, it functions as a standalone Mermaid compiler and is fully usable for direct diagram authoring.
