Offline-first, RAG-enforced, soul-governed personal AI assistant (no internet required!) Version 1.4.0 (planning) · Baseline 1.3.0 (production) · Python 3.12 · LM Studio + ChromaDB + BM25 + LangGraph
<-- Screenshots of Local AI web interface
CyClaw is a personal RAG (Retrieval-Augmented Generation) backend that:
- Answers questions exclusively from your local Markdown corpus — no internet by default
- Enforces every safety invariant via LangGraph topology — not prompts, not config flags, not discipline
- Maintains a persistent soul/personality layer (
soul.md) with SHA-256 drift detection, atomic evolution writes, and user-gated modification - Falls back to Grok (xAI) only with explicit user confirmation in hybrid mode — triple-gated at config, env, and per-query level
- Exposes both a FastAPI HTTP gateway and an MCP server for Claude Desktop / Copilot Studio integration
Zero telemetry. Binds to 127.0.0.1:8787 only. All embeddings run locally via sentence-transformers. No cloud dependency for offline operation.
| Version | Status | Key Changes |
|---|---|---|
| v1.2.0 | Superseded | 8 OWASP patterns, 90-day TTL, sanitizer baseline |
| v1.3.0 | Pre-Langgrinch | Rate limiting (60/min), 13 OWASP patterns, soul SHA-256 drift detection, atomic writes, TTL→365 days |
| v1.4.0 | Production (current) | Updated requirements.txt to patch vulns and modernize for Python 3.12 |
| v1.5.0 | Planning | Fix Stemmer.py, sql write placeholder code sections, other cleanups,test Dropbox corpus sync integration, BM25 SHA Integrity Detection |
User Query (HTTP POST /query or MCMC tool call)
│
▼
┌─────────────────────────────────────────────────────┐
│ gate.py (FastAPI, 127.0.0.1:8787) │
│ • Rate limit (60 req/min per IP — RUNS FIRST) │
│ • Injection filter (sanitizer.py, config-driven) │
│ • Soul init (PersonalityManager closure) │
│ • Telemetry kill block (before any SDK import) │
└──────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ graph.py (LangGraph 7-node State Machine) │
│ │
│ [ENTRY] │
│ ↓ │
│ 1. retrieve (Chroma + BM25 + RRF fusion) │
│ ↓ │
│ 2. route_score (top_score >= 0.028 RRF?) │
│ ├─ YES ──→ 3. local_llm (LM Studio :1234) │
│ └─ NO ──→ 4. user_gate (needs_confirm=true) │
│ ├─ confirmed + hybrid ──→ │
│ │ 5. grok_fallback │
│ └─ declined / offline ──→ │
│ 6. offline_best_effort │
│ ↓ (all paths converge) │
│ 7. audit_logger (SHA-256 + PII redact → jsonl) │
│ ↓ │
│ [END] │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ HybridRetriever (retrieval/hybrid_search.py) │
│ • ChromaDB (semantic, all-MiniLM-L6-v2, 384d) │
│ • BM25Okapi (keyword, Porter stemming) │
│ • RRF fusion (k=60, equal 1.0/1.0 weighting) │
│ • Per-chunk provenance metadata in every result │
└─────────────────────────────────────────────────────┘
Five security invariants enforced by graph edges — not prompts:
| # | Invariant | Enforcement |
|---|---|---|
| 1 | RAG-First | retrieve is the unconditional graph entry point — no LLM call can precede it |
| 2 | Topology = Policy | Routing is graph edges, not LLM decisions or if/else code |
| 3 | Triple-Gated External | Grok requires: mode=hybrid AND grok.enabled=true AND user_confirmed_online=true — simultaneously |
| 4 | Audit Convergence | All 6 execution paths converge at audit_logger — no shortcut path exists |
| 5 | Soul Governance | Soul evolution requires explicit human reason string; no autonomous modification from any path |
| Requirement | Version | Notes |
|---|---|---|
| Python | 3.12 | Primary supported runtime (3.11 also works) |
| LM Studio | Any | Must be running on localhost:1234 |
| GGUF model loaded in LM Studio | — | mistral-7b-instruct or qwen2.5-7b work well |
git clone https://github.com/CGFixIT/CyClaw
cd CyClaw
python3.12 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 1) Install CPU-only torch first (pinned >=2.6.0 for CVE-2025-32434 safety)
pip install torch==2.6.0+cpu --index-url https://download.pytorch.org/whl/cpu
# 2) Install the rest, pinned to the verified transitive tree.
pip install -r requirements.txt -c constraints.txtUpgrading from a pre-1.4.0 checkout? ChromaDB moved from 0.4.x to 1.5.x and the on-disk index format changed — delete
index/and rebuild withpython -m retrieval.indexer.Offline note: embeddings use
all-MiniLM-L6-v2. Becausecyclaw_telemetry_kill.envsetsHF_HUB_OFFLINE=1, the model must be cached locally first. On a machine with network, run the indexer once (it downloads + caches the model); afterwards it runs fully offline.
Key settings in config.yaml:
app:
mode: "offline" # "offline" | "hybrid" (hybrid enables Grok fallback)
models:
local_llm:
base_url: "http://127.0.0.1:1234/v1" # LM Studio default
model: "your-model-name-here" # must match LM Studio loaded model name exactly
timeout_sec: 720 # long-context inference budget
max_tokens: 5000
personality:
enabled: true
soul_path: "data/personality/soul.md" # your identity file — source of truth
interaction_ttl_days: 365 # audit window
retrieval:
min_score: 0.028 # RRF fused-rank threshold (NOT cosine sim — different scale)
top_k_semantic: 5
top_k_keyword: 5
rrf_k: 60
max_context_tokens: 5000CyClaw/
├── gate.py FastAPI gateway + soul endpoints
├── graph.py LangGraph 7-node state machine
├── mcp_hybrid_server.py MCP server (retrieval-only, no LLM)
├── metrics.py Audit JSONL analyzer
├── config.yaml Single source of truth for all config
├── requirements.txt Pinned Python deps
├── cyclaw_telemetry_kill.env Kill-switch for LangChain/Chroma/OTel telemetry
├── cyclaw_suggestions_fix.md Dev notes and open issues
├── .gitignore
├── old.md Archived prior README
├── llm/
│ └── client.py LocalLLMClient + GrokClient
├── retrieval/
│ ├── embeddings.py sentence-transformers wrapper
│ ├── hybrid_search.py ChromaDB + BM25 + RRF fusion
│ ├── indexer.py Corpus ingestion + index build
│ └── stemmer.py Porter stemmer (tech-vocabulary tuned)
├── schemas/
│ └── api.py Pydantic request/response models
├── utils/
│ ├── errors.py Typed RAGError hierarchy
│ ├── health.py Startup dependency health checks
│ ├── logger.py Audit JSONL + SHA-256 query hashing
│ ├── personality.py PersonalityManager (soul CRUD + governance)
│ └── sanitizer.py Prompt injection filter + PII redaction
├── static/
├ |── extractor.html Browser-Based simplified insight_extractor.py to generate .md corpus files
│ └── terminal.html Browser UI / Soul Console
├── data/
│ ├── corpus/ .md / .txt knowledge base (gitignored runtime content)
│ └── personality/
│ └── soul.md Identity source-of-truth
└── tests/
├── conftest.py
├── test_gate.py
├── test_graph.py
├── test_hybrid_search.py
├── test_sanitizer.py
├── test_personality.py
├── test_personality_changes.py
├── test_rate_limit.py
├── test_audit.py
├── test_stemmer.py
├── apipsTest.ps1 Windows PowerShell smoke test
└── cmd2index.bat Windows index rebuild shortcut
CyClaw maintains a persistent identity through soul.md. Key properties:
- File-as-truth:
data/personality/soul.mdis always the canonical version - Shadow SQLite DB:
cyclaw_soul.dbstores version history and interaction logs - SHA-256 drift detection: on startup, file hash vs. DB hash — mismatch triggers forensic log entry
- Atomic writes: backup → atomic disk write (
tmpfile +os.replace) → DB version insert → in-memory update; theos.replaceis what makes a crash unable to leave a half-writtensoul.md - Advisory injection scan on propose:
POST /soul/proposeruns an OWASP injection scan whose flags are advisory — surfaced for human review alongside the diff;proposenever writes - Enforced injection scan on apply:
POST /soul/applyis human-gated (explicit reason string required) and re-runs the injection scan at the write boundary — a proposed soul containing injection patterns is rejected with400 PROMPT_INJECTION_BLOCKEDbefore any file/DB write, closing the soul-poisoning vector. The trusted restore path (restore_from_backup, re-applying a previously vetted.bak) bypasses the scan viascan=False
| Layer | Mechanism |
|---|---|
| Network | Binds 127.0.0.1:8787 — no external exposure by design |
| Input | Config-driven injection filter (policy.prompt_filter, 31 patterns), 4000 char max |
| Rate limit | 60 req/min per IP — thread-safe in-memory sliding window (utils/ratelimit.py, lock-guarded) |
| Telemetry | Kill block runs before any SDK import in gate.py |
| Audit | All paths (HTTP and MCP) log SHA-256 query hash + PII-redacted metadata |
| Grok gating | Triple gate: mode=hybrid AND grok.enabled=true AND user_confirmed_online=true |
| Soul writes | Enforced injection scan at the write boundary (apply_evolution, → 400 PROMPT_INJECTION_BLOCKED) + human reason string + atomic (os.replace) crash-safe write |
| Corpus | Chunk sanitization at index time via sanitizer.py |
| Model Weights | Trusted/verified sources only. Safetensors strongly preferred. torch.load(..., weights_only=True) alone was insufficient on torch<2.6 (CVE-2025-32434). We pin torch==2.6.0+cpu and keep loading paths (embeddings.py) minimal + documented. |
For Claude Desktop or other MCP-compatible clients:
{
"mcpServers": {
"cyclaw": {
"command": "python",
"args": ["/path/to/CyClaw/mcp_hybrid_server.py"]
}
}
}The MCP server exposes a single hybrid_search tool. It has no sampling capability — sampling: null is set at the protocol level, making it architecturally impossible for this server to invoke an LLM.
What works in v1.3.0:
- RAG-first pipeline (ChromaDB + BM25 + RRF)
- FastAPI
/querywith LangGraph 7-node controller - Local LLM via LM Studio
- Optional Grok fallback (triple-gated)
- MCP server (retrieval-only)
- Audit JSONL with SHA-256 hashing and PII redaction
- Soul persistence with drift detection and atomic writes
- Rate limiting (60/min per IP)
- Browser UI via
static/terminal.html
v1.4.0 targets:
- Dropbox/cloud corpus sync
plan_nodefor multi-step query decomposition- BM25 index SHA-256 integrity check on load
- General-purpose agent (tool invocation from corpus context)
Not yet planned:
- Multi-user or network-exposed deployment
- Production security hardening (external pentest)
Built by Chris Grady · cgfixit.com/linkedin