Static analysis and policy framework for AI coding agents — 21-gate compliance engine, fabrication detector, and tamper-proof lock
This is a static-analysis / guardrail framework for AI coding agents, not a runtime library for production applications, and not a workflow engine that replaces Temporal, Inngest, or AWS Step Functions.
Approximately 18K lines of TypeScript organized as:
- A 21-gate compliance engine (
deerflow/enforcement/) running at dev / CI time to keep AI coding agents honest. - A fabrication detector with 8 detection branches.
- A tamper-proof lock using SHA-256 hashes.
- An agent guard and agent contract layer.
- A reference Fastify server (
src/) and CLI (bin/deerflow.ts) showing how to wire the gates into an HTTP layer. - A Hera subsystem (
deerflow/hera/) for TF-IDF conversation memory and agent specialization. - An 8-phase workflow pipeline enforcer (
deerflow/workflow.ts).
- Not a durable workflow engine.
deerflow/workflow.tsis a pipeline enforcer for a single-machine dev session with 8 phases (ANALYZE → PLAN → SCAFFOLD → IMPLEMENT → VALIDATE → TEST → SECURITY → QUALITY_GATE). The comment@see https://temporal.io/workflowsin the file references the workflow concept, not a feature parity claim. There is no database, no durable execution, no retry, no distributed orchestration. - Not a fabrication oracle. The fabrication detector is heuristic static analysis. It produces signals, not verdicts.
- Not a distributed memory layer. Hera runs in-process only. State is not persisted to a database and is not shared across processes or machines.
- Fabrication detector is heuristic. The 8 detection branches (
INVENTED_API,FAKE_IMPORT,HALLUCINATED_TYPE,IMPOSSIBLE_RETURN,PHANTOM_METHOD,SPECULATIVE_CODE,COPY_PASTE_SMELL,UNVERIFIABLE_LOGIC) rely on pattern matching plus cross-reference againstpackage.jsonand the TypeScript type checker. They produce false positives (flagging legitimate code) and false negatives (missing subtle fabrication). Read the gate output as a signal, not as an absolute verdict. - Hera is in-process. TF-IDF conversation memory and agent specialization are demo-grade. They do not persist to a database and are not distributed.
- No performance or load testing. 92 tests (unit + integration + e2e) pass at the time of this commit;
tscbuilds clean. The e2e tests callcreateApp()and hit/health,/ready,/livewith no external network calls. There is no benchmark, no load test, and no fuzz test for the fabrication detector. - Not a replacement for human review or a full type checker. The gates complement, but do not replace, careful code review and a properly configured TypeScript compiler.
| Need | Use instead | Stars |
|---|---|---|
| Durable workflow orchestration | Temporal | ~13k★ |
| Durable workflow orchestration | Inngest | ~5k★ |
| Durable workflow orchestration | DBOS | ~1k★ |
| Durable workflow orchestration | Restate | ~7k★ |
| Long-running memory for agents | mem0 | ~25k★ |
| Long-running memory for agents | Letta (formerly MemGPT) | ~13k★ |
| Long-running memory for agents | Zep | ~3k★ |
Rate limiting, authentication and authorization, persistent storage for tamper-proof-lock and agent sessions, distributed coordination, observability (OpenTelemetry), and a measured false-positive / false-negative evaluation of each gate on a real codebase before enabling --max-warnings=0.
- 21-Gate Compliance Engine — Multi-gate enforcement with a zero-mock-data policy.
- Fabrication Detector — 8 detection branches for invented APIs, fake imports, hallucinated types, phantom methods, speculative code, copy-paste smell, and unverifiable logic.
- Tamper-Proof Lock — SHA-256 hash verification of artifacts and contracts.
- Agent Guard — Real-time monitoring of agent behavior.
- Workflow Enforcer — 8-phase pipeline enforcer for single-machine dev sessions (not a durable workflow engine — see disclosure above).
- Skill Modules — Pluggable skills for code-review, security, test, UI, and search.
- Hera Subsystem — TF-IDF conversation memory and agent specialization (in-process, not distributed).
- CLI Tool — Command-line interface for quality checks and compliance.
| Category | Technology |
|---|---|
| Language | TypeScript 5 |
| Validation | Zod 3 |
| Testing | Vitest 2 |
| Linting | ESLint 9 |
| Formatting | Prettier 3 |
| Runtime | Node.js 20+ |
- Node.js 20+ and npm 10+
# Clone the repository
git clone https://github.com/ntd25022006q/deerflow.git
cd deerflow
# Install dependencies
npm install
# Run quality gate checks
npm run quality-gate| Script | Description |
|---|---|
npm run dev |
Start development server with hot reload |
npm run build |
Compile TypeScript to dist/ |
npm run start |
Run compiled production server |
npm run lint |
Check code with ESLint |
npm run type-check |
TypeScript compilation check |
npm run test |
Run Vitest test suite |
npm run test:coverage |
Run tests with coverage |
npm run format:check |
Verify Prettier formatting |
npm run quality-gate |
Run full quality pipeline |
npm run agent-guard |
Run agent guard checks |
npm run compliance |
Run 21-gate compliance checks |
Every gate must pass or the agent is locked.
| Gate | Name | Enforces |
|---|---|---|
| 1 | ACTION_GUARD | Pre-action validation — blocks before damage |
| 2 | FILE_GUARD | Filesystem integrity — no deleted critical files |
| 3 | ANTI_PATTERN | Code quality — no any, mock data, secrets |
| 4 | DEPENDENCY_GUARD | Library conflicts and banned packages |
| 5 | SECURITY_DEEP | Deep security scan (OWASP, CWE, injection) |
| 6 | STRUCTURE_ANALYZE | Dead code, circular deps, nesting, coupling |
| 7 | TEST_QUALITY | Assertion density, error paths, no skips |
| 8 | BUILD_INTEGRITY | Build output completeness and size |
| 9 | SOURCE_CITATION | Evidence-based code (anti-fabrication policy) |
| 10 | ENFORCEMENT_REGISTRY | Rule registry validation with evidence |
| 11 | TYPE_CHECK | TypeScript compilation clean |
| 12 | LINT | ESLint zero errors and warnings |
| 13 | TEST | Test suite and coverage >= 80% |
| 14 | SECURITY | npm audit (moderate+ vulnerabilities) |
| 15 | AGENT_COORDINATOR | Session management and lock state |
| 16 | EVIDENCE_VALIDATOR | Active evidence enforcement (papers/repos/docs) |
| 17 | QUALITY_OVER_SPEED | Quality gates — no rushing, no shortcuts |
| 18 | TAMPER_PROOF_LOCK | Cryptographic lock verification (SHA-256) |
| 19 | FABRICATION_DETECTOR | Detect fabricated or invented code |
| 20 | AGENT_CONTRACT | Contract compliance verification |
| 21 | COMPLETION_GUARANTEE | Mandatory task completion |
flowchart TB
subgraph CLI["CLI (bin/deerflow.ts)"]
A1[deerflow check]
A2[deerflow enforce]
A3[deerflow lock]
end
subgraph Server["Reference Server (src/)"]
B1[Fastify HTTP]
B2[/health /ready /live]
end
subgraph Core["DeerFlow Engine (deerflow/)"]
C1[workflow.ts<br/>8-phase enforcer]
C2[agent-guard.ts<br/>behavior monitor]
end
subgraph Enforcement["21-Gate Compliance (deerflow/enforcement/)"]
D1[Fabrication Detector<br/>8 branches]
D2[Tamper-Proof Lock<br/>SHA-256]
D3[Compliance Engine<br/>orchestrator]
D4[Evidence Validator]
D5[Source Citation]
D6[Quality Over Speed]
D7[19 more gates...]
end
subgraph Hera["Hera Self-Evolution (deerflow/hera/)"]
E1[TF-IDF Index<br/>pure TypeScript]
E2[Conversation Memory]
E3[Agent Specialization]
E4[Evolution Engine]
E5[Adaptive Coordinator]
end
subgraph Skills["Skill Modules (deerflow/skills/)"]
F1[code-review]
F2[security]
F3[test]
F4[ui]
F5[search]
end
CLI --> Core
Server --> Core
Core --> Enforcement
Core --> Hera
Core --> Skills
Enforcement --> D3
D3 --> D1 & D2 & D4 & D5 & D6 & D7
import { FabricationDetector } from '@deerflow/enforcement/fabrication-detector';
const detector = new FabricationDetector({
workingDir: process.cwd(),
scanDirs: ['src'],
minFabricationScore: 80,
maxCriticalFindings: 0,
});
const result = detector.scan();
if (!result.passed) {
console.error(result.lockReason);
for (const finding of result.findings) {
console.error(` [${finding.severity}] ${finding.file}:${finding.line} — ${finding.message}`);
}
process.exit(1);
}import { TFIDFIndex } from '@deerflow/hera/tfidf-index';
const index = new TFIDFIndex({ dataDir: '.agent/hera' });
index.initialize();
// Add conversations to the corpus — IDF weights evolve
index.addDocument('User asked about authentication patterns in Fastify');
index.addDocument('Discussed JWT vs session cookies for SPA authentication');
// Embed a new query and compare against stored vectors
const query = index.embed('How should I handle auth in my API?');
const stored = index.embed('JWT vs session cookies for SPA authentication');
const similarity = index.cosineSimilarity(query.vector, stored.vector);
console.log(`Similarity: ${similarity.toFixed(3)}`); // → 0.42
index.save(); // Persist vocabulary for next session# Run all gates, exit non-zero on any failure
npx deerflow enforce --strict
# Run only specific gates
npx deerflow enforce --only FABRICATION_DETECTOR,TAMPER_PROOF_LOCK
# Generate a tamper-proof lock file
npx deerflow lock --out .agent/lock.jsondeerflow/
├── src/ # Reference Fastify server showing how to wire gates
│ ├── index.ts # Main entry — createApp()
│ ├── server.ts # HTTP server
│ ├── routes/ # /health, /ready, /live
│ └── services/ # Business logic
├── bin/ # CLI entry point
│ └── deerflow.ts
├── deerflow/ # The framework itself
│ ├── enforcement/ # 21-gate compliance engine (20 modules)
│ ├── hera/ # TF-IDF memory + agent specialization (6 modules)
│ ├── skills/ # Pluggable skills (5 modules)
│ ├── agent-guard.ts # Real-time agent behavior monitor
│ ├── workflow.ts # 8-phase pipeline enforcer
│ └── index.ts # Public API
├── eslint-plugins/ # Custom ESLint rules for fabrication detection
├── tests/
│ ├── unit/ # 85 unit tests
│ ├── integration/ # 3 integration tests
│ └── e2e/ # 4 e2e tests
├── docker/ # Dockerfile + docker-compose
└── templates/ # Project templates
# Run all tests
npm run test
# Run specific test types
npm run test:unit
npm run test:integration
npm run test:e2e
# Run with coverage
npm run test:coverage92 tests pass: 85 unit (config, hera TF-IDF math, index API), 3 integration, 4 e2e (createApp() hitting /health, /ready, /live with no external network calls).
The Hera test suite explicitly validates mathematical properties:
- TF-IDF vectors are deterministic (same input → identical weights to 10 decimal places)
- Cosine similarity is bounded in
[0, 1]for non-negative vectors - Rare terms receive higher IDF than common terms
- Agent specialization weights converge toward 1.0 with repeated success
- Softmax produces a valid probability distribution
- TF-IDF index: O(n) embedding where n = unique terms in input. Vocabulary pruned at 50,000 terms by default. State persists to a
vocabulary.jsonfile (typically < 1 MB). - Fabrication detector: O(files × patterns) per scan. The 12 hallucination patterns are compiled regexes; cross-reference against
package.jsonis O(1) via aSet. - Tamper-proof lock: Single SHA-256 hash per artifact. Negligible overhead.
- No external services required. The framework runs entirely in-process. No database, no message queue, no LLM API calls.
MIT — Copyright (c) 2026 Nguyen Tien Dat. All rights reserved.