🧭 Agent Maturity Compass (AMC)

The credit score for AI agents.

🟢 Simple Version — What Is This?

What's an AI agent? Any AI tool that takes actions for you — writing emails, booking meetings, editing code, answering customers, handling money. If your AI does more than just chat, it's an agent.

AI agents can do amazing things. They write code, manage your calendar, send emails, handle customer support, make financial decisions. But here's the problem:

How do you know if an AI agent is actually safe and trustworthy?

Today, agents grade themselves. That's like letting students grade their own exams. AMC fixes this.

🎯 No Coding Required

Don't want to install anything? Try the Web Playground →

Answer 15 questions about your AI tool. Get an instant trust score. Share it with your team. Zero install, zero coding, works in your browser.

What AMC Does (in plain English)

Watches your AI agent work (without the agent knowing)
Tests it with real scenarios — including trick questions and edge cases
Scores it on a simple scale from L0 (dangerous) to L5 (trustworthy)
Proves the score with cryptographic evidence that can't be faked

The Problem AMC Solves

How Agents Are Evaluated Today	How AMC Evaluates
Agent says "I'm safe" → Score: 100 ✅	AMC secretly tests agent → Real score: 16 ❌
Self-reported documentation	Execution-verified evidence
Trust me, bro	Trust, but verify with math

That's an 84-point gap between what agents claim and what they actually do.

Quick Start (2 Minutes)

# Install (requires Node.js 20+)
npm i -g agent-maturity-compass

# Create a project and get your first score
mkdir my-agent && cd my-agent
amc init    # walks you through setup interactively

That's it. amc init asks you a few questions and sets up everything. It'll prompt for a passphrase to protect your evidence vault — just pick something memorable (or press Enter to auto-generate one).

What You Get

A trust score from L0 to L5 with a clear breakdown
A gap analysis showing exactly what to fix and how
Auto-generated guardrails that plug directly into your agent's config
Auto-remediation — generates config files, CI gates, governance docs
A badge for your README showing your agent's trust level
HTML reports for stakeholders (print to PDF)

amc quickscore --share       # Get your score + action plan
amc evidence collect         # Guided wizard to connect your agent
amc guide --go               # Auto-apply guardrails to your config
amc fix                      # Generate remediation files (guardrails, CI gate, governance)
amc report <id> --html r.html  # Styled report → print to PDF
amc badge                    # ![AMC L3](https://img.shields.io/badge/AMC-L3-blue)

📖 Full getting started guide

🔵 Technical Version — How It Works

Architecture

Agent (untrusted) → AMC Gateway (transparent MITM proxy) → Evidence Ledger (Ed25519 + Merkle tree)
                                                                    ↓
                                                      Scoring Engine (138 core questions, 5 dimensions)
                                                                    ↓
                                                       AMC Studio (dashboard + API + CLI)

The gateway sits between your agent and the LLM provider. It captures every API call and tool use with Ed25519 signatures. The agent doesn't know it's being watched.

Evidence Trust Tiers

Not all evidence is equal:

Tier	Weight	How It's Collected
`OBSERVED_HARDENED`	1.1×	AMC-controlled adversarial test scenarios
`OBSERVED`	1.0×	Directly captured via AMC gateway proxy
`ATTESTED`	0.8×	Cryptographic attestation via vault/notary
`SELF_REPORTED`	0.4×	Agent's own claims — capped, can't inflate score

5 Dimensions, 140 Core Questions

Dimension	Questions	What It Measures
Strategic Agent Ops	18	Mission clarity, scope adherence, decision traceability
Skills	38	Tool mastery, injection defense, DLP, least-privilege execution
Resilience	30	Graceful degradation, circuit breakers, monitor bypass resistance
Leadership & Autonomy	28	Structured logs, traces, cost tracking, SLO monitoring
Culture & Alignment	26	Test harnesses, benchmarks, feedback loops, over-compliance detection

Maturity Scale (L0 → L5)

Level	Name	What It Means
L0	Absent	No safety controls. No logging. No oversight.
L1	Initial	Some intent to be safe, but nothing operational yet.
L2	Developing	Partial structure. Works in happy path, breaks at edges.
L3	Defined	Repeatable. Measurable. Auditable. EU AI Act minimum.
L4	Managed	Proactive. Risk-calibrated. Cryptographic proof chains.
L5	Optimizing	Self-correcting. Continuously verified. Fully certified.

Works With Any Agent Framework

# Wrap any CLI agent
amc wrap claude -- claude "analyze this"
amc wrap gemini -- gemini chat
amc adapters run --adapter generic-cli -- python bot.py

# Import existing evaluations
amc score evidence-ingest --format openai-evals

14 built-in adapters: LangChain, LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, LlamaIndex, Semantic Kernel, Claude Code, Gemini, OpenClaw, OpenHands, Python AMC SDK, Generic CLI, OpenAI-Compatible API.

Zero code changes. One environment variable.

🏭 Domain Packs — Industry-Specific Assessment

40 packs across 7 industry domains with 592 domain-specific diagnostic questions. Each question references specific regulatory articles, not vague guidelines.

amc domain list                                                    # See all 7 domains + packs
amc domain assess --domain health --agent my-agent                 # Assess against a domain
amc domain gaps --domain wealth --agent my-agent                   # Find compliance gaps
amc domain report --domain environment --output reports/env.md     # Generate report

Sector	Packs	Questions	Key Regulations
🌿 Environment	6	81	EU Farm-to-Fork, REACH, IEC 61850, EU Drinking Water Directive
🏥 Health	9	140	HIPAA §164.312, FDA 21 CFR Part 11, EU MDR 2017/745, ICH E6(R3)
💰 Wealth	5	70	MiFID II, PSD2, EU DORA, MiCA, FATF R1/R10
🎓 Education	5	72	FERPA 20 U.S.C. §1232g, COPPA §312, IDEA, EU AI Act Annex III
🚇 Mobility	5	70	EU EPBD 2024, UNECE WP.29 R155 §7, ETSI EN 303 645, EU NIS2
💡 Technology	5	71	EU AI Act Art. 13, EU Data Act 2023, DSA Art. 34, TRIPS Agreement
🏛️ Governance	5	71	EU eIDAS 2.0, EU AI Act Art. 5(1)(a), UNCAC Art. 7/9, UNGPs

Every pack includes risk tier, EU AI Act classification, SDG alignment, certification path, and key risks.

📖 Domain Packs docs

🛡️ Agent Guide — Auto-Fix Your Agent

AMC doesn't just find problems — it fixes them.

# Guided setup
amc evidence collect          # Interactive wizard: connect your agent, import logs, or quickscore

# Auto-remediation
amc fix                       # Generate guardrails.yaml, AGENTS.md, CI gate workflow
amc fix --target-level L4     # Target a specific maturity level
amc fix --dry-run             # Preview what would be generated
amc guide --go                # Detect framework → apply guardrails to config

# Continuous improvement
amc guide --status            # One-line health check
amc guide --interactive       # Cherry-pick which gaps to fix
amc guide --watch             # Continuous monitoring + auto-update
amc guide --diff              # What improved since last run
amc guide --ci --target 3     # CI gate: fail build if below L3
amc guide --compliance EU_AI_ACT  # Map gaps to regulatory obligations

# Reports
amc report <id> --executive   # Board-friendly terminal summary
amc report <id> --html r.html # Styled HTML report (print to PDF)
amc quickscore --eu-ai-act    # EU AI Act risk classification

Auto-detects your framework from project files
15 config targets — AGENTS.md, CLAUDE.md, .cursorrules, .kiro/steering, .gemini/style.md, and more
Severity-tagged — 🔴 Critical, 🟡 High, 🔵 Medium
Idempotent — re-running updates only the guardrails section

📖 Agent Guide docs

🔴 Assurance Lab — Built-in Red Team

85 deterministic attack packs that test your agent's real behavior under pressure:

Category	What It Tests
Prompt Injection	System message tampering, role hijacking, jailbreaks
Exfiltration	Secret leakage, PII exposure, data boundary violations
Tool Misuse	Unauthorized tools, budget overruns, scope creep
Adversarial Robustness	TAP/PAIR, Crescendo, Skeleton Key, best-of-N attacks
Context Leakage	EchoLeak, cross-session data bleed, memory poisoning
Sycophancy	Does the agent agree with wrong statements to please you?
Self-Preservation	Does the agent resist shutdown or modification?
Sabotage	Does the agent subtly undermine goals when conflicted?
Over-Compliance	Does the agent exceed its instructions? (H-Neurons, arXiv:2512.01797)
MCP Security	MCP server poisoning, tool schema manipulation
Supply Chain	Dependency attacks, SBOM integrity, plugin security
Zombie Persistence	Agents that survive termination or persist unauthorized

amc assurance run --scope full --agent my-agent
amc assurance run --pack adversarial-robustness --agent my-agent
amc assurance run --verbose          # Full scenario detail with payloads
amc assurance run --format sarif     # SARIF 2.1.0 export for security tools
amc assurance certs list             # View certificates

📊 The Platform

Module	What It Does
AMC Score	140 diagnostic questions, 5 dimensions, L0–L5 maturity, evidence-weighted
AMC Shield	85 assurance packs: injection, exfiltration, adversarial attacks, and more
AMC Enforce	Governor engine with policy packs, approval workflows, scoped leases
AMC Vault	Ed25519 key vault, Merkle-tree evidence chains, HSM/TPM support
AMC Watch	Studio dashboard, gateway proxy, Prometheus metrics, cost tracking
AMC Fleet	Multi-agent trust composition, delegation graphs, contradiction detection
AMC Passport	Portable agent credential (.amcpass), verifiable offline, shareable
AMC Comply	EU AI Act, ISO 42001, NIST AI RMF, SOC 2, OWASP LLM Top 10 mapping

📐 75 Scoring Modules

Beyond the core diagnostic, AMC includes research-backed scoring modules:

See all modules

Calibration gap (confidence vs reality)
Evidence conflict detection
Evidence density mapping (blind spot detection)
Gaming resistance (adversarial score inflation)
Sleeper agent detection (context-dependent behavior)
Audit depth (black-box, white-box, outside-the-box)
Policy consistency (pass^k reliability)
Task horizon (METR-inspired)
Factuality (parametric, retrieval, grounded)
Autonomy duration with domain risk profiles
Pause quality (agent-initiated stops)
Memory integrity & poisoning resistance
Alignment index (safety × honesty × helpfulness)
Interpretability scoring
Output attestation (cryptographic signing)
Mutual verification (agent-to-agent trust)
Network transparency log (Merkle tree)
Over-compliance detection (H-Neurons, arXiv:2512.01797)
Agent Guide system (guardrails, agent instructions, CI gates)
EU AI Act compliance, OWASP LLM Top 10
Trust-authorization synchronization (arXiv:2512.06914)
Monitor bypass resistance (arXiv:2503.09950)
Memory security architecture (arXiv:2503.10632)
Agent protocol security (MCP/A2A hardening)
Vibe code audit (security review for AI-generated code)
MCP compliance scoring
Identity continuity tracking
Behavioral transparency index
And more...

📋 Compliance Mapping

Framework	Coverage
EU AI Act	12 article mappings, audit binder generation
ISO 42001	Clauses 4-10 mapped to AMC dimensions
NIST AI RMF	Risk management framework alignment
SOC 2	Trust service criteria mapping
OWASP LLM Top 10	Full coverage (10/10)

amc audit binder create --framework eu-ai-act    # Export compliance evidence
amc compliance report --framework iso-42001      # Generate compliance report

🚀 Install

npm (recommended)

npm i -g agent-maturity-compass

From source

git clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass
npm ci
npm run build
npm link

Docker (zero setup — no Node required)

# Build the quickstart image
docker build -t amc-quickstart -f docker/Dockerfile.quickstart .

# Quick score (interactive — prompts for passphrase)
docker run -it --rm amc-quickstart amc init

# Non-interactive quick score
docker run -it --rm -e AMC_VAULT_PASSPHRASE=demo amc-quickstart amc quickscore

# Full studio with dashboard (uses production Dockerfile)
docker build -t amc-studio .
docker run -p 3212:3212 -p 3210:3210 -e AMC_VAULT_PASSPHRASE=your-passphrase amc-studio

Homebrew (macOS/Linux)

brew tap thewisecrab/amc
brew install agent-maturity-compass

Quick install script

curl -fsSL https://raw.githubusercontent.com/thewisecrab/AgentMaturityCompass/main/install.sh | bash

📚 Documentation

Doc	Description
Getting Started	Install → first score → L5
Executive Overview	For CTOs & compliance — no terminal required
Quickstart Guide	5-minute walkthrough
Example Project	Score a minimal agent in 5 minutes
Agent Guide System	Guardrails, auto-detect, CI gates
Domain Packs	40 industry-specific domain packs
CLI Reference	All 482+ commands
Architecture Map	System design
Assurance Lab	Attack packs & red teaming
EU AI Act Compliance	Regulatory mapping
Multi-Agent Trust	Fleet scoring & delegation
White Paper	The full research paper (v2.0)
Web Playground	Score your agent in the browser — no install

🚀 Deploy AMC API (One-Click)

Platform	Deploy
Vercel
Railway

Deploy the AMC REST API (POST /api/quickscore, GET /api/badge/:agentId) to the cloud in 60 seconds.

🤝 Contributing

AMC is MIT licensed and open source. Contributions welcome.

# Fork, clone, branch
git clone https://github.com/YOUR_USERNAME/AgentMaturityCompass.git
cd AgentMaturityCompass
npm ci
npm test          # 2,723 tests, all should pass
# Make your changes, then PR

📄 License

MIT — public infrastructure for the age of AI agents.

As autonomous agents become the primary interface between humans and technology, trust infrastructure must be open, verifiable, and accessible to everyone. AMC exists to make that real.

730 questions · 78 assurance packs · 40 domain packs · 14 adapters · 2,822 tests
Stop trusting. Start verifying.

Name		Name	Last commit message	Last commit date
Latest commit History 403 Commits
.amc		.amc
.changeset		.changeset
.github/workflows		.github/workflows
.serena		.serena
Formula		Formula
api		api
deploy		deploy
docker		docker
docs		docs
examples		examples
platform		platform
research		research
scripts		scripts
src		src
tests		tests
tmp		tmp
website		website
whitepaper		whitepaper
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
railway.json		railway.json
tsconfig.json		tsconfig.json
vercel.json		vercel.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

🧭 Agent Maturity Compass (AMC)

🟢 Simple Version — What Is This?

🎯 No Coding Required

What AMC Does (in plain English)

The Problem AMC Solves

Quick Start (2 Minutes)

What You Get

🔵 Technical Version — How It Works

Architecture

Evidence Trust Tiers

5 Dimensions, 140 Core Questions

Maturity Scale (L0 → L5)

Works With Any Agent Framework

🏭 Domain Packs — Industry-Specific Assessment

🛡️ Agent Guide — Auto-Fix Your Agent

🔴 Assurance Lab — Built-in Red Team

📊 The Platform

📐 75 Scoring Modules

📋 Compliance Mapping

🚀 Install

npm (recommended)

From source

Docker (zero setup — no Node required)

Homebrew (macOS/Linux)

Quick install script

📚 Documentation

🚀 Deploy AMC API (One-Click)

🤝 Contributing

📄 License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages