An @enchanter-ai product — algorithm-driven, agent-managed, self-learning.
Real-time change comprehension. Bayesian trust scoring. Information-gain review.
4 plugins. 6 algorithms. 4 agents. Every change accounted for.
Claude changed 12 files in 8 turns. I didn't read a single diff. Crow told me the auth migration was safe (trust: 0.82), the config change was not (trust: 0.31), and the test deletions were adversarial (trust: 0.18). I reviewed 2 files instead of 12.
In plain English: Claude edited fourteen files this session. You'll skim three. Crow ranks the fourteen so the one that breaks production isn't the one you skipped.
Technically: V2 Beta-Bernoulli posterior scoring updates a per-file trust value on every Write/Edit, seeded at Beta(2,2) and pushed by change type. V3 Information-Gain ordering IG = H(posterior) surfaces maximum-uncertainty files first so the two files worth reviewing float to the top. Every advisory carries (trust_score, change_type, N) — no advisory ships without a posterior sample count.
Crow takes its name from Alex's Mobs — a sharp-eyed corvid that perches over disturbances, inspects every object it finds, remembers faces, and sorts friend from threat. Every AI-assisted edit is a disturbance until its diff has been read; Crow reads it for you and scores trust before it reaches main.
The question this plugin answers: What just happened?
- Reviewers drowning in AI-generated diffs who want
review the 2 risky files, not all 12. - Teams who've been burned by silent destructive edits mid-session and want a scored, auditable trail.
- Engineers who understand that trust is evidence, not vibes and want the Bayesian posterior to say so.
Not for:
- Solo hack sessions where every edit is intentional and review friction is pure cost.
- Teams that want a blocking gate — Crow is advisory by design (see ../vis/packages/core/conduct/hooks.md § Injection over denial).
- The Problem
- How It Works
- What Makes Crow Different
- The Full Lifecycle
- Install
- Quickstart
- 4 Plugins, 4 Agents, 6 Algorithms
- What You Get Per Session
- Roadmap
- The Science Behind Crow
- Commands
- How Trust Scoring Works
- How Information-Gain Ordering Works
- vs Everything Else
- Agent Conduct (11 Modules)
- Architecture
- Acknowledgments
- Versioning & release cadence
- Contributing
- Citation
- License
The review-and-comprehension loop eats 40-60% of every Claude Code session:
- Developers rubber-stamp 93% of permission prompts (Anthropic data)
- Developers start second Claude instances to review the first (Issue #1144)
- The diff UI shows +7,490/-6,880 for an 11-line change (Issue #18541)
- No per-hunk accept/discard exists (Issue #31395)
- 10-20% of sessions are abandoned due to unexpected changes
Four plugins, one concern each, bound to specific hook points. decision-gate on PostToolUse orders pending reviews by information gain (H3) and red-teams low-trust changes (H5). change-tracker on PostToolUse classifies and clusters every diff (H1). trust-scorer on PostToolUse updates a Beta-Bernoulli posterior per file (H2). session-memory on PreCompact builds a continuity graph and persists cross-session learnings (H4, H6). The diagram below shows the bindings and state outputs.
Source: docs/assets/pipeline.mmd · Regeneration command in docs/assets/README.md.
Each plugin owns one concern. No overlap. No dependencies between plugins.
Every Write/Edit updates a Beta-Bernoulli posterior per file. Docs push the mean up, sensitive config pushes it down, reverts halve the likelihood. After 6 changes, a file's trust posterior has narrowed enough to say "review this one" or "this one's fine" — no more rubber-stamping 12 diffs at equal weight.
IG(X) = H(trust posterior). Changes at trust 0.5 get reviewed first (maximum uncertainty, maximum value). Changes at trust 0.1 or 0.9 drop to the bottom — the decision is already made. You review 2 files out of 12, and they're the right 2.
For any file under trust 0.4, the decision-gate agent generates specific adversarial questions tied to the diff content. "This changes the database query from parameterized to string interpolation — SQL injection risk." Not "consider security implications."
H6 Exponential Strategy Averaging (cross-session EMA) adapts priors per file type. After N sessions, Crow knows: config changes always get flagged by this developer, test changes are usually safe, schema changes require careful review. The classifier's defaults give way to what you actually do.
The tool executes, then PostToolUse runs decision-gate (H3 IG-ranking + H5 adversarial questions on fresh trust scores), change-tracker, and trust-scorer. When context fills, PreCompact triggers session-memory to write session-graph.json before the wipe. On resume, the restorer agent reads it back autonomously.
Source: docs/assets/lifecycle.mmd · Regeneration command in docs/assets/README.md.
Crow ships as 4 plugins that feed each other (change-tracker → trust-scorer → decision-gate → session-memory). One meta-plugin — full — lists all four as dependencies, so a single install pulls in the whole chain.
In Claude Code (recommended):
/plugin marketplace add enchanter-ai/crow
/plugin install full@crow
Claude Code resolves the dependency list and installs all 4 plugins. Verify with /plugin list.
Want to cherry-pick? Individual plugins are still installable by name — e.g. /plugin install crow-trust-scorer@crow if you only need scoring. The pipeline is designed to work end-to-end, though, so full@crow is the path we recommend.
Via shell (also installs shared/*.sh and shared/scripts/*.py locally so hooks work offline):
bash <(curl -s https://raw.githubusercontent.com/enchanter-ai/crow/main/install.sh)git clone https://github.com/enchanter-ai/crow
cd crow
./scripts/bootstrap.sh # canonical first command — installs vis siblingWithout ./scripts/bootstrap.sh, conduct imports will silently miss and Claude Code's @-loader will fail-soft. Always bootstrap first.
| Plugin | Hook | Command | What |
|---|---|---|---|
| change-tracker | PostToolUse | /crow:changes |
Semantic diff compression + classification |
| trust-scorer | PostToolUse | /crow:trust |
Bayesian trust scoring + alerts |
| decision-gate | PostToolUse | /crow:review |
IG-ordered review + adversarial questions |
| session-memory | PreCompact | /crow:session |
Continuity graph + Exponential Strategy Averaging |
| Agent | Model | Plugin | What |
|---|---|---|---|
| classifier | Haiku | change-tracker | Deep semantic change classification |
| auditor | Haiku | trust-scorer | Trust distribution analysis + risk report |
| adversary | Sonnet | decision-gate | Targeted adversarial review questions |
| restorer | Haiku | session-memory | Autonomous context restoration |
Three hook events fan out into four color-coded journals — one per sub-plugin — and converge on the enchanted-mcp bus and the /crow:* query surface. Color maps engines to journals: blue = change-tracker (V1 semantic-diff) · purple = trust-scorer (V2 Bayesian + V6 Exponential Strategy Averaging) · red = decision-gate (V3 info-gain) · yellow = session-memory (V4 continuity graph).
Source: docs/assets/state-flow.mmd · Regeneration command in docs/assets/README.md.
change-tracker/state/
├── changes.jsonl # Every file change with type, hash, cluster
└── metrics.jsonl # change_tracked events
trust-scorer/state/
├── trust.json # Per-file Beta parameters and trust scores
├── learnings.json # Cross-session Exponential Strategy Averaging data
└── metrics.jsonl # trust_scored events
decision-gate/state/
└── metrics.jsonl # review_advisory events
session-memory/state/
├── session-graph.json # Continuity graph (nodes, edges, trust overview)
├── session-summary.md # Human-readable session recap
└── metrics.jsonl # session_saved events
Tracked in docs/ROADMAP.md and the shared ecosystem map. For upcoming work specific to Crow, see issues tagged roadmap.
Six named algorithms power every decision:
Raw diffs are noise. Crow classifies each change by type and clusters related changes across files.
Change types: source_code, config_change, test_change, documentation, schema_change, dependency_change.
Impact radius: local (1 file), module (2-5 files), systemic (6+ files).
Each file change gets a trust score using Beta-Bernoulli conjugate priors.
Prior: Beta(2, 2) — mildly uncertain. Update via change-type likelihood ℓ. Trust reported as the posterior mean.
| Change Type | Likelihood ℓ |
|---|---|
| Documentation | 0.95 |
| Test changes | 0.85 |
| Source code (small) | 0.70 |
| Source code (large) | 0.50 |
| Schema changes | 0.55 |
| Dependencies | 0.50 |
| Config (sensitive) | 0.30 |
Help the developer review efficiently by showing the most uncertain changes first.
Maximum at p = 0.5 (trust is most uncertain). Changes at trust 0.5 get reviewed first. Changes at trust 0.1 or 0.9 are already decided — low review value.
Before compaction, build a semantic graph:
- Nodes: files (with type, trust, change count), decisions (review advisories)
- Edges: cluster relationships, file-to-decision links
On resumption: "Last session: 15 changes, 2 low-trust files flagged, 3 advisories issued."
For low-trust changes (trust < 0.4), generate specific adversarial questions:
- "This changes the database query from parameterized to string interpolation. SQL injection risk."
- "This test now asserts
true === true. The original checked actual business logic." - "This deletes the rate limiter. Was rate limiting intentional?"
Not generic warnings. Specific to the diff content.
Exponential moving average over per-type trust rates across sessions.
After N sessions, Crow knows: config changes always get flagged, test changes are usually safe, this developer always reviews schema changes carefully. Adapts priors accordingly.
| Command | Plugin | What |
|---|---|---|
/crow:changes |
change-tracker | All changes grouped by type and file |
/crow:trust |
trust-scorer | Trust scores sorted riskiest-first |
/crow:review |
decision-gate | IG-ranked review queue with adversarial questions |
/crow:session |
session-memory | Full session dashboard |
- Every file starts at Beta(2, 2) — a mildly uncertain prior (mean = 0.5).
- Each Write/Edit updates the posterior: high-trust types (docs, tests) push the score up, risky types (config, schema) push it down.
- After multiple updates, the posterior narrows — confidence increases.
- Reverts are penalized: if a file returns to a previous hash, the likelihood is halved.
- Trust scores persist across the session via
trust.json. Cross-session learning vialearnings.json.
Not all files are equally worth reviewing. Crow ranks by uncertainty:
- Trust 0.5 → IG 1.0 (maximum uncertainty — you need to look at this)
- Trust 0.1 → IG 0.47 (clearly bad — you already know)
- Trust 0.9 → IG 0.47 (clearly good — don't waste time)
Review the uncertain files first. Skip the ones where trust is already decided.
| Crow | Gryph | Context Mode | ClaudeWatch | Anthropic Review | |
|---|---|---|---|---|---|
| Real-time awareness | in-session | post-hoc | — | — | post-PR |
| Trust scoring | Bayesian | — | — | — | — |
| Per-change review | IG-ordered | — | — | — | — |
| Adversarial questions | specific | — | — | — | generic |
| Session continuity | graph + learnings | — | — | — | — |
| Cross-session learning | Gauss EMA | — | — | — | — |
| Dependencies | bash + jq | Node | Node + MCP | Python | API |
Every skill inherits a reusable behavioral contract from shared/ — loaded once into CLAUDE.md, applied across all plugins. This is how Claude acts inside Crow: deterministic, surgical, verifiable. Not a suggestion; a contract.
| Module | What it governs |
|---|---|
| discipline.md | Coding conduct: think-first, simplicity, surgical edits, goal-driven loops |
| context.md | Attention-budget hygiene, U-curve placement, checkpoint protocol |
| verification.md | Independent checks, baseline snapshots, dry-run for destructive ops |
| delegation.md | Subagent contracts, tool whitelisting, parallel vs. serial rules |
| failure-modes.md | 14-code taxonomy for accumulated-learning logs |
| tool-use.md | Tool-choice hygiene, error payload contract, parallel-dispatch rules |
| skill-authoring.md | SKILL.md frontmatter discipline, discovery test |
| hooks.md | Advisory-only hooks, injection over denial, fail-open |
| precedent.md | Log self-observed failures to state/precedent-log.md; consult before risky steps |
| tier-sizing.md | Prompt verbosity scales inversely with model tier; Haiku needs mechanical steps, Opus runs on intent |
| web-fetch.md | External URL handling: cache, dedup, budget; WebFetch is Haiku-tier-only |
Interactive architecture explorer with plugin diagrams, agent cards, and data flow:
docs/architecture/ — auto-generated from the codebase. Run python docs/architecture/generate.py to regenerate.
Crow builds on substrate laid by others:
- Claude Code (Anthropic) — the plugin surface this work extends.
- Keep a Changelog — CHANGELOG convention.
- Semantic Versioning — versioning contract.
- Contributor Covenant — Code of Conduct.
- repostatus.org — status badge.
- Citation File Format — citation metadata.
- Conventional Commits — commit convention.
Crow follows Semantic Versioning. Breaking changes land on major bumps only; the CHANGELOG flags them explicitly. Release cadence is opportunistic — tags land when accumulated fixes or features justify a cut, not on a fixed schedule. Migration notes between majors live in docs/upgrading.md.
See CONTRIBUTING.md
If you use this project in research or derivative work, please cite it:
@software{crow_2026,
title = {Crow},
author = {{Klaiderman}},
year = {2026},
url = {https://github.com/enchanter-ai/crow}
}See CITATION.cff for additional formats (APA, MLA, EndNote).
MIT
Crow is the change-trust layer — it scores every Write/Edit the agent makes before the change influences a commit. Upstream, Wixie's prompts produce the changes Crow observes. Downstream, Sylph consumes Crow's trust signal in its W4 reviewer routing (blame × recency × CODEOWNERS × Crow availability), and Lich uses Crow's trust as a gating prior before spending sandbox time on deep review.
Crow does not engineer prompts (Wixie's lane), track tokens (Emu's lane), review code correctness (Lich's lane), orchestrate PR lifecycle (Sylph's lane), or scan security surfaces (Hydra's lane). It scores trust in what just happened.
See docs/ecosystem.md § Data Flow Between Plugins for the full map.
