Agent-Litmus

"The test your tests have to pass."

Test quality governance for AI agent workflows. 5 commands, 5 agents, 12 violation types, Test Quality Score 0-100.

The Problem

AI agents write tests that pass but don't protect.

Your agent generates a test suite. Every test is green. Coverage is 95%. You ship with confidence. Then production breaks — and the test suite never flinched.

This is not a hypothetical. The research is clear:

Only 29% of developers trust AI accuracy (Stack Overflow Developer Survey 2025)
Best AI test generators achieve 71% mutation scores — meaning 29% of bugs slip through undetected (Diffblue 2025 Benchmarks)
Researchers identified 13 new test smells specific to auto-generated tests that don't exist in human-written tests (Springer 2025)
AI-generated code has 1.7x more issues than human-written code (CodeRabbit 2025)

Your test suite shows green. Your code ships bugs. The tests are theater.

The Solution

Agent-Litmus doesn't run your tests. It tests your tests.

Without Agent-Litmus

Agent writes tests -> Tests pass -> Green bar -> You trust it -> Ship it

Nobody checks if:

Assertions are meaningful (or just expect(result).toBeDefined())
Edge cases are covered (or just happy paths)
Tests would catch a regression (or just pass for any output)
"100% tests passing" actually means "100% protection"

It doesn't.

With Agent-Litmus

Agent writes tests -> /litmus-scan catches 12 violation types
                   -> /litmus-edge maps every edge case
                   -> /litmus-strength asks "would this catch a real bug?"
                   -> /litmus-fix generates concrete improvements
                   -> /litmus-report gives project-wide Test Quality Score

Verdicts are honest: EFFECTIVE / WEAK / HOLLOW — not just "tests pass."

The Analogy

Concern	What You Tell the Agent	What Validates It
Testing	"Please write tests"	Jest / Pytest
Linting	"Please format nicely"	ESLint / Prettier
Evidence	"Please cite sources"	Agent-Cite
Drift	"Please follow instructions"	Agent-Drift
Test Quality	"Are these tests real?"	Agent-Litmus

Without Agent-Litmus, green means nothing. With it, green means protected.

The 12 Violation Types

Agent-Litmus detects 12 violation types across 4 categories:

Category A — Assertion Weakness

Type	Severity	What It Catches
`HOLLOW_ASSERTION`	error	`expect(result).toBeDefined()` — checks existence, not correctness. Passes for `{ error: true }` just as happily as `{ name: 'Alice' }`.
`WEAK_ASSERTION`	warning	`expect(result.length).toBeGreaterThan(0)` — confirms non-empty, but `["CORRUPTED"]` passes too.
`NO_ASSERTION`	critical	Test function calls code but never checks the result. Zero assertions. Smoke test at best.

Category B — Test Design

Type	Severity	What It Catches
`IMPLEMENTATION_COUPLING`	error	`jest.spyOn(service, '_validate')` — tests HOW the code works, not WHAT it produces. Breaks on refactor.
`OVER_MOCKING`	error	8 mocks, 1 assertion. You're testing that JavaScript calls functions. Not that your logic works.
`BRITTLE_SELECTOR`	warning	`.css-1a2b3c`, `/html/body/div[3]/span` — breaks every build, trains devs to ignore failures.

Category C — Coverage Gaps

Type	Severity	What It Catches
`MISSING_EDGE_CASE`	warning	Source checks `if (!name)` but no test ever passes `null`. The guard is untested.
`HAPPY_PATH_ONLY`	warning*	Every test uses valid input. No error paths, no boundaries, no nulls. (*Escalates to error if source has error handling.)

Category D — Test Integrity

Type	Severity	What It Catches
`DUPLICATE_TEST_LOGIC`	info	3 tests: `add(1,1)`, `add(2,2)`, `add(3,3)`. Same code path, three times. Zero extra coverage.
`TEST_PRIVATE_METHOD`	warning	`service._validateEmail()` — testing internals that break on refactor.
`HARDCODED_DEPENDENCY`	warning	`/Users/dev/data.json`, `localhost:3000`, `new Date('2024-01-15')` — time bombs.
`FLAKY_INDICATOR`	warning	`setTimeout(2000)`, `Math.random()`, timing assertions — non-deterministic failures.

Commands

Command	Purpose	Verdict
`/litmus-scan <test-file>`	Scan for 12 violation types, classify all assertions	EFFECTIVE / WEAK / HOLLOW
`/litmus-edge <source-file>`	Map all edge cases, check which are tested	COVERED / GAPS / EXPOSED
`/litmus-strength <test-file>`	Thought-experiment mutation testing	STRONG / MODERATE / THEATER
`/litmus-fix <test-file>`	Generate concrete improved test code	Before/After diffs
`/litmus-report [scope]`	Batch audit, project-wide Test Quality Score	PROTECTED / AT_RISK / EXPOSED

Examples

# Scan a single test file
/litmus-scan src/utils/auth.test.ts

# Check edge case coverage for a source file
/litmus-edge src/utils/auth.ts

# Would these tests catch real bugs?
/litmus-strength src/utils/auth.test.ts

# Fix the violations
/litmus-fix src/utils/auth.test.ts --auto

# Project-wide assessment
/litmus-report --format summary

# Strict mode (warnings become errors)
/litmus-scan src/utils/auth.test.ts --strict

# Focus on assertion quality only
/litmus-scan src/utils/auth.test.ts --focus A

Test Quality Score (TQS)

A single number from 0-100 that answers: "How protected is this code?"

TQS = assertion_strength * 0.40 + violation_penalty * 0.30 + edge_coverage * 0.30

Component (Weight)	Calculation
Assertion Strength (40%)	100 - (weak% x 1) - (hollow% x 2)
Violation Penalty (30%)	100 - (critical x 15) - (error x 8) - (warning x 3) - (info x 1)
Edge Coverage (30%)	Tested edges / total edges x 100

All component scores are floored at 0 (no negative values). The final TQS is capped at 100.

Verdicts

TQS	Verdict	Meaning
80-100	PROTECTED	Tests are doing their job. Ship with confidence.
50-79	AT_RISK	Tests exist but have significant blind spots. Bugs will get through.
0-49	EXPOSED	Green-bar theater. Tests pass but protect nothing.

When to Use

Use Agent-Litmus when:

An AI agent just wrote tests for your code
You want to audit test quality before a release
You're doing periodic test health checks
You're reviewing a PR with new tests
Your test suite is green but bugs keep shipping

Don't use Agent-Litmus for:

Code without any tests (write tests first, then audit them)
Test infrastructure setup (jest.config, conftest.py)
Mocking library configuration
E2E test orchestration (Playwright, Cypress config)

Installation

Quick Install (Auto-Detect CLI)

curl -fsSL https://raw.githubusercontent.com/saisumantatgit/Agent-Litmus/main/install.sh | bash

The installer auto-detects your CLI (Claude Code, Cursor, Codex, Aider) and installs the appropriate adapter.

Manual Install (Claude Code)

git clone https://github.com/saisumantatgit/Agent-Litmus.git
cp -r Agent-Litmus/.claude/ .claude/
cp -r Agent-Litmus/.claude-plugin/ .claude-plugin/
cp -r Agent-Litmus/references/ references/
cp -r Agent-Litmus/templates/ templates/

Other Platforms

Platform	Adapter Location	Setup
Claude Code	`adapters/claude-code/`	Plugin + commands (native)
Cursor	`adapters/cursor/`	Rules file in `.cursor/rules/`
OpenAI Codex	`adapters/codex/`	`AGENTS.md` system prompt
Aider	`adapters/aider/`	`.aider.conf.yml`
Generic	`adapters/generic/`	Copy prompts from `prompts/`

Quick Start

# 1. Install
curl -fsSL https://raw.githubusercontent.com/saisumantatgit/Agent-Litmus/main/install.sh | bash

# 2. Scan your weakest test file
/litmus-scan path/to/your.test.ts

# 3. See the violations and verdict (EFFECTIVE/WEAK/HOLLOW)

# 4. Fix the violations
/litmus-fix path/to/your.test.ts --auto

# 5. Get project-wide score
/litmus-report

Configuration

Copy templates/litmus-protocol.yaml to your project root as .litmus-protocol.yaml:

# Override violation severities
violations:
  HOLLOW_ASSERTION: error        # default: error
  WEAK_ASSERTION: warning        # default: warning
  NO_ASSERTION: critical         # default: critical
  DUPLICATE_TEST_LOGIC: off      # disable this check

# Test file discovery patterns
test_patterns:
  - "**/*.test.{ts,tsx,js,jsx}"
  - "**/test_*.py"

# TQS thresholds
tqs:
  protected: 80
  at_risk: 50

# Scoring weights (must sum to 1.0)
scoring:
  assertion_strength: 0.40
  violation_penalty: 0.30
  edge_coverage: 0.30

# Directories to ignore
ignore:
  - "node_modules/"
  - "vendor/"
  - ".git/"

See templates/litmus-protocol.yaml for all configuration options.

Platform Support

Agent-Litmus works with any AI coding assistant that can read files and follow instructions.

Platform	Support Level	Adapter
Claude Code	Native plugin	`.claude-plugin/` + commands + skills + agents
Cursor	Rules integration	`.cursor/rules/litmus.md`
OpenAI Codex	Agent instructions	`AGENTS.md`
Aider	Config integration	`.aider.conf.yml`
Windsurf	Generic prompts	`prompts/*.md`
Cline	Generic prompts	`prompts/*.md`
Any LLM	Copy-paste prompts	`prompts/*.md`

Part of the Agent Suite

Agent-Litmus is one of six products in the Agent Suite for AI agent governance:

Product	Tagline	Purpose
Agent-PROVE	"Prove it or it fails."	Thinking validation — structured reasoning frameworks
Agent-Trace	"See the ripple effect before it happens."	Blast radius mapping — impact analysis before changes
Agent-Drift	"Not on my watch."	Drift detection — catch when agents deviate from instructions
Agent-Litmus	"The test your tests have to pass."	Test quality governance — are tests protecting code?
Agent-Cite	"Cite it or it's opinion."	Evidence enforcement — require citations for claims
Agent-Scribe	"Nothing is lost."	Session governance — capture decisions and context

Origin

Agent-Litmus was built from research across:

Springer 2025 papers on auto-generated test quality and the 13 test smells unique to AI-generated tests
Diffblue 2025 mutation testing benchmarks showing AI tests achieve only 71% mutation scores
Stack Overflow Developer Survey 2025 reporting only 29% developer trust in AI accuracy
CodeRabbit 2025 analysis showing 1.7x more issues in AI-generated code

The 12 violation types map directly to the documented failure modes of AI-generated tests. The assertion classification (STRONG/WEAK/HOLLOW) comes from mutation testing research: assertions that survive mutations are not protecting code.

Contributing

See CONTRIBUTING.md for how to:

Add new violation types
Add assertion patterns for new frameworks
Add CLI adapters
Add edge case categories

License

See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent-Litmus

The Problem

The Solution

Without Agent-Litmus

With Agent-Litmus

The Analogy

The 12 Violation Types

Category A — Assertion Weakness

Category B — Test Design

Category C — Coverage Gaps

Category D — Test Integrity

Commands

Examples

Test Quality Score (TQS)

Verdicts

When to Use

Installation

Quick Install (Auto-Detect CLI)

Manual Install (Claude Code)

Other Platforms

Quick Start

Configuration

Platform Support

Part of the Agent Suite

Origin

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude-plugin		.claude-plugin
.claude		.claude
adapters		adapters
docs/pir		docs/pir
prompts		prompts
references		references
templates		templates
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

Agent-Litmus

The Problem

The Solution

Without Agent-Litmus

With Agent-Litmus

The Analogy

The 12 Violation Types

Category A — Assertion Weakness

Category B — Test Design

Category C — Coverage Gaps

Category D — Test Integrity

Commands

Examples

Test Quality Score (TQS)

Verdicts

When to Use

Installation

Quick Install (Auto-Detect CLI)

Manual Install (Claude Code)

Other Platforms

Quick Start

Configuration

Platform Support

Part of the Agent Suite

Origin

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages