Skip to content

saisumantatgit/Agent-Litmus

Repository files navigation

Agent-Litmus

"The test your tests have to pass."

License: MIT Version Claude Code Agent Suite

Test quality governance for AI agent workflows. 5 commands, 5 agents, 12 violation types, Test Quality Score 0-100.


The Problem

AI agents write tests that pass but don't protect.

Your agent generates a test suite. Every test is green. Coverage is 95%. You ship with confidence. Then production breaks — and the test suite never flinched.

This is not a hypothetical. The research is clear:

Your test suite shows green. Your code ships bugs. The tests are theater.


The Solution

Agent-Litmus doesn't run your tests. It tests your tests.

Without Agent-Litmus

Agent writes tests -> Tests pass -> Green bar -> You trust it -> Ship it

Nobody checks if:

  • Assertions are meaningful (or just expect(result).toBeDefined())
  • Edge cases are covered (or just happy paths)
  • Tests would catch a regression (or just pass for any output)
  • "100% tests passing" actually means "100% protection"

It doesn't.

With Agent-Litmus

Agent writes tests -> /litmus-scan catches 12 violation types
                   -> /litmus-edge maps every edge case
                   -> /litmus-strength asks "would this catch a real bug?"
                   -> /litmus-fix generates concrete improvements
                   -> /litmus-report gives project-wide Test Quality Score

Verdicts are honest: EFFECTIVE / WEAK / HOLLOW — not just "tests pass."

The Analogy

Concern What You Tell the Agent What Validates It
Testing "Please write tests" Jest / Pytest
Linting "Please format nicely" ESLint / Prettier
Evidence "Please cite sources" Agent-Cite
Drift "Please follow instructions" Agent-Drift
Test Quality "Are these tests real?" Agent-Litmus

Without Agent-Litmus, green means nothing. With it, green means protected.


The 12 Violation Types

Agent-Litmus detects 12 violation types across 4 categories:

Category A — Assertion Weakness

Type Severity What It Catches
HOLLOW_ASSERTION error expect(result).toBeDefined() — checks existence, not correctness. Passes for { error: true } just as happily as { name: 'Alice' }.
WEAK_ASSERTION warning expect(result.length).toBeGreaterThan(0) — confirms non-empty, but ["CORRUPTED"] passes too.
NO_ASSERTION critical Test function calls code but never checks the result. Zero assertions. Smoke test at best.

Category B — Test Design

Type Severity What It Catches
IMPLEMENTATION_COUPLING error jest.spyOn(service, '_validate') — tests HOW the code works, not WHAT it produces. Breaks on refactor.
OVER_MOCKING error 8 mocks, 1 assertion. You're testing that JavaScript calls functions. Not that your logic works.
BRITTLE_SELECTOR warning .css-1a2b3c, /html/body/div[3]/span — breaks every build, trains devs to ignore failures.

Category C — Coverage Gaps

Type Severity What It Catches
MISSING_EDGE_CASE warning Source checks if (!name) but no test ever passes null. The guard is untested.
HAPPY_PATH_ONLY warning* Every test uses valid input. No error paths, no boundaries, no nulls. (*Escalates to error if source has error handling.)

Category D — Test Integrity

Type Severity What It Catches
DUPLICATE_TEST_LOGIC info 3 tests: add(1,1), add(2,2), add(3,3). Same code path, three times. Zero extra coverage.
TEST_PRIVATE_METHOD warning service._validateEmail() — testing internals that break on refactor.
HARDCODED_DEPENDENCY warning /Users/dev/data.json, localhost:3000, new Date('2024-01-15') — time bombs.
FLAKY_INDICATOR warning setTimeout(2000), Math.random(), timing assertions — non-deterministic failures.

Commands

Command Purpose Verdict
/litmus-scan <test-file> Scan for 12 violation types, classify all assertions EFFECTIVE / WEAK / HOLLOW
/litmus-edge <source-file> Map all edge cases, check which are tested COVERED / GAPS / EXPOSED
/litmus-strength <test-file> Thought-experiment mutation testing STRONG / MODERATE / THEATER
/litmus-fix <test-file> Generate concrete improved test code Before/After diffs
/litmus-report [scope] Batch audit, project-wide Test Quality Score PROTECTED / AT_RISK / EXPOSED

Examples

# Scan a single test file
/litmus-scan src/utils/auth.test.ts

# Check edge case coverage for a source file
/litmus-edge src/utils/auth.ts

# Would these tests catch real bugs?
/litmus-strength src/utils/auth.test.ts

# Fix the violations
/litmus-fix src/utils/auth.test.ts --auto

# Project-wide assessment
/litmus-report --format summary

# Strict mode (warnings become errors)
/litmus-scan src/utils/auth.test.ts --strict

# Focus on assertion quality only
/litmus-scan src/utils/auth.test.ts --focus A

Test Quality Score (TQS)

A single number from 0-100 that answers: "How protected is this code?"

TQS = assertion_strength * 0.40 + violation_penalty * 0.30 + edge_coverage * 0.30
Component (Weight) Calculation
Assertion Strength (40%) 100 - (weak% x 1) - (hollow% x 2)
Violation Penalty (30%) 100 - (critical x 15) - (error x 8) - (warning x 3) - (info x 1)
Edge Coverage (30%) Tested edges / total edges x 100

All component scores are floored at 0 (no negative values). The final TQS is capped at 100.

Verdicts

TQS Verdict Meaning
80-100 PROTECTED Tests are doing their job. Ship with confidence.
50-79 AT_RISK Tests exist but have significant blind spots. Bugs will get through.
0-49 EXPOSED Green-bar theater. Tests pass but protect nothing.

When to Use

Use Agent-Litmus when:

  • An AI agent just wrote tests for your code
  • You want to audit test quality before a release
  • You're doing periodic test health checks
  • You're reviewing a PR with new tests
  • Your test suite is green but bugs keep shipping

Don't use Agent-Litmus for:

  • Code without any tests (write tests first, then audit them)
  • Test infrastructure setup (jest.config, conftest.py)
  • Mocking library configuration
  • E2E test orchestration (Playwright, Cypress config)

Installation

Quick Install (Auto-Detect CLI)

curl -fsSL https://raw.githubusercontent.com/saisumantatgit/Agent-Litmus/main/install.sh | bash

The installer auto-detects your CLI (Claude Code, Cursor, Codex, Aider) and installs the appropriate adapter.

Manual Install (Claude Code)

git clone https://github.com/saisumantatgit/Agent-Litmus.git
cp -r Agent-Litmus/.claude/ .claude/
cp -r Agent-Litmus/.claude-plugin/ .claude-plugin/
cp -r Agent-Litmus/references/ references/
cp -r Agent-Litmus/templates/ templates/

Other Platforms

Platform Adapter Location Setup
Claude Code adapters/claude-code/ Plugin + commands (native)
Cursor adapters/cursor/ Rules file in .cursor/rules/
OpenAI Codex adapters/codex/ AGENTS.md system prompt
Aider adapters/aider/ .aider.conf.yml
Generic adapters/generic/ Copy prompts from prompts/

Quick Start

# 1. Install
curl -fsSL https://raw.githubusercontent.com/saisumantatgit/Agent-Litmus/main/install.sh | bash

# 2. Scan your weakest test file
/litmus-scan path/to/your.test.ts

# 3. See the violations and verdict (EFFECTIVE/WEAK/HOLLOW)

# 4. Fix the violations
/litmus-fix path/to/your.test.ts --auto

# 5. Get project-wide score
/litmus-report

Configuration

Copy templates/litmus-protocol.yaml to your project root as .litmus-protocol.yaml:

# Override violation severities
violations:
  HOLLOW_ASSERTION: error        # default: error
  WEAK_ASSERTION: warning        # default: warning
  NO_ASSERTION: critical         # default: critical
  DUPLICATE_TEST_LOGIC: off      # disable this check

# Test file discovery patterns
test_patterns:
  - "**/*.test.{ts,tsx,js,jsx}"
  - "**/test_*.py"

# TQS thresholds
tqs:
  protected: 80
  at_risk: 50

# Scoring weights (must sum to 1.0)
scoring:
  assertion_strength: 0.40
  violation_penalty: 0.30
  edge_coverage: 0.30

# Directories to ignore
ignore:
  - "node_modules/"
  - "vendor/"
  - ".git/"

See templates/litmus-protocol.yaml for all configuration options.


Platform Support

Agent-Litmus works with any AI coding assistant that can read files and follow instructions.

Platform Support Level Adapter
Claude Code Native plugin .claude-plugin/ + commands + skills + agents
Cursor Rules integration .cursor/rules/litmus.md
OpenAI Codex Agent instructions AGENTS.md
Aider Config integration .aider.conf.yml
Windsurf Generic prompts prompts/*.md
Cline Generic prompts prompts/*.md
Any LLM Copy-paste prompts prompts/*.md

Part of the Agent Suite

Agent-Litmus is one of six products in the Agent Suite for AI agent governance:

Product Tagline Purpose
Agent-PROVE "Prove it or it fails." Thinking validation — structured reasoning frameworks
Agent-Trace "See the ripple effect before it happens." Blast radius mapping — impact analysis before changes
Agent-Drift "Not on my watch." Drift detection — catch when agents deviate from instructions
Agent-Litmus "The test your tests have to pass." Test quality governance — are tests protecting code?
Agent-Cite "Cite it or it's opinion." Evidence enforcement — require citations for claims
Agent-Scribe "Nothing is lost." Session governance — capture decisions and context

Origin

Agent-Litmus was built from research across:

  • Springer 2025 papers on auto-generated test quality and the 13 test smells unique to AI-generated tests
  • Diffblue 2025 mutation testing benchmarks showing AI tests achieve only 71% mutation scores
  • Stack Overflow Developer Survey 2025 reporting only 29% developer trust in AI accuracy
  • CodeRabbit 2025 analysis showing 1.7x more issues in AI-generated code

The 12 violation types map directly to the documented failure modes of AI-generated tests. The assertion classification (STRONG/WEAK/HOLLOW) comes from mutation testing research: assertions that survive mutations are not protecting code.


Contributing

See CONTRIBUTING.md for how to:

  • Add new violation types
  • Add assertion patterns for new frameworks
  • Add CLI adapters
  • Add edge case categories

License

MIT License. Copyright (c) 2026 Sai Sumanth Battepati.

See LICENSE for details.

About

Agent-Litmus — The test your tests have to pass. Test quality governance for AI agent workflows. 12 violation types, TQS scoring, 5 commands.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages