Principle: Find and fix defects at the earliest possible stage in the development lifecycle.
Cost Multiplier: Every stage defects move right increases cost 10x
- Requirements → Design: 10x cost increase
- Design → Implementation: 10x cost increase
- Implementation → Testing: 10x cost increase
- Testing → Production: 10x cost increase
Goal: Catch 80% of defects before code review, 95% before merge
Activity: Test Planning & Threat Modeling
Tools:
# Generate test plan from module specification
eos test plan generate \
--module oauth2-pkce-verifier \
--spec docs/PKCE_SPEC.md \
--output tests/test-plan-pkce.md
# Output includes:
# - Attack vectors to test
# - Security test cases (from OWASP ASVS)
# - Error scenarios
# - Edge cases
# - Estimated test countChecklist:
- Security requirements defined (ASVS level identified)
- Attack vectors documented (STRIDE threat model)
- Test acceptance criteria written (BDD style)
- Test scaffolding generated
- Coverage target set (e.g., 85% for auth modules)
Human-Centric Practice:
- Collaborate with security expert to identify threats
- Use visual threat modeling (draw.io, Miro)
- Document "why" not just "what" to test
Activity: Test-Driven Development (TDD)
Red-Green-Refactor Cycle:
# 1. RED: Generate failing test from specification
eos test generate \
--module oauth2-pkce-verifier \
--function verifyPKCE \
--test-type security \
--template asvs-v4.0.3 \
--output tests/unit/oauth2-pkce-verifier.test.js
# Opens editor with failing test:
describe('PKCEValidator.verifyPKCE', () => {
it('should reject plain code challenge method per RFC 7636 §4.2', () => {
// TODO: Implement test
expect(true).toBe(false); // Fail first
});
});
# 2. GREEN: Write minimum code to pass
# Developer implements verifyPKCE()
# 3. REFACTOR: Improve code while tests pass
eos test watchPre-Commit Hook (Automated):
#!/bin/bash
# .husky/pre-commit
echo "🔍 Running pre-commit validation..."
# 1. Run tests for changed files only
eos test run --changed --fast || {
echo "❌ Tests failed for changed files"
exit 1
}
# 2. Check coverage delta
eos test coverage --delta --threshold=+0% || {
echo "❌ Coverage decreased (not allowed)"
exit 1
}
# 3. Security linting
eos lint security --staged || {
echo "❌ Security issues found"
exit 1
}
# 4. Generate test if missing
eos test check-missing --auto-generate || {
echo "⚠️ Missing tests detected, generated scaffolds"
git add tests/
}
echo "✅ Pre-commit checks passed"Benefits:
- Tests written before code (design-first)
- Impossible to forget tests
- Coverage guaranteed
Activity: Automated Test Quality Review
PR Checklist Bot:
# GitHub Action triggered on PR open
eos test review \
--pr ${{ github.event.pull_request.number }} \
--checks all
# Automated checks:
# ✅ Tests added for new functions
# ✅ Tests cover attack vectors (STRIDE analysis)
# ✅ Error handling tested (null, timeout, etc.)
# ✅ Edge cases covered
# ❌ Cryptographic failures not tested → COMMENT
# Posts comment on PR:Sample PR Comment:
## 🤖 Test Quality Review
### ✅ Tests Added
- `oauth2-pkce-verifier.test.js` (12 tests, 89% coverage)
### ⚠️ Missing Tests
1. **Error Handling**: Cryptographic failures not tested
- `crypto.subtle.digest()` can throw
- Add test: `it('should handle crypto failure gracefully')`
2. **Edge Cases**: Large input not tested
- Code challenge can be up to 128 chars
- Add test: `it('should handle maximum length challenge')`
3. **Security Scenarios**: Replay attack not tested
- RFC 7636 requires one-time use
- Add test: `it('should detect code_challenge reuse')`
### 📊 Coverage Delta
- Overall: 45.2% → 47.8% (+2.6%) ✅
- oauth2-pkce-verifier.js: 0% → 89% ✅
### 🎯 Recommendation
**APPROVE** after adding 3 missing tests above.Human Review Focus:
- Reviewer validates test scenarios, not just implementation
- Pair review for critical security modules
- Test readability and maintainability
Activity: Comprehensive Automated Testing
Pipeline Stages:
# .github/workflows/test.yml (generated by eos)
stages:
- name: Fast Feedback (< 60s)
jobs:
- Lint (ESLint + Security plugins)
- Unit tests (changed files only)
- Type check
- name: Comprehensive Testing (< 5min)
jobs:
- All unit tests
- Integration tests
- Coverage validation (thresholds enforced)
- name: Security Gates (< 10min)
jobs:
- npm audit (blocking)
- SAST (CodeQL)
- Dependency scanning
- SBOM generation
- name: Quality Gates (< 5min)
jobs:
- Coverage >= 70% overall
- Coverage >= 85% for auth/ modules
- No flaky tests (3 consecutive runs)
- Performance regression check
# Any failure blocks mergeBranch Protection (Enforced):
eos branch-protection configure \
--branch main \
--require-reviews 1 \
--require-status-checks "Fast Feedback,Security Gates,Quality Gates" \
--require-up-to-date \
--enforce-admins
# Result: Zero defects reach main branchActivity: Runtime Testing & Observability
Synthetic Monitoring:
# Run real-world test scenarios against production
eos test synthetic \
--environment production \
--scenario oauth2-pkce-flow \
--frequency 15min \
--alert-on-failure
# Monitors:
# - End-to-end OAuth2 flows
# - PKCE validation accuracy
# - Performance (response time)
# - Error ratesDefect Feedback Loop:
# When production bug found:
eos test reproduce \
--bug HERA-1234 \
--create-test \
--add-to-suite regression
# Automatically:
# 1. Creates failing test from bug report
# 2. Adds to regression test suite
# 3. Prevents recurrenceProblem: Developers forget to write tests or don't know where to start
Solution: Auto-generate test files when new code created
# Triggered when new file created
# Git hook: post-checkout, post-merge
if [ -f "modules/auth/new-module.js" ]; then
eos test scaffold \
--file modules/auth/new-module.js \
--template security-module \
--output tests/unit/new-module.test.js
fi
# Generated test file includes:
# - Import statements
# - Describe blocks for each exported function
# - TODO test cases from JSDoc comments
# - Standard error handling tests
# - Edge case templatesExample Output (tests/unit/new-module.test.js):
import { describe, it, expect, beforeEach } from 'vitest';
import { NewModule } from '../../modules/auth/new-module.js';
describe('NewModule', () => {
let instance;
beforeEach(() => {
instance = new NewModule();
});
// Auto-generated from function signature
describe('validateInput()', () => {
it('should accept valid input', () => {
// TODO: Implement
expect(true).toBe(false);
});
it('should reject invalid input', () => {
// TODO: Implement
expect(true).toBe(false);
});
// Standard error tests
it('should handle null input', () => {
expect(() => instance.validateInput(null)).not.toThrow();
});
it('should handle undefined input', () => {
expect(() => instance.validateInput(undefined)).not.toThrow();
});
it('should handle empty string', () => {
const result = instance.validateInput('');
expect(result.valid).toBe(false);
});
});
});
// TODO: Run 'eos test complete' to fill in testsProblem: Coverage silently decreases over time
Solution: Daily coverage analysis with alerts
# Cron job: Daily at 9am
eos test coverage-analysis \
--compare-to last-7-days \
--alert-on-decrease \
--notify slack://engineering-team
# Report:Sample Report:
📊 Daily Coverage Report - 2024-11-07
🔴 Coverage Decreased:
- modules/auth/oauth2-analyzer.js: 78% → 72% (-6%)
Cause: 3 new functions added without tests
Assignee: @developer
Action: Add tests by EOD
🟢 Coverage Increased:
- modules/auth/token-redactor.js: 85% → 92% (+7%)
Good work: @another-developer
🎯 Overall: 45.2% (+0.3% from last week)
⚠️ Still 24.8% below target (70%)
📈 Trend: Positive (↑ 0.3%/week)
⏰ ETA to 70%: ~75 weeks at current rate
💡 Recommendation: Dedicate 2 devs for 1 sprint to close gap
Problem: Tests exist but don't actually validate behavior
Solution: Automatically mutate code and verify tests catch mutations
# Run weekly (expensive operation)
eos test mutation \
--files "modules/auth/*.js" \
--threshold 80 \
--report html
# Stryker Mutator or similar:
# - Changes operators: === to !==
# - Removes conditionals: if (x) → if (true)
# - Changes constants: 128 → 127
# - Removes function calls
#
# Good tests should fail when code mutatedExample Findings:
🧬 Mutation Testing Report
✅ 45 mutants killed (90%)
❌ 5 mutants survived (10%)
Survived Mutants:
1. oauth2-pkce-verifier.js:45
- Mutant: Changed 128 to 127 (entropy threshold)
- Tests passed (should have failed!)
- Issue: No test validates exact threshold
- Fix: Add test: expect(entropy(127)).toBe(false)
2. csrf-verifier.js:89
- Mutant: Removed state replay check
- Tests passed (should have failed!)
- Issue: Replay attack not tested
- Fix: Add replay attack test
Action: Add 5 tests to kill surviving mutants
Problem: Tests validate happy path, not attack vectors
Solution: Automatically generate attack scenario tests
# Using STRIDE threat model
eos test generate-attacks \
--module oauth2-pkce-verifier \
--threat-model STRIDE \
--output tests/security/pkce-attacks.test.js
# Generated attack tests:Example Output:
describe('PKCEValidator - Attack Scenarios', () => {
// Spoofing
it('should reject authorization with spoofed code_challenge', () => {
// Attacker provides valid format but spoofed challenge
});
// Tampering
it('should detect code_challenge parameter modification', () => {
// Attacker intercepts and modifies challenge mid-flight
});
// Repudiation
it('should log PKCE validation attempts', () => {
// Verify evidence collection for audit trail
});
// Information Disclosure
it('should not leak code_verifier in error messages', () => {
// Error messages shouldn't expose secret verifier
});
// Denial of Service
it('should handle extremely long code_challenge', () => {
const longChallenge = 'A'.repeat(1000000);
expect(() => validate(longChallenge)).not.toThrow();
});
// Elevation of Privilege
it('should reject authorization code reuse', () => {
// Using same code twice should fail
});
});Problem: Integration breaks when dependencies change
Solution: Consumer-driven contract tests
# Define expected behavior of Chrome APIs
eos test contract define \
--provider chrome.storage.local \
--consumer evidence-collector \
--output tests/contracts/chrome-storage.contract.js
# Contract enforced in tests:Example Contract:
describe('Contract: chrome.storage.local', () => {
it('should respect quota limits', async () => {
// Contract: storage.local has 10MB quota
const largeData = { data: 'x'.repeat(11 * 1024 * 1024) }; // 11MB
await expect(
chrome.storage.local.set(largeData)
).rejects.toThrow('QUOTA_BYTES_PER_ITEM');
});
it('should maintain data consistency', async () => {
// Contract: set() followed by get() returns same data
const data = { key: 'value' };
await chrome.storage.local.set(data);
const retrieved = await chrome.storage.local.get('key');
expect(retrieved.key).toBe('value');
});
});
// If Chrome API changes and breaks contract, tests fail immediatelyPrinciple: No production code without a failing test first
Implementation:
# Git hook enforces test-first
# .husky/pre-push
# Check if new code added without tests
eos test enforce-test-first \
--commits-since origin/main \
--strict || {
echo "❌ New code without tests detected"
echo "Fix: Write tests first, then push"
exit 1
}Team Agreement:
## Test-First Commit Pledge
I agree to:
1. Write failing test before implementing feature
2. Commit test file first, then implementation
3. Never push untested code to remote
4. Pair program when unsure how to test
Signed: [Developer Name]
Date: [Date]Principle: Make testing progress visible to entire team
Dashboard (web UI):
┌─────────────────────────────────────────────┐
│ Hera Test Dashboard - Live │
├─────────────────────────────────────────────┤
│ Coverage: ████████░░░░░░ 45.2% (Target: 70%)│
│ Tests: 284 passing ✅ | 0 failing │
│ Speed: 12.3s ⚡ (Target: <60s) │
│ Flaky: 2 🔥 (Investigate) │
├─────────────────────────────────────────────┤
│ Module Status: │
│ ✅ jwt-validator.js 95% ┃████████████│
│ ✅ oidc-validator.js 95% ┃████████████│
│ ❌ oauth2-pkce-verifier 0% ┃░░░░░░░░░░░░│
│ ❌ csrf-verifier 0% ┃░░░░░░░░░░░░│
├─────────────────────────────────────────────┤
│ Recent Activity: │
│ @dev1 added 12 tests to token-redactor ⬆️ │
│ @dev2 fixed flaky test in evidence-coll 🔧 │
│ CI passed for PR #45 ✅ │
└─────────────────────────────────────────────┘
Slack Integration:
Daily Standup Bot:
🌅 Good morning! Test status:
📊 Yesterday:
- 12 tests added
- Coverage: +2.3%
- 0 new flaky tests
🎯 Today's Focus:
- @dev1: Add PKCE tests (12 tests)
- @dev2: Fix flaky crypto test
- @dev3: Integration test for OAuth2 flow
🏆 Testing Champion: @dev1 (42 tests this week!)
Principle: Pair programming for critical security tests
Process:
1. Schedule 2-hour pairing session
2. Roles:
- Driver: Types the code
- Navigator: Reviews, suggests, researches
3. Rotation: Switch every 20 minutes
4. Output: High-quality tests with knowledge transfer
Pairing Matrix:
| Security Expert | Test Expert | Result |
|-----------------|-------------|---------------------------|
| @security-lead | @dev1 | PKCE validator tests |
| @security-lead | @dev2 | Session security tests |
| @test-lead | @dev3 | Error handling framework |
Principle: Learn from testing failures and successes
Monthly Retro Agenda:
1. Review Test Metrics (15 min)
- Coverage trend
- Flaky test count
- Test speed
- Bugs found by tests vs. production
2. Celebrate Wins (10 min)
- "Test of the month" award
- Coverage milestones reached
- Zero production bugs this month
3. Identify Challenges (15 min)
- What's hard to test?
- What tests are we avoiding?
- Where do bugs still escape?
4. Action Items (15 min)
- Improve test tooling
- Training needs
- Process tweaks
5. Test Kata (15 min)
- Live coding: solve test challenge together
- Learn new testing technique
Example Action Items:
From Nov 2024 Retro:
✅ ACTION: Create mutation testing suite
Owner: @test-lead
Due: 2024-11-30
✅ ACTION: Add crypto failure tests to all validators
Owner: @security-lead + @dev1
Due: 2024-11-15
✅ ACTION: Set up test dashboard (Grafana)
Owner: @devops-lead
Due: 2024-11-20
What to Measure:
{
"overall_coverage": {
"current": 45.2,
"target": 70,
"delta_week": +2.3,
"delta_month": +8.7,
"trend": "positive"
},
"module_coverage": {
"auth": 78.5, // High-risk modules
"security": 0, // CRITICAL GAP
"detection": 23.1,
"utils": 62.3
},
"eta_to_target": {
"weeks": 75,
"confidence": "low" // Current pace too slow
}
}Visualization:
Coverage Trend (Last 90 Days)
70% ┃ ┌─ Target
┃ │
60% ┃ │
┃ ╱───│
50% ┃ ╱─── │
┃ ╱─── │
40% ┃ ╱─── │
┃ ╱─── │
30% ┃╱─── │
└─────────────────────────────
Aug Sep Oct Nov Dec Jan
Formula:
Quality Score = (
Coverage * 0.3 +
MutationScore * 0.3 +
(100 - FlakyRate) * 0.2 +
(100 - BugEscapeRate) * 0.2
)
Example:
Coverage: 45.2%
MutationScore: 85%
FlakyRate: 2%
BugEscapeRate: 10% (10% of bugs found in prod)
Quality = (45.2 * 0.3) + (85 * 0.3) + (98 * 0.2) + (90 * 0.2)
= 13.56 + 25.5 + 19.6 + 18
= 76.66%
Grade: C (70-80%)
What to Measure:
Defect Detection Stage:
Before:
┌─────────────────────────────┐
│ Requirements │ 5% │
│ Implementation │ 10% │
│ Code Review │ 15% │
│ Testing (QA) │ 40% ← Most defects found late
│ Production │ 30% ← Expensive!
└─────────────────────────────┘
After Shift-Left:
┌─────────────────────────────┐
│ Requirements │ 20% ← Threat modeling
│ Implementation │ 45% ← TDD + pre-commit hooks
│ Code Review │ 25% ← Automated PR checks
│ Testing (QA) │ 8% ← Few escaped
│ Production │ 2% ← Rare
└─────────────────────────────┘
Success: 90% defects caught before QA
What to Measure:
{
"test_speed": {
"full_suite": "58s", // Target: <60s
"fast_feedback": "8s", // Target: <10s
"trend": "improving"
},
"flaky_tests": {
"count": 2, // Target: <1%
"percentage": 0.7,
"repeat_runners": 3 // Run 3x to catch flaky
},
"developer_satisfaction": {
"survey_score": 8.2, // 1-10 scale
"complaints_per_month": 3,
"test_tooling_rating": "Good"
}
}Survey Questions (Monthly):
1. How confident are you that tests catch bugs? (1-10)
2. How often do tests block your work? (Never/Rarely/Sometimes/Often)
3. How easy is it to write new tests? (1-10)
4. What's the biggest testing pain point?
5. What testing tool/practice would help most?
✅ Coverage increased from 2.3% to 40% ✅ All Tier 1 security modules >= 80% coverage ✅ CI/CD blocks merges on security failures ✅ Pre-commit hooks running smoothly ✅ Zero production bugs from untested code
✅ Coverage >= 70% overall ✅ Mutation testing score >= 80% ✅ Flaky test rate < 1% ✅ Test suite runs in < 60 seconds ✅ Developers report high satisfaction with testing (>8/10)
✅ Coverage >= 85% overall ✅ Zero security vulnerabilities in production ✅ 95% of bugs caught before code review ✅ Testing culture embedded (TDD by default) ✅ Continuous improvement process established
-
Implement Pre-Commit Hooks (This week)
- Install husky + lint-staged
- Configure test-first enforcement
- Document in CONTRIBUTING.md
-
Create Test Scaffolding (This week)
- Build
eos test scaffoldcommand - Generate tests for existing untested modules
- Train team on usage
- Build
-
Enable Branch Protection (This week)
- Configure GitHub branch protection
- Require status checks
- Enforce code reviews
-
Start TDD Practice (Next sprint)
- TDD workshop for team
- Pair programming sessions
- Celebrate early wins
-
Monitor and Adjust (Ongoing)
- Weekly coverage review
- Monthly retrospectives
- Quarterly strategy adjustment
Remember: Shift-left is a journey, not a destination. Start small, measure progress, and continuously improve. 🚀