Skip to content

Latest commit

 

History

History
845 lines (666 loc) · 21.4 KB

File metadata and controls

845 lines (666 loc) · 21.4 KB

SHIFT-LEFT TESTING STRATEGY

Preventing Quality Gaps Through Early Detection

Principle: Find and fix defects at the earliest possible stage in the development lifecycle.

Cost Multiplier: Every stage defects move right increases cost 10x

  • Requirements → Design: 10x cost increase
  • Design → Implementation: 10x cost increase
  • Implementation → Testing: 10x cost increase
  • Testing → Production: 10x cost increase

Goal: Catch 80% of defects before code review, 95% before merge


Part 1: Development Workflow Integration

Stage 1: Before Coding (Requirements Phase)

Activity: Test Planning & Threat Modeling

Tools:

# Generate test plan from module specification
eos test plan generate \
  --module oauth2-pkce-verifier \
  --spec docs/PKCE_SPEC.md \
  --output tests/test-plan-pkce.md

# Output includes:
# - Attack vectors to test
# - Security test cases (from OWASP ASVS)
# - Error scenarios
# - Edge cases
# - Estimated test count

Checklist:

  • Security requirements defined (ASVS level identified)
  • Attack vectors documented (STRIDE threat model)
  • Test acceptance criteria written (BDD style)
  • Test scaffolding generated
  • Coverage target set (e.g., 85% for auth modules)

Human-Centric Practice:

  • Collaborate with security expert to identify threats
  • Use visual threat modeling (draw.io, Miro)
  • Document "why" not just "what" to test

Stage 2: During Coding (Implementation Phase)

Activity: Test-Driven Development (TDD)

Red-Green-Refactor Cycle:

# 1. RED: Generate failing test from specification
eos test generate \
  --module oauth2-pkce-verifier \
  --function verifyPKCE \
  --test-type security \
  --template asvs-v4.0.3 \
  --output tests/unit/oauth2-pkce-verifier.test.js

# Opens editor with failing test:
describe('PKCEValidator.verifyPKCE', () => {
  it('should reject plain code challenge method per RFC 7636 §4.2', () => {
    // TODO: Implement test
    expect(true).toBe(false); // Fail first
  });
});

# 2. GREEN: Write minimum code to pass
# Developer implements verifyPKCE()

# 3. REFACTOR: Improve code while tests pass
eos test watch

Pre-Commit Hook (Automated):

#!/bin/bash
# .husky/pre-commit

echo "🔍 Running pre-commit validation..."

# 1. Run tests for changed files only
eos test run --changed --fast || {
  echo "❌ Tests failed for changed files"
  exit 1
}

# 2. Check coverage delta
eos test coverage --delta --threshold=+0% || {
  echo "❌ Coverage decreased (not allowed)"
  exit 1
}

# 3. Security linting
eos lint security --staged || {
  echo "❌ Security issues found"
  exit 1
}

# 4. Generate test if missing
eos test check-missing --auto-generate || {
  echo "⚠️  Missing tests detected, generated scaffolds"
  git add tests/
}

echo "✅ Pre-commit checks passed"

Benefits:

  • Tests written before code (design-first)
  • Impossible to forget tests
  • Coverage guaranteed

Stage 3: Code Review (Review Phase)

Activity: Automated Test Quality Review

PR Checklist Bot:

# GitHub Action triggered on PR open
eos test review \
  --pr ${{ github.event.pull_request.number }} \
  --checks all

# Automated checks:
# ✅ Tests added for new functions
# ✅ Tests cover attack vectors (STRIDE analysis)
# ✅ Error handling tested (null, timeout, etc.)
# ✅ Edge cases covered
# ❌ Cryptographic failures not tested → COMMENT

# Posts comment on PR:

Sample PR Comment:

## 🤖 Test Quality Review

### ✅ Tests Added
- `oauth2-pkce-verifier.test.js` (12 tests, 89% coverage)

### ⚠️ Missing Tests
1. **Error Handling**: Cryptographic failures not tested
   - `crypto.subtle.digest()` can throw
   - Add test: `it('should handle crypto failure gracefully')`

2. **Edge Cases**: Large input not tested
   - Code challenge can be up to 128 chars
   - Add test: `it('should handle maximum length challenge')`

3. **Security Scenarios**: Replay attack not tested
   - RFC 7636 requires one-time use
   - Add test: `it('should detect code_challenge reuse')`

### 📊 Coverage Delta
- Overall: 45.2% → 47.8% (+2.6%) ✅
- oauth2-pkce-verifier.js: 0% → 89% ✅

### 🎯 Recommendation
**APPROVE** after adding 3 missing tests above.

Human Review Focus:

  • Reviewer validates test scenarios, not just implementation
  • Pair review for critical security modules
  • Test readability and maintainability

Stage 4: CI/CD (Integration Phase)

Activity: Comprehensive Automated Testing

Pipeline Stages:

# .github/workflows/test.yml (generated by eos)

stages:
  - name: Fast Feedback (< 60s)
    jobs:
      - Lint (ESLint + Security plugins)
      - Unit tests (changed files only)
      - Type check

  - name: Comprehensive Testing (< 5min)
    jobs:
      - All unit tests
      - Integration tests
      - Coverage validation (thresholds enforced)

  - name: Security Gates (< 10min)
    jobs:
      - npm audit (blocking)
      - SAST (CodeQL)
      - Dependency scanning
      - SBOM generation

  - name: Quality Gates (< 5min)
    jobs:
      - Coverage >= 70% overall
      - Coverage >= 85% for auth/ modules
      - No flaky tests (3 consecutive runs)
      - Performance regression check

# Any failure blocks merge

Branch Protection (Enforced):

eos branch-protection configure \
  --branch main \
  --require-reviews 1 \
  --require-status-checks "Fast Feedback,Security Gates,Quality Gates" \
  --require-up-to-date \
  --enforce-admins

# Result: Zero defects reach main branch

Stage 5: Production (Monitoring Phase)

Activity: Runtime Testing & Observability

Synthetic Monitoring:

# Run real-world test scenarios against production
eos test synthetic \
  --environment production \
  --scenario oauth2-pkce-flow \
  --frequency 15min \
  --alert-on-failure

# Monitors:
# - End-to-end OAuth2 flows
# - PKCE validation accuracy
# - Performance (response time)
# - Error rates

Defect Feedback Loop:

# When production bug found:
eos test reproduce \
  --bug HERA-1234 \
  --create-test \
  --add-to-suite regression

# Automatically:
# 1. Creates failing test from bug report
# 2. Adds to regression test suite
# 3. Prevents recurrence

Part 2: Preventive Automation

Prevention 1: Test Scaffolding Generator

Problem: Developers forget to write tests or don't know where to start

Solution: Auto-generate test files when new code created

# Triggered when new file created
# Git hook: post-checkout, post-merge

if [ -f "modules/auth/new-module.js" ]; then
  eos test scaffold \
    --file modules/auth/new-module.js \
    --template security-module \
    --output tests/unit/new-module.test.js
fi

# Generated test file includes:
# - Import statements
# - Describe blocks for each exported function
# - TODO test cases from JSDoc comments
# - Standard error handling tests
# - Edge case templates

Example Output (tests/unit/new-module.test.js):

import { describe, it, expect, beforeEach } from 'vitest';
import { NewModule } from '../../modules/auth/new-module.js';

describe('NewModule', () => {
  let instance;

  beforeEach(() => {
    instance = new NewModule();
  });

  // Auto-generated from function signature
  describe('validateInput()', () => {
    it('should accept valid input', () => {
      // TODO: Implement
      expect(true).toBe(false);
    });

    it('should reject invalid input', () => {
      // TODO: Implement
      expect(true).toBe(false);
    });

    // Standard error tests
    it('should handle null input', () => {
      expect(() => instance.validateInput(null)).not.toThrow();
    });

    it('should handle undefined input', () => {
      expect(() => instance.validateInput(undefined)).not.toThrow();
    });

    it('should handle empty string', () => {
      const result = instance.validateInput('');
      expect(result.valid).toBe(false);
    });
  });
});

// TODO: Run 'eos test complete' to fill in tests

Prevention 2: Coverage Gap Detector

Problem: Coverage silently decreases over time

Solution: Daily coverage analysis with alerts

# Cron job: Daily at 9am
eos test coverage-analysis \
  --compare-to last-7-days \
  --alert-on-decrease \
  --notify slack://engineering-team

# Report:

Sample Report:

📊 Daily Coverage Report - 2024-11-07

🔴 Coverage Decreased:
- modules/auth/oauth2-analyzer.js: 78% → 72% (-6%)
  Cause: 3 new functions added without tests
  Assignee: @developer
  Action: Add tests by EOD

🟢 Coverage Increased:
- modules/auth/token-redactor.js: 85% → 92% (+7%)
  Good work: @another-developer

🎯 Overall: 45.2% (+0.3% from last week)
⚠️  Still 24.8% below target (70%)

📈 Trend: Positive (↑ 0.3%/week)
⏰ ETA to 70%: ~75 weeks at current rate
💡 Recommendation: Dedicate 2 devs for 1 sprint to close gap

Prevention 3: Mutation Testing

Problem: Tests exist but don't actually validate behavior

Solution: Automatically mutate code and verify tests catch mutations

# Run weekly (expensive operation)
eos test mutation \
  --files "modules/auth/*.js" \
  --threshold 80 \
  --report html

# Stryker Mutator or similar:
# - Changes operators: === to !==
# - Removes conditionals: if (x) → if (true)
# - Changes constants: 128 → 127
# - Removes function calls
#
# Good tests should fail when code mutated

Example Findings:

🧬 Mutation Testing Report

✅ 45 mutants killed (90%)
❌ 5 mutants survived (10%)

Survived Mutants:

1. oauth2-pkce-verifier.js:45
   - Mutant: Changed 128 to 127 (entropy threshold)
   - Tests passed (should have failed!)
   - Issue: No test validates exact threshold
   - Fix: Add test: expect(entropy(127)).toBe(false)

2. csrf-verifier.js:89
   - Mutant: Removed state replay check
   - Tests passed (should have failed!)
   - Issue: Replay attack not tested
   - Fix: Add replay attack test

Action: Add 5 tests to kill surviving mutants

Prevention 4: Attack Simulation

Problem: Tests validate happy path, not attack vectors

Solution: Automatically generate attack scenario tests

# Using STRIDE threat model
eos test generate-attacks \
  --module oauth2-pkce-verifier \
  --threat-model STRIDE \
  --output tests/security/pkce-attacks.test.js

# Generated attack tests:

Example Output:

describe('PKCEValidator - Attack Scenarios', () => {
  // Spoofing
  it('should reject authorization with spoofed code_challenge', () => {
    // Attacker provides valid format but spoofed challenge
  });

  // Tampering
  it('should detect code_challenge parameter modification', () => {
    // Attacker intercepts and modifies challenge mid-flight
  });

  // Repudiation
  it('should log PKCE validation attempts', () => {
    // Verify evidence collection for audit trail
  });

  // Information Disclosure
  it('should not leak code_verifier in error messages', () => {
    // Error messages shouldn't expose secret verifier
  });

  // Denial of Service
  it('should handle extremely long code_challenge', () => {
    const longChallenge = 'A'.repeat(1000000);
    expect(() => validate(longChallenge)).not.toThrow();
  });

  // Elevation of Privilege
  it('should reject authorization code reuse', () => {
    // Using same code twice should fail
  });
});

Prevention 5: Contract Testing

Problem: Integration breaks when dependencies change

Solution: Consumer-driven contract tests

# Define expected behavior of Chrome APIs
eos test contract define \
  --provider chrome.storage.local \
  --consumer evidence-collector \
  --output tests/contracts/chrome-storage.contract.js

# Contract enforced in tests:

Example Contract:

describe('Contract: chrome.storage.local', () => {
  it('should respect quota limits', async () => {
    // Contract: storage.local has 10MB quota
    const largeData = { data: 'x'.repeat(11 * 1024 * 1024) }; // 11MB

    await expect(
      chrome.storage.local.set(largeData)
    ).rejects.toThrow('QUOTA_BYTES_PER_ITEM');
  });

  it('should maintain data consistency', async () => {
    // Contract: set() followed by get() returns same data
    const data = { key: 'value' };
    await chrome.storage.local.set(data);
    const retrieved = await chrome.storage.local.get('key');
    expect(retrieved.key).toBe('value');
  });
});

// If Chrome API changes and breaks contract, tests fail immediately

Part 3: Cultural Practices

Practice 1: Test-First Mindset

Principle: No production code without a failing test first

Implementation:

# Git hook enforces test-first
# .husky/pre-push

# Check if new code added without tests
eos test enforce-test-first \
  --commits-since origin/main \
  --strict || {
  echo "❌ New code without tests detected"
  echo "Fix: Write tests first, then push"
  exit 1
}

Team Agreement:

## Test-First Commit Pledge

I agree to:
1. Write failing test before implementing feature
2. Commit test file first, then implementation
3. Never push untested code to remote
4. Pair program when unsure how to test

Signed: [Developer Name]
Date: [Date]

Practice 2: Test Visibility

Principle: Make testing progress visible to entire team

Dashboard (web UI):

┌─────────────────────────────────────────────┐
│  Hera Test Dashboard - Live                 │
├─────────────────────────────────────────────┤
│  Coverage: ████████░░░░░░ 45.2% (Target: 70%)│
│  Tests: 284 passing ✅ | 0 failing           │
│  Speed: 12.3s ⚡ (Target: <60s)              │
│  Flaky: 2 🔥 (Investigate)                  │
├─────────────────────────────────────────────┤
│  Module Status:                              │
│  ✅ jwt-validator.js      95% ┃████████████│
│  ✅ oidc-validator.js     95% ┃████████████│
│  ❌ oauth2-pkce-verifier  0%  ┃░░░░░░░░░░░░│
│  ❌ csrf-verifier         0%  ┃░░░░░░░░░░░░│
├─────────────────────────────────────────────┤
│  Recent Activity:                            │
│  @dev1 added 12 tests to token-redactor ⬆️   │
│  @dev2 fixed flaky test in evidence-coll 🔧 │
│  CI passed for PR #45 ✅                     │
└─────────────────────────────────────────────┘

Slack Integration:

Daily Standup Bot:
🌅 Good morning! Test status:

📊 Yesterday:
- 12 tests added
- Coverage: +2.3%
- 0 new flaky tests

🎯 Today's Focus:
- @dev1: Add PKCE tests (12 tests)
- @dev2: Fix flaky crypto test
- @dev3: Integration test for OAuth2 flow

🏆 Testing Champion: @dev1 (42 tests this week!)

Practice 3: Test Pairing

Principle: Pair programming for critical security tests

Process:

1. Schedule 2-hour pairing session
2. Roles:
   - Driver: Types the code
   - Navigator: Reviews, suggests, researches

3. Rotation: Switch every 20 minutes

4. Output: High-quality tests with knowledge transfer

Pairing Matrix:

| Security Expert | Test Expert | Result                    |
|-----------------|-------------|---------------------------|
| @security-lead  | @dev1       | PKCE validator tests      |
| @security-lead  | @dev2       | Session security tests    |
| @test-lead      | @dev3       | Error handling framework  |

Practice 4: Test Retrospectives

Principle: Learn from testing failures and successes

Monthly Retro Agenda:

1. Review Test Metrics (15 min)
   - Coverage trend
   - Flaky test count
   - Test speed
   - Bugs found by tests vs. production

2. Celebrate Wins (10 min)
   - "Test of the month" award
   - Coverage milestones reached
   - Zero production bugs this month

3. Identify Challenges (15 min)
   - What's hard to test?
   - What tests are we avoiding?
   - Where do bugs still escape?

4. Action Items (15 min)
   - Improve test tooling
   - Training needs
   - Process tweaks

5. Test Kata (15 min)
   - Live coding: solve test challenge together
   - Learn new testing technique

Example Action Items:

From Nov 2024 Retro:

✅ ACTION: Create mutation testing suite
   Owner: @test-lead
   Due: 2024-11-30

✅ ACTION: Add crypto failure tests to all validators
   Owner: @security-lead + @dev1
   Due: 2024-11-15

✅ ACTION: Set up test dashboard (Grafana)
   Owner: @devops-lead
   Due: 2024-11-20

Part 4: Metrics & Monitoring

Metric 1: Test Coverage Trends

What to Measure:

{
  "overall_coverage": {
    "current": 45.2,
    "target": 70,
    "delta_week": +2.3,
    "delta_month": +8.7,
    "trend": "positive"
  },
  "module_coverage": {
    "auth": 78.5,     // High-risk modules
    "security": 0,     // CRITICAL GAP
    "detection": 23.1,
    "utils": 62.3
  },
  "eta_to_target": {
    "weeks": 75,
    "confidence": "low"  // Current pace too slow
  }
}

Visualization:

Coverage Trend (Last 90 Days)
70% ┃                        ┌─ Target
    ┃                        │
60% ┃                        │
    ┃                    ╱───│
50% ┃                ╱───    │
    ┃            ╱───        │
40% ┃        ╱───            │
    ┃    ╱───                │
30% ┃╱───                    │
    └─────────────────────────────
    Aug  Sep  Oct  Nov  Dec  Jan

Metric 2: Test Quality Score

Formula:

Quality Score = (
  Coverage * 0.3 +
  MutationScore * 0.3 +
  (100 - FlakyRate) * 0.2 +
  (100 - BugEscapeRate) * 0.2
)

Example:
Coverage:       45.2%
MutationScore:  85%
FlakyRate:      2%
BugEscapeRate:  10% (10% of bugs found in prod)

Quality = (45.2 * 0.3) + (85 * 0.3) + (98 * 0.2) + (90 * 0.2)
        = 13.56 + 25.5 + 19.6 + 18
        = 76.66%

Grade: C (70-80%)

Metric 3: Shift-Left Progress

What to Measure:

Defect Detection Stage:

Before:
┌─────────────────────────────┐
│ Requirements    │ 5%        │
│ Implementation  │ 10%       │
│ Code Review     │ 15%       │
│ Testing (QA)    │ 40%  ← Most defects found late
│ Production      │ 30%  ← Expensive!
└─────────────────────────────┘

After Shift-Left:
┌─────────────────────────────┐
│ Requirements    │ 20%  ← Threat modeling
│ Implementation  │ 45%  ← TDD + pre-commit hooks
│ Code Review     │ 25%  ← Automated PR checks
│ Testing (QA)    │ 8%   ← Few escaped
│ Production      │ 2%   ← Rare
└─────────────────────────────┘

Success: 90% defects caught before QA

Metric 4: Developer Experience

What to Measure:

{
  "test_speed": {
    "full_suite": "58s",   // Target: <60s
    "fast_feedback": "8s",  // Target: <10s
    "trend": "improving"
  },
  "flaky_tests": {
    "count": 2,            // Target: <1%
    "percentage": 0.7,
    "repeat_runners": 3    // Run 3x to catch flaky
  },
  "developer_satisfaction": {
    "survey_score": 8.2,   // 1-10 scale
    "complaints_per_month": 3,
    "test_tooling_rating": "Good"
  }
}

Survey Questions (Monthly):

1. How confident are you that tests catch bugs? (1-10)
2. How often do tests block your work? (Never/Rarely/Sometimes/Often)
3. How easy is it to write new tests? (1-10)
4. What's the biggest testing pain point?
5. What testing tool/practice would help most?

Part 5: Success Criteria

Short-Term Success (Month 1)

✅ Coverage increased from 2.3% to 40% ✅ All Tier 1 security modules >= 80% coverage ✅ CI/CD blocks merges on security failures ✅ Pre-commit hooks running smoothly ✅ Zero production bugs from untested code

Medium-Term Success (Months 2-3)

✅ Coverage >= 70% overall ✅ Mutation testing score >= 80% ✅ Flaky test rate < 1% ✅ Test suite runs in < 60 seconds ✅ Developers report high satisfaction with testing (>8/10)

Long-Term Success (Months 4-6)

✅ Coverage >= 85% overall ✅ Zero security vulnerabilities in production ✅ 95% of bugs caught before code review ✅ Testing culture embedded (TDD by default) ✅ Continuous improvement process established


Next Steps

  1. Implement Pre-Commit Hooks (This week)

    • Install husky + lint-staged
    • Configure test-first enforcement
    • Document in CONTRIBUTING.md
  2. Create Test Scaffolding (This week)

    • Build eos test scaffold command
    • Generate tests for existing untested modules
    • Train team on usage
  3. Enable Branch Protection (This week)

    • Configure GitHub branch protection
    • Require status checks
    • Enforce code reviews
  4. Start TDD Practice (Next sprint)

    • TDD workshop for team
    • Pair programming sessions
    • Celebrate early wins
  5. Monitor and Adjust (Ongoing)

    • Weekly coverage review
    • Monthly retrospectives
    • Quarterly strategy adjustment

Remember: Shift-left is a journey, not a destination. Start small, measure progress, and continuously improve. 🚀