Skip to content

Latest commit

 

History

History
248 lines (198 loc) · 8.44 KB

File metadata and controls

248 lines (198 loc) · 8.44 KB

SLO Agent Feature - PR Summary

Overview

This PR implements a complete SLO/Observability Agent for greybeard that analyzes code and recommends appropriate Service Level Objective (SLO) targets based on service type, architecture, and code patterns.

What's Included

1. Core Implementation: greybeard/agents/slo_agent.py (18 KB)

  • SLOAgent class: Main analysis engine
  • ServiceType enum: Classification (SaaS, batch, critical-infra, background-jobs)
  • SLOTarget dataclass: Individual target representation with rationale
  • SLORecommendation dataclass: Full recommendation with targets, signals, confidence

Key Methods:

  • analyze() - Main entry point for analysis
  • _detect_service_type() - Auto-detect from code patterns
  • _analyze_code() - Pattern detection (8 reliability indicators)
  • _analyze_repo() - Repository context (tests, Docker, K8s, monitoring)
  • _targets_*() - Generate SLO targets per service type
  • _generate_notes() - Contextual recommendations

2. CLI Integration: greybeard/cli_slo.py (5.2 KB)

  • slo-check command: Full CLI interface with options
  • Output formats: JSON, Markdown, Table (ASCII)
  • Context flags: Support for service-type, service-name, and custom metadata
  • Input sources: stdin, file, or repository

Usage Examples:

git diff main | greybeard slo-check
greybeard slo-check --context "service-type:saas"
greybeard slo-check --repo /path/to/api --output json
cat service.py | greybeard slo-check --context "service-type:batch"

3. CLI Module Update: greybeard/cli.py

  • Imported SLOAgent and slo_check command
  • Registered slo-check with CLI group via cli.add_command()

4. Agents Module Update: greybeard/agents/__init__.py

  • Exported SLOAgent, SLORecommendation, SLOTarget, ServiceType for public API

5. Content Packs: greybeard/packs/slo-patterns/ (4 YAML files)

Domain-specific guidance packs for each service type:

  • saas.yaml - User-facing SaaS services perspective
  • batch.yaml - Batch job and scheduler guidance
  • critical-infra.yaml - Platform/gateway/auth services
  • background-jobs.yaml - Async workers and task queues

Each pack includes:

  • Detailed perspective and tone
  • Focus areas for that service type
  • SLO-relevant heuristics (5-10 questions each)
  • Example questions to explore
  • Communication style guidelines

6. Comprehensive Test Suite: tests/test_slo_agent.py (37 tests)

Test Coverage by Category:

  • Basic Functionality (5 tests) - Initialization, analysis, serialization
  • Service Type Detection (5 tests) - SaaS, batch, critical-infra, background-jobs, explicit override
  • SLO Target Generation (4 tests) - Proper targets for each service type
  • Code Analysis (8 tests) - Database calls, HTTP, caching, retries, error handling, async, monitoring, timeouts
  • Repository Analysis (4 tests) - Tests, Docker, Kubernetes, file counting
  • Context Integration (3 tests) - Service name, explicit type, multiple flags
  • Recommendations (4 tests) - Notes for missing patterns, well-formed code
  • Confidence Scoring (2 tests) - Range validation, type-specific scoring
  • CLI Integration (1 test) - Command registration
  • Serialization (1 test) - JSON round-trip

Test Results: ✅ 37/37 passing (100%)
Coverage: 93.42% on agent code

7. Documentation: SLO_AGENT.md

Comprehensive guide including:

  • Feature overview and use cases
  • Service type explanations with typical targets
  • CLI usage examples for all formats
  • Code pattern detection reference table
  • SLO targets table for each service type
  • Output examples (JSON, markdown, table)
  • Recommendations and guidance notes
  • Integration with greybeard packs
  • Python API usage examples
  • Testing documentation
  • Architecture overview
  • Future enhancement ideas

Key Features

1. Code Pattern Detection

Analyzes code for 8 reliability-relevant patterns:

  • Database calls → Latency, availability concerns
  • HTTP requests → Timeouts, retries needed
  • Caching usage → Optimization level
  • Retry logic → Failure resilience
  • Error handling → Exception safety
  • Async/await → Concurrency model
  • Monitoring → Observability level
  • Timeouts → Failure isolation

2. Service Type Classification

  • SaaS: 99.9% availability, <200ms p99 latency, <0.1% error rate
  • Batch: 95% availability, <1 hour duration, <5% error rate
  • Critical Infrastructure: 99.95% availability, <50ms p99 latency, <0.01% error rate
  • Background Jobs: 98% availability, <5 min latency, <1% error rate

3. Contextual Recommendations

Generates specific notes for missing patterns:

  • ⚠️ Error handling detection
  • 📊 Monitoring/logging detection
  • ⏱️ Timeout coverage on external calls
  • 💾 Caching for database calls
  • 🔄 Retry logic for HTTP calls
  • 🔁 Dead letter queues for background jobs

4. Confidence Scoring

Confidence varies by service type (0.5-0.8) based on detection certainty

5. Multiple Output Formats

  • JSON: For programmatic integration
  • Markdown: For documentation/sharing
  • Table: Human-friendly ASCII table

Code Quality

Testing

  • ✅ 37 tests, 100% passing
  • ✅ 93.42% code coverage on agent
  • ✅ All edge cases covered

Linting

  • ✅ ruff clean (no violations)
  • ✅ Proper line lengths, imports, style
  • ✅ Full type hints

Documentation

  • ✅ Module docstrings
  • ✅ Class docstrings
  • ✅ Method docstrings
  • ✅ Comprehensive guide (SLO_AGENT.md)

Usage Examples

Simple Analysis

$ cat api.py | greybeard slo-check

With Service Type

$ greybeard slo-check --context "service-type:critical-infra"

With Repository Context

$ greybeard slo-check --repo /path/to/api --context "criticality:high"

JSON Output for Integration

$ git diff main | greybeard slo-check --output json > slos.json

Python API

from greybeard.agents import SLOAgent

agent = SLOAgent()
rec = agent.analyze(
    code_snippet="@app.get('/api/users') def get_users(): ...",
    service_type="saas"
)
print(f"Service type: {rec.service_type.value}")
for target in rec.targets:
    print(f"  {target.metric}: {target.target}")

Branch & Commits

  • Branch: feat/slo-agent
  • Created from: main
  • Commit: Single clean commit with all files

Files Modified/Created

Created:
  greybeard/agents/slo_agent.py                    (18 KB, 152 statements)
  greybeard/cli_slo.py                             (5.2 KB, 86 statements)
  greybeard/packs/slo-patterns/saas.yaml
  greybeard/packs/slo-patterns/batch.yaml
  greybeard/packs/slo-patterns/critical-infra.yaml
  greybeard/packs/slo-patterns/background-jobs.yaml
  tests/test_slo_agent.py                          (37 tests, 100% passing)
  SLO_AGENT.md                                     (comprehensive documentation)

Modified:
  greybeard/agents/__init__.py                     (exports SLOAgent classes)
  greybeard/cli.py                                 (integrated slo-check command)

Testing Instructions

cd /home/node/.openclaw/workspace/greybeard

# Activate venv
. .venv/bin/activate

# Run all SLO tests
pytest tests/test_slo_agent.py -v

# Run with coverage
pytest tests/test_slo_agent.py --cov=greybeard.agents.slo_agent

# Try CLI
echo "requests.get(url)" | python -m greybeard.cli slo-check --context "service-type:saas"

# JSON output
echo "@app.get('/api/users')\ndef get_users(): pass" | python -m greybeard.cli slo-check --output json

Design Decisions

  1. Separate CLI Module: cli_slo.py keeps SLO-specific CLI logic separate and maintainable
  2. Dataclasses: Used for immutable recommendation objects with .to_dict() for serialization
  3. Enum for ServiceType: Type-safe classification instead of strings
  4. Regex Patterns: Simple, effective pattern detection without AST parsing (good for diffs)
  5. Content Packs: Follow existing greybeard architecture for consistency
  6. Confidence Scoring: Provides transparency about recommendation certainty

Future Enhancement Opportunities

  • ML-based SLO prediction from historical data
  • Cost analysis for different SLO levels
  • Multi-service orchestration and dependency analysis
  • Error budget tracking and alerting
  • Integration with monitoring systems (Datadog, Prometheus)
  • Custom SLO patterns for specific industries

Status: Ready for review and merge ✅
Test Coverage: 93.42% on agent code
Linting: Passing (ruff)
Documentation: Complete (SLO_AGENT.md, docstrings, inline comments)