Skip to content

feat: rust-observability skill -- tracing, OpenTelemetry, Prometheus for Rust services #69

@AlexMikhalev

Description

@AlexMikhalev

Parent Epic: #62

What

New skill: rust-observability covering production monitoring and observability for Rust services.

Skill Content

  1. Structured Tracing -- tracing crate setup, span context, #[instrument] attribute, field conventions
  2. OpenTelemetry Integration -- tracing-opentelemetry subscriber, span export to Jaeger/OTLP, context propagation across HTTP boundaries
  3. Prometheus Metrics -- metrics crate with metrics-exporter-prometheus, histogram/counter/gauge patterns, endpoint setup
  4. Per-Request Context -- Request ID propagation, correlation IDs in traces and logs, HTTP header extraction
  5. Error Reporting -- Structured error spans, error rate metrics, alerting thresholds
  6. Log Filtering -- EnvFilter configuration, per-crate log levels, production vs development configs

Source Reference

Section 13 (Production Monitoring and Observability) from RUST_SYSTEM_PROGRAMMING_BEST_PRACTICES.md

Disciplined Engineering Alignment

Phase Skill Observability Relevance
Research disciplined-research Identify observability gaps in current service; map existing log/metric/trace coverage; document SLI/SLO requirements
Design disciplined-design Specify tracing span hierarchy; define metric names/labels/buckets; design log level strategy per environment
Implementation disciplined-implementation Add tracing subscriber first, then instrument hot paths, then metrics, then OTel export -- each as separate step
Verification disciplined-verification Verify spans appear in collector; confirm metric cardinality within bounds; test log filtering at each level
Validation disciplined-validation Validate observability under production load; confirm alerts fire correctly; stakeholder sign-off on dashboard

Key integration point: The devops skill should reference rust-observability for Rust service deployment monitoring requirements.

The SKILL.md should include:

  • A "Disciplined Observability Checklist" mapping each V-model phase to specific observability tasks
  • Integration examples for axum middleware (tracing + metrics)

Why a New Skill (Not Extension)

Observability is a distinct cross-cutting concern for any production Rust service. It's not specific to development idioms or performance optimization.

Relevance to Terraphim

terraphim-server runs as a persistent service. Structured tracing and metrics are essential for production operation and debugging.

Acceptance Criteria

  • skills/rust-observability/SKILL.md created with full content
  • Includes axum middleware examples for tracing and metrics
  • Includes Cargo.toml dependency snippets
  • Covers both development (pretty logs) and production (JSON/OTLP) configurations
  • Disciplined observability checklist included
  • Cross-references devops skill

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions