[FEATURE] Runbook-aware reasoning — ingest runbooks and surface remediation steps

## Problem statement

The agent diagnoses root causes but never produces actionable remediation steps. The `remediation_steps` field exists throughout the stack — in state, report context, and the Slack/terminal report formatters — but every code path in the diagnosis node hardcodes it to an empty list. Users see a root cause but no "what to do next."

Beyond that, the investigation planner has no awareness of team-specific runbooks. If an org's runbook says "when latency spikes, check DB connection pool before checking deployments," the planner can't use that signal — it relies solely on alert keywords and source detection.

The `SREGuidanceTool` knowledge base covers generic Google SRE book pipeline patterns but not web service, Kubernetes, or org-specific failure patterns.

## Proposed solution

Two-phase implementation:

**Phase 1 — Populate remediation steps (smaller, standalone):**
- Add an explicit instruction for the LLM to return ordered remediation steps during diagnosis.
- Replace the hardcoded empty `remediation_steps` list with the parsed LLM output.
- Update synthetic scoring rubrics to assert non-empty `remediation_steps`.

**Phase 2 — Runbook ingestion and retrieval at planning time:**
- Add a local runbook store mapping service tags and trigger keywords to runbook content.
- Add a CLI command to register a runbook from a file.
- At planning time, retrieve the top matching runbook excerpt for the current alert and inject it into the planning prompt.
- Use runbook context during diagnosis so remediation steps are runbook-backed, not generic.

Phase 1 is a self-contained improvement that can ship independently of the runbook storage work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Runbook-aware reasoning — ingest runbooks and surface remediation steps #1073

Problem statement

Proposed solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE] Runbook-aware reasoning — ingest runbooks and surface remediation steps #1073

Description

Problem statement

Proposed solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions