feat(runbooks): local runbook store with runbook-aware diagnosis grounding (#1073 phase 2a) by devankitjuneja · Pull Request #2029 · Tracer-Cloud/opensre

devankitjuneja · 2026-05-14T18:08:26Z

Relates to #1073

Describe the changes you have made in this PR -

Adds a local markdown runbook store and runbook-aware reasoning to the investigation pipeline (Phase 2a).

What's included:

`app/runbooks/` — disk-backed store with YAML frontmatter parsing and deterministic top-1 retrieval (service match +2, keyword overlap +1 per trigger)
`app/pipeline/pipeline.py` — `_retrieve_runbook()` runs between `extract_alert` and the ReAct loop, writes `matched_runbook` to state
`app/agent/prompt.py` — matched runbook body appended to `format_alert_context()` so the LLM grounds remediation steps in team playbooks
`app/delivery/publish_findings/` — `runbook_provenance` on `ReportContext`; renders `Source: runbooks/.md` below Recommended Actions
`opensre runbook add|list|remove` CLI + `/runbook` REPL parity
46 new tests; `docs/runbooks.mdx`

Runbook format:

---
service: payments-api
triggers:
  - oom
  - memory
---

Demo/Screenshot for feature changes and bug fixes -

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

Yes, I used AI assistance (continue below)

If you used AI assistance:

I have reviewed every single line of the AI-generated code
I can explain the purpose and logic of each function/component I added
I have tested edge cases and understand how the code handles them
I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

Retrieval is deterministic (no LLM, no vector DB) — score = service name match + keyword overlap against `triggers:` frontmatter. Top-1 result is written to state before the ReAct loop so `format_alert_context()` can append the runbook body to the user message. Injection is in the alert context (not system prompt) because `build_system_prompt` is stateless. Template fallback from Phase 1 stays active when no runbook matches.

Checklist before requesting a review

I have added proper PR title and linked to the issue
I have performed a self-review of my code
I can explain the purpose of every function, class, and logic block I added
I understand why my changes work and have tested them thoroughly
I have considered potential edge cases and how my code handles them
If it is a core feature, I have added thorough tests
My code follows the project's style guidelines and conventions

github-actions · 2026-05-14T18:08:37Z

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

greptile-apps · 2026-05-14T18:12:54Z

Greptile Summary

This PR adds a local markdown runbook store and runbook-aware reasoning to the investigation pipeline (Phase 2a of #1073). All previously flagged issues from earlier review rounds have been addressed in this revision.

app/runbooks/: New disk-backed store (store.py) with YAML frontmatter parsing, slug validation enforced symmetrically in both save() and remove(), and a deterministic top-1 retrieval engine (retrieval.py) that scores by service match (+2) and multi-word trigger overlap (+1 per trigger using all()-token matching).
app/pipeline/pipeline.py: _retrieve_runbook() inserted between extract_alert and the ReAct loop, using len(w) >= 3 (consistent with docs and tests), with commonLabels: null handled via or {}, wrapped in a broad except so a broken runbook store never blocks an investigation.
Delivery + prompt layers: Matched runbook body injected into format_alert_context() for LLM grounding; runbook_provenance written to ReportContext and rendered as a _Source: runbooks/<slug>.md_ line in Slack messages and a Block Kit context block.

Confidence Score: 5/5

Safe to merge — all previously identified defects are resolved and the new runbook path is wrapped to never block an investigation.

Every previously flagged issue has been corrected: the len(w) >= 3 keyword filter is consistent across pipeline, test suite, and docs; multi-word triggers are matched with the walrus-operator all() approach; commonLabels: null is handled by or {}; save() now validates the slug before copying; and the CLI remove command converts both ValueError and RunbookValidationError to clean ClickExceptions. The new code is defensive, test coverage is thorough (46 new tests including a synthetic scenario), and no new defects were found.

No files require special attention.

Important Files Changed

Filename	Overview
app/runbooks/store.py	New disk-backed runbook store with YAML frontmatter parsing, slug validation, save/remove/load_all APIs. All previously flagged issues addressed: save() now validates slug before copying, remove() catches invalid slugs in CLI layer.
app/runbooks/retrieval.py	Deterministic top-1 scoring engine. Multi-word trigger matching now uses walrus operator + all() to split trigger tokens and require all parts to appear in keyword_set — previously flagged silent-zero bug is resolved.
app/pipeline/pipeline.py	_retrieve_runbook() inserted between extract_alert and ReAct loop. Uses len(w) >= 3 (matching docs/tests), handles commonLabels: null via `or {}`, wraps in broad except to never block investigation.
app/agent/prompt.py	Adds _build_runbook_section() that appends matched runbook body (truncated at 2000 chars at a newline boundary) to format_alert_context(). Slug validated to [\w-]+ so no injection risk.
app/cli/commands/runbook.py	New CLI group with add/list/remove subcommands. remove() now catches both ValueError and RunbookValidationError and converts them to clean ClickException — previously flagged traceback issue is resolved.
app/delivery/publish_findings/report_context.py	Adds runbook_provenance field to ReportContext TypedDict and build_report_context(). Only populated when matched_runbook is a dict with a non-empty slug.
app/delivery/publish_findings/formatters/report.py	Appends Source: runbooks/.md as both a text line in format_slack_message and a Block Kit context block in build_slack_blocks. Correctly guarded with None-check on runbook_provenance and slug.
tests/synthetic/runbooks/test_runbook_suite.py	Fixture-driven synthetic suite. Now uses len(w) >= 3 consistently with production — previously flagged > 3 mismatch is fixed.

Sequence Diagram

sequenceDiagram
    participant P as pipeline.py
    participant RS as runbooks/store.py
    participant RR as runbooks/retrieval.py
    participant PR as prompt.py
    participant RC as report_context.py
    participant RF as report.py (Slack)

    P->>RS: load_all() → list[Runbook]
    P->>RR: retrieve_matching_runbook(runbooks, keywords, service, pipeline_name)
    RR-->>P: "Runbook | None"
    P->>P: matched.to_dict() → dict
    Note over P: _merge(state, {matched_runbook: dict})

    P->>PR: format_alert_context(state)
    PR->>PR: _build_runbook_section(state[matched_runbook])
    PR-->>P: alert context + runbook block

    P->>RC: build_report_context(state)
    RC->>RC: extract runbook_provenance from matched_runbook
    RC-->>P: ReportContext with runbook_provenance

    P->>RF: format_slack_message(ctx) / build_slack_blocks(ctx)
    RF->>RF: "append _Source: runbooks/<slug>.md_ line/context block"
    RF-->>P: Slack message with runbook provenance

_{Reviews (9): Last reviewed commit: "fix(runbooks): handle null commonLabels ..." | Re-trigger Greptile}

devankitjuneja · 2026-05-14T18:28:19Z

@greptile review

devankitjuneja · 2026-05-14T18:38:10Z

@greptile review

…phase 2a)

…lic RUNBOOK_DIR, fix docs

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

devankitjuneja · 2026-05-15T08:56:35Z

@greptile review

…d double parse in save()

devankitjuneja · 2026-05-15T09:47:05Z

@greptile review

VibhorGautam

this is a good shape overall - store, CLI, prompt section, and report provenance all tied together

two prompt-section nits:

_build_runbook_section cuts the body at 2000 chars, so it can split a sentence or code block. id look for the last newline before the cutoff and add a truncated marker so the agent doesnt treat a partial runbook as complete

_build_runbook_section falls back to unknown for slug, but report_context.py uses an empty string in provenance. small thing, but the report could look weird if a runbook is missing a slug

i couldnt tell from this diff where _retrieve_runbook decides which runbook matches. is that already in main, or coming in another PR? thats the part id want to sanity-check since the prompt asks the agent to prefer runbook actions

tmp_path + monkeypatch coverage looks right

devankitjuneja · 2026-05-15T10:12:33Z

this is a good shape overall - store, CLI, prompt section, and report provenance all tied together

two prompt-section nits:

_build_runbook_section cuts the body at 2000 chars, so it can split a sentence or code block. id look for the last newline before the cutoff and add a truncated marker so the agent doesnt treat a partial runbook as complete

_build_runbook_section falls back to unknown for slug, but report_context.py uses an empty string in provenance. small thing, but the report could look weird if a runbook is missing a slug

i couldnt tell from this diff where _retrieve_runbook decides which runbook matches. is that already in main, or coming in another PR? thats the part id want to sanity-check since the prompt asks the agent to prefer runbook actions

tmp_path + monkeypatch coverage looks right

Hi @VibhorGautam
Thanks for poiting out these issues. This is a good review :)

Both points are valid and will be accounted in the next commit
_retrieve_runbook is in app/pipeline/pipeline.py lines 197–223, already in this PR.

…ard missing slug in prompt

VibhorGautam · 2026-05-15T10:15:17Z

ah missed that, thanks for the pointer - looks like the matching logic is solid then

devankitjuneja · 2026-05-15T10:28:12Z

@greptile review

…nstead of traceback

devankitjuneja · 2026-05-15T10:49:16Z

@greptile review

VibhorGautam · 2026-05-15T11:01:45Z

good fix on the ValueError catch, clean and minimal

greptile's right about the commonLabels null case too. .get("commonLabels", {}) only falls back when the key is missing, not when the alert source sends it as explicit null. or {} covers both paths

worth scanning the rest of _retrieve_runbook for similar .get(..., {}) usage on external json fields while you're in there, same footgun anywhere the upstream payload can send explicit nulls

…word matches

devankitjuneja · 2026-05-15T11:33:05Z

@greptile review

muddlebee · 2026-05-15T14:27:48Z

hey @devankitjuneja

thank you for the PR, we will need some sort of confirmation from @VaibhavUpreti and @davincios before going with the merge and approval and reviews.

devankitjuneja · 2026-05-15T14:38:15Z

hey @devankitjuneja

thank you for the PR, we will need some sort of confirmation from @VaibhavUpreti and @davincios before going with the merge and approval and reviews.

Sure :)

devankitjuneja · 2026-05-20T06:55:39Z

Hi @VaibhavUpreti
Need your inputs on this.

VaibhavUpreti · 2026-05-28T11:29:11Z

+@runbook.command("add")
+@click.argument("path", type=click.Path(exists=True, dir_okay=False, path_type=Path))
+def runbook_add(path: Path) -> None:
+    """Copy a markdown runbook into ~/.config/opensre/runbooks/."""


Let's update all the runbook DIR to .opensre/runbooks

VaibhavUpreti

@devankitjuneja great work so far, could you please add a demo video of a grafana alert by running opensre investigate -i <file_path>, before and after you added the runbook.

After loading integration the first step should be to load the runbook in the planning step.

devankitjuneja · 2026-05-28T11:38:01Z

@devankitjuneja great work so far, could you please add a demo video of a grafana alert by running opensre investigate -i <file_path>, before and after you added the runbook.

After loading integration the first step should be to load the runbook in the planning step.

Sure @VaibhavUpreti

…~/.opensre

github-advanced-security AI found potential problems May 14, 2026

View reviewed changes

Comment thread tests/runbooks/test_store.py Fixed

Comment thread tests/runbooks/test_store.py Fixed

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

Comment thread app/pipeline/pipeline.py Outdated

Comment thread app/cli/commands/runbook.py

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

Comment thread tests/synthetic/runbooks/test_runbook_suite.py Outdated

devankitjuneja mentioned this pull request May 14, 2026

[FEATURE] Runbook-aware reasoning — ingest runbooks and surface remediation steps #1073

Open

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

Comment thread app/runbooks/retrieval.py Outdated

devankitjuneja marked this pull request as draft May 14, 2026 18:48

devankitjuneja marked this pull request as ready for review May 15, 2026 08:51

Ankit Juneja and others added 6 commits May 15, 2026 14:24

feat(runbooks): runbook-aware diagnosis grounding (Tracer-Cloud#1073 …

7163588

…phase 2a)

feat(runbooks): port phase 2a to ReAct agent architecture

6e0474a

fix(runbooks): drop banned docstring ref, fix keyword filter, use pub…

14d9508

…lic RUNBOOK_DIR, fix docs

fix(tests): avoid assert side-effects flagged by CodeQL

e55f5e6

Update tests/synthetic/runbooks/test_runbook_suite.py

21ed076

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

fix(runbooks): match multi-word triggers by checking all tokens present

6ab2a18

devankitjuneja force-pushed the feature/1073-phase-2a-runbook-store branch from cf6595d to 6ab2a18 Compare May 15, 2026 08:54

fix(runbooks): sanitize slug in remove(), fix frontmatter regex, avoi…

c7f3d8e

…d double parse in save()

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

Comment thread app/runbooks/store.py

VibhorGautam reviewed May 15, 2026

View reviewed changes

fix(runbooks): validate slug in save(), clean truncation boundary, gu…

66fdb02

…ard missing slug in prompt

fix(runbooks): catch ValueError in remove command, show clean error i…

e000f9f

…nstead of traceback

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

Comment thread app/pipeline/pipeline.py Outdated

fix(runbooks): handle null commonLabels in alert_json to preserve key…

5505ea2

…word matches

muddlebee assigned davincios and VaibhavUpreti May 15, 2026

muddlebee added the pending triage label May 15, 2026

VaibhavUpreti reviewed May 28, 2026

View reviewed changes

VaibhavUpreti requested changes May 28, 2026

View reviewed changes

Ankit Juneja added 2 commits May 29, 2026 13:35

chore: merge upstream/main, resolve conflicts, update runbook dir to …

cfc1041

…~/.opensre

fix(runbooks): add /runbook to slash catalog, fix description parity

8202df4

Conversation

devankitjuneja commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the changes you have made in this PR -

Demo/Screenshot for feature changes and bug fixes -

Code Understanding and AI Usage

Checklist before requesting a review

Uh oh!

github-actions Bot commented May 14, 2026

Greptile code review

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

devankitjuneja commented May 14, 2026

Uh oh!

Uh oh!

devankitjuneja commented May 14, 2026

Uh oh!

Uh oh!

devankitjuneja commented May 15, 2026

Uh oh!

devankitjuneja commented May 15, 2026

Uh oh!

Uh oh!

VibhorGautam left a comment

Choose a reason for hiding this comment

Uh oh!

devankitjuneja commented May 15, 2026

Uh oh!

VibhorGautam commented May 15, 2026

Uh oh!

devankitjuneja commented May 15, 2026

Uh oh!

devankitjuneja commented May 15, 2026

Uh oh!

Uh oh!

VibhorGautam commented May 15, 2026

Uh oh!

devankitjuneja commented May 15, 2026

Uh oh!

muddlebee commented May 15, 2026

Uh oh!

devankitjuneja commented May 15, 2026

Uh oh!

devankitjuneja commented May 20, 2026

Uh oh!

VaibhavUpreti May 28, 2026

Choose a reason for hiding this comment

Uh oh!

VaibhavUpreti left a comment

Choose a reason for hiding this comment

Uh oh!

devankitjuneja commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

devankitjuneja commented May 14, 2026 •

edited

Loading

greptile-apps Bot commented May 14, 2026 •

edited

Loading