Skip to content

feat(hermes): add controls/security RCA synthetic scenarios and e2e coverage #2520

Open
cerencamkiran wants to merge 1 commit into
Tracer-Cloud:mainfrom
cerencamkiran:feat/hermes-controls-scenarios
Open

feat(hermes): add controls/security RCA synthetic scenarios and e2e coverage #2520
cerencamkiran wants to merge 1 commit into
Tracer-Cloud:mainfrom
cerencamkiran:feat/hermes-controls-scenarios

Conversation

@cerencamkiran
Copy link
Copy Markdown
Collaborator

@cerencamkiran cerencamkiran commented May 25, 2026

Fixes #1512

Summary

Adds controls/security-focused Hermes synthetic RCA scenarios and corresponding e2e investigation coverage.

Added synthetic scenarios

  • 040-determinism-engine-missing
  • 041-approval-lock-missing-dangerous-command
  • 042-audit-trail-missing
  • 043-rbac-gateway-missing-multi-user
  • 044-credential-proxy-missing

Scenario coverage

Each scenario includes:

  • scenario.yml
  • alert.json
  • answer.yml
  • hermes_config.json
  • hermes_runtime_state.json
  • hermes_session_log.json
  • scenario-specific controls/security evidence fixtures

Added evidence fixture coverage for

  • hermes_workflow_run
  • hermes_approval_events
  • hermes_audit_trail
  • hermes_rbac_state
  • hermes_credential_state

Schema / loader wiring

Updated tests/synthetic/hermes_rca/hermes_schemas.py

  • added Hermes controls/security RCA categories
  • added validators for controls-oriented evidence sources

Updated tests/synthetic/hermes_rca/scenario_loader.py

  • added loading support for new controls/security evidence fixtures

Root-cause taxonomy

Updated app/types/root_cause_categories.py with Hermes RCA categories for:

  • missing_determinism_control
  • missing_approval_gate
  • missing_audit_trail
  • missing_rbac
  • missing_credential_isolation

Backend / tool exposure

Updated tests/synthetic/mock_hermes_backend/backend.py

  • added fixture backend getters for controls/security evidence

Updated app/tools/HermesSessionEvidenceTool/__init__.py

  • exposed new investigation tools:

    • get_hermes_workflow_run
    • get_hermes_approval_events
    • get_hermes_audit_trail
    • get_hermes_rbac_state
    • get_hermes_credential_state

Added e2e coverage

New controls-focused e2e investigation tests under:

  • tests/e2e/hermes/controls/

The assertions intentionally allow a small set of semantically-equivalent RCA categories during live investigation execution.

The tests also validate scenario-specific reasoning/evidence signals instead of relying only on category matching.

Validation performed

python -m pytest tests/synthetic/hermes_rca -q
# 10 passed

python -m pytest tests/e2e/hermes/controls -q
# 5 skipped (expected without OPENSRE_RUN_HERMES_E2E)

ruff check app/tools/HermesSessionEvidenceTool app/types/root_cause_categories.py tests/synthetic/hermes_rca tests/synthetic/mock_hermes_backend tests/e2e/hermes/controls
# All checks passed

I also attempted to validate the live e2e investigation flow with Anthropic enabled.

The scenarios successfully executed through the real investigation pipeline path, but full live e2e completion could not be verified because the Anthropic API quota limit was reached during runtime testing.

Ekran görüntüsü 2026-05-25 215116 Ekran görüntüsü 2026-05-27 193140

@github-actions
Copy link
Copy Markdown
Contributor

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

@cerencamkiran cerencamkiran marked this pull request as draft May 25, 2026 18:49
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 25, 2026

Greptile Summary

This PR adds five controls/security-focused Hermes RCA synthetic scenarios (040–044) covering missing determinism control, approval gate, audit trail, RBAC, and credential isolation. It wires each scenario end-to-end with evidence fixtures, schema validators, a mock backend, new investigation tools, updated root-cause taxonomy, and gated e2e tests.

  • Five new @tool functions (get_hermes_audit_trail, get_hermes_approval_events, get_hermes_rbac_state, get_hermes_credential_state, get_hermes_workflow_run) follow the existing tool pattern exactly and are correctly registered in __all__ and the telemetry exemption list.
  • Schema validators in hermes_schemas.py are thorough; the diverging_steps element-type guard and the get_approval_events session_id consistency (both noted in prior review rounds) are correctly handled in this version.
  • The new HermesScenarioEvidence fields are picked up automatically by as_dict() via __dict__ iteration, so no manual update to that method was needed.

Confidence Score: 5/5

Safe to merge; changes are additive test infrastructure and new investigation tool stubs that are fixture-backend-only and cannot activate in production.

All five new tools follow the established fixture-backend-only pattern and are correctly gated behind _fixture_backend_only. The schema validators are thorough and the previously flagged issues (diverging_steps element check, get_approval_events session_id consistency) are correctly addressed in this version.

tests/synthetic/hermes_rca/hermes_schemas.py — the validate_hermes_credential_state function validates the top-level mode strictly but leaves per-call credential_source unchecked.

Important Files Changed

Filename Overview
app/tools/HermesSessionEvidenceTool/init.py Five new tool functions added following the exact same pattern as existing Hermes tools; correctly registered in all and the telemetry exemption list.
app/types/root_cause_categories.py Five new RCA categories appended under GROUP_CODE_AND_CONFIG and added to HERMES_ROOT_CAUSE_CATEGORIES; grouping choice was flagged and discussed in a prior review thread.
tests/synthetic/hermes_rca/hermes_schemas.py New validators are solid overall; credential_source values in outbound_calls are not validated against the known enum, creating a minor inconsistency with the top-level mode field.
tests/synthetic/mock_hermes_backend/backend.py Five new FixtureHermesBackend getters are consistent with existing patterns; session_id handling is uniform across all new getters.
tests/synthetic/hermes_rca/scenario_loader.py Loading blocks for all five new evidence types follow the existing conditional pattern and pass all new evidence correctly to HermesScenarioEvidence.
tests/e2e/hermes/controls/test_approval_lock_missing.py E2e test correctly skips without OPENSRE_RUN_HERMES_E2E and asserts both category and keyword signals; fallback category breadth was discussed in a prior review thread.
tests/tools/test_telemetry.py All five new tool names are correctly registered in _TOOLS_WITHOUT_DELIBERATE_CATCH, matching the existing Hermes tool exemption pattern.

Sequence Diagram

sequenceDiagram
    participant Test as e2e / synthetic test
    participant Loader as scenario_loader
    participant Schema as hermes_schemas (validators)
    participant Backend as FixtureHermesBackend
    participant Tool as HermesSessionEvidenceTool

    Test->>Loader: load_scenario("040-044")
    Loader->>Schema: validate_hermes_scenario_metadata()
    Loader->>Schema: validate_hermes_alert()
    Loader->>Schema: validate_hermes_answer_key()
    Loader->>Schema: "validate_hermes_*() per available_evidence"
    Schema-->>Loader: HermesScenarioEvidence (with new fields)
    Loader-->>Test: HermesScenarioFixture

    Test->>Backend: FixtureHermesBackend(fixture)
    Test->>Tool: get_hermes_workflow_run(session_id)
    Tool->>Backend: backend.get_workflow_run(session_id)
    Backend-->>Tool: "{source, available, session_id, workflow_id, runs, diverging_steps}"
    Tool-->>Test: evidence dict

    Test->>Tool: get_hermes_approval_events(session_id)
    Tool->>Backend: backend.get_approval_events(session_id)
    Backend-->>Tool: "{source, available, session_id, events}"

    Test->>Tool: get_hermes_audit_trail / get_hermes_rbac_state / get_hermes_credential_state
    Tool->>Backend: "backend.get_*()"
    Backend-->>Tool: envelope with caller-supplied session_id
Loading

Reviews (3): Last reviewed commit: "feat(hermes): add controls synthetic sce..." | Re-trigger Greptile

Comment thread tests/synthetic/hermes_rca/hermes_schemas.py Outdated
Comment thread tests/synthetic/mock_hermes_backend/backend.py
Comment thread app/types/root_cause_categories.py
Comment thread tests/e2e/hermes/controls/test_approval_lock_missing.py
@cerencamkiran cerencamkiran force-pushed the feat/hermes-controls-scenarios branch from 490dab1 to d4b6794 Compare May 27, 2026 16:38
@cerencamkiran
Copy link
Copy Markdown
Collaborator Author

@greptile review

@cerencamkiran cerencamkiran marked this pull request as ready for review May 27, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hermes incident-identification scenarios 4/5 — Determinism, approval locks, cryptographic audit, RBAC, credential proxy

1 participant