feat(hermes): add controls/security RCA synthetic scenarios and e2e coverage #2520
feat(hermes): add controls/security RCA synthetic scenarios and e2e coverage #2520cerencamkiran wants to merge 1 commit into
Conversation
Greptile code reviewThis repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md. Run a review — add a PR comment with: Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5. Optional: automate with the greploop skill. |
Greptile SummaryThis PR adds five controls/security-focused Hermes RCA synthetic scenarios (040–044) covering missing determinism control, approval gate, audit trail, RBAC, and credential isolation. It wires each scenario end-to-end with evidence fixtures, schema validators, a mock backend, new investigation tools, updated root-cause taxonomy, and gated e2e tests.
Confidence Score: 5/5Safe to merge; changes are additive test infrastructure and new investigation tool stubs that are fixture-backend-only and cannot activate in production. All five new tools follow the established fixture-backend-only pattern and are correctly gated behind _fixture_backend_only. The schema validators are thorough and the previously flagged issues (diverging_steps element check, get_approval_events session_id consistency) are correctly addressed in this version. tests/synthetic/hermes_rca/hermes_schemas.py — the validate_hermes_credential_state function validates the top-level mode strictly but leaves per-call credential_source unchecked. Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as e2e / synthetic test
participant Loader as scenario_loader
participant Schema as hermes_schemas (validators)
participant Backend as FixtureHermesBackend
participant Tool as HermesSessionEvidenceTool
Test->>Loader: load_scenario("040-044")
Loader->>Schema: validate_hermes_scenario_metadata()
Loader->>Schema: validate_hermes_alert()
Loader->>Schema: validate_hermes_answer_key()
Loader->>Schema: "validate_hermes_*() per available_evidence"
Schema-->>Loader: HermesScenarioEvidence (with new fields)
Loader-->>Test: HermesScenarioFixture
Test->>Backend: FixtureHermesBackend(fixture)
Test->>Tool: get_hermes_workflow_run(session_id)
Tool->>Backend: backend.get_workflow_run(session_id)
Backend-->>Tool: "{source, available, session_id, workflow_id, runs, diverging_steps}"
Tool-->>Test: evidence dict
Test->>Tool: get_hermes_approval_events(session_id)
Tool->>Backend: backend.get_approval_events(session_id)
Backend-->>Tool: "{source, available, session_id, events}"
Test->>Tool: get_hermes_audit_trail / get_hermes_rbac_state / get_hermes_credential_state
Tool->>Backend: "backend.get_*()"
Backend-->>Tool: envelope with caller-supplied session_id
Reviews (3): Last reviewed commit: "feat(hermes): add controls synthetic sce..." | Re-trigger Greptile |
490dab1 to
d4b6794
Compare
|
@greptile review |
Fixes #1512
Summary
Adds controls/security-focused Hermes synthetic RCA scenarios and corresponding e2e investigation coverage.
Added synthetic scenarios
Scenario coverage
Each scenario includes:
scenario.ymlalert.jsonanswer.ymlhermes_config.jsonhermes_runtime_state.jsonhermes_session_log.jsonAdded evidence fixture coverage for
hermes_workflow_runhermes_approval_eventshermes_audit_trailhermes_rbac_statehermes_credential_stateSchema / loader wiring
Updated
tests/synthetic/hermes_rca/hermes_schemas.pyUpdated
tests/synthetic/hermes_rca/scenario_loader.pyRoot-cause taxonomy
Updated
app/types/root_cause_categories.pywith Hermes RCA categories for:missing_determinism_controlmissing_approval_gatemissing_audit_trailmissing_rbacmissing_credential_isolationBackend / tool exposure
Updated
tests/synthetic/mock_hermes_backend/backend.pyUpdated
app/tools/HermesSessionEvidenceTool/__init__.pyexposed new investigation tools:
get_hermes_workflow_runget_hermes_approval_eventsget_hermes_audit_trailget_hermes_rbac_stateget_hermes_credential_stateAdded e2e coverage
New controls-focused e2e investigation tests under:
tests/e2e/hermes/controls/The assertions intentionally allow a small set of semantically-equivalent RCA categories during live investigation execution.
The tests also validate scenario-specific reasoning/evidence signals instead of relying only on category matching.
Validation performed
I also attempted to validate the live e2e investigation flow with Anthropic enabled.
The scenarios successfully executed through the real investigation pipeline path, but full live e2e completion could not be verified because the Anthropic API quota limit was reached during runtime testing.