Skip to content

feat: GSAR Agent integration — Agent(gsar=GSARConfig(...))#19

Merged
fede-kamel merged 1 commit into
mainfrom
feat/gsar-agent-integration-github
Apr 30, 2026
Merged

feat: GSAR Agent integration — Agent(gsar=GSARConfig(...))#19
fede-kamel merged 1 commit into
mainfrom
feat/gsar-agent-integration-github

Conversation

@fede-kamel
Copy link
Copy Markdown
Contributor

Wires the typed-grounding layer onto the agent loop. After the agent completes, a configured GSAR judge scores the final answer + tool-execution history; verdict surfaces on AgentResult as gsar_judgment / gsar_score / gsar_decision.

Single-pass v1 — promised by gsar.py's docstring once the layer stabilised (proven 30/30 over 80 prior runs in #18). Full Algorithm-1 outer loop stays available via locus.reasoning.gsar_evaluator.

Usage

from locus.agent import Agent
from locus.agent.config import GSARConfig
from locus.reasoning.gsar_judge import StructuredOutputGSARJudge

agent = Agent(
    model='openai:gpt-4o-mini',
    tools=[lookup_metric],
    gsar=GSARConfig(
        judge=StructuredOutputGSARJudge(model=judge_model),
        contradiction_penalty=0.5,    # ρ
        tau_proceed=0.80,
        tau_regenerate=0.65,
    ),
)
result = agent.run_sync('What is the CPU on db-prod-1?')
print(result.gsar_decision)   # 'proceed' | 'regenerate' | 'replan' | 'abstain'
print(result.gsar_score)      # float in [0, 1]
print(result.gsar_judgment)   # full JudgeOutput with the 4-way partition

Surface

  • locus.agent.config.GSARConfig — judge, ρ, τ_proceed/τ_regenerate, weight_map, fail_on_low_score.
  • AgentConfig.gsar, Agent(gsar=...) kwarg.
  • AgentResult.gsar_judgment / gsar_score / gsar_decision.
  • Agent._run_gsar_judgment — assembles evidence from state.tool_executions, runs judge, recomputes S, returns δ.

Robustness

  • Judge exception → (None, None, None). Agent does not crash.
  • Judge unset → default StructuredOutputGSARJudge over the primary model with a doc note that this is rarely the right production choice.

Tests

  • tests/unit/test_agent_gsar.py (14 tests): config validation, fields stay None when unset, proceed/replan/abstain decisions surface, evidence corpus assembly, judge-exception fallback, ρ override, custom thresholds.
  • tests/integration/test_agent_gsar_live.py (2 tests, OPENAI_API_KEY gated): grounded answer with @tool → non-replan; ungrounded answer → non-proceed + ≥1 non-grounded claim.

Stability proven

30 iterations of the full GSAR live suite (8 evaluator + 2 new agent tests = 10 per run):

Metric Value
Runs 30 / 30 pass
Tests 297 passes, 3 honest skips, 0 failures across 300 attempts
Wall clock 2080s total (~70s/run)

Drive-by

test_gsar_recovery_then_proceed_live_cycle: when the live judge accepts a contradicted-claim report on the first iteration, the loop goes straight to proceed without recovery. Now skips with a clear message instead of failing on a vacuously-truthful premise.

Validation

  • 3193 unit tests pass (14 new), no regressions.
  • 30/30 GSAR live suite runs pass.
  • hatch run lint clean.

Test plan

  • CI runs tests/unit/test_agent_gsar.py cleanly.

feat: GSAR Agent integration — Agent(gsar=GSARConfig(...))

See merge request saas-observ-eng/locus!103
@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 30, 2026
@fede-kamel fede-kamel merged commit 3fa6745 into main Apr 30, 2026
1 check passed
@fede-kamel fede-kamel deleted the feat/gsar-agent-integration-github branch May 13, 2026 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant