feat: GSAR Agent integration — Agent(gsar=GSARConfig(...)) by fede-kamel · Pull Request #19 · oracle-samples/locus

fede-kamel · 2026-04-30T20:56:04Z

Wires the typed-grounding layer onto the agent loop. After the agent completes, a configured GSAR judge scores the final answer + tool-execution history; verdict surfaces on AgentResult as gsar_judgment / gsar_score / gsar_decision.

Single-pass v1 — promised by gsar.py's docstring once the layer stabilised (proven 30/30 over 80 prior runs in #18). Full Algorithm-1 outer loop stays available via locus.reasoning.gsar_evaluator.

Usage

from locus.agent import Agent
from locus.agent.config import GSARConfig
from locus.reasoning.gsar_judge import StructuredOutputGSARJudge

agent = Agent(
    model='openai:gpt-4o-mini',
    tools=[lookup_metric],
    gsar=GSARConfig(
        judge=StructuredOutputGSARJudge(model=judge_model),
        contradiction_penalty=0.5,    # ρ
        tau_proceed=0.80,
        tau_regenerate=0.65,
    ),
)
result = agent.run_sync('What is the CPU on db-prod-1?')
print(result.gsar_decision)   # 'proceed' | 'regenerate' | 'replan' | 'abstain'
print(result.gsar_score)      # float in [0, 1]
print(result.gsar_judgment)   # full JudgeOutput with the 4-way partition

Surface

locus.agent.config.GSARConfig — judge, ρ, τ_proceed/τ_regenerate, weight_map, fail_on_low_score.
AgentConfig.gsar, Agent(gsar=...) kwarg.
AgentResult.gsar_judgment / gsar_score / gsar_decision.
Agent._run_gsar_judgment — assembles evidence from state.tool_executions, runs judge, recomputes S, returns δ.

Robustness

Judge exception → (None, None, None). Agent does not crash.
Judge unset → default StructuredOutputGSARJudge over the primary model with a doc note that this is rarely the right production choice.

Tests

tests/unit/test_agent_gsar.py (14 tests): config validation, fields stay None when unset, proceed/replan/abstain decisions surface, evidence corpus assembly, judge-exception fallback, ρ override, custom thresholds.
tests/integration/test_agent_gsar_live.py (2 tests, OPENAI_API_KEY gated): grounded answer with @tool → non-replan; ungrounded answer → non-proceed + ≥1 non-grounded claim.

Stability proven

30 iterations of the full GSAR live suite (8 evaluator + 2 new agent tests = 10 per run):

Metric	Value
Runs	30 / 30 pass
Tests	297 passes, 3 honest skips, 0 failures across 300 attempts
Wall clock	2080s total (~70s/run)

Drive-by

test_gsar_recovery_then_proceed_live_cycle: when the live judge accepts a contradicted-claim report on the first iteration, the loop goes straight to proceed without recovery. Now skips with a clear message instead of failing on a vacuously-truthful premise.

Validation

3193 unit tests pass (14 new), no regressions.
30/30 GSAR live suite runs pass.
hatch run lint clean.

Test plan

CI runs tests/unit/test_agent_gsar.py cleanly.

feat: GSAR Agent integration — Agent(gsar=GSARConfig(...)) See merge request saas-observ-eng/locus!103

Merge branch 'feat/gsar-agent-integration' into 'main'

38cd0da

feat: GSAR Agent integration — Agent(gsar=GSARConfig(...)) See merge request saas-observ-eng/locus!103

oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 30, 2026

fede-kamel merged commit 3fa6745 into main Apr 30, 2026
1 check passed

fede-kamel mentioned this pull request May 1, 2026

docs: rewrite README — content-first, drop GIFs, link to website #23

Merged

2 tasks

fede-kamel deleted the feat/gsar-agent-integration-github branch May 13, 2026 04:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GSAR Agent integration — Agent(gsar=GSARConfig(...))#19

feat: GSAR Agent integration — Agent(gsar=GSARConfig(...))#19
fede-kamel merged 1 commit into
mainfrom
feat/gsar-agent-integration-github

fede-kamel commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fede-kamel commented Apr 30, 2026

Usage

Surface

Robustness

Tests

Stability proven

Drive-by

Validation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant