Remove trace-aware LLM client helper

Cody McCodePants · Cody McCodePants · commit 84861d1bbf17 · 2026-06-13T17:41:32.000+08:00
diff --git a/docs/content/docs/build-your-agent/custom-agents.mdx b/docs/content/docs/build-your-agent/custom-agents.mdx
@@ -68,9 +68,9 @@ The reference LLM agent builds a compact prompt from topology size, symptom keys
 
 Return structured fields rather than only natural language. Free-form explanations are useful for review, but scoring depends on verdict, fault type, and location fields.
 
-For reproducibility, NetOpsBench saves a per-case runtime trace beside the raw scenario result. Agents that need private LLM message traces should use `context.trace.llm_client(...)` or a NetOpsBench-provided framework callback such as `context.trace.langchain_callback()`. The harness writes ATIF v1.7 `trajectory.atif.json` artifacts for Harbor-style inspection while keeping ground truth out of the agent trajectory; scoring details are linked separately through `traces/results.jsonl`. Use `netopsbench trace view` to sync trace-enabled runs into the local Harbor viewer cache, or `netopsbench trace view <run_id>` to ensure a specific saved run is available in the viewer.
+For reproducibility, NetOpsBench saves a per-case runtime trace beside the raw scenario result. The bundled reference agent captures private LLM and tool events by attaching `context.trace.langchain_callback()` to its LangChain-compatible runtime. Custom non-LangChain agents can use advanced manual recorder methods such as `context.trace.record_llm_request(...)` and `context.trace.record_llm_response(...)` when they need private model calls in the trace. The harness writes ATIF v1.7 `trajectory.atif.json` artifacts for Harbor-style inspection while keeping ground truth out of the agent trajectory; scoring details are linked separately through `traces/results.jsonl`. Use `netopsbench trace view` to sync trace-enabled runs into the local Harbor viewer cache, or `netopsbench trace view <run_id>` to ensure a specific saved run is available in the viewer.
 
-Trace storage preserves visible agent-environment interactions with secret redaction and per-field size limits. NetOpsBench does not monkeypatch arbitrary LLM SDKs, so fully private model prompts and responses are captured only when the agent uses the trace-aware client or callback.
+Trace storage preserves visible agent-environment interactions with secret redaction and per-field size limits. NetOpsBench does not monkeypatch arbitrary LLM SDKs, so fully private model prompts and responses are captured only when the agent uses a supported framework callback or the manual recorder methods.
 
 ## Reference agent
 
diff --git a/docs/content/docs/build-your-agent/python-api-guide.mdx b/docs/content/docs/build-your-agent/python-api-guide.mdx
@@ -71,7 +71,7 @@ The `scenario_summaries[*].raw_result_path` fields point to raw JSON files for c
 
 Agent traces are saved by default and can be disabled for a run with `trace=False` or by setting `NETOPSBENCH_TRACE=0`. Disabling trace prevents private runtime trace collection and sidecar artifact creation. Ground truth and score details are written to `traces/results.jsonl`, not into the agent trajectory.
 
-NetOpsBench stores visible prompts, model messages, tool calls, and observations with secret redaction and per-field truncation. For private LLM message capture, custom agents should call models through `context.trace.llm_client(...)` or attach `context.trace.langchain_callback()` to LangChain-compatible runtimes. Set `NETOPSBENCH_TRACE_MAX_FIELD_CHARS` to tune truncation.
+NetOpsBench stores visible prompts, model messages, tool calls, and observations with secret redaction and per-field truncation. The bundled `MinimalDeepAgent` attaches `context.trace.langchain_callback()` to its LangChain-compatible runtime so private LLM messages and tool events flow into the same recorder. Non-LangChain agents can use the advanced manual recorder methods, such as `context.trace.record_llm_request(...)` and `context.trace.record_llm_response(...)`, when they need to capture private model calls. Set `NETOPSBENCH_TRACE_MAX_FIELD_CHARS` to tune truncation.
 
 Open a completed run directly in the Harbor viewer:
 
diff --git a/docs/content/docs/quickstart.mdx b/docs/content/docs/quickstart.mdx
@@ -69,7 +69,7 @@ Supported provider presets:
 
 | `--vendor` | Model | Environment variable |
 |---|---|---|
-| `openai` | gpt-5.4 | `OPENAI_API_KEY` |
+| `openai` | gpt-5.5 | `OPENAI_API_KEY` |
 | `minimax` | MiniMax-M3 | `MINIMAX_API_KEY` |
 | `deepseek` | deepseek-v4-pro | `DEEPSEEK_API_KEY` |
 | `zhipu` | glm-5.1 | `ZHIPU_API_KEY` |
diff --git a/examples/agents/README.md b/examples/agents/README.md
@@ -47,7 +47,7 @@ To switch provider and endpoint explicitly:
 ```python
 agent = MinimalDeepAgent(
     vendor="openai",
-    model="gpt-5.4",
+    model="gpt-5.5",
     base_url="https://api.openai.com/v1",
 )
 ```
diff --git a/examples/agents/minimal_deepagent/agent.py b/examples/agents/minimal_deepagent/agent.py
@@ -9,7 +9,7 @@
 - ``providers/minimax.py``   — MiniMax-M3
 - ``providers/glm.py``       — ZhipuAI GLM-5.1
 - ``providers/deepseek.py``  — DeepSeek deepseek-v4-pro (thinking mode disabled)
-- ``providers/openai.py``    — OpenAI-compatible endpoint (gpt-5.4)
+- ``providers/openai.py``    — OpenAI-compatible endpoint (gpt-5.5)
 
 The shared output schema (``DiagnosisOutput``) lives in ``schema.py``.
 Shared runtime and result helpers live in ``providers/runtime.py`` and
@@ -20,7 +20,7 @@
 - ``minimax``   — MiniMax-M3 (default)
 - ``zhipu``    — GLM-5.1 (ZhipuAI)
 - ``deepseek`` — deepseek-v4-pro (DeepSeek)
-- ``openai``   — gpt-5.4 via OpenAI-compatible endpoint
+- ``openai``   — gpt-5.5 via OpenAI-compatible endpoint
 
 Dependencies (install with ``pip install deepagents langchain-openai langchain-mcp-adapters``):
 - deepagents
@@ -69,7 +69,7 @@ class MinimalDeepAgent:
         agent = MinimalDeepAgent(vendor="minimax")  # MiniMax-M3 (default)
         agent = MinimalDeepAgent(vendor="zhipu")    # GLM-5.1
         agent = MinimalDeepAgent(vendor="deepseek") # deepseek-v4-pro
-        agent = MinimalDeepAgent(vendor="openai")   # gpt-5.4
+        agent = MinimalDeepAgent(vendor="openai")   # gpt-5.5
 
     Explicit ``model``, ``base_url``, or ``api_key`` kwargs override the
     vendor preset.
diff --git a/netopsbench/agents/tracing.py b/netopsbench/agents/tracing.py
@@ -6,7 +6,7 @@
 import threading
 import uuid
 from datetime import UTC, datetime
-from typing import Any, cast
+from typing import Any
 
 from netopsbench.agents._trace_utils import jsonable as _jsonable
 
@@ -34,26 +34,6 @@ def disabled(cls) -> AgentTraceRecorder:
 
         return cls(enabled=False)
 
-    def llm_client(
-        self,
-        provider: str = "openai",
-        *,
-        model: str,
-        api_key: str | None = None,
-        base_url: str | None = None,
-        **client_kwargs: Any,
-    ) -> TraceAwareLLMClient:
-        """Return an OpenAI-compatible chat client that records visible messages."""
-
-        return TraceAwareLLMClient(
-            recorder=self,
-            provider=provider,
-            model=model,
-            api_key=api_key,
-            base_url=base_url,
-            client_kwargs=client_kwargs,
-        )
-
     def langchain_callback(self) -> Any | None:
         """Return a LangChain callback handler that writes into this recorder."""
 
@@ -371,57 +351,6 @@ def _accumulate_usage(self, usage: dict[str, int]) -> None:
             self._metrics["total_tokens"] += usage["total_tokens"]
             self._metrics["llm_call_count"] += usage["has_usage"] or 1
 
-
-class TraceAwareLLMClient:
-    """Small OpenAI-compatible chat client wrapper that records requests and responses."""
-
-    def __init__(
-        self,
-        *,
-        recorder: AgentTraceRecorder,
-        provider: str,
-        model: str,
-        api_key: str | None,
-        base_url: str | None,
-        client_kwargs: dict[str, Any],
-    ):
-        self.recorder = recorder
-        self.provider = provider
-        self.model = model
-        self.api_key = api_key
-        self.base_url = base_url
-        self.client_kwargs = dict(client_kwargs)
-
-    async def chat(self, messages: list[dict[str, Any]], **kwargs: Any) -> Any:
-        try:
-            from openai import AsyncOpenAI
-        except Exception as exc:  # pragma: no cover - depends on optional agent deps
-            raise RuntimeError("openai is required for context.trace.llm_client().chat()") from exc
-
-        run_id = self.recorder.record_llm_request(
-            messages,
-            model=self.model,
-            provider=self.provider,
-        )
-        client = AsyncOpenAI(api_key=self.api_key, base_url=self.base_url, **self.client_kwargs)
-        try:
-            response = await client.chat.completions.create(
-                model=self.model,
-                messages=cast(Any, messages),
-                **kwargs,
-            )
-        except Exception as exc:
-            self.recorder.record_error(stage="llm", error=exc, run_id=run_id)
-            raise
-        self.recorder.record_llm_response(
-            response,
-            run_id=run_id,
-            model=self.model,
-            provider=self.provider,
-        )
-        return response
-
-
 def _message_payload(message: Any, *, index: int) -> dict[str, Any]:
     payload: dict[str, Any] = {
         "index": index,
diff --git a/tests/test_example_agents.py b/tests/test_example_agents.py
@@ -199,7 +199,7 @@ def test_minimal_deepagent_openai_defaults_and_openai_key(monkeypatch):
     agent = MinimalDeepAgent(vendor="openai")
 
     assert agent.api_key == "openai-shell-key"
-    assert agent.model == "gpt-5.4"
+    assert agent.model == "gpt-5.5"
     assert agent.base_url == "https://api.openai.com/v1"
 
 
diff --git a/tests/test_session_tracing.py b/tests/test_session_tracing.py
@@ -1,54 +1,16 @@
 from __future__ import annotations
 
 import json
-import sys
 from datetime import UTC, datetime
 from types import SimpleNamespace
 
-import pytest
 from harbor.viewer.scanner import JobScanner
 
 from netopsbench.agents.base import DiagnosticContext
 from netopsbench.agents.tracing import AgentTraceRecorder
 from netopsbench.platform.session.tracing import TraceWriter, export_traces, load_trace_index
 
 
-@pytest.mark.asyncio
-async def test_trace_aware_llm_client_records_private_messages(monkeypatch):
-    captured = {}
-
-    class FakeCompletions:
-        async def create(self, **kwargs):
-            captured.update(kwargs)
-            return SimpleNamespace(
-                choices=[SimpleNamespace(message=SimpleNamespace(type="ai", content="diagnosis draft"))],
-                usage=SimpleNamespace(prompt_tokens=7, completion_tokens=3, total_tokens=10),
-            )
-
-    class FakeAsyncOpenAI:
-        def __init__(self, **kwargs):
-            captured["client"] = kwargs
-            self.chat = SimpleNamespace(completions=FakeCompletions())
-
-    monkeypatch.setitem(sys.modules, "openai", SimpleNamespace(AsyncOpenAI=FakeAsyncOpenAI))
-    recorder = AgentTraceRecorder()
-
-    response = await recorder.llm_client("openai", model="gpt-test", api_key="test-key").chat(
-        [{"role": "user", "content": "diagnose"}],
-        temperature=0,
-    )
-
-    assert response.choices[0].message.content == "diagnosis draft"
-    assert captured["model"] == "gpt-test"
-    assert captured["messages"] == [{"role": "user", "content": "diagnose"}]
-    assert recorder.metrics()["input_tokens"] == 7
-    assert recorder.metrics()["output_tokens"] == 3
-    steps = recorder.to_steps()
-    assert [step["message"] for step in steps] == ["diagnosis draft"]
-    assert steps[0]["duration_seconds"] is not None
-    assert steps[0]["extra"]["llm_request"]["messages"][0]["content"] == "diagnose"
-
-
 def test_disabled_trace_recorder_preserves_api_without_collecting():
     recorder = AgentTraceRecorder.disabled()
     run_id = recorder.record_llm_request([{"role": "user", "content": "diagnose"}], model="gpt-test")

Original file line number	Diff line number	Diff line change
`@@ -47,7 +47,7 @@ To switch provider and endpoint explicitly:`
`47`	`47`	```python
`48`	`48`	`agent = MinimalDeepAgent(`
`49`	`49`	`vendor="openai",`
`50`		`- model="gpt-5.4",`
	`50`	`+ model="gpt-5.5",`
`51`	`51`	`base_url="https://api.openai.com/v1",`
`52`	`52`	`)`
`53`	`53`	```