AgentBoundary v0.1 conformance evaluation of LangSmith — pre-publication review

Hi @hinthornw and LangSmith team —

I'm Sunil Prakash from JamJet Labs. I've been authoring an open spec for AI-action receipts called **AgentBoundary** (`jamjet-labs/agentboundary`, v0.1 stable + v0.2-alpha draft). It defines a portable, tamper-evident JSON receipt format that a third party can verify without trusting the runtime.

I built a 40-scenario conformance suite and graded it against four prominent agent-governance products including LangSmith. Opening this issue to give the team a 7-day right-to-respond window before publication of the comparative report.

**Headline up front:** LangSmith is the most full-featured *observability platform* in the comparison. The Run object captures everything needed to debug a multi-step agent call — `inputs`, `outputs`, full trace tree, `feedback_stats`, token costs, eval-dataset linkage. What it does not have, and does not claim to have, is a *normative artifact format for portable verification*. Policy decisions, approver identity, and execution outcomes live in team-defined tag/feedback conventions. The comparison reveals that LangSmith's design choice (capture-everything-flexibly) and AgentBoundary's (schema-versioned-receipts) target different audiences — engineers debugging vs third-party verifiers.

**What I did:**

- Read the [Run data format docs](https://docs.langchain.com/langsmith/run-data-format)
- Read the [Gateway announcement](https://www.langchain.com/blog/introducing-llm-gateway)
- Reviewed Fleet (partial public docs)
- Built an adapter at [`adapters/langsmith-gateway/`](https://github.com/jamjet-labs/agentboundary/tree/main/adapters/langsmith-gateway) that translates a LangSmith Run (with optional tag/feedback conventions documented in mapping.md) into an AgentBoundary v0.2-alpha receipt
- Ran all 40 conformance scenarios against adapter-translated receipts
- Per-scenario verdicts in [`results.md`](https://github.com/jamjet-labs/agentboundary/blob/main/adapters/langsmith-gateway/results.md); field-by-field mapping in [`mapping.md`](https://github.com/jamjet-labs/agentboundary/blob/main/adapters/langsmith-gateway/mapping.md)

**Headline:**

```
PASS         15
PARTIAL      14
DOCS-ONLY     1
NOT COVERED   8
N/A           2
──────────────
TOTAL        40
```

15 PASS is the second-highest of any vendor (after Microsoft AGT), driven mostly by Level 3 hashing scenarios — `Run.inputs` stores raw JSON the adapter can canonicalise and hash directly. 14 PARTIAL is the signature: the data is in the Run somewhere, but the schema location varies by team convention. With a strict team convention (e.g., `decision:allow`, `policy:foo`, `env:prod` tags), many PARTIAL rows upgrade to PASS; without conventions, they fall to NOT COVERED.

**Specific notes (design choices worth acknowledging, not bugs):**

- LangSmith doesn't normalize a `policy.decision` field; teams use `Run.tags` conventions. A cross-team auditor can't reliably parse what they find without per-team schema understanding.
- `parent_run_id` builds a tree per trace; it does NOT chain across traces, so the AgentBoundary v0.2-alpha L4 chain-integrity check doesn't apply at trace boundaries.
- The Gateway adds spend caps + PII redaction at the LLM-request layer; these are control-plane primitives, orthogonal to per-action receipt structure.

**The ask:** if any per-scenario mapping or factual claim is wrong, corrections welcome via this issue or via PR to `jamjet-labs/agentboundary` within 7 days. After that, the report publishes with the data as currently mapped.

Happy to share §7.3 of the report (the LangSmith section, ~600 words) for a sneak look if anyone wants one.

Thanks for LangSmith — the Run data model is one of the most thoughtful I've seen in this space, and the observability + eval primitives raise the bar.

— Sunil Prakash, JamJet Labs (sunil@jamjet.dev)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AgentBoundary v0.1 conformance evaluation of LangSmith — pre-publication review #2919

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AgentBoundary v0.1 conformance evaluation of LangSmith — pre-publication review #2919

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions