Skip to content

NVIDIA/NeMo-Relay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

238 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

License GitHub Release Codecov PyPI npm node npm wasm Crates.io Crates.io Crates.io Ask DeepWiki

NVIDIA NeMo Relay

NVIDIA NeMo Relay helps see and control what happens inside agent runs without rewriting the agent stack already made. It gives coding agents, applications, framework integrations, middleware, and observability backends a shared runtime for scopes, policy, plugins, and lifecycle events.

Where To Start

Goal Start With...
Observe Codex, Claude Code, Cursor, or Hermes locally via CLI Quick Start CLI
Instrument app-owned LLM or tool calls Quick Start Application
Use LangChain, LangGraph, Deep Agents, or OpenClaw Supported Integrations
Build a framework or provider integration Integrate into Frameworks
Export ATOF, ATIF, OpenTelemetry, or OpenInference Observability Plugin
Package reusable middleware or exporters Build Plugins
Develop or test this repository from source CONTRIBUTING.md

Quick Start CLI

A good first step is to record a real agent run on disk. Once Relay is writing raw events and a trajectory file, there is something concrete to inspect, debug, and build from.

Local Agent Trajectory

This walkthrough shows an end-to-end quick success setup. Install the nemo-relay-cli, turn on local exporters, run either Codex or Claude Code through Relay, and check that Relay wrote both raw events and normalized trajectories.

1. Install the CLI

cargo install nemo-relay-cli

If using cargo-binstall, the CLI can also be installed with:

cargo binstall nemo-relay-cli

2. Enable Local Observability Output

From the project directory ready to be observed, open the project-scoped plugin editor:

nemo-relay plugins edit --project

The editor creates or updates the nearest project plugin file at .nemo-relay/plugins.toml. In the menu:

  1. Enable the Observability component.

  2. Open ATOF, toggle the section [on]

    Optionally set:

    • output_directory to .nemo-relay/atof
    • filename to events.jsonl
    • mode to overwrite
  3. Open ATIF, toggle the section [on]

    Optionally set:

    • output_directory to .nemo-relay/atif
    • filename_template to trajectory-{session_id}.json
  4. Press p to preview the generated TOML.

  5. Press s to save.

Note

Use nemo-relay plugins edit without --project only if needing to use these exporter settings in a user-level Relay config instead of a specific project.

3. Run Codex or Claude Code Through Relay

Use either host CLI that is installed on a machine. For example:

nemo-relay codex -- exec "Summarize this repository."
nemo-relay claude -- "Summarize this repository."

Refer to the full Quick Start CLI docs for more options.

The transparent wrapper starts a local Relay gateway, injects host-specific hook and provider settings for that launched process, then shuts the gateway down when the agent exits.

Warning

Codex users may need to review and activate generated hooks before events appear. Using the Codex Desktop App also adds further complications. Refer to the Codex CLI guide for the current hook activation caveat and troubleshooting steps.

4. Verify the Run

After the run exits, check that raw events and trajectory files were written. If the optionally set output directory and file name were used:

test -s .nemo-relay/atof/events.jsonl
ls .nemo-relay/atif/*.json
for file in .nemo-relay/atif/*.json; do
  python3 -m json.tool "$file" >/dev/null
done

Then verify that at least one raw ATOF 0.1 event exists:

python3 - <<'PY'
from pathlib import Path
import json

events_path = Path(".nemo-relay/atof/events.jsonl")
events = [
    json.loads(line)
    for line in events_path.read_text().splitlines()
    if line.strip()
]

assert events, "no ATOF events were written"
assert any(event.get("atof_version") == "0.1" for event in events), "no ATOF 0.1 events found"
print(f"validated {len(events)} ATOF event(s)")
PY

A successful run creates several outputs to inspect:

  • .nemo-relay/atof/events.jsonl as the raw canonical event stream.
  • One or more .nemo-relay/atif/*.json trajectory files for analysis and evaluation workflows.

Tip

If raw ATOF events exist but LLM spans are missing, provider traffic probably isn't flowing through the Relay gateway. If ATIF is missing, make sure the agent session or turn ended and the output directory is writable.

Next Steps

Go to the full NeMo Relay CLI docs for persistent host plugin installation, gateway configuration, exporter options, and agent-specific diagnostics.

Tip

Start by trusting the raw Agent Trajectory Observability Format (ATOF) JSONL. It shows the lifecycle events Relay actually captured before anything is translated into Agent Trajectory Interchange Format (ATIF), OpenTelemetry, or OpenInference output.

Quick Start Applications

If writing the code that calls the model or tool, install the binding for the appropriate language and route that boundary through Relay directly.

Application Trajectory

Install Relay for the application language:

# Python
uv add nemo-relay

# Node.js
# Requires Node.js 24 or newer.
npm install nemo-relay-node

# Rust
cargo add nemo-relay

Then run a minimal example workflow for that binding:

What Relay Adds

Relay is the liaison between agent systems. A production application may combine NeMo Agent Toolkit, LangChain, LangGraph, provider SDKs, custom harness code, NeMo Guardrails, tracing systems, and evaluation pipelines. Relay gives those pieces one runtime contract instead of asking every layer to invent its own wrappers and trace vocabulary.

Relay gives those systems:

  • Scopes so runs, turns, tools, LLM calls, and subagents have clear ownership, parent-child lineage, cleanup boundaries, and request isolation.
  • Managed LLM and tool calls so the same lifecycle and middleware rules apply around each callback.
  • Middleware for the places where Relay must block, sanitize, transform, route, retry, or replace execution.
  • Plugins so reusable observability, guardrail, adaptive, and exporter behavior can be turned on from configuration.
  • Events and subscribers so raw ATOF, normalized ATIF, OpenTelemetry, and OpenInference output all come from the same runtime stream.

Relay does not replace frameworks, model provider, application logic, observability backend, or guardrail authoring system. It gives those systems a common boundary to meet.

flowchart LR
    App[Application, Framework, or CLI Harness]

    subgraph Runtime[NeMo Relay Runtime]
        direction TB
        Scopes[Scopes]
        Middleware[Middleware]
        Plugins[Plugins]
        Events[Lifecycle Events]
    end

    Output[Subscribers and Exporters]

    App --> Scopes
    App --> Middleware
    Plugins --> Middleware
    Scopes --> Events
    Middleware --> Events
    Events --> Output
Loading

Support Status

Note

The main supported paths today are Rust, Python, and Node.js. Go, WebAssembly, and raw C FFI are available for source-first users, but they are still experimental.

The following table shows which language bindings and CLI features are currently supported:

Binding Status Notes
Python Fully supported Documented with Quick Start and Guides.
Node.js Fully supported Documented with Quick Start and Guides.
Rust Fully supported Documented with Quick Start and Guides.
NeMo Relay CLI Supported Local observability and hook-backed security are supported; optimization is partial and host-dependent.
Go Experimental Source-first under go/nemo_relay.
WebAssembly Experimental Source-first under crates/wasm.
FFI Experimental Source-first under crates/ffi.

Agent Harness Support

The CLI support matrix separates the supported CLI surface from host-specific coverage.

  • Observability works for the listed harnesses.
  • Security is supported when the host exposes blocking hooks.
  • Optimization remains partial and host-dependent.
Agent Observability Security Optimization Notes
Claude Code Yes Yes Partial Hook forwarding, pre-tool blocking, and gateway-routed LLM observability are supported.
Codex Yes Yes Partial Hook activation is required; missing session-end behavior limits trajectory finalization and full optimization coverage.
Hermes Agent Yes Yes Partial Hook forwarding, pre-tool blocking, and gateway-routed or hook-backed LLM observability are supported.
Cursor Partial Limited No Missing hooks under cursor-agent and manual gateway routing limit full feature coverage.

Public API Integrations

Use these integrations when the framework exposes stable callbacks, middleware, or plugin hooks that preserve enough lifecycle fidelity.

Agent / Library Observability Security Optimization Notes
LangChain Yes Yes Yes Wrapped tool and LLM calling.
LangGraph Yes Yes Yes Wrapped tool and LLM calling.
Deep Agents Yes Yes Yes Wrapped tool and LLM calling.
OpenClaw Yes Partial No Hook-backed telemetry with pre-tool guardrails. Managed execution rewrites require the patch-based integration.

The Python nemo-relay package ships extras for LangChain, LangGraph, and Deep Agents:

uv add "nemo-relay[langchain,langgraph,deepagents]"

Refer to Supported Integrations for setup guides and current caveats.

Patch-Based Integrations

Patch-based integrations are experimental samples maintained against pinned upstream checkouts. Use third_party/README.md for the clone, checkout, and patch-application workflow.

Integration Observability Security Optimization Notes
LangChain, LangGraph, LangChain NVIDIA Yes Yes Yes Directly patches behavior into code.
opencode Yes Yes Yes Directly patches behavior into code.
OpenClaw Yes Yes Yes Adds middleware support to OpenClaw and a built-in plugin.
Hermes Agent Yes Yes Yes Directly patches behavior into code.

Documentation

End-user documentation lives at NVIDIA NeMo Relay documentation.

Important local entry points:

For source builds, tests, and contribution workflow, refer to CONTRIBUTING.md.

Roadmap

  • NemoClaw support and integration for managed tool and LLM execution flows.
  • Deeper NVIDIA NeMo ecosystem integration across agent, guardrail, evaluation, and observability workflows.
  • Expanded adaptive optimization capabilities for performance-aware scheduling, hints, and cache behavior.
  • First-party plugins and packages for common agent runtimes and frameworks where upstream extension points allow it.

License

NVIDIA NeMo Relay is licensed under the Apache License 2.0.

About

Multi-language agent runtime for execution scope management, lifecycle events, and middleware on tool and LLM calls.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors