Skip to content

fix: repair Hermes gateway session fallback#189

Merged
rapids-bot[bot] merged 3 commits into
NVIDIA:release/0.3from
bbednarski9:bbednarski/issue-176-gateway-session-fallback-clean
May 29, 2026
Merged

fix: repair Hermes gateway session fallback#189
rapids-bot[bot] merged 3 commits into
NVIDIA:release/0.3from
bbednarski9:bbednarski/issue-176-gateway-session-fallback-clean

Conversation

@bbednarski9

@bbednarski9 bbednarski9 commented May 29, 2026

Copy link
Copy Markdown
Contributor

Overview

Fixes the Hermes gateway session fallback and tightens ATIF LLM dedupe so complementary hook/gateway spans are only collapsed when they represent the same physical request.

  • I confirm this contribution is my own work, or I have the right to submit it under this project's license.
  • I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

  • Uses the OpenAI-compatible request body session_id as a gateway fallback when explicit session headers are absent.
  • Keeps the existing explicit Claude/Codex session fallbacks ahead of the OpenAI body fallback.
  • Requires complementary hook/gateway LLM spans to share a request signature or strong request correlation key before ATIF dedupes them.
  • Adds regression coverage for gateway fallback selection and concurrent overlapping LLM spans that should remain distinct.

Where should the reviewer start?

Start with crates/cli/src/alignment/mod.rs for the gateway fallback behavior, then review crates/core/src/observability/atif.rs for the strengthened complementary hook/gateway dedupe guard. The focused regression tests are in crates/cli/tests/coverage/alignment_tests.rs, crates/cli/tests/coverage/gateway_tests.rs, and crates/core/tests/unit/atif_tests.rs.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • Bug Fixes
    • Session ID resolution enhanced to properly support OpenAI-compatible API request formats, including additional fallback to request body identifiers
    • LLM span correlation and deduplication logic improved with request-level identifier matching, enabling more accurate observability tracking and better event correlation for request tracing

Review Change Stack

  Add a gateway session-id fallback for OpenAI-compatible requests that
  carry a top-level  in the request body. Explicit relay headers,
  Claude headers, and Codex prompt-cache routing still take precedence.

  This lets CLI/plugin harnesses keep gateway-observed LLM calls aligned to
  the intended session even when no relay session header is available.

  Also adds regression coverage for:
  - OpenAI chat completions body

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
@bbednarski9 bbednarski9 requested a review from a team as a code owner May 29, 2026 21:18
@github-actions github-actions Bot added size:M PR is medium Bug issue describes bug; PR fixes bug labels May 29, 2026
@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

Walkthrough

The PR improves request identification across two layers: CLI routing now falls back to a generic OpenAI-compatible session_id field in request bodies, and the ATIF exporter now deduplicates overlapping hook and gateway LLM spans using request correlation keys extracted from event metadata. This addresses duplicate ATIF steps when both CLI-integrated agent hooks and gateway instrumentation observe the same request.

Changes

Request Correlation Enhancements

Layer / File(s) Summary
OpenAI body session_id fallback in CLI routing
crates/cli/src/alignment/mod.rs, crates/cli/tests/coverage/alignment_tests.rs, crates/cli/tests/coverage/gateway_tests.rs
gateway_session_id now extracts and validates a trimmed session_id from the request body for OpenAI routes only (OpenAiChatCompletions and OpenAiResponses), falling back after explicit headers and provider-specific routes. New test gateway_session_id_accepts_openai_body_session_id_fallback validates trimming, empty/invalid rejection, and route-specific behavior.
ATIF request correlation keys for span deduplication
crates/core/src/observability/atif.rs, crates/core/tests/unit/atif_tests.rs
LlmSpanCandidate gains a request_correlation_keys field populated from event metadata paths (api_call_id, llm_correlation_request_id, etc.). Complementary hook+gateway span matching now accepts spans as equivalent when request_signature differs but spans share at least one correlation key, eliminating duplicate ATIF steps for the same physical request. New test test_exporter_keeps_overlapping_non_exact_hook_and_gateway_spans_without_shared_request_key verifies behavior when spans lack shared keys.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/NeMo-Relay#183: Both PRs modify ATIF span deduplication logic; PR #183 introduces/expands the LLM span dedupe pre-pass while this PR extends matching to use request correlation keys.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title follows Conventional Commits format with lowercase type 'fix', concise imperative summary, no scope, no breaking change indicator, under 72 characters, and no trailing period.
Description check ✅ Passed The description includes all required sections: Overview with confirmations, Details explaining changes, Where should the reviewer start with file guidance, and Related Issues with Closes keyword and issue number.
Linked Issues check ✅ Passed The PR directly addresses #176 by implementing gateway session fallback for OpenAI-compatible routes and tightening ATIF LLM deduplication to require shared request signatures or correlation keys for complementary hook/gateway spans.
Out of Scope Changes check ✅ Passed All code changes in alignment, ATIF deduplication, and test modules are scoped to fixing the gateway session fallback and preventing duplicate ATIF LLM steps, directly addressing the requirements in #176.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the lang:rust PR changes/introduces Rust code label May 29, 2026
@willkill07 willkill07 added this to the 0.3 milestone May 29, 2026
@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown

@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/core/tests/unit/atif_tests.rs`:
- Around line 2100-2113: The test currently asserts only that each function name
appears more than zero times; change the assertions to check exact counts (2
each) by reusing the existing closure function_name_count and asserting
function_name_count("openrouter") == 2 and
function_name_count("openai.chat_completions") == 2 so that with
trajectory.steps.len() == 4 you validate each function_name in
step.extra.ancestry appears exactly twice.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 1d2c80b3-1794-48b2-8abb-761cda1f4e38

📥 Commits

Reviewing files that changed from the base of the PR and between f1a69c2 and d1c6dc0.

📒 Files selected for processing (5)
  • crates/cli/src/alignment/mod.rs
  • crates/cli/tests/coverage/alignment_tests.rs
  • crates/cli/tests/coverage/gateway_tests.rs
  • crates/core/src/observability/atif.rs
  • crates/core/tests/unit/atif_tests.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Check / Run
  • GitHub Check: Preview docs
🧰 Additional context used
📓 Path-based instructions (11)
**/*.rs

📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)

Use snake_case naming convention for Rust identifiers (e.g., nemo_relay_tool_call)

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for all FFI work since it is Rust work
Run just test-rust to validate FFI changes
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting on FFI work

When Rust files changed as part of Go work, also run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all when Rust files are changed as part of Node work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files are changed as part of Node work
Run just test-rust when Rust files are changed as part of Node work

**/*.rs: Run cargo fmt --all to format all Rust code
Run cargo clippy --workspace --all-targets -- -D warnings to enforce all clippy lints as errors

**/*.rs: Run cargo fmt --all when Rust files changed as part of WebAssembly work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files changed as part of WebAssembly work

**/*.rs: If any Rust code changed, always run just test-rust
If any Rust code changed, also run cargo fmt --all
If any Rust code changed, also run cargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting with cargo fmt --all
Run Rust linting with cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Keep SPDX headers on Rust source files. The project is Apache-2.0.
Use snake_case for Rust binding naming conventions.
Use Json = serde_json::Value in Rust-facing runtime APIs where the existing code expects JSON payloads.
Use Result<T> with FlowError in core runtime paths. Keep errors explicit and binding-appropriate at the wrapper layer.
Preserve async behavior on the existing tokio-based model i...

Files:

  • crates/cli/src/alignment/mod.rs
  • crates/cli/tests/coverage/alignment_tests.rs
  • crates/core/src/observability/atif.rs
  • crates/cli/tests/coverage/gateway_tests.rs
  • crates/core/tests/unit/atif_tests.rs
**/{Cargo.toml,**/*.rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Maintain consistency between Rust package names in Cargo.toml and their actual usage across the codebase

Files:

  • crates/cli/src/alignment/mod.rs
  • crates/cli/tests/coverage/alignment_tests.rs
  • crates/core/src/observability/atif.rs
  • crates/cli/tests/coverage/gateway_tests.rs
  • crates/core/tests/unit/atif_tests.rs
**/*.{h,hpp,c,cpp,rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Ensure FFI header and library naming follows consistent conventions across platform-specific builds

Files:

  • crates/cli/src/alignment/mod.rs
  • crates/cli/tests/coverage/alignment_tests.rs
  • crates/core/src/observability/atif.rs
  • crates/cli/tests/coverage/gateway_tests.rs
  • crates/core/tests/unit/atif_tests.rs
**/*.{rs,toml}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update Rust crate names and module prefixes during coordinated rename operations

Files:

  • crates/cli/src/alignment/mod.rs
  • crates/cli/tests/coverage/alignment_tests.rs
  • crates/core/src/observability/atif.rs
  • crates/cli/tests/coverage/gateway_tests.rs
  • crates/core/tests/unit/atif_tests.rs
{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Maintain documented and tested validation and report behavior for adaptive surfaces

Files:

  • crates/cli/tests/coverage/alignment_tests.rs
  • crates/cli/tests/coverage/gateway_tests.rs
  • crates/core/tests/unit/atif_tests.rs
{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

  • crates/cli/tests/coverage/alignment_tests.rs
  • crates/cli/tests/coverage/gateway_tests.rs
  • crates/core/tests/unit/atif_tests.rs
crates/core/src/observability/{atif,otel,openinference}.rs

📄 CodeRabbit inference engine (.agents/skills/maintain-observability/SKILL.md)

When changing event fields in ATIF, OpenTelemetry, or OpenInference observability surfaces, keep the core event model in crates/core/src/observability/atif.rs, crates/core/src/observability/otel.rs, and crates/core/src/observability/openinference.rs in sync

Files:

  • crates/core/src/observability/atif.rs
{crates/core,crates/adaptive}/**/*

📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)

Changes to crates/core or crates/adaptive must run the full language matrix

Files:

  • crates/core/src/observability/atif.rs
  • crates/core/tests/unit/atif_tests.rs
crates/core/**/*.rs

📄 CodeRabbit inference engine (.agents/skills/test-go-binding/SKILL.md)

If the change touched crates/core or shared runtime semantics, also use validate-change for broader validation

Files:

  • crates/core/src/observability/atif.rs
  • crates/core/tests/unit/atif_tests.rs
crates/{core,adaptive}/**

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

If crates/core or crates/adaptive changed, run the full matrix across Rust, Python, Go, Node.js, and WebAssembly

Files:

  • crates/core/src/observability/atif.rs
  • crates/core/tests/unit/atif_tests.rs
crates/{core,adaptive}/**/*.rs

⚙️ CodeRabbit configuration file

crates/{core,adaptive}/**/*.rs: Review the Rust runtime for async correctness, scope isolation, middleware ordering, and event lifecycle regressions.
Pay close attention to task-local/thread-local scope propagation, callback lifetimes, stream finalization, and root_uuid isolation.
Public API changes should preserve existing behavior unless tests and docs show the intended migration path.

Files:

  • crates/core/src/observability/atif.rs
  • crates/core/tests/unit/atif_tests.rs
🔇 Additional comments (11)
crates/cli/src/alignment/mod.rs (2)

247-261: LGTM!


263-275: LGTM!

crates/cli/tests/coverage/alignment_tests.rs (2)

93-121: LGTM!


123-160: LGTM!

crates/cli/tests/coverage/gateway_tests.rs (1)

264-341: LGTM!

crates/core/src/observability/atif.rs (4)

1197-1197: LGTM!


1252-1252: LGTM!


1279-1332: LGTM!


1347-1364: LGTM!

crates/core/tests/unit/atif_tests.rs (2)

1938-1938: LGTM!

Also applies to: 1952-1955, 1963-1966, 1976-1976


2016-2114: Fix toolchain for cargo fmt, then run full Rust validation for this crates/core test change

  • cargo fmt --all failed with error: no such command: 'fmt' → ensure the Rust toolchain includes rustfmt (e.g., rustup component add rustfmt) and rerun cargo fmt --all
  • Run cargo clippy --workspace --all-targets -- -D warnings
  • Run just test-rust
  • Since crates/core changed, also run the full language matrix (Rust/Python/Go/Node.js/WebAssembly) per the project guidelines

Comment thread crates/core/tests/unit/atif_tests.rs
@bbednarski9

Copy link
Copy Markdown
Contributor Author

/ok to test efd8036

@willkill07

Copy link
Copy Markdown
Member

/merge

@rapids-bot rapids-bot Bot merged commit 40bd7a0 into NVIDIA:release/0.3 May 29, 2026
70 checks passed
yczhang-nv pushed a commit to yczhang-nv/NeMo-Flow that referenced this pull request Jun 3, 2026
#### Overview

Fixes the Hermes gateway session fallback and tightens ATIF LLM dedupe so complementary hook/gateway spans are only collapsed when they represent the same physical request.

- [x] I confirm this contribution is my own work, or I have the right to submit it under this project's license.
- [x] I searched existing issues and open pull requests, and this does not duplicate existing work.

#### Details

- Uses the OpenAI-compatible request body session_id as a gateway fallback when explicit session headers are absent.
- Keeps the existing explicit Claude/Codex session fallbacks ahead of the OpenAI body fallback.
- Requires complementary hook/gateway LLM spans to share a request signature or strong request correlation key before ATIF dedupes them.
- Adds regression coverage for gateway fallback selection and concurrent overlapping LLM spans that should remain distinct.

#### Where should the reviewer start?

Start with `crates/cli/src/alignment/mod.rs` for the gateway fallback behavior, then review `crates/core/src/observability/atif.rs` for the strengthened complementary hook/gateway dedupe guard. The focused regression tests are in `crates/cli/tests/coverage/alignment_tests.rs`, `crates/cli/tests/coverage/gateway_tests.rs`, and `crates/core/tests/unit/atif_tests.rs`.

#### Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

- Closes NVIDIA#176

## Summary by CodeRabbit

* **Bug Fixes**
  * Session ID resolution enhanced to properly support OpenAI-compatible API request formats, including additional fallback to request body identifiers
  * LLM span correlation and deduplication logic improved with request-level identifier matching, enabling more accurate observability tracking and better event correlation for request tracing

[![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Relay/pull/189?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

Authors:
  - Bryan Bednarski (https://github.com/bbednarski9)

Approvers:
  - Will Killian (https://github.com/willkill07)

URL: NVIDIA#189
Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug issue describes bug; PR fixes bug lang:rust PR changes/introduces Rust code size:M PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants