fix: Dedupe overlapping LLM spans in ATIF export by bbednarski9 · Pull Request #183 · NVIDIA/NeMo-Relay

bbednarski9 · 2026-05-29T17:16:15Z

Overview

Adds ATIF exporter de-duplication for overlapping LLM spans that represent the same physical provider request, such as a hook-observed span and a gateway-observed span.

I confirm this contribution is my own work, or I have the right to submit it under this project's license.
I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

Some harnesses can emit multiple LLM spans for one underlying request. Without de-duplication, ATIF can contain repeated user/agent steps for a single model call.

This PR adds an exporter pre-pass that collects complete LLM start/end span candidates, detects overlapping duplicates under the same parent/model, suppresses the lower-fidelity span, and merges metrics from the suppressed span into the canonical step when needed.

It also adds tests for overlapping hook/gateway spans, preferring a higher-fidelity gateway span over a non-exact hook summary, and preserving sequential same-content LLM calls as separate steps.

Where should the reviewer start?

Start with crates/core/src/observability/atif.rs, specifically build_llm_dedupe, same_physical_llm_request, and llm_event_fidelity_score. The main tests are in crates/core/tests/unit/atif_tests.rs around test_exporter_dedupes_overlapping_hook_and_gateway_llm_spans.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to [Bug]: duplicate LLM steps when CLI hook-forward and gateway instrumentation both observe the same request #176

Summary by CodeRabbit

New Features
- Enhanced ATIF exporter with improved LLM span deduplication and token-metric consolidation from multiple instrumentation sources.
Bug Fixes
- Fixed metric handling for overlapping LLM requests to prevent inaccurate data consolidation.
Tests
- Added comprehensive unit tests validating LLM span deduplication behavior across instrumentation scenarios.

copy-pr-bot · 2026-05-29T17:16:20Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-29T17:16:24Z

Walkthrough

This PR adds LLM span deduplication and metric consolidation to the ATIF exporter. It detects when hook and gateway instrumentation capture the same physical LLM request, suppresses lower-fidelity duplicates, merges their token metrics, and integrates these results into event-to-step conversion to eliminate duplicate agent steps.

Changes

LLM Span Deduplication

Layer / File(s)	Summary
Metric merging utilities `crates/core/src/observability/atif.rs`	Adds private helper functions to combine a primary `AtifMetrics` with supplemental metrics by backfilling missing token/cost fields and merging `extra` JSON object keys.
Deduplication data structures and algorithm `crates/core/src/observability/atif.rs`	Extends `EventLookupMaps` with `suppressed_llm_events` and `supplemental_llm_metrics` fields. Implements deduplication algorithm that collects LLM span candidates from scope start/end events, builds request/response signatures, compares candidates by parent correlation, model compatibility, time overlap, and request/response match to detect identical physical requests including hook/gateway complementarity, assigns fidelity scores, suppresses lower-fidelity duplicates, and records canonical metrics.
Event conversion integration `crates/core/src/observability/atif.rs`	Integrates deduplication results: `StepConversionState::handle_event` returns early for suppressed LLM events; `handle_llm_end` merges extracted metrics with supplemental metrics from dedupe lookups and writes merged result into the agent step.
Deduplication behavior validation `crates/core/tests/unit/atif_tests.rs`	Three test cases: overlapping hook and gateway spans are correctly deduped with ancestry and metrics preserved; gateway spans are preferred over non-exact hook summaries; sequential spans with identical content are not incorrectly collapsed and gateway ancestry is retained.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

[Bug]: duplicate LLM steps when CLI hook-forward and gateway instrumentation both observe the same request #176: Implements LLM-span deduplication and hook/gateway metric merging to suppress duplicate ATIF steps from overlapping instrumentation.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.30% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title follows Conventional Commits format with type 'fix' and concise imperative summary describing the main change (LLM span deduplication in ATIF export).
Description check	✅ Passed	The description is well-structured, covering overview, details, reviewer entry points, and related issue. All critical sections from the template are completed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Add an ATIF exporter pre-pass that identifies duplicate LLM spans emitted for the same physical request, such as overlapping hook-observed and gateway-observed spans. Prefer the higher-fidelity span as the canonical step, suppress the duplicate step, and merge token metrics from the suppressed span when the canonical span is missing them. This prevents duplicate user/agent steps in ATIF when multiple instrumentation paths observe the same LLM call, while preserving sequential same-content calls as distinct events. Also adds regression coverage for: - overlapping hook and gateway LLM spans - preferring gateway spans over non-exact hook summaries - preserving sequential repeated LLM calls Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 · 2026-05-29T19:19:57Z

/ok to test 426270b

github-actions · 2026-05-29T19:23:52Z

Fern docs preview: https://nvidia-preview-pull-request-183.docs.buildwithfern.com/nemo/relay (https://nvidia-preview-pull-request-183.docs.buildwithfern.com/nemo/relay)

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/core/src/observability/atif.rs`:
- Around line 1277-1295: The complementary dedupe path
(complementary_hook_and_gateway_spans) is too permissive and needs an extra
guard to avoid collapsing concurrent distinct requests; update
complementary_hook_and_gateway_spans to only return true when the hook/gateway
polarity condition holds AND either the request_signature matches
(left.request_signature == right.request_signature) or a shared request
correlation key is equal (e.g., compare a correlation field on LlmSpanCandidate
such as request_correlation/request_id if present). Then ensure
same_physical_llm_request still uses that strengthened
complementary_hook_and_gateway_spans check so overlapping concurrent calls under
the same parent/model are not incorrectly deduplicated.
- Around line 1317-1321: When fidelity_score ties (Ordering::Equal) the code
currently does nothing; instead choose a deterministic tie-break and call
suppress_llm_span for the loser. Update the match's Equal arm to compare a
stable secondary key (e.g., span boundaries or another Ord field on the span
such as (left.span.start, left.span.end) or left.source_id/text) and then call
suppress_llm_span(left, right, lookups) or suppress_llm_span(right, left,
lookups) based on that comparison so equal-fidelity duplicates are consistently
deduped; use the existing symbols fidelity_score, left, right, and
suppress_llm_span.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: e6b68f5c-7b73-4238-8a62-80c57abf4951

📥 Commits

Reviewing files that changed from the base of the PR and between e0a09ca and 426270b.

📒 Files selected for processing (2)

crates/core/src/observability/atif.rs
crates/core/tests/unit/atif_tests.rs

📜 Review details

🧰 Additional context used

📓 Path-based instructions (11)

**/*.rs

📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)

Use snake_case naming convention for Rust identifiers (e.g., nemo_relay_tool_call)

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for all FFI work since it is Rust work
Run just test-rust to validate FFI changes
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting on FFI work

When Rust files changed as part of Go work, also run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all when Rust files are changed as part of Node work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files are changed as part of Node work
Run just test-rust when Rust files are changed as part of Node work

**/*.rs: Run cargo fmt --all to format all Rust code
Run cargo clippy --workspace --all-targets -- -D warnings to enforce all clippy lints as errors

**/*.rs: Run cargo fmt --all when Rust files changed as part of WebAssembly work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files changed as part of WebAssembly work

**/*.rs: If any Rust code changed, always run just test-rust
If any Rust code changed, also run cargo fmt --all
If any Rust code changed, also run cargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting with cargo fmt --all
Run Rust linting with cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Keep SPDX headers on Rust source files. The project is Apache-2.0.
Use snake_case for Rust binding naming conventions.
Use Json = serde_json::Value in Rust-facing runtime APIs where the existing code expects JSON payloads.
Use Result<T> with FlowError in core runtime paths. Keep errors explicit and binding-appropriate at the wrapper layer.
Preserve async behavior on the existing tokio-based model i...

Files:

crates/core/tests/unit/atif_tests.rs
crates/core/src/observability/atif.rs

{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Maintain documented and tested validation and report behavior for adaptive surfaces

Files:

crates/core/tests/unit/atif_tests.rs

**/{Cargo.toml,**/*.rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Maintain consistency between Rust package names in Cargo.toml and their actual usage across the codebase

Files:

crates/core/tests/unit/atif_tests.rs
crates/core/src/observability/atif.rs

**/*.{h,hpp,c,cpp,rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Ensure FFI header and library naming follows consistent conventions across platform-specific builds

Files:

crates/core/tests/unit/atif_tests.rs
crates/core/src/observability/atif.rs

{crates/core,crates/adaptive}/**/*

📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)

Changes to crates/core or crates/adaptive must run the full language matrix

Files:

crates/core/tests/unit/atif_tests.rs
crates/core/src/observability/atif.rs

**/*.{rs,toml}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update Rust crate names and module prefixes during coordinated rename operations

Files:

crates/core/tests/unit/atif_tests.rs
crates/core/src/observability/atif.rs

crates/core/**/*.rs

📄 CodeRabbit inference engine (.agents/skills/test-go-binding/SKILL.md)

If the change touched crates/core or shared runtime semantics, also use validate-change for broader validation

Files:

crates/core/tests/unit/atif_tests.rs
crates/core/src/observability/atif.rs

crates/{core,adaptive}/**

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

If crates/core or crates/adaptive changed, run the full matrix across Rust, Python, Go, Node.js, and WebAssembly

Files:

crates/core/tests/unit/atif_tests.rs
crates/core/src/observability/atif.rs

crates/{core,adaptive}/**/*.rs

⚙️ CodeRabbit configuration file

crates/{core,adaptive}/**/*.rs: Review the Rust runtime for async correctness, scope isolation, middleware ordering, and event lifecycle regressions.
Pay close attention to task-local/thread-local scope propagation, callback lifetimes, stream finalization, and root_uuid isolation.
Public API changes should preserve existing behavior unless tests and docs show the intended migration path.

Files:

crates/core/tests/unit/atif_tests.rs
crates/core/src/observability/atif.rs

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

crates/core/tests/unit/atif_tests.rs

crates/core/src/observability/{atif,otel,openinference}.rs

📄 CodeRabbit inference engine (.agents/skills/maintain-observability/SKILL.md)

When changing event fields in ATIF, OpenTelemetry, or OpenInference observability surfaces, keep the core event model in crates/core/src/observability/atif.rs, crates/core/src/observability/otel.rs, and crates/core/src/observability/openinference.rs in sync

Files:

crates/core/src/observability/atif.rs

🔇 Additional comments (2)

crates/core/tests/unit/atif_tests.rs (1)

1812-2076: LGTM!

crates/core/src/observability/atif.rs (1)

1149-1170: Run required validation checks for crates/core change (crates/core/src/observability/atif.rs, from_events_with_correlation_events)

Run:

cargo fmt --all --check

cargo clippy --workspace --all-targets -- -D warnings

just test-rust

validate-change (crates/core validation)

uv run pre-commit run --all-files (cross-language/tooling hygiene + SPDX/header checks)

willkill07 · 2026-05-29T20:35:04Z

/merge

bbednarski9 · 2026-05-29T20:36:48Z

Thanks will

#### Overview Adds ATIF exporter de-duplication for overlapping LLM spans that represent the same physical provider request, such as a hook-observed span and a gateway-observed span. - [x] I confirm this contribution is my own work, or I have the right to submit it under this project's license. - [x] I searched existing issues and open pull requests, and this does not duplicate existing work. #### Details Some harnesses can emit multiple LLM spans for one underlying request. Without de-duplication, ATIF can contain repeated user/agent steps for a single model call. This PR adds an exporter pre-pass that collects complete LLM start/end span candidates, detects overlapping duplicates under the same parent/model, suppresses the lower-fidelity span, and merges metrics from the suppressed span into the canonical step when needed. It also adds tests for overlapping hook/gateway spans, preferring a higher-fidelity gateway span over a non-exact hook summary, and preserving sequential same-content LLM calls as separate steps. #### Where should the reviewer start? Start with `crates/core/src/observability/atif.rs`, specifically `build_llm_dedupe`, `same_physical_llm_request`, and `llm_event_fidelity_score`. The main tests are in `crates/core/tests/unit/atif_tests.rs` around `test_exporter_dedupes_overlapping_hook_and_gateway_llm_spans`. #### Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to) - Relates to NVIDIA#176 ## Summary by CodeRabbit * **New Features** * Enhanced ATIF exporter with improved LLM span deduplication and token-metric consolidation from multiple instrumentation sources. * **Bug Fixes** * Fixed metric handling for overlapping LLM requests to prevent inaccurate data consolidation. * **Tests** * Added comprehensive unit tests validating LLM span deduplication behavior across instrumentation scenarios. [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Relay/pull/183?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - Bryan Bednarski (https://github.com/bbednarski9) Approvers: - Will Killian (https://github.com/willkill07) URL: NVIDIA#183 Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

github-actions Bot added size:L PR is large lang:rust PR changes/introduces Rust code labels May 29, 2026

bbednarski9 force-pushed the bbednarski/issue-176-atif-llm-dedupe branch from f8658b7 to b0fb15c Compare May 29, 2026 17:22

bbednarski9 changed the base branch from main to release/0.3 May 29, 2026 17:22

willkill07 changed the title ~~Dedupe overlapping LLM spans in ATIF export~~ fix: Dedupe overlapping LLM spans in ATIF export May 29, 2026

willkill07 assigned bbednarski9 May 29, 2026

willkill07 added this to the 0.3 milestone May 29, 2026

github-actions Bot added the Bug issue describes bug; PR fixes bug label May 29, 2026

bbednarski9 added 2 commits May 29, 2026 12:12

test: unwrap ATIF exporter results in dedupe tests

2fe2adf

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

Merge branch 'release/0.3' into bbednarski/issue-176-atif-llm-dedupe

426270b

bbednarski9 marked this pull request as ready for review May 29, 2026 19:19

bbednarski9 requested a review from a team as a code owner May 29, 2026 19:19

copy-pr-bot Bot temporarily deployed to fern May 29, 2026 19:20 Inactive

willkill07 approved these changes May 29, 2026

View reviewed changes

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Comment thread crates/core/src/observability/atif.rs

Comment thread crates/core/src/observability/atif.rs

rapids-bot Bot merged commit f1a69c2 into NVIDIA:release/0.3 May 29, 2026
122 of 125 checks passed

rapids-bot Bot temporarily deployed to fern May 29, 2026 20:35 Inactive

coderabbitai Bot mentioned this pull request May 29, 2026

fix: repair Hermes gateway session fallback #189

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Dedupe overlapping LLM spans in ATIF export#183

fix: Dedupe overlapping LLM spans in ATIF export#183
rapids-bot[bot] merged 3 commits into
NVIDIA:release/0.3from
bbednarski9:bbednarski/issue-176-atif-llm-dedupe

bbednarski9 commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

bbednarski9 commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

willkill07 commented May 29, 2026

Uh oh!

Uh oh!

bbednarski9 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bbednarski9 commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Details

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

❌ Failed checks (1 warning)

Uh oh!

bbednarski9 commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

willkill07 commented May 29, 2026

Uh oh!

Uh oh!

bbednarski9 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bbednarski9 commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading