Skip to content

fix: Dedupe overlapping LLM spans in ATIF export#183

Merged
rapids-bot[bot] merged 3 commits into
NVIDIA:release/0.3from
bbednarski9:bbednarski/issue-176-atif-llm-dedupe
May 29, 2026
Merged

fix: Dedupe overlapping LLM spans in ATIF export#183
rapids-bot[bot] merged 3 commits into
NVIDIA:release/0.3from
bbednarski9:bbednarski/issue-176-atif-llm-dedupe

Conversation

@bbednarski9

@bbednarski9 bbednarski9 commented May 29, 2026

Copy link
Copy Markdown
Contributor

Overview

Adds ATIF exporter de-duplication for overlapping LLM spans that represent the same physical provider request, such as a hook-observed span and a gateway-observed span.

  • I confirm this contribution is my own work, or I have the right to submit it under this project's license.
  • I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

Some harnesses can emit multiple LLM spans for one underlying request. Without de-duplication, ATIF can contain repeated user/agent steps for a single model call.

This PR adds an exporter pre-pass that collects complete LLM start/end span candidates, detects overlapping duplicates under the same parent/model, suppresses the lower-fidelity span, and merges metrics from the suppressed span into the canonical step when needed.

It also adds tests for overlapping hook/gateway spans, preferring a higher-fidelity gateway span over a non-exact hook summary, and preserving sequential same-content LLM calls as separate steps.

Where should the reviewer start?

Start with crates/core/src/observability/atif.rs, specifically build_llm_dedupe, same_physical_llm_request, and llm_event_fidelity_score. The main tests are in crates/core/tests/unit/atif_tests.rs around test_exporter_dedupes_overlapping_hook_and_gateway_llm_spans.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • Enhanced ATIF exporter with improved LLM span deduplication and token-metric consolidation from multiple instrumentation sources.
  • Bug Fixes

    • Fixed metric handling for overlapping LLM requests to prevent inaccurate data consolidation.
  • Tests

    • Added comprehensive unit tests validating LLM span deduplication behavior across instrumentation scenarios.

Review Change Stack

@copy-pr-bot

copy-pr-bot Bot commented May 29, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

Walkthrough

This PR adds LLM span deduplication and metric consolidation to the ATIF exporter. It detects when hook and gateway instrumentation capture the same physical LLM request, suppresses lower-fidelity duplicates, merges their token metrics, and integrates these results into event-to-step conversion to eliminate duplicate agent steps.

Changes

LLM Span Deduplication

Layer / File(s) Summary
Metric merging utilities
crates/core/src/observability/atif.rs
Adds private helper functions to combine a primary AtifMetrics with supplemental metrics by backfilling missing token/cost fields and merging extra JSON object keys.
Deduplication data structures and algorithm
crates/core/src/observability/atif.rs
Extends EventLookupMaps with suppressed_llm_events and supplemental_llm_metrics fields. Implements deduplication algorithm that collects LLM span candidates from scope start/end events, builds request/response signatures, compares candidates by parent correlation, model compatibility, time overlap, and request/response match to detect identical physical requests including hook/gateway complementarity, assigns fidelity scores, suppresses lower-fidelity duplicates, and records canonical metrics.
Event conversion integration
crates/core/src/observability/atif.rs
Integrates deduplication results: StepConversionState::handle_event returns early for suppressed LLM events; handle_llm_end merges extracted metrics with supplemental metrics from dedupe lookups and writes merged result into the agent step.
Deduplication behavior validation
crates/core/tests/unit/atif_tests.rs
Three test cases: overlapping hook and gateway spans are correctly deduped with ancestry and metrics preserved; gateway spans are preferred over non-exact hook summaries; sequential spans with identical content are not incorrectly collapsed and gateway ancestry is retained.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title follows Conventional Commits format with type 'fix' and concise imperative summary describing the main change (LLM span deduplication in ATIF export).
Description check ✅ Passed The description is well-structured, covering overview, details, reviewer entry points, and related issue. All critical sections from the template are completed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added size:L PR is large lang:rust PR changes/introduces Rust code labels May 29, 2026
  Add an ATIF exporter pre-pass that identifies duplicate LLM spans emitted
  for the same physical request, such as overlapping hook-observed and
  gateway-observed spans. Prefer the higher-fidelity span as the canonical
  step, suppress the duplicate step, and merge token metrics from the
  suppressed span when the canonical span is missing them.

  This prevents duplicate user/agent steps in ATIF when multiple
  instrumentation paths observe the same LLM call, while preserving
  sequential same-content calls as distinct events.

  Also adds regression coverage for:
  - overlapping hook and gateway LLM spans
  - preferring gateway spans over non-exact hook summaries
  - preserving sequential repeated LLM calls

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
@bbednarski9 bbednarski9 force-pushed the bbednarski/issue-176-atif-llm-dedupe branch from f8658b7 to b0fb15c Compare May 29, 2026 17:22
@bbednarski9 bbednarski9 changed the base branch from main to release/0.3 May 29, 2026 17:22
@willkill07 willkill07 changed the title Dedupe overlapping LLM spans in ATIF export fix: Dedupe overlapping LLM spans in ATIF export May 29, 2026
@willkill07 willkill07 added this to the 0.3 milestone May 29, 2026
@github-actions github-actions Bot added the Bug issue describes bug; PR fixes bug label May 29, 2026
@bbednarski9 bbednarski9 marked this pull request as ready for review May 29, 2026 19:19
@bbednarski9 bbednarski9 requested a review from a team as a code owner May 29, 2026 19:19
@bbednarski9

Copy link
Copy Markdown
Contributor Author

/ok to test 426270b

@github-actions

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/core/src/observability/atif.rs`:
- Around line 1277-1295: The complementary dedupe path
(complementary_hook_and_gateway_spans) is too permissive and needs an extra
guard to avoid collapsing concurrent distinct requests; update
complementary_hook_and_gateway_spans to only return true when the hook/gateway
polarity condition holds AND either the request_signature matches
(left.request_signature == right.request_signature) or a shared request
correlation key is equal (e.g., compare a correlation field on LlmSpanCandidate
such as request_correlation/request_id if present). Then ensure
same_physical_llm_request still uses that strengthened
complementary_hook_and_gateway_spans check so overlapping concurrent calls under
the same parent/model are not incorrectly deduplicated.
- Around line 1317-1321: When fidelity_score ties (Ordering::Equal) the code
currently does nothing; instead choose a deterministic tie-break and call
suppress_llm_span for the loser. Update the match's Equal arm to compare a
stable secondary key (e.g., span boundaries or another Ord field on the span
such as (left.span.start, left.span.end) or left.source_id/text) and then call
suppress_llm_span(left, right, lookups) or suppress_llm_span(right, left,
lookups) based on that comparison so equal-fidelity duplicates are consistently
deduped; use the existing symbols fidelity_score, left, right, and
suppress_llm_span.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: e6b68f5c-7b73-4238-8a62-80c57abf4951

📥 Commits

Reviewing files that changed from the base of the PR and between e0a09ca and 426270b.

📒 Files selected for processing (2)
  • crates/core/src/observability/atif.rs
  • crates/core/tests/unit/atif_tests.rs
📜 Review details
🧰 Additional context used
📓 Path-based instructions (11)
**/*.rs

📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)

Use snake_case naming convention for Rust identifiers (e.g., nemo_relay_tool_call)

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for all FFI work since it is Rust work
Run just test-rust to validate FFI changes
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting on FFI work

When Rust files changed as part of Go work, also run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all when Rust files are changed as part of Node work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files are changed as part of Node work
Run just test-rust when Rust files are changed as part of Node work

**/*.rs: Run cargo fmt --all to format all Rust code
Run cargo clippy --workspace --all-targets -- -D warnings to enforce all clippy lints as errors

**/*.rs: Run cargo fmt --all when Rust files changed as part of WebAssembly work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files changed as part of WebAssembly work

**/*.rs: If any Rust code changed, always run just test-rust
If any Rust code changed, also run cargo fmt --all
If any Rust code changed, also run cargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting with cargo fmt --all
Run Rust linting with cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Keep SPDX headers on Rust source files. The project is Apache-2.0.
Use snake_case for Rust binding naming conventions.
Use Json = serde_json::Value in Rust-facing runtime APIs where the existing code expects JSON payloads.
Use Result<T> with FlowError in core runtime paths. Keep errors explicit and binding-appropriate at the wrapper layer.
Preserve async behavior on the existing tokio-based model i...

Files:

  • crates/core/tests/unit/atif_tests.rs
  • crates/core/src/observability/atif.rs
{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Maintain documented and tested validation and report behavior for adaptive surfaces

Files:

  • crates/core/tests/unit/atif_tests.rs
**/{Cargo.toml,**/*.rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Maintain consistency between Rust package names in Cargo.toml and their actual usage across the codebase

Files:

  • crates/core/tests/unit/atif_tests.rs
  • crates/core/src/observability/atif.rs
**/*.{h,hpp,c,cpp,rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Ensure FFI header and library naming follows consistent conventions across platform-specific builds

Files:

  • crates/core/tests/unit/atif_tests.rs
  • crates/core/src/observability/atif.rs
{crates/core,crates/adaptive}/**/*

📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)

Changes to crates/core or crates/adaptive must run the full language matrix

Files:

  • crates/core/tests/unit/atif_tests.rs
  • crates/core/src/observability/atif.rs
**/*.{rs,toml}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update Rust crate names and module prefixes during coordinated rename operations

Files:

  • crates/core/tests/unit/atif_tests.rs
  • crates/core/src/observability/atif.rs
crates/core/**/*.rs

📄 CodeRabbit inference engine (.agents/skills/test-go-binding/SKILL.md)

If the change touched crates/core or shared runtime semantics, also use validate-change for broader validation

Files:

  • crates/core/tests/unit/atif_tests.rs
  • crates/core/src/observability/atif.rs
crates/{core,adaptive}/**

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

If crates/core or crates/adaptive changed, run the full matrix across Rust, Python, Go, Node.js, and WebAssembly

Files:

  • crates/core/tests/unit/atif_tests.rs
  • crates/core/src/observability/atif.rs
crates/{core,adaptive}/**/*.rs

⚙️ CodeRabbit configuration file

crates/{core,adaptive}/**/*.rs: Review the Rust runtime for async correctness, scope isolation, middleware ordering, and event lifecycle regressions.
Pay close attention to task-local/thread-local scope propagation, callback lifetimes, stream finalization, and root_uuid isolation.
Public API changes should preserve existing behavior unless tests and docs show the intended migration path.

Files:

  • crates/core/tests/unit/atif_tests.rs
  • crates/core/src/observability/atif.rs
{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

  • crates/core/tests/unit/atif_tests.rs
crates/core/src/observability/{atif,otel,openinference}.rs

📄 CodeRabbit inference engine (.agents/skills/maintain-observability/SKILL.md)

When changing event fields in ATIF, OpenTelemetry, or OpenInference observability surfaces, keep the core event model in crates/core/src/observability/atif.rs, crates/core/src/observability/otel.rs, and crates/core/src/observability/openinference.rs in sync

Files:

  • crates/core/src/observability/atif.rs
🔇 Additional comments (2)
crates/core/tests/unit/atif_tests.rs (1)

1812-2076: LGTM!

crates/core/src/observability/atif.rs (1)

1149-1170: Run required validation checks for crates/core change (crates/core/src/observability/atif.rs, from_events_with_correlation_events)

Run:

  • cargo fmt --all --check
  • cargo clippy --workspace --all-targets -- -D warnings
  • just test-rust
  • validate-change (crates/core validation)
  • uv run pre-commit run --all-files (cross-language/tooling hygiene + SPDX/header checks)

Comment thread crates/core/src/observability/atif.rs
Comment thread crates/core/src/observability/atif.rs
@willkill07

Copy link
Copy Markdown
Member

/merge

@rapids-bot rapids-bot Bot merged commit f1a69c2 into NVIDIA:release/0.3 May 29, 2026
122 of 125 checks passed

Copy link
Copy Markdown
Contributor Author

Thanks will

yczhang-nv pushed a commit to yczhang-nv/NeMo-Flow that referenced this pull request Jun 3, 2026
#### Overview

Adds ATIF exporter de-duplication for overlapping LLM spans that represent the same physical provider request, such as a hook-observed span and a gateway-observed span.

- [x] I confirm this contribution is my own work, or I have the right to submit it under this project's license.
- [x] I searched existing issues and open pull requests, and this does not duplicate existing work.

#### Details

Some harnesses can emit multiple LLM spans for one underlying request. Without de-duplication, ATIF can contain repeated user/agent steps for a single model call.

This PR adds an exporter pre-pass that collects complete LLM start/end span candidates, detects overlapping duplicates under the same parent/model, suppresses the lower-fidelity span, and merges metrics from the suppressed span into the canonical step when needed.

It also adds tests for overlapping hook/gateway spans, preferring a higher-fidelity gateway span over a non-exact hook summary, and preserving sequential same-content LLM calls as separate steps.

#### Where should the reviewer start?

Start with `crates/core/src/observability/atif.rs`, specifically `build_llm_dedupe`, `same_physical_llm_request`, and `llm_event_fidelity_score`. The main tests are in `crates/core/tests/unit/atif_tests.rs` around `test_exporter_dedupes_overlapping_hook_and_gateway_llm_spans`.

#### Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

- Relates to NVIDIA#176

## Summary by CodeRabbit

* **New Features**
  * Enhanced ATIF exporter with improved LLM span deduplication and token-metric consolidation from multiple instrumentation sources.

* **Bug Fixes**
  * Fixed metric handling for overlapping LLM requests to prevent inaccurate data consolidation.

* **Tests**
  * Added comprehensive unit tests validating LLM span deduplication behavior across instrumentation scenarios.

[![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Relay/pull/183?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

Authors:
  - Bryan Bednarski (https://github.com/bbednarski9)

Approvers:
  - Will Killian (https://github.com/willkill07)

URL: NVIDIA#183
Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug issue describes bug; PR fixes bug lang:rust PR changes/introduces Rust code size:L PR is large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants