fix: Dedupe overlapping LLM spans in ATIF export#183
Conversation
WalkthroughThis PR adds LLM span deduplication and metric consolidation to the ATIF exporter. It detects when hook and gateway instrumentation capture the same physical LLM request, suppresses lower-fidelity duplicates, merges their token metrics, and integrates these results into event-to-step conversion to eliminate duplicate agent steps. ChangesLLM Span Deduplication
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Add an ATIF exporter pre-pass that identifies duplicate LLM spans emitted for the same physical request, such as overlapping hook-observed and gateway-observed spans. Prefer the higher-fidelity span as the canonical step, suppress the duplicate step, and merge token metrics from the suppressed span when the canonical span is missing them. This prevents duplicate user/agent steps in ATIF when multiple instrumentation paths observe the same LLM call, while preserving sequential same-content calls as distinct events. Also adds regression coverage for: - overlapping hook and gateway LLM spans - preferring gateway spans over non-exact hook summaries - preserving sequential repeated LLM calls Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
f8658b7 to
b0fb15c
Compare
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
|
/ok to test 426270b |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/core/src/observability/atif.rs`:
- Around line 1277-1295: The complementary dedupe path
(complementary_hook_and_gateway_spans) is too permissive and needs an extra
guard to avoid collapsing concurrent distinct requests; update
complementary_hook_and_gateway_spans to only return true when the hook/gateway
polarity condition holds AND either the request_signature matches
(left.request_signature == right.request_signature) or a shared request
correlation key is equal (e.g., compare a correlation field on LlmSpanCandidate
such as request_correlation/request_id if present). Then ensure
same_physical_llm_request still uses that strengthened
complementary_hook_and_gateway_spans check so overlapping concurrent calls under
the same parent/model are not incorrectly deduplicated.
- Around line 1317-1321: When fidelity_score ties (Ordering::Equal) the code
currently does nothing; instead choose a deterministic tie-break and call
suppress_llm_span for the loser. Update the match's Equal arm to compare a
stable secondary key (e.g., span boundaries or another Ord field on the span
such as (left.span.start, left.span.end) or left.source_id/text) and then call
suppress_llm_span(left, right, lookups) or suppress_llm_span(right, left,
lookups) based on that comparison so equal-fidelity duplicates are consistently
deduped; use the existing symbols fidelity_score, left, right, and
suppress_llm_span.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: e6b68f5c-7b73-4238-8a62-80c57abf4951
📒 Files selected for processing (2)
crates/core/src/observability/atif.rscrates/core/tests/unit/atif_tests.rs
📜 Review details
🧰 Additional context used
📓 Path-based instructions (11)
**/*.rs
📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)
Use
snake_casenaming convention for Rust identifiers (e.g.,nemo_relay_tool_call)
**/*.rs: Any Rust change must runjust test-rust
Any Rust change must runcargo fmt --all
Any Rust change must runcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Runcargo fmt --allfor all FFI work since it is Rust work
Runjust test-rustto validate FFI changes
Runcargo clippy --workspace --all-targets -- -D warningsto enforce strict linting on FFI workWhen Rust files changed as part of Go work, also run
cargo fmt --all,just test-rust, andcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Runcargo fmt --allwhen Rust files are changed as part of Node work
Runcargo clippy --workspace --all-targets -- -D warningswhen Rust files are changed as part of Node work
Runjust test-rustwhen Rust files are changed as part of Node work
**/*.rs: Runcargo fmt --allto format all Rust code
Runcargo clippy --workspace --all-targets -- -D warningsto enforce all clippy lints as errors
**/*.rs: Runcargo fmt --allwhen Rust files changed as part of WebAssembly work
Runcargo clippy --workspace --all-targets -- -D warningswhen Rust files changed as part of WebAssembly work
**/*.rs: If any Rust code changed, always runjust test-rust
If any Rust code changed, also runcargo fmt --all
If any Rust code changed, also runcargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting withcargo fmt --all
Run Rust linting withcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Keep SPDX headers on Rust source files. The project is Apache-2.0.
Usesnake_casefor Rust binding naming conventions.
UseJson = serde_json::Valuein Rust-facing runtime APIs where the existing code expects JSON payloads.
UseResult<T>withFlowErrorin core runtime paths. Keep errors explicit and binding-appropriate at the wrapper layer.
Preserve async behavior on the existing tokio-based model i...
Files:
crates/core/tests/unit/atif_tests.rscrates/core/src/observability/atif.rs
{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}
📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)
Maintain documented and tested validation and report behavior for adaptive surfaces
Files:
crates/core/tests/unit/atif_tests.rs
**/{Cargo.toml,**/*.rs}
📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)
Maintain consistency between Rust package names in
Cargo.tomland their actual usage across the codebase
Files:
crates/core/tests/unit/atif_tests.rscrates/core/src/observability/atif.rs
**/*.{h,hpp,c,cpp,rs}
📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)
Ensure FFI header and library naming follows consistent conventions across platform-specific builds
Files:
crates/core/tests/unit/atif_tests.rscrates/core/src/observability/atif.rs
{crates/core,crates/adaptive}/**/*
📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)
Changes to
crates/coreorcrates/adaptivemust run the full language matrix
Files:
crates/core/tests/unit/atif_tests.rscrates/core/src/observability/atif.rs
**/*.{rs,toml}
📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)
Update Rust crate names and module prefixes during coordinated rename operations
Files:
crates/core/tests/unit/atif_tests.rscrates/core/src/observability/atif.rs
crates/core/**/*.rs
📄 CodeRabbit inference engine (.agents/skills/test-go-binding/SKILL.md)
If the change touched
crates/coreor shared runtime semantics, also usevalidate-changefor broader validation
Files:
crates/core/tests/unit/atif_tests.rscrates/core/src/observability/atif.rs
crates/{core,adaptive}/**
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
If
crates/coreorcrates/adaptivechanged, run the full matrix across Rust, Python, Go, Node.js, and WebAssembly
Files:
crates/core/tests/unit/atif_tests.rscrates/core/src/observability/atif.rs
crates/{core,adaptive}/**/*.rs
⚙️ CodeRabbit configuration file
crates/{core,adaptive}/**/*.rs: Review the Rust runtime for async correctness, scope isolation, middleware ordering, and event lifecycle regressions.
Pay close attention to task-local/thread-local scope propagation, callback lifetimes, stream finalization, and root_uuid isolation.
Public API changes should preserve existing behavior unless tests and docs show the intended migration path.
Files:
crates/core/tests/unit/atif_tests.rscrates/core/src/observability/atif.rs
{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}
⚙️ CodeRabbit configuration file
{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.
Files:
crates/core/tests/unit/atif_tests.rs
crates/core/src/observability/{atif,otel,openinference}.rs
📄 CodeRabbit inference engine (.agents/skills/maintain-observability/SKILL.md)
When changing event fields in ATIF, OpenTelemetry, or OpenInference observability surfaces, keep the core event model in
crates/core/src/observability/atif.rs,crates/core/src/observability/otel.rs, andcrates/core/src/observability/openinference.rsin sync
Files:
crates/core/src/observability/atif.rs
🔇 Additional comments (2)
crates/core/tests/unit/atif_tests.rs (1)
1812-2076: LGTM!crates/core/src/observability/atif.rs (1)
1149-1170: Run required validation checks forcrates/corechange (crates/core/src/observability/atif.rs,from_events_with_correlation_events)Run:
cargo fmt --all --checkcargo clippy --workspace --all-targets -- -D warningsjust test-rustvalidate-change(crates/core validation)uv run pre-commit run --all-files(cross-language/tooling hygiene + SPDX/header checks)
|
/merge |
|
Thanks will |
#### Overview Adds ATIF exporter de-duplication for overlapping LLM spans that represent the same physical provider request, such as a hook-observed span and a gateway-observed span. - [x] I confirm this contribution is my own work, or I have the right to submit it under this project's license. - [x] I searched existing issues and open pull requests, and this does not duplicate existing work. #### Details Some harnesses can emit multiple LLM spans for one underlying request. Without de-duplication, ATIF can contain repeated user/agent steps for a single model call. This PR adds an exporter pre-pass that collects complete LLM start/end span candidates, detects overlapping duplicates under the same parent/model, suppresses the lower-fidelity span, and merges metrics from the suppressed span into the canonical step when needed. It also adds tests for overlapping hook/gateway spans, preferring a higher-fidelity gateway span over a non-exact hook summary, and preserving sequential same-content LLM calls as separate steps. #### Where should the reviewer start? Start with `crates/core/src/observability/atif.rs`, specifically `build_llm_dedupe`, `same_physical_llm_request`, and `llm_event_fidelity_score`. The main tests are in `crates/core/tests/unit/atif_tests.rs` around `test_exporter_dedupes_overlapping_hook_and_gateway_llm_spans`. #### Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to) - Relates to NVIDIA#176 ## Summary by CodeRabbit * **New Features** * Enhanced ATIF exporter with improved LLM span deduplication and token-metric consolidation from multiple instrumentation sources. * **Bug Fixes** * Fixed metric handling for overlapping LLM requests to prevent inaccurate data consolidation. * **Tests** * Added comprehensive unit tests validating LLM span deduplication behavior across instrumentation scenarios. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Relay/pull/183?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - Bryan Bednarski (https://github.com/bbednarski9) Approvers: - Will Killian (https://github.com/willkill07) URL: NVIDIA#183 Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
Overview
Adds ATIF exporter de-duplication for overlapping LLM spans that represent the same physical provider request, such as a hook-observed span and a gateway-observed span.
Details
Some harnesses can emit multiple LLM spans for one underlying request. Without de-duplication, ATIF can contain repeated user/agent steps for a single model call.
This PR adds an exporter pre-pass that collects complete LLM start/end span candidates, detects overlapping duplicates under the same parent/model, suppresses the lower-fidelity span, and merges metrics from the suppressed span into the canonical step when needed.
It also adds tests for overlapping hook/gateway spans, preferring a higher-fidelity gateway span over a non-exact hook summary, and preserving sequential same-content LLM calls as separate steps.
Where should the reviewer start?
Start with
crates/core/src/observability/atif.rs, specificallybuild_llm_dedupe,same_physical_llm_request, andllm_event_fidelity_score. The main tests are incrates/core/tests/unit/atif_tests.rsaroundtest_exporter_dedupes_overlapping_hook_and_gateway_llm_spans.Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Bug Fixes
Tests