Skip to content

Add ResourceStatsDetail gRPC API#3435

Open
jumpei527 wants to merge 24 commits intomainfrom
feature/gateway/add-resource-stats-detail
Open

Add ResourceStatsDetail gRPC API#3435
jumpei527 wants to merge 24 commits intomainfrom
feature/gateway/add-resource-stats-detail

Conversation

@jumpei527
Copy link
Contributor

@jumpei527 jumpei527 commented Dec 23, 2025

Description

#3243 implemented the ResourceStats API for future resource control. This PR adds ResourceStatsDetail API to easily aggregate ResourceStats data from all agents.

What Changed:

  1. Proto updates
  • Added rpc.v1.StatsDetail service to apis/proto/v1/rpc/stats/stats.proto.
  • Added ResourceStatsDetail(payload.v1.Empty) returns (payload.v1.Info.Stats.ResourceStatsDetail).
  • Updated payload types under Info.Stats for detail response (ResourceStatsDetail).
  1. Service registration and implementation
  • Implemented ResourceStatsDetail on gateway LB handler to aggregate ResourceStats from agents.
  • Implemented ResourceStatsDetail on NGT/Faiss agent handlers to return self-only detail.
  • Registered StatsDetail service in gateway and agent gRPC registration paths.
  1. E2E updates
  • Added resource_stats_detail operation support in E2E v2 config/binding/strategy routing.
  • Added dedicated stats execution path (stats_test.go).
  • Added ResourceStatsDetail scenarios to unary_crud.yaml, stream_crud.yaml, and multi_crud.yaml.
  • Verified this change with E2E v2 scenarios on a k3d-deployed Vald cluster.

Related Issue

#3274

Versions

  • Vald Version: v1.7.17
  • Go Version: v1.25.5
  • Rust Version: v1.92.0
  • Docker Version: v29.1.3
  • Kubernetes Version: v1.34.3
  • Helm Version: v4.0.4
  • NGT Version: v2.7.1
  • Faiss Version: v1.13.1

Checklist

Special notes for your reviewer

Summary by CodeRabbit

  • New Features

    • Added /resource/stats/detail API and StatsDetail gRPC service to fetch per-agent resource statistics (CPU/memory) and aggregated details.
  • Tests

    • Added end-to-end tests covering the new stats detail operations.
  • Chores

    • Bumped dependencies: tokio, anyhow, wincode.

@jumpei527 jumpei527 self-assigned this Dec 23, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR restructures the resource statistics API by introducing a hierarchical Info.Stats container with nested ResourceStats, ResourceStatsDetail, and CgroupStats types. It adds a new StatsDetail RPC service, implements resource stats detail handlers across agent cores (FAISS/NGT) and the gateway load balancer, wires client support, updates documentation and Swagger, adds e2e tests, and refreshes Rust dependencies and generated bindings.

Changes

Cohort / File(s) Summary
Proto & API docs
apis/proto/v1/payload/payload.proto, apis/proto/v1/rpc/stats/stats.proto, apis/docs/v1/payload.md.tmpl, apis/docs/v1/docs.md, apis/swagger/v1/rpc/stats/stats.swagger.json
Introduce Info.Stats with nested ResourceStats, ResourceStatsDetail, and CgroupStats; move previous ResourceStats into Info.Stats; add StatsDetail service and ResourceStatsDetail RPC and update docs/Swagger accordingly.
Vald client plumbing
apis/grpc/v1/vald/vald.go, internal/client/v1/client/vald/vald.go
Embed stats.StatsClient into Vald client interface/struct; instantiate stats client; add client ResourceStats method (trace + RoundRobin).
Agent handlers — FAISS
pkg/agent/core/faiss/handler/grpc/handler.go, pkg/agent/core/faiss/handler/grpc/stats.go, pkg/agent/core/faiss/usecase/agentd.go
Embed stats.StatsDetailServer, register service, implement ResourceStatsDetail handler that calls GetResourceStats and returns a single-entry details map keyed by agent name.
Agent handlers — NGT
pkg/agent/core/ngt/handler/grpc/handler.go, pkg/agent/core/ngt/handler/grpc/stats.go, pkg/agent/core/ngt/usecase/agentd.go
Same as FAISS: embed StatsDetailServer, register service, implement ResourceStatsDetail handler returning agent-specific stats.
Gateway LB
pkg/gateway/lb/handler/grpc/handler.go, pkg/gateway/lb/handler/grpc/stats.go, pkg/gateway/lb/usecase/vald.go
Extend server interface to include StatsDetailServer, change constructor return type, register service, implement ResourceStatsDetail that concurrently broadcasts ResourceStats to agents, aggregates per-agent details, records per-call spans/statuses, and surfaces errors with resource metadata.
Internal stats util & tests
internal/net/grpc/stats/stats.go, internal/net/grpc/stats/stats_test.go
Change return types from Info_ResourceStatsInfo_Stats_ResourceStats; add GetResourceStats(ctx) helper building hostname/IP and optional cgroup metrics; update tests to new types.
E2E tests & runner
tests/v2/e2e/config/enums.go, tests/v2/e2e/config/config.go, tests/v2/e2e/crud/strategy_test.go, tests/v2/e2e/crud/stats_test.go
Add OpResourceStatsDetail operation, wire string aliases, noop bind handling, add stats_test.go with resourceStatsDetail helper and processStats dispatcher, integrate operation into strategy execution flow.
Rust protobuf codegen
rust/libs/proto/src/payload/v1/payload.v1.rs
Regenerate Rust bindings to reflect nested Info::Stats types (ResourceStats, ResourceStatsDetail, CgroupStats) and update full_name/type_url entries.
Rust deps
rust/libs/kvs/Cargo.toml, rust/libs/observability/Cargo.toml, rust/libs/vqueue/Cargo.toml
Minor dependency bumps (tokio 1.49→1.50, wincode 0.4.4→0.4.5, anyhow 1.0.101→1.0.102).
Gateway index handler
pkg/gateway/lb/handler/grpc/index.go
Fix: separate RPC error variable (callErr) from context/local err, ensuring correct error classification, status parsing, and span recording.
Misc
dockers/agent/core/agent/Dockerfile
Run cargo clean before building Rust release binary.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant GatewayLB as "Gateway LB"
    participant Agent1 as "Agent 1"
    participant Agent2 as "Agent 2"
    participant AgentN as "Agent N"
    participant StatsUtil as "Stats Util"

    Client->>GatewayLB: ResourceStatsDetail()
    GatewayLB->>GatewayLB: start trace span\ninit Details map

    par broadcast
        GatewayLB->>Agent1: ResourceStats() [sub-span]
        Agent1->>StatsUtil: GetResourceStats(ctx)
        StatsUtil-->>Agent1: stats (hostname, ip, cgroup)
        Agent1-->>GatewayLB: ResourceStats response -> Details[agent1]

        GatewayLB->>Agent2: ResourceStats() [sub-span]
        Agent2->>StatsUtil: GetResourceStats(ctx)
        StatsUtil-->>Agent2: stats
        Agent2-->>GatewayLB: Details[agent2]

        GatewayLB->>AgentN: ResourceStats() [sub-span]
        AgentN->>StatsUtil: GetResourceStats(ctx)
        StatsUtil-->>AgentN: stats
        AgentN-->>GatewayLB: Details[agentN]
    end

    GatewayLB->>GatewayLB: aggregate Details\nrecord per-call statuses
    GatewayLB-->>Client: ResourceStatsDetail{ details: map[...] }
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly related PRs

Suggested reviewers

  • Matts966
  • datelier
  • kpango

Poem

📊 Agents whisper metrics near and far,
Gateways gather echoes like a chart;
Details map each host, each little star,
Traces tie the pieces — whole, not part. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add ResourceStatsDetail gRPC API' accurately reflects the primary objective of this changeset, which introduces a new gRPC service and proto definitions for ResourceStatsDetail across agents and gateway.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/gateway/add-resource-stats-detail
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.

OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required.

@vdaas-ci
Copy link
Collaborator

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • 🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

@codecov
Copy link

codecov bot commented Dec 23, 2025

Codecov Report

❌ Patch coverage is 2.80899% with 173 lines in your changes missing coverage. Please review.
✅ Project coverage is 25.97%. Comparing base (70de684) to head (63e5986).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/gateway/lb/handler/grpc/stats.go 0.00% 100 Missing ⚠️
pkg/gateway/lb/handler/grpc/index.go 0.00% 22 Missing ⚠️
internal/client/v1/client/vald/vald.go 0.00% 20 Missing ⚠️
pkg/agent/core/faiss/handler/grpc/stats.go 0.00% 13 Missing ⚠️
pkg/agent/core/ngt/handler/grpc/stats.go 0.00% 13 Missing ⚠️
apis/grpc/v1/vald/vald.go 0.00% 1 Missing ⚠️
pkg/agent/core/faiss/usecase/agentd.go 0.00% 1 Missing ⚠️
pkg/agent/core/ngt/usecase/agentd.go 0.00% 1 Missing ⚠️
pkg/gateway/lb/handler/grpc/handler.go 0.00% 1 Missing ⚠️
pkg/gateway/lb/usecase/vald.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3435      +/-   ##
==========================================
+ Coverage   25.02%   25.97%   +0.94%     
==========================================
  Files         544      572      +28     
  Lines       43800    50475    +6675     
==========================================
+ Hits        10963    13109    +2146     
- Misses      32108    36395    +4287     
- Partials      729      971     +242     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jumpei527 jumpei527 force-pushed the feature/gateway/add-resource-stats-detail branch from c3df358 to 2df183b Compare January 16, 2026 14:19
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 16, 2026

Deploying vald with  Cloudflare Pages  Cloudflare Pages

Latest commit: 63e5986
Status: ✅  Deploy successful!
Preview URL: https://1d6d77f5.vald.pages.dev
Branch Preview URL: https://feature-gateway-add-resource.vald.pages.dev

View logs

@github-actions github-actions bot added size/XL and removed size/XL labels Jan 16, 2026
@jumpei527 jumpei527 added the actions/build/arm64 Build ARM64 Docker Images label Jan 22, 2026
@jumpei527 jumpei527 requested a review from datelier March 9, 2026 00:35
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
apis/grpc/v1/vald/vald.go (1)

57-65: ⚠️ Potential issue | 🟠 Major

Client and server interfaces in the aggregate Vald helper are asymmetric.

Client now includes stats.StatsClient and is initialized properly in NewValdClient, but Server lacks stats.StatsServer, and RegisterValdServer does not register the stats service. This creates a runtime contract violation: clients advertise stats RPCs while servers registered via RegisterValdServer return UNIMPLEMENTED. The issue propagates downstream to types that embed vald.Client, including internal/client/v1/client/agent/core/client.go.

All four callers of RegisterValdServer (gateways and agents) would need to manually register stats if they implement it—a burden that the aggregate helper should eliminate. Add stats.StatsServer and stats.UnimplementedStatsServer to the interfaces, embed stats.UnimplementedStatsServer in UnimplementedValdServer, and call stats.RegisterStatsServer in RegisterValdServer.

Proposed change
 type Server interface {
+	stats.StatsServer
 	FlushServer
 	IndexServer
 	InsertServer
 	ObjectServer
 	RemoveServer
 	SearchServer
 	UpdateServer
 	UpsertServer
 }

 type UnimplementedValdServer struct {
+	stats.UnimplementedStatsServer
 	UnimplementedFlushServer
 	UnimplementedIndexServer
 	UnimplementedInsertServer
 	UnimplementedObjectServer
 	UnimplementedRemoveServer
 	UnimplementedSearchServer
 	UnimplementedUpdateServer
 	UnimplementedUpsertServer
 }

 func RegisterValdServer(s *grpc.Server, srv Server) {
+	stats.RegisterStatsServer(s, srv)
 	RegisterFlushServer(s, srv)
 	RegisterIndexServer(s, srv)
 	RegisterInsertServer(s, srv)
 	RegisterObjectServer(s, srv)
 	RegisterRemoveServer(s, srv)
 	RegisterSearchServer(s, srv)
 	RegisterUpdateServer(s, srv)
 	RegisterUpsertServer(s, srv)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apis/grpc/v1/vald/vald.go` around lines 57 - 65, The aggregate Vald helper is
asymmetric: Client includes stats.StatsClient while Server and
RegisterValdServer do not expose or register the stats service, causing
UNIMPLEMENTED at runtime; to fix, add stats.StatsServer to the Server interface,
include stats.UnimplementedStatsServer in the UnimplementedValdServer struct,
and update RegisterValdServer to call stats.RegisterStatsServer alongside
existing registrations so servers created via NewValdClient/NewValdServer
advertise and serve the same RPCs; ensure symbols referenced are
stats.StatsServer, stats.UnimplementedStatsServer, UnimplementedValdServer,
RegisterValdServer, and NewValdClient.
internal/net/grpc/stats/stats_test.go (1)

130-145: ⚠️ Potential issue | 🟠 Major

Make this success case independent of host cgroup support.

GetResourceStats deliberately returns a non-nil stats payload even when cgroup sampling fails, leaving CgroupStats nil. Requiring CgroupStats here makes the test depend on the runner's cgroup mounts and can fail even when the production path is behaving correctly. Assert Name and Ip unconditionally, and only validate cgroup fields when they are present or behind a stubbed collector.

💡 Suggested fix
-					if stats.CgroupStats == nil {
-						return errors.New("cgroup stats should not be nil")
-					}
+					if stats.CgroupStats != nil && stats.CgroupStats.MemoryUsageBytes == 0 {
+						return errors.New("memory usage should be greater than 0 when cgroup stats are available")
+					}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/net/grpc/stats/stats_test.go` around lines 130 - 145, The test's
success path currently requires stats.CgroupStats to be non-nil which makes it
flaky on hosts without cgroup support; update the checkFunc used in the test for
GetResourceStats to unconditionally assert stats != nil and that stats.Name and
stats.Ip are non-empty, but remove the unconditional failure when
stats.CgroupStats is nil — instead only validate cgroup fields (e.g. contents of
payload.Info_Stats_ResourceStats.CgroupStats) when stats.CgroupStats != nil or
when you have explicitly stubbed a cgroup collector for the test.
internal/net/grpc/stats/stats.go (1)

84-111: ⚠️ Potential issue | 🟠 Major

Propagate canceled or expired contexts instead of silently degrading to partial stats.

measureCgroupStats returns ctx.Err() when the caller cancels during the 100ms sampling window. Swallowing that here makes GetResourceStats look successful to direct callers and causes the gRPC handler to lose the correct cancellation/deadline status.

Suggested fix
  cgroupStats, err := measureCgroupStats(ctx)
  if err != nil {
+   if ctxErr := ctx.Err(); ctxErr != nil {
+     return nil, ctxErr
+   }
    log.Warn("failed to measure cgroup stats", err)
  }

As per coding guidelines, "ensure context is passed through all calls (no missing context.Context propagation) and that proper gRPC error codes are returned when applicable."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/net/grpc/stats/stats.go` around lines 84 - 111, GetResourceStats
currently swallows errors from measureCgroupStats causing callers to think the
call succeeded; modify the error handling after calling measureCgroupStats so
that if err is context.Canceled or context.DeadlineExceeded (use errors.Is(err,
context.Canceled) / errors.Is(err, context.DeadlineExceeded) or compare to
ctx.Err()) you propagate that error immediately (return nil, err) instead of
logging and continuing; for non-context errors keep the existing log.Warn path
and continue to populate stats. This change touches GetResourceStats and its
call to measureCgroupStats.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apis/proto/v1/payload/payload.proto`:
- Around line 652-677: Restore the original top-level message Info.ResourceStats
(with fields name, ip, cgroup_stats) so the generated type
payload.v1.Info.ResourceStats remains unchanged; keep the new grouped shape for
stats under Info.Stats but rename the nested message (currently
Info.Stats.ResourceStats) to a different identifier (e.g., ResourceStatsItem or
ResourceStatsDetailItem) and/or make the map in Info.Stats.ResourceStatsDetail
use the top-level Info.ResourceStats as its value type (map<string,
payload.v1.Info.ResourceStats>) and keep CgroupStats as-is, ensuring no existing
symbol payload.v1.Info.ResourceStats was removed or renamed.

In `@apis/proto/v1/rpc/stats/stats.proto`:
- Line 33: The change makes rpc ResourceStats return
payload.v1.Info.Stats.ResourceStats which is source-breaking; revert
ResourceStats to its original response type (the previous generated type) and
keep the new nested payload.v1.Info.Stats.ResourceStats only for the
new/additive RPC (e.g., ResourceStatsDetail) or introduce a versioned RPC name;
update the proto so ResourceStats signature is unchanged and the nested-type
change is exposed only via the new ResourceStatsDetail (or a v2 RPC) to preserve
client source-compatibility.

In `@apis/swagger/v1/rpc/stats/stats.swagger.json`:
- Around line 39-58: The REST layer is missing the ResourceStatsDetail endpoint:
add ResourceStatsDetail(ctx http.Context or context.Context, w
http.ResponseWriter, r *http.Request) to the Handler interface in
pkg/gateway/lb/handler/rest/handler.go and implement it in the existing handler
type (mirror the behavior/response shape of other handlers like GetObject), then
register the route GET "/resource/stats/detail" in
pkg/gateway/lb/router/router.go (add a route entry similar to GetObject
registration) so the documented swagger path is served; note the gRPC service is
already registered via stats.RegisterStatsDetailServer(srv, v) in
pkg/gateway/lb/usecase/vald.go, so ensure the REST handler delegates or
translates requests consistently with that implementation.

In `@pkg/gateway/lb/handler/grpc/stats.go`:
- Around line 60-118: The callback passed to s.gateway.BroadCast is mutating the
method-level named return err (via vc.ResourceStats and status.ParseError),
causing races across broadcast workers; make error state local to the closure
(e.g., use a local variable like localErr or ierr) and use that for all error
handling, tracing (sspan.RecordError/SetStatus) and returns from the closure,
avoiding writes to the outer err inside the BroadCast worker; if you must
surface a worker error to the caller, collect it from the BroadCast/errgroup
result (s.eg) or send it through the ech channel and assign to the method-level
err only after the broadcast goroutines have completed.

In `@tests/v2/e2e/crud/stats_test.go`:
- Around line 50-52: The test currently calls single(..., r.resourceStatsDetail,
printCallback[*payload.Info_Stats_ResourceStats_ResourceStatsDetail](passThrough))
but only checks for no error; update the callback used with single (or replace
passThrough) to assert the response's Details are populated by calling
res.GetDetails() and asserting len(res.GetDetails()) > 0, and when the scenario
provides the expected agent count also assert len(res.GetDetails()) ==
expectedAgentCount; locate the call site around resourceStatsDetail and
payload.Info_Stats_ResourceStatsDetail and add the small assertion callback to
fail the test when Details is empty or count mismatches.

---

Outside diff comments:
In `@apis/grpc/v1/vald/vald.go`:
- Around line 57-65: The aggregate Vald helper is asymmetric: Client includes
stats.StatsClient while Server and RegisterValdServer do not expose or register
the stats service, causing UNIMPLEMENTED at runtime; to fix, add
stats.StatsServer to the Server interface, include
stats.UnimplementedStatsServer in the UnimplementedValdServer struct, and update
RegisterValdServer to call stats.RegisterStatsServer alongside existing
registrations so servers created via NewValdClient/NewValdServer advertise and
serve the same RPCs; ensure symbols referenced are stats.StatsServer,
stats.UnimplementedStatsServer, UnimplementedValdServer, RegisterValdServer, and
NewValdClient.

In `@internal/net/grpc/stats/stats_test.go`:
- Around line 130-145: The test's success path currently requires
stats.CgroupStats to be non-nil which makes it flaky on hosts without cgroup
support; update the checkFunc used in the test for GetResourceStats to
unconditionally assert stats != nil and that stats.Name and stats.Ip are
non-empty, but remove the unconditional failure when stats.CgroupStats is nil —
instead only validate cgroup fields (e.g. contents of
payload.Info_Stats_ResourceStats.CgroupStats) when stats.CgroupStats != nil or
when you have explicitly stubbed a cgroup collector for the test.

In `@internal/net/grpc/stats/stats.go`:
- Around line 84-111: GetResourceStats currently swallows errors from
measureCgroupStats causing callers to think the call succeeded; modify the error
handling after calling measureCgroupStats so that if err is context.Canceled or
context.DeadlineExceeded (use errors.Is(err, context.Canceled) / errors.Is(err,
context.DeadlineExceeded) or compare to ctx.Err()) you propagate that error
immediately (return nil, err) instead of logging and continuing; for non-context
errors keep the existing log.Warn path and continue to populate stats. This
change touches GetResourceStats and its call to measureCgroupStats.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c784a6d0-d4ec-4491-92d1-79c1f226c6b6

📥 Commits

Reviewing files that changed from the base of the PR and between 9e5e903 and e41aea5.

⛔ Files ignored due to path filters (10)
  • apis/grpc/v1/payload/payload.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • apis/grpc/v1/payload/payload.pb.json.go is excluded by !**/*.pb.json.go
  • apis/grpc/v1/payload/payload_vtproto.pb.go is excluded by !**/*.pb.go, !**/*.pb.go, !**/*_vtproto.pb.go
  • apis/grpc/v1/rpc/stats/stats.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • apis/grpc/v1/rpc/stats/stats_vtproto.pb.go is excluded by !**/*.pb.go, !**/*.pb.go, !**/*_vtproto.pb.go
  • rust/libs/proto/src/payload/v1/payload.v1.serde.rs is excluded by !**/*.serde.rs
  • rust/libs/proto/src/rpc/v1/rpc.v1.tonic.rs is excluded by !**/*.tonic.rs
  • tests/v2/e2e/assets/multi_crud.yaml is excluded by !**/assets/**
  • tests/v2/e2e/assets/stream_crud.yaml is excluded by !**/assets/**
  • tests/v2/e2e/assets/unary_crud.yaml is excluded by !**/assets/**
📒 Files selected for processing (24)
  • .gitfiles
  • apis/docs/v1/docs.md
  • apis/docs/v1/payload.md.tmpl
  • apis/grpc/v1/vald/vald.go
  • apis/proto/v1/payload/payload.proto
  • apis/proto/v1/rpc/stats/stats.proto
  • apis/swagger/v1/rpc/stats/stats.swagger.json
  • internal/client/v1/client/vald/vald.go
  • internal/net/grpc/stats/stats.go
  • internal/net/grpc/stats/stats_test.go
  • pkg/agent/core/faiss/handler/grpc/handler.go
  • pkg/agent/core/faiss/handler/grpc/stats.go
  • pkg/agent/core/faiss/usecase/agentd.go
  • pkg/agent/core/ngt/handler/grpc/handler.go
  • pkg/agent/core/ngt/handler/grpc/stats.go
  • pkg/agent/core/ngt/usecase/agentd.go
  • pkg/gateway/lb/handler/grpc/handler.go
  • pkg/gateway/lb/handler/grpc/stats.go
  • pkg/gateway/lb/usecase/vald.go
  • rust/libs/proto/src/payload/v1/payload.v1.rs
  • tests/v2/e2e/config/config.go
  • tests/v2/e2e/config/enums.go
  • tests/v2/e2e/crud/stats_test.go
  • tests/v2/e2e/crud/strategy_test.go

Comment on lines +652 to 677
// Represents the stats info messages.
message Stats {
// Represent the resource stats
message ResourceStats {
string name = 1;
string ip = 2 [(buf.validate.field).string.ipv4 = true];
// Container resource usage statistics
CgroupStats cgroup_stats = 3;
}

// Represents the resource stats for each agent
message ResourceStatsDetail {
map<string, ResourceStats> details = 1;
}

message CgroupStats {
// CPU cores available
double cpu_limit_cores = 1;
// CPU usage in cores (not percentage)
double cpu_usage_cores = 2;
// Memory limit in bytes
uint64 memory_limit_bytes = 3;
// Memory usage in bytes
uint64 memory_usage_bytes = 4;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the existing Info.ResourceStats type on the v1 surface.

Rehoming ResourceStats under Info.Stats changes the generated type used by the existing ResourceStats RPC (payload.v1.Info.ResourceStatspayload.v1.Info.Stats.ResourceStats). Existing code built against the current v1 SDKs will stop compiling even though the wire fields are unchanged. Please add the new detail shape without renaming the existing ResourceStats message.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apis/proto/v1/payload/payload.proto` around lines 652 - 677, Restore the
original top-level message Info.ResourceStats (with fields name, ip,
cgroup_stats) so the generated type payload.v1.Info.ResourceStats remains
unchanged; keep the new grouped shape for stats under Info.Stats but rename the
nested message (currently Info.Stats.ResourceStats) to a different identifier
(e.g., ResourceStatsItem or ResourceStatsDetailItem) and/or make the map in
Info.Stats.ResourceStatsDetail use the top-level Info.ResourceStats as its value
type (map<string, payload.v1.Info.ResourceStats>) and keep CgroupStats as-is,
ensuring no existing symbol payload.v1.Info.ResourceStats was removed or
renamed.

// Overview
// Represent the RPC to get the resource stats.
rpc ResourceStats(payload.v1.Empty) returns (payload.v1.Info.ResourceStats) {
rpc ResourceStats(payload.v1.Empty) returns (payload.v1.Info.Stats.ResourceStats) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep ResourceStats source-compatible in rpc.v1.

Line 33 changes the existing ResourceStats RPC to return payload.v1.Info.Stats.ResourceStats, which renames the generated response type for every client. The new ResourceStatsDetail RPC is additive, but this line turns the existing v1 method into a source-breaking change. Preserve the original return type on ResourceStats, or move the nested-type change behind a new/versioned RPC.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apis/proto/v1/rpc/stats/stats.proto` at line 33, The change makes rpc
ResourceStats return payload.v1.Info.Stats.ResourceStats which is
source-breaking; revert ResourceStats to its original response type (the
previous generated type) and keep the new nested
payload.v1.Info.Stats.ResourceStats only for the new/additive RPC (e.g.,
ResourceStatsDetail) or introduce a versioned RPC name; update the proto so
ResourceStats signature is unchanged and the nested-type change is exposed only
via the new ResourceStatsDetail (or a v2 RPC) to preserve client
source-compatibility.

Comment on lines +39 to +58
"/resource/stats/detail": {
"get": {
"summary": "Represent the RPC to get the resource stats for each agent.",
"operationId": "StatsDetail_ResourceStatsDetail",
"responses": {
"200": {
"description": "A successful response.",
"schema": {
"$ref": "#/definitions/InfoStatsResourceStatsDetail"
}
},
"default": {
"description": "An unexpected error response.",
"schema": {
"$ref": "#/definitions/rpcStatus"
}
}
},
"tags": ["StatsDetail"]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '== Proto/Swagger declarations =='
rg -n -C2 'ResourceStatsDetail|/resource/stats/detail' \
  apis/proto/v1/rpc/stats/stats.proto \
  apis/swagger/v1/rpc/stats/stats.swagger.json

echo
echo '== Gateway REST wiring =='
rg -n -C3 'WithVald|ResourceStats|ResourceStatsDetail|StatsDetail|/resource/stats' \
  pkg/gateway/lb/usecase/vald.go \
  pkg/gateway/lb/handler/rest \
  pkg/gateway/lb/router

echo
echo 'Expected result: ResourceStatsDetail should appear in the gateway REST wiring if the documented HTTP endpoint is actually served.'

Repository: vdaas/vald

Length of output: 5019


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '== REST handler implementation =='
find pkg/gateway/lb/handler/rest -name '*.go' -type f | head -20

echo
echo '== Check for stats-related routes in REST handler =='
rg -n 'resource/stats|StatsDetail|ResourceStats|stats\.' \
  pkg/gateway/lb/handler/rest/ --type go

echo
echo '== Main REST handler file =='
if [ -f pkg/gateway/lb/handler/rest/handler.go ]; then
  wc -l pkg/gateway/lb/handler/rest/handler.go
  head -100 pkg/gateway/lb/handler/rest/handler.go
fi

echo
echo '== Check for route registration patterns =='
rg -n 'router\.|Handle|Route' \
  pkg/gateway/lb/handler/rest/ --type go -A2 | head -50

Repository: vdaas/vald

Length of output: 340


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '== REST handler.go content =='
cat -n pkg/gateway/lb/handler/rest/handler.go

echo
echo '== Check handler struct definition =='
rg -n 'type handler struct' -A10 pkg/gateway/lb/handler/rest/handler.go

echo
echo '== Check if handler implements ServeHTTP or similar =='
rg -n 'func.*handler.*ServeHTTP\|func.*handler.*Handle' -A5 pkg/gateway/lb/handler/rest/handler.go

Repository: vdaas/vald

Length of output: 8870


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '== Check router implementation =='
cat -n pkg/gateway/lb/router/router.go | head -150

echo
echo '== Check if router explicitly registers Handler methods =='
rg -n 'Index|Search|Insert|Handle' pkg/gateway/lb/router/router.go -B2 -A2 | head -80

echo
echo '== Vald server interface - check if it has ResourceStatsDetail =='
rg -n 'interface.*Server\|ResourceStatsDetail' \
  apis/grpc/v1/vald/vald.pb.go -A5 | head -50

Repository: vdaas/vald

Length of output: 6137


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '== Full router.go to check for stats routes =='
wc -l pkg/gateway/lb/router/router.go
cat -n pkg/gateway/lb/router/router.go | tail -100

echo
echo '== Search for ResourceStatsDetail or stats in entire REST handler directory =='
rg -r 'ResourceStatsDetail' pkg/gateway/lb/handler/rest/

echo
echo '== Search for stats routes in router files =='
find pkg/gateway/lb/router -name '*.go' -type f -exec grep -l 'stats\|Stats\|ResourceStats' {} \;

echo
echo '== Check if there is a separate stats handler =='
find pkg/gateway/lb/handler -type d | grep -i stats

echo
echo '== Search for ResourceStatsDetail across entire gateway package =='
rg -n 'ResourceStatsDetail' pkg/gateway/lb/

Repository: vdaas/vald

Length of output: 2778


Implement ResourceStatsDetail handler and router registration for REST endpoint.

The proto declares rpc ResourceStatsDetail with option (google.api.http).get = "/resource/stats/detail", and Swagger documents the endpoint, but the REST layer is incomplete:

  • pkg/gateway/lb/handler/rest/handler.go: Handler interface omits ResourceStatsDetail method; implementation stops at GetObject
  • pkg/gateway/lb/router/router.go: Route registration ends at GetObject (line 173); no route for /resource/stats/detail

The gRPC service is correctly registered via stats.RegisterStatsDetailServer(srv, v) at line 129 in pkg/gateway/lb/usecase/vald.go, so clients using gRPC will work. However, HTTP clients will receive 404 for a documented endpoint. Add the missing ResourceStatsDetail method to the Handler interface and register the route in the router to serve this endpoint over REST consistently with the proto contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apis/swagger/v1/rpc/stats/stats.swagger.json` around lines 39 - 58, The REST
layer is missing the ResourceStatsDetail endpoint: add ResourceStatsDetail(ctx
http.Context or context.Context, w http.ResponseWriter, r *http.Request) to the
Handler interface in pkg/gateway/lb/handler/rest/handler.go and implement it in
the existing handler type (mirror the behavior/response shape of other handlers
like GetObject), then register the route GET "/resource/stats/detail" in
pkg/gateway/lb/router/router.go (add a route entry similar to GetObject
registration) so the documented swagger path is served; note the gRPC service is
already registered via stats.RegisterStatsDetailServer(srv, v) in
pkg/gateway/lb/usecase/vald.go, so ensure the REST handler delegates or
translates requests consistently with that implementation.

Comment on lines +60 to +118
s.eg.Go(safety.RecoverFunc(func() error {
defer close(ech)
ech <- s.gateway.BroadCast(ctx, service.READ, func(ctx context.Context, target string, vc vald.Client, copts ...grpc.CallOption) error {
sctx, sspan := trace.StartSpan(grpc.WrapGRPCMethod(ctx, "BroadCast/"+target), apiName+"/"+resourceStatsDetailRPCName+"/"+target)
defer func() {
if sspan != nil {
sspan.End()
}
}()
var stats *payload.Info_Stats_ResourceStats
stats, err = vc.ResourceStats(sctx, new(payload.Empty), copts...)
if err != nil {
var (
attrs trace.Attributes
st *status.Status
msg string
code codes.Code
)
switch {
case errors.Is(err, context.Canceled),
errors.Is(err, errors.ErrRPCCallFailed(target, context.Canceled)):
attrs = trace.StatusCodeCancelled(
errdetails.ValdGRPCResourceTypePrefix +
"/" + statsPackageName + "." + resourceStatsDetailRPCName + ".BroadCast/" +
target + " canceled: " + err.Error())
code = codes.Canceled
case errors.Is(err, context.DeadlineExceeded),
errors.Is(err, errors.ErrRPCCallFailed(target, context.DeadlineExceeded)):
attrs = trace.StatusCodeDeadlineExceeded(
errdetails.ValdGRPCResourceTypePrefix +
"/" + statsPackageName + "." + resourceStatsDetailRPCName + ".BroadCast/" +
target + " deadline_exceeded: " + err.Error())
code = codes.DeadlineExceeded
default:
st, msg, err = status.ParseError(err, codes.NotFound, "error "+resourceStatsDetailRPCName+" API",
&errdetails.ResourceInfo{
ResourceType: errdetails.ValdGRPCResourceTypePrefix + "/" + statsPackageName + "." + resourceStatsDetailRPCName + ".BroadCast/" + target,
ResourceName: fmt.Sprintf("%s: %s(%s) to %s", apiName, s.name, s.ip, target),
})
if st != nil {
code = st.Code()
} else {
code = codes.NotFound
}
attrs = trace.FromGRPCStatus(code, msg)
}
if sspan != nil {
sspan.RecordError(err)
sspan.SetAttributes(attrs...)
sspan.SetStatus(trace.StatusError, err.Error())
}
if err != nil && st != nil &&
code != codes.Canceled &&
code != codes.DeadlineExceeded &&
code != codes.InvalidArgument &&
code != codes.NotFound &&
code != codes.OK &&
code != codes.Unimplemented {
return err
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Stop sharing the named err across broadcast workers.

The callback writes to the method-level named return err (vc.ResourceStats(...) and status.ParseError(...)). BroadCast fans out per target, so those writes race with each other and with the outer select, which can return the wrong error or mask one entirely. Keep per-target state in locals and only assign to the method result after the broadcast completes.

💡 Suggested fix
-			var stats *payload.Info_Stats_ResourceStats
-			stats, err = vc.ResourceStats(sctx, new(payload.Empty), copts...)
-			if err != nil {
+			res, callErr := vc.ResourceStats(sctx, new(payload.Empty), copts...)
+			if callErr != nil {
 				var (
 					attrs trace.Attributes
 					st    *status.Status
 					msg   string
 					code  codes.Code
 				)
 				switch {
-				case errors.Is(err, context.Canceled),
-					errors.Is(err, errors.ErrRPCCallFailed(target, context.Canceled)):
+				case errors.Is(callErr, context.Canceled),
+					errors.Is(callErr, errors.ErrRPCCallFailed(target, context.Canceled)):
 					attrs = trace.StatusCodeCancelled(
 						errdetails.ValdGRPCResourceTypePrefix +
 							"/" + statsPackageName + "." + resourceStatsDetailRPCName + ".BroadCast/" +
-							target + " canceled: " + err.Error())
+							target + " canceled: " + callErr.Error())
 					code = codes.Canceled
-				case errors.Is(err, context.DeadlineExceeded),
-					errors.Is(err, errors.ErrRPCCallFailed(target, context.DeadlineExceeded)):
+				case errors.Is(callErr, context.DeadlineExceeded),
+					errors.Is(callErr, errors.ErrRPCCallFailed(target, context.DeadlineExceeded)):
 					attrs = trace.StatusCodeDeadlineExceeded(
 						errdetails.ValdGRPCResourceTypePrefix +
 							"/" + statsPackageName + "." + resourceStatsDetailRPCName + ".BroadCast/" +
-							target + " deadline_exceeded: " + err.Error())
+							target + " deadline_exceeded: " + callErr.Error())
 					code = codes.DeadlineExceeded
 				default:
-					st, msg, err = status.ParseError(err, codes.NotFound, "error "+resourceStatsDetailRPCName+" API",
+					st, msg, callErr = status.ParseError(callErr, codes.NotFound, "error "+resourceStatsDetailRPCName+" API",
 						&errdetails.ResourceInfo{
 							ResourceType: errdetails.ValdGRPCResourceTypePrefix + "/" + statsPackageName + "." + resourceStatsDetailRPCName + ".BroadCast/" + target,
 							ResourceName: fmt.Sprintf("%s: %s(%s) to %s", apiName, s.name, s.ip, target),
 						})
 					if st != nil {
 						code = st.Code()
 					} else {
 						code = codes.NotFound
 					}
 					attrs = trace.FromGRPCStatus(code, msg)
 				}
 				if sspan != nil {
-					sspan.RecordError(err)
+					sspan.RecordError(callErr)
 					sspan.SetAttributes(attrs...)
-					sspan.SetStatus(trace.StatusError, err.Error())
+					sspan.SetStatus(trace.StatusError, callErr.Error())
 				}
-				if err != nil && st != nil &&
+				if callErr != nil && st != nil &&
 					code != codes.Canceled &&
 					code != codes.DeadlineExceeded &&
 					code != codes.InvalidArgument &&
 					code != codes.NotFound &&
 					code != codes.OK &&
 					code != codes.Unimplemented {
-					return err
+					return callErr
 				}
 				return nil
 			}
-			if stats != nil {
+			if res != nil {
 				mu.Lock()
-				detail.Details[target] = stats
+				detail.Details[target] = res
 				mu.Unlock()
 			}
As per coding guidelines, `**/*.go` changes must be checked for “Concurrency safety: check for race conditions, proper channel usage, goroutine leaks, and use of Vald’s `internal/sync/errgroup` for multi-goroutine error handling.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gateway/lb/handler/grpc/stats.go` around lines 60 - 118, The callback
passed to s.gateway.BroadCast is mutating the method-level named return err (via
vc.ResourceStats and status.ParseError), causing races across broadcast workers;
make error state local to the closure (e.g., use a local variable like localErr
or ierr) and use that for all error handling, tracing
(sspan.RecordError/SetStatus) and returns from the closure, avoiding writes to
the outer err inside the BroadCast worker; if you must surface a worker error to
the caller, collect it from the BroadCast/errgroup result (s.eg) or send it
through the ech channel and assign to the method-level err only after the
broadcast goroutines have completed.

✅ Addressed in commit 5f93be8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jumpei527
This comment seems correct. Let's avoid sharing err.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexStatisticsDetail has similar problem too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Matts966
To prevent error race condition, I updated the BroadCast callback to use a local callErr variable. In addition to ResourceStatsDetail, I fixed the same issue in IndexStatisticsDetail and IndexProperty.

Comment on lines +50 to +52
switch plan.Type {
case config.OpResourceStatsDetail:
return single(t, ctx, 0, plan, new(payload.Empty), r.resourceStatsDetail, printCallback[*payload.Info_Stats_ResourceStatsDetail](passThrough))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Assert that Details is populated.

This path currently only verifies that the RPC returns without error. An implementation that always returns an empty Details map would still pass, which misses the main contract of ResourceStatsDetail. Add a callback assertion for len(res.GetDetails()) > 0 and, if the scenario knows the expected agent count, validate that too.

Based on learnings: "GRPC handler packages in pkg components require minimal testing except for bugfixes; detailed business logic tests belong in E2E tests."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/v2/e2e/crud/stats_test.go` around lines 50 - 52, The test currently
calls single(..., r.resourceStatsDetail,
printCallback[*payload.Info_Stats_ResourceStats_ResourceStatsDetail](passThrough))
but only checks for no error; update the callback used with single (or replace
passThrough) to assert the response's Details are populated by calling
res.GetDetails() and asserting len(res.GetDetails()) > 0, and when the scenario
provides the expected agent count also assert len(res.GetDetails()) ==
expectedAgentCount; locate the call site around resourceStatsDetail and
payload.Info_Stats_ResourceStatsDetail and add the small assertion callback to
fail the test when Details is empty or count mismatches.

@kpango
Copy link
Collaborator

kpango commented Mar 9, 2026

/format

@vdaas-ci
Copy link
Collaborator

vdaas-ci commented Mar 9, 2026

[FORMAT] Updating license headers and formatting go codes triggered by kpango.

Signed-off-by: Vdaas CI <vald@vdaas.org>
@github-actions github-actions bot added size/XXL and removed size/XXL labels Mar 9, 2026
kmrmt
kmrmt previously approved these changes Mar 9, 2026
Signed-off-by: Jumpei Tanikawa <jumpei5278@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
apis/proto/v1/payload/payload.proto (1)

654-678: ⚠️ Potential issue | 🟠 Major

Still unresolved: preserve Info.ResourceStats on the v1 surface.

Rehoming this message under Info.Stats renames the generated SDK type for the existing stats RPC (Info_ResourceStatsInfo_Stats_ResourceStats in Go), which is a source-breaking change for current v1 clients. Keep the legacy top-level message and hang the new detail shape off it instead.

You can verify the generated surface change with:

#!/bin/bash
set -euo pipefail

rg -n -C2 'type Info_ResourceStats|type Info_Stats_ResourceStats' apis/grpc/v1/payload/payload.pb.go
rg -n -C2 '\bResourceStats\b' apis/proto/v1/rpc/stats/stats.proto
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apis/proto/v1/payload/payload.proto` around lines 654 - 678, The change moved
the ResourceStats protobuf message under Info.Stats which renames the generated
Go type (Info_ResourceStats → Info_Stats_ResourceStats) and breaks v1 clients;
restore the original top-level message name and keep the new detailed shape as a
nested type. Specifically, reintroduce a top-level message ResourceStats with
the same fields (cpu_limit_cores, cpu_usage_cores, memory_limit_bytes,
memory_usage_bytes and the name/ip fields) so the generated type
Info_ResourceStats remains, and modify Stats (or Info.Stats) to reference that
top-level ResourceStats for existing usages while adding a new
ResourceStatsDetail (or similar nested message) that can contain the map<string,
ResourceStats> for the detailed shape.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@apis/proto/v1/payload/payload.proto`:
- Around line 654-678: The change moved the ResourceStats protobuf message under
Info.Stats which renames the generated Go type (Info_ResourceStats →
Info_Stats_ResourceStats) and breaks v1 clients; restore the original top-level
message name and keep the new detailed shape as a nested type. Specifically,
reintroduce a top-level message ResourceStats with the same fields
(cpu_limit_cores, cpu_usage_cores, memory_limit_bytes, memory_usage_bytes and
the name/ip fields) so the generated type Info_ResourceStats remains, and modify
Stats (or Info.Stats) to reference that top-level ResourceStats for existing
usages while adding a new ResourceStatsDetail (or similar nested message) that
can contain the map<string, ResourceStats> for the detailed shape.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 61a63792-cabc-46bf-a663-a296fdcccb75

📥 Commits

Reviewing files that changed from the base of the PR and between 5f93be8 and b8e4ac2.

⛔ Files ignored due to path filters (3)
  • apis/grpc/v1/payload/payload.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • apis/grpc/v1/payload/payload_vtproto.pb.go is excluded by !**/*.pb.go, !**/*.pb.go, !**/*_vtproto.pb.go
  • rust/libs/proto/src/payload/v1/payload.v1.serde.rs is excluded by !**/*.serde.rs
📒 Files selected for processing (5)
  • apis/docs/v1/docs.md
  • apis/docs/v1/payload.md.tmpl
  • apis/proto/v1/payload/payload.proto
  • pkg/gateway/lb/handler/grpc/index.go
  • rust/libs/proto/src/payload/v1/payload.v1.rs

@jumpei527 jumpei527 requested a review from Matts966 March 17, 2026 13:12
This reverts commit 628704b.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexStatistics can panic when all agent responses have Valid=false. mergeInfoIndexStatistics skips invalid entries but still indexes into empty slices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants