Skip to content

perf(node): pool aggregation prep vectors to cut GC pressure (#309)#310

Merged
mananuf merged 1 commit into
perf/instrument-aggregation-phasesfrom
perf/pool-aggregation-prep-vectors
May 25, 2026
Merged

perf(node): pool aggregation prep vectors to cut GC pressure (#309)#310
mananuf merged 1 commit into
perf/instrument-aggregation-phasesfrom
perf/pool-aggregation-prep-vectors

Conversation

@mananuf
Copy link
Copy Markdown
Collaborator

@mananuf mananuf commented May 25, 2026

Summary

Adds sync.Pool reuse + capacity hints for the four prep slices used per data root in aggregateFromSnapshot (childProofs, rawPubkeys, rawSigs, rawIDs). Reduces young-gen GC pressure during the long-running FFI window — the suspected dominant source of gean's p99 tail above 1 s. Closes #309.

Why

Pre-PR: each iteration allocated 4 slices nil-initialised, grew them via append doubling, then discarded. At 5-20 data roots per pass, that's 20-80 allocations per pass × every interval-2 tick. The 400 ms FFI window is a generous opening for the runtime to schedule a 10-30 ms STW GC pause inside.

ethlambda pays nothing equivalent — Rust ownership, stack-allocated buffers dropped at scope exit (ethlambda/crates/blockchain/src/aggregation.rs:238 uses Vec::with_capacity).

What changed

New: node/aggregation_pool.go (+35 LOC)

Four sync.Pools, one per prep slice type, with first-fit capacity hints (8 children, 32 raw sigs). Typed get*Buf/put*Buf wrappers identical in shape to the existing xmss/proof_pool.go pattern. Put resets length to 0 preserving capacity.

Modified: node/store_aggregate.go (+116 / -109 LOC, net +7)

Refactor of aggregateFromSnapshot's per-data-root loop:

  • Wraps loop body in func(){}() so defer pool-Put fires per iteration (not per function-return — the latter would defeat reuse).
  • Replaces continue with return inside the anonymous func.
  • Slice access via *bufPtr; appends via *bufPtr = append(*bufPtr, x).
  • Capacity hint added to allIDs (was 0, now rawIDs + covered size).

Secondary win

defer xmss.FreeSignature(parsed) inside the inner sig loop now fires per data root rather than accumulating across the whole pass to function return. Tighter C-side handle lifetime; previously all parsed sig handles from all data roots in a pass leaked until aggregateFromSnapshot returned.

Behavior change

None. Same FFI calls in the same order, same store mutations (via *AggregationMutations from #307), same broadcast set. Only the lifetime of internal scratch slices and one C-handle defer batch changes — both shorter, both correct.

Expected impact

Per the gean-vs-ethlambda analysis: 20-40 ms p50 recovery, more on p99 (GC pauses are tail-dominant). Real numbers come from #305's phase-attribution histograms post-merge — if prep isn't where the gap lives, this PR's measured impact will be small and we'd refocus on post or commit.

Spec compliance

Zero impact. No spec-mapped function or container modified.

Test plan

Lean-review

  • dead-code: N/A
  • premature-abstraction: 4 pools, 8 helpers — sized to current need; no speculative pool surface
  • defensive-bloat: N/A
  • duplicates-existing-util: ✅ followed existing xmss/proof_pool.go pattern; no parallel pool helper invented
  • comment-bloat: header on aggregation_pool.go is 8 lines explaining the why + the existing-pattern cross-reference; earns its keep
  • over-validated-boundary: N/A

Stack

Originally stacked on PR #308 (snapshot refactor), which has now merged into perf/instrument-aggregation-phases. Rebased trivially; this PR now diffs cleanly against perf/instrument-aggregation-phases head with just the one Phase C commit.

…sure (#309)

Previously, aggregateFromSnapshot allocated four nil-initialised slices
per data root and discarded them at iteration end:

  var childProofs []xmss.ChildProof  // grew via append doubling
  var rawPubkeys  []xmss.CPubKey
  var rawSigs     []xmss.CSig
  var rawIDs      []uint64

At typical 5-20 data roots per pass × 4 slices, that's 20-80 allocations
per pass that the young-gen GC has to clean up. A 400 ms FFI call is a
generous window for the runtime to schedule a 10-30 ms STW pause inside
— the dominant suspected source of gean's p99 tail above 1 s.

ethlambda pays nothing equivalent (Rust ownership, stack-allocated
buffers dropped at scope exit per crates/blockchain/src/aggregation.rs:238).

This PR adds sync.Pool reuse for all four slice types, plus capacity
hints in the pool New functions sized to typical working set (8 children,
32 raw sigs). Pattern mirrors the existing xmss/proof_pool.go — pool
stores *[]T, typed get/put wrappers, put resets length to 0 preserving
capacity.

Refactor of aggregateFromSnapshot:

  - Wraps the per-data-root loop body in func(){}() so defer pool-Put
    fires per iteration rather than accumulating to function return
  - Replaces continue with return inside the anonymous func
  - Slice access via *bufPtr; append targets *bufPtr = append(*bufPtr, x)

Secondary win: defer xmss.FreeSignature(parsed) inside the sig loop also
now fires per iteration — previously those C-side handles accumulated
across ALL data roots in a pass, only freeing at function return.
Tighter memory lifetime.

Also adds a capacity hint on allIDs (was 0; now rawIDs + covered count).

go build ./... + go test ./node/... green.

Closes #309.
@mananuf mananuf merged commit 22cea31 into perf/instrument-aggregation-phases May 25, 2026
@mananuf mananuf deleted the perf/pool-aggregation-prep-vectors branch May 25, 2026 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant