perf(node): pool aggregation prep vectors to cut GC pressure (#309)#310
Merged
mananuf merged 1 commit intoMay 25, 2026
Merged
Conversation
…sure (#309) Previously, aggregateFromSnapshot allocated four nil-initialised slices per data root and discarded them at iteration end: var childProofs []xmss.ChildProof // grew via append doubling var rawPubkeys []xmss.CPubKey var rawSigs []xmss.CSig var rawIDs []uint64 At typical 5-20 data roots per pass × 4 slices, that's 20-80 allocations per pass that the young-gen GC has to clean up. A 400 ms FFI call is a generous window for the runtime to schedule a 10-30 ms STW pause inside — the dominant suspected source of gean's p99 tail above 1 s. ethlambda pays nothing equivalent (Rust ownership, stack-allocated buffers dropped at scope exit per crates/blockchain/src/aggregation.rs:238). This PR adds sync.Pool reuse for all four slice types, plus capacity hints in the pool New functions sized to typical working set (8 children, 32 raw sigs). Pattern mirrors the existing xmss/proof_pool.go — pool stores *[]T, typed get/put wrappers, put resets length to 0 preserving capacity. Refactor of aggregateFromSnapshot: - Wraps the per-data-root loop body in func(){}() so defer pool-Put fires per iteration rather than accumulating to function return - Replaces continue with return inside the anonymous func - Slice access via *bufPtr; append targets *bufPtr = append(*bufPtr, x) Secondary win: defer xmss.FreeSignature(parsed) inside the sig loop also now fires per iteration — previously those C-side handles accumulated across ALL data roots in a pass, only freeing at function return. Tighter memory lifetime. Also adds a capacity hint on allIDs (was 0; now rawIDs + covered count). go build ./... + go test ./node/... green. Closes #309.
5 tasks
Merged
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
sync.Poolreuse + capacity hints for the four prep slices used per data root inaggregateFromSnapshot(childProofs,rawPubkeys,rawSigs,rawIDs). Reduces young-gen GC pressure during the long-running FFI window — the suspected dominant source of gean's p99 tail above 1 s. Closes #309.Why
Pre-PR: each iteration allocated 4 slices nil-initialised, grew them via
appenddoubling, then discarded. At 5-20 data roots per pass, that's 20-80 allocations per pass × every interval-2 tick. The 400 ms FFI window is a generous opening for the runtime to schedule a 10-30 ms STW GC pause inside.ethlambda pays nothing equivalent — Rust ownership, stack-allocated buffers dropped at scope exit (
ethlambda/crates/blockchain/src/aggregation.rs:238usesVec::with_capacity).What changed
New:
node/aggregation_pool.go(+35 LOC)Four
sync.Pools, one per prep slice type, with first-fit capacity hints (8 children, 32 raw sigs). Typedget*Buf/put*Bufwrappers identical in shape to the existingxmss/proof_pool.gopattern. Put resets length to 0 preserving capacity.Modified:
node/store_aggregate.go(+116 / -109 LOC, net +7)Refactor of
aggregateFromSnapshot's per-data-root loop:func(){}()sodeferpool-Put fires per iteration (not per function-return — the latter would defeat reuse).continuewithreturninside the anonymous func.*bufPtr; appends via*bufPtr = append(*bufPtr, x).allIDs(was 0, nowrawIDs + coveredsize).Secondary win
defer xmss.FreeSignature(parsed)inside the inner sig loop now fires per data root rather than accumulating across the whole pass to function return. Tighter C-side handle lifetime; previously all parsed sig handles from all data roots in a pass leaked untilaggregateFromSnapshotreturned.Behavior change
None. Same FFI calls in the same order, same store mutations (via
*AggregationMutationsfrom #307), same broadcast set. Only the lifetime of internal scratch slices and one C-handle defer batch changes — both shorter, both correct.Expected impact
Per the gean-vs-ethlambda analysis: 20-40 ms p50 recovery, more on p99 (GC pauses are tail-dominant). Real numbers come from #305's phase-attribution histograms post-merge — if
prepisn't where the gap lives, this PR's measured impact will be small and we'd refocus onpostorcommit.Spec compliance
Zero impact. No spec-mapped function or container modified.
Test plan
go build ./...cleango test -count=1 ./node/...greenlean_aggregation_prep_time_secondsp99 should drop measurably vs the post-perf(node): instrument interval-2 aggregation phases for attribution (#305) #306 baseline; rate of Go GC pauses (via Go runtime metrics) should declineLean-review
dead-code: N/Apremature-abstraction: 4 pools, 8 helpers — sized to current need; no speculative pool surfacedefensive-bloat: N/Aduplicates-existing-util: ✅ followed existingxmss/proof_pool.gopattern; no parallel pool helper inventedcomment-bloat: header onaggregation_pool.gois 8 lines explaining the why + the existing-pattern cross-reference; earns its keepover-validated-boundary: N/AStack
Originally stacked on PR #308 (snapshot refactor), which has now merged into
perf/instrument-aggregation-phases. Rebased trivially; this PR now diffs cleanly againstperf/instrument-aggregation-phaseshead with just the one Phase C commit.