perf(node): pool aggregation prep vectors to cut GC pressure (#309) by mananuf · Pull Request #310 · geanlabs/gean

mananuf · 2026-05-25T12:11:37Z

Summary

Adds sync.Pool reuse + capacity hints for the four prep slices used per data root in aggregateFromSnapshot (childProofs, rawPubkeys, rawSigs, rawIDs). Reduces young-gen GC pressure during the long-running FFI window — the suspected dominant source of gean's p99 tail above 1 s. Closes #309.

Why

Pre-PR: each iteration allocated 4 slices nil-initialised, grew them via append doubling, then discarded. At 5-20 data roots per pass, that's 20-80 allocations per pass × every interval-2 tick. The 400 ms FFI window is a generous opening for the runtime to schedule a 10-30 ms STW GC pause inside.

ethlambda pays nothing equivalent — Rust ownership, stack-allocated buffers dropped at scope exit (ethlambda/crates/blockchain/src/aggregation.rs:238 uses Vec::with_capacity).

What changed

New: `node/aggregation_pool.go` (+35 LOC)

Four sync.Pools, one per prep slice type, with first-fit capacity hints (8 children, 32 raw sigs). Typed get*Buf/put*Buf wrappers identical in shape to the existing xmss/proof_pool.go pattern. Put resets length to 0 preserving capacity.

Modified: `node/store_aggregate.go` (+116 / -109 LOC, net +7)

Refactor of aggregateFromSnapshot's per-data-root loop:

Wraps loop body in func(){}() so defer pool-Put fires per iteration (not per function-return — the latter would defeat reuse).
Replaces continue with return inside the anonymous func.
Slice access via *bufPtr; appends via *bufPtr = append(*bufPtr, x).
Capacity hint added to allIDs (was 0, now rawIDs + covered size).

Secondary win

defer xmss.FreeSignature(parsed) inside the inner sig loop now fires per data root rather than accumulating across the whole pass to function return. Tighter C-side handle lifetime; previously all parsed sig handles from all data roots in a pass leaked until aggregateFromSnapshot returned.

Behavior change

None. Same FFI calls in the same order, same store mutations (via *AggregationMutations from #307), same broadcast set. Only the lifetime of internal scratch slices and one C-handle defer batch changes — both shorter, both correct.

Expected impact

Per the gean-vs-ethlambda analysis: 20-40 ms p50 recovery, more on p99 (GC pauses are tail-dominant). Real numbers come from #305's phase-attribution histograms post-merge — if prep isn't where the gap lives, this PR's measured impact will be small and we'd refocus on post or commit.

Spec compliance

Zero impact. No spec-mapped function or container modified.

Test plan

go build ./... clean
go test -count=1 ./node/... green
Post-merge: lean_aggregation_prep_time_seconds p99 should drop measurably vs the post-perf(node): instrument interval-2 aggregation phases for attribution (#305) #306 baseline; rate of Go GC pauses (via Go runtime metrics) should decline

Lean-review

dead-code: N/A
premature-abstraction: 4 pools, 8 helpers — sized to current need; no speculative pool surface
defensive-bloat: N/A
duplicates-existing-util: ✅ followed existing xmss/proof_pool.go pattern; no parallel pool helper invented
comment-bloat: header on aggregation_pool.go is 8 lines explaining the why + the existing-pattern cross-reference; earns its keep
over-validated-boundary: N/A

Stack

Originally stacked on PR #308 (snapshot refactor), which has now merged into perf/instrument-aggregation-phases. Rebased trivially; this PR now diffs cleanly against perf/instrument-aggregation-phases head with just the one Phase C commit.

…sure (#309) Previously, aggregateFromSnapshot allocated four nil-initialised slices per data root and discarded them at iteration end: var childProofs []xmss.ChildProof // grew via append doubling var rawPubkeys []xmss.CPubKey var rawSigs []xmss.CSig var rawIDs []uint64 At typical 5-20 data roots per pass × 4 slices, that's 20-80 allocations per pass that the young-gen GC has to clean up. A 400 ms FFI call is a generous window for the runtime to schedule a 10-30 ms STW pause inside — the dominant suspected source of gean's p99 tail above 1 s. ethlambda pays nothing equivalent (Rust ownership, stack-allocated buffers dropped at scope exit per crates/blockchain/src/aggregation.rs:238). This PR adds sync.Pool reuse for all four slice types, plus capacity hints in the pool New functions sized to typical working set (8 children, 32 raw sigs). Pattern mirrors the existing xmss/proof_pool.go — pool stores *[]T, typed get/put wrappers, put resets length to 0 preserving capacity. Refactor of aggregateFromSnapshot: - Wraps the per-data-root loop body in func(){}() so defer pool-Put fires per iteration rather than accumulating to function return - Replaces continue with return inside the anonymous func - Slice access via *bufPtr; append targets *bufPtr = append(*bufPtr, x) Secondary win: defer xmss.FreeSignature(parsed) inside the sig loop also now fires per iteration — previously those C-side handles accumulated across ALL data roots in a pass, only freeing at function return. Tighter memory lifetime. Also adds a capacity hint on allIDs (was 0; now rawIDs + covered count). go build ./... + go test ./node/... green. Closes #309.

mananuf mentioned this pull request May 25, 2026

feat(node): move interval-2 aggregation off the tick loop into a worker goroutine #311

Closed

5 tasks

mananuf merged commit 22cea31 into perf/instrument-aggregation-phases May 25, 2026

mananuf deleted the perf/pool-aggregation-prep-vectors branch May 25, 2026 12:13

mananuf mentioned this pull request May 25, 2026

feat(node): move interval-2 aggregation off the tick loop into a worker goroutine (#311) #312

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(node): pool aggregation prep vectors to cut GC pressure (#309)#310

perf(node): pool aggregation prep vectors to cut GC pressure (#309)#310
mananuf merged 1 commit into
perf/instrument-aggregation-phasesfrom
perf/pool-aggregation-prep-vectors

mananuf commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mananuf commented May 25, 2026

Summary

Why

What changed

New: node/aggregation_pool.go (+35 LOC)

Modified: node/store_aggregate.go (+116 / -109 LOC, net +7)

Secondary win

Behavior change

Expected impact

Spec compliance

Test plan

Lean-review

Stack

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New: `node/aggregation_pool.go` (+35 LOC)

Modified: `node/store_aggregate.go` (+116 / -109 LOC, net +7)