Skip to content

[DESIGN]: Rebalancing policy and orchestration (when/which partitions move, hot-slot trigger, node drain and decommission) #148

@ELares

Description

@ELares

Filed from the IronCache pre-implementation coverage audit (2026-06-13): no existing issue adequately owned this.

Why this is needed

#75 builds the atomic migration mechanism but explicitly scopes OUT 'rebalancing policy (which slots to move, when)'; #80 chooses the placement hash but not the controller that invokes migration. Nothing owns the orchestration layer: the trigger to rebalance on node add/remove, hot-partition detection feeding a migration decision (the hot-shard work in #32 isolates a hot partition per-core but never connects it to cross-node migration), concurrency/throttling of simultaneous migrations against the rebalance-time budget, and node drain/decommission sequencing. This is the named, deliberately-deferred gap between the mechanism (#75) and the placement math (#80), and it is what makes scale-out usable.

Context

Relates to / partially overlaps #75. Part of the vision EPIC #1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:observabilityArea: observabilityarea:replicationArea: replicationdesignDesign specification / decision record to be vettedwave:3Readiness wave 3: clustering, AI advisor, tiering, advanced

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions