From 62f50288875473be7b0d3f38fe5afe58877d46e5 Mon Sep 17 00:00:00 2001 From: Alexander Polcyn Date: Mon, 26 Jan 2026 23:11:09 +0000 Subject: [PATCH 1/5] A113: pick_first: Weighted Random Shuffling --- A113-pick-first-weighted-shuffling.md | 80 +++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) create mode 100644 A113-pick-first-weighted-shuffling.md diff --git a/A113-pick-first-weighted-shuffling.md b/A113-pick-first-weighted-shuffling.md new file mode 100644 index 000000000..1aeb6049d --- /dev/null +++ b/A113-pick-first-weighted-shuffling.md @@ -0,0 +1,80 @@ +A113: pick_first: Weighted Random Shuffling +---- +* Author(s): Alex Polcyn (@apolcyn) +* Approver: Mark Roth (@markdroth), Eric Anderson (@ejona86), Doug Fawley (@dfawley), Easwar Swaminathan (@easwars) +* Status: Draft +* Implemented in: +* Last updated: Jan 26, 2026 +* Discussion at: (filled after thread exists) + +## Abstract + +Support weighted random shuffling in the pick first LB policy. + +## Background + +The pick first LB policy currently supports random shuffling. A primary intention of the feature +is for load balancing, however it does not take (possibly present) locality or endpoint weights +into account. Naturally this can lead to skewed load distribution and hotspots, when the load +balancing control plane delivers varied weights and expects them to be followed. + + +### Related Proposals: +* [A62](https://github.com/grpc/proposal/blob/master/A62-pick-first.md): pick_first: sticky TRANSIENT_FAILURE and address order randomization +* [A42](https://github.com/grpc/proposal/blob/master/A42-xds-ring-hash-lb-policy.md) xDS Ring Hash LB Policy + +## Proposal + +### Changes within Pick First + +Modify behavior of pick_first when the `shuffle_address_list` option is set, and +perform a weighted random sort *based on per-endpoint weights*: +* Use the [Weighted Random Sampling](https://utopia.duth.gr/~pefraimi/research/data/2007EncOfAlg.pdf) algorithm +proposed by Efraimidis, Spirakis. +* Set the weight of each endpoint to `u ^ (1 / weight)`, where `u` is a uniform random number in `(0, 1)` and weight +is the weight of the endpoint (as present in a weight attribute). Default to 1 if no weight attribute is present. + +### CDS LB Policy changes: Computing Endpoint Weights + +In XDS, we have a notion of both locality and endpoint weights. The expectation of the load balancing +control plane is to *first* pick locality and *second* pick endpoint. The total probability distribution +reflected by per-endpoint weights must reflect this. As such, we need to normalize locality weights within +each priority and endpoint weights within locality; the final weight provided to `pick_first` should be a +product of the two normalized weights (i.e. a logical AND of the two selection events). + +The CDS LB policy currently calculates per-endpoint weight attributes. It will continue to do so however +we need to fix the mechanics: an endpoint's final weight should be a product of its *normalized* locality +weight and *normalized* endpoint weight, rather than their product outright. Note: as a side effect this +will fix per-endpoint weights in Ring Hash LB, which +[currently](https://github.com/grpc/proposal/blob/master/A42-xds-ring-hash-lb-policy.md) multiply +*raw* locality and endpoint weights. + +We can continue to represent weights as integers if we represent their normalized values in +fixed point Q31 format. Math as follows (citation due for @ejona): + +``` +// To normalize: +uint32_t ONE = 1 << 31; +uint32_t weight = (uint64_t) weight * ONE / weight_sum; + +// To multiply the weights for an endpoint: +weight = ((uint64_t) locality_weight * weight) >> 31; +if (weight == 0) weight = 1; +``` + +### Temporary environment variable protection + +CDS LB policy and Pick First LB policy behavior changes will be guarded by `GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING`. + +## Rationale + +* CDS LB policy changes are needed to generate correct weight distributions, not only for Pick First but + also for Ring Hash +* Using fixed point Q31 format has predictable bounds on precision, and allows us to continue representing + weights as integers. Note our math assumes the sum of weights within a grouping does not exceed max uint32, + which is mandated in the XDS protocol. + +## Implementation + +TBD + From 88e4950f6fa0c7516867d92dbc6391ad94cc18a7 Mon Sep 17 00:00:00 2001 From: Alex Polcyn Date: Tue, 27 Jan 2026 07:09:45 +0000 Subject: [PATCH 2/5] update status --- A113-pick-first-weighted-shuffling.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/A113-pick-first-weighted-shuffling.md b/A113-pick-first-weighted-shuffling.md index 1aeb6049d..7ff68849f 100644 --- a/A113-pick-first-weighted-shuffling.md +++ b/A113-pick-first-weighted-shuffling.md @@ -2,10 +2,10 @@ A113: pick_first: Weighted Random Shuffling ---- * Author(s): Alex Polcyn (@apolcyn) * Approver: Mark Roth (@markdroth), Eric Anderson (@ejona86), Doug Fawley (@dfawley), Easwar Swaminathan (@easwars) -* Status: Draft +* Status: In Review * Implemented in: * Last updated: Jan 26, 2026 -* Discussion at: (filled after thread exists) +* Discussion at: https://groups.google.com/g/grpc-io/c/iCsweGDmUU4 ## Abstract From ac00a0a1e11f5a43d73fa19dce7405f8014af9a0 Mon Sep 17 00:00:00 2001 From: Alex Polcyn Date: Tue, 27 Jan 2026 19:58:22 +0000 Subject: [PATCH 3/5] respond comments --- A113-pick-first-weighted-shuffling.md | 41 ++++++++++++++++++++------- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/A113-pick-first-weighted-shuffling.md b/A113-pick-first-weighted-shuffling.md index 7ff68849f..50a182b15 100644 --- a/A113-pick-first-weighted-shuffling.md +++ b/A113-pick-first-weighted-shuffling.md @@ -28,11 +28,22 @@ balancing control plane delivers varied weights and expects them to be followed. ### Changes within Pick First Modify behavior of pick_first when the `shuffle_address_list` option is set, and -perform a weighted random sort *based on per-endpoint weights*: -* Use the [Weighted Random Sampling](https://utopia.duth.gr/~pefraimi/research/data/2007EncOfAlg.pdf) algorithm -proposed by Efraimidis, Spirakis. -* Set the weight of each endpoint to `u ^ (1 / weight)`, where `u` is a uniform random number in `(0, 1)` and weight -is the weight of the endpoint (as present in a weight attribute). Default to 1 if no weight attribute is present. +perform a weighted random sort *based on per-endpoint weights*. To do this, we will +use the [Weighted Random Sampling](https://utopia.duth.gr/~pefraimi/research/data/2007EncOfAlg.pdf) algorithm +proposed by Efraimidis, Spirakis: + +1) Assign a key to each endpoint, `u ^ (1 / weight)`, where `u` is a uniform random number in `(0, 1)` and weight +is the weight of the endpoint (as present in a weight attribute). Default `weight` to 1 if no weight attribute is +present. + +2) Sort endpoints by key in *descending* order. + +Note: the paper suggests `u` be in `(0, 1)` *exclusive*. Random numbers *on* zero or one effectively +drop their weight. Zero will technically not transform to the exponential distribution that we are trying +to create. However, load balancing skew introduced by such edge cases is unlikely to be noticeable, and so +implementations are free to include these bounds so long as it does not cause other problems +(e.g. crashes). + ### CDS LB Policy changes: Computing Endpoint Weights @@ -44,13 +55,14 @@ product of the two normalized weights (i.e. a logical AND of the two selection e The CDS LB policy currently calculates per-endpoint weight attributes. It will continue to do so however we need to fix the mechanics: an endpoint's final weight should be a product of its *normalized* locality -weight and *normalized* endpoint weight, rather than their product outright. Note: as a side effect this -will fix per-endpoint weights in Ring Hash LB, which -[currently](https://github.com/grpc/proposal/blob/master/A42-xds-ring-hash-lb-policy.md) multiply -*raw* locality and endpoint weights. +weight and *normalized* endpoint weight, rather than their product outright. + +Note: as a side effect this will fix per-endpoint weights in Ring Hash LB, which +[currently](https://github.com/grpc/proposal/blob/master/A42-xds-ring-hash-lb-policy.md#change-child-policy-config-generation-in-xds_cluster_resolver-policy) are a product of the initial *raw* locality and endpoint weights. +This "fix" will not require any changes within Ring Hash LB itself. We can continue to represent weights as integers if we represent their normalized values in -fixed point Q31 format. Math as follows (citation due for @ejona): +fixed point Q1.31 format. Math as follows (citation due for @ejona): ``` // To normalize: @@ -62,10 +74,19 @@ weight = ((uint64_t) locality_weight * weight) >> 31; if (weight == 0) weight = 1; ``` +Note: currently we round down to zero (and then up if we hit zero). +We *could* use more accurate rounding schemes. However, rounding down +is simple and should provide enough precision for load balancing +purposes. For example, we only round down to zero if the product of +two normalized weight probabilities is less than `2 ^ -31`, this kind +of error is unlikely to cause noticeable skew in load balancing. + ### Temporary environment variable protection CDS LB policy and Pick First LB policy behavior changes will be guarded by `GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING`. +Barring unexpected issues, this should be enabled by default. + ## Rationale * CDS LB policy changes are needed to generate correct weight distributions, not only for Pick First but From b3ae299217be3bdf26b5b61dcf2b54fe08fd4640 Mon Sep 17 00:00:00 2001 From: Alex Polcyn Date: Wed, 28 Jan 2026 06:37:43 +0000 Subject: [PATCH 4/5] comments --- A113-pick-first-weighted-shuffling.md | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/A113-pick-first-weighted-shuffling.md b/A113-pick-first-weighted-shuffling.md index 50a182b15..455ba55d9 100644 --- a/A113-pick-first-weighted-shuffling.md +++ b/A113-pick-first-weighted-shuffling.md @@ -39,7 +39,7 @@ present. 2) Sort endpoints by key in *descending* order. Note: the paper suggests `u` be in `(0, 1)` *exclusive*. Random numbers *on* zero or one effectively -drop their weight. Zero will technically not transform to the exponential distribution that we are trying +drop their weight. Also, technically zero will not transform to the exponential distribution that we are trying to create. However, load balancing skew introduced by such edge cases is unlikely to be noticeable, and so implementations are free to include these bounds so long as it does not cause other problems (e.g. crashes). @@ -62,7 +62,7 @@ Note: as a side effect this will fix per-endpoint weights in Ring Hash LB, which This "fix" will not require any changes within Ring Hash LB itself. We can continue to represent weights as integers if we represent their normalized values in -fixed point Q1.31 format. Math as follows (citation due for @ejona): +fixed point UQ1.31 format. Math as follows (citation due for @ejona): ``` // To normalize: @@ -85,15 +85,21 @@ of error is unlikely to cause noticeable skew in load balancing. CDS LB policy and Pick First LB policy behavior changes will be guarded by `GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING`. -Barring unexpected issues, this should be enabled by default. +This should be enabled by default, after testing. ## Rationale -* CDS LB policy changes are needed to generate correct weight distributions, not only for Pick First but - also for Ring Hash -* Using fixed point Q31 format has predictable bounds on precision, and allows us to continue representing - weights as integers. Note our math assumes the sum of weights within a grouping does not exceed max uint32, - which is mandated in the XDS protocol. +CDS LB policy changes are needed to generate correct weight distributions, not only for Pick First but +also for Ring Hash. + +Reasons for UQ1.31? + +- Predictable and acceptable bounds on precision. +- Allows us to continue representing weights as integers internally. +- Avoids risk of overflow bugs by preserving the (XDS) property that the sum of all weights within + a "grouping" does not exceed max uint32. For example note how if we used UQ32, *after* + normalization and multiplication a subsequent summation of endpoint weights in a locality may + result in uint32 overflow due to contributions of rounding errors. ## Implementation From d7e4a893c00de4366daa2584f3ee6ac2f75c6621 Mon Sep 17 00:00:00 2001 From: Alex Polcyn Date: Wed, 28 Jan 2026 06:39:03 +0000 Subject: [PATCH 5/5] correction --- A113-pick-first-weighted-shuffling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/A113-pick-first-weighted-shuffling.md b/A113-pick-first-weighted-shuffling.md index 455ba55d9..f820e919e 100644 --- a/A113-pick-first-weighted-shuffling.md +++ b/A113-pick-first-weighted-shuffling.md @@ -92,7 +92,7 @@ This should be enabled by default, after testing. CDS LB policy changes are needed to generate correct weight distributions, not only for Pick First but also for Ring Hash. -Reasons for UQ1.31? +Reasons for UQ1.31 fixed point integers: - Predictable and acceptable bounds on precision. - Allows us to continue representing weights as integers internally.