feat(spanner): add DCP error penalty#14614
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements an error penalty mechanism for the DynamicChannelPool to temporarily steer traffic away from channels experiencing Unavailable or ResourceExhausted errors. It introduces new configuration parameters (DCPErrorPenaltyLoad, DCPErrorPenaltyDuration), tracks penalties via metrics, and updates the load balancing logic to use a weightedLoad that includes these penalties. Review feedback highlighted a potential integer overflow risk when calculating the weighted load and suggested stricter validation for the new configuration parameters to prevent negative values.
| exp := e.penaltyUntil.Load() | ||
| if exp > 0 { | ||
| if time.Now().UnixNano() < exp { | ||
| l += e.parent.cfg.DCPErrorPenaltyLoad |
There was a problem hiding this comment.
If DCPErrorPenaltyLoad is configured with a very large value, adding it to the current rpcLoad could overflow the int32 result. In a 'least busy' selection strategy, an overflow resulting in a negative value would make a failing channel appear to have the lowest load, causing a thundering herd effect to a potentially unhealthy channel. It is safer to use a 64-bit sum and cap the result at the maximum 32-bit integer value.
if sum := int64(l) + int64(e.parent.cfg.DCPErrorPenaltyLoad); sum <= 2147483647 {
l = int32(sum)
} else {
l = 2147483647
}| if cfg.DCPErrorPenaltyLoad == 0 { | ||
| cfg.DCPErrorPenaltyLoad = def.DCPErrorPenaltyLoad | ||
| } | ||
| if cfg.DCPErrorPenaltyDuration == 0 { | ||
| cfg.DCPErrorPenaltyDuration = def.DCPErrorPenaltyDuration | ||
| } |
There was a problem hiding this comment.
The current normalization logic only checks for zero values. If a user provides a negative value for DCPErrorPenaltyLoad or DCPErrorPenaltyDuration, it could lead to unexpected behavior (e.g., a negative load making a failing channel appear more attractive). Consider validating that these values are positive, or at least non-negative, to ensure the penalty mechanism works as intended.
| if cfg.DCPErrorPenaltyLoad == 0 { | |
| cfg.DCPErrorPenaltyLoad = def.DCPErrorPenaltyLoad | |
| } | |
| if cfg.DCPErrorPenaltyDuration == 0 { | |
| cfg.DCPErrorPenaltyDuration = def.DCPErrorPenaltyDuration | |
| } | |
| if cfg.DCPErrorPenaltyLoad <= 0 { | |
| cfg.DCPErrorPenaltyLoad = def.DCPErrorPenaltyLoad | |
| } | |
| if cfg.DCPErrorPenaltyDuration <= 0 { | |
| cfg.DCPErrorPenaltyDuration = def.DCPErrorPenaltyDuration | |
| } |
Split of #14604
Internal reference: go/go-dcp-design