Skip to content

feat(spanner): add DCP error penalty#14614

Open
rahul2393 wants to merge 1 commit into
dcp-split/3-observabilityfrom
dcp-split/4-error-penalty
Open

feat(spanner): add DCP error penalty#14614
rahul2393 wants to merge 1 commit into
dcp-split/3-observabilityfrom
dcp-split/4-error-penalty

Conversation

@rahul2393
Copy link
Copy Markdown
Contributor

Split of #14604

Internal reference: go/go-dcp-design

@rahul2393 rahul2393 requested review from a team as code owners May 19, 2026 12:45
@product-auto-label product-auto-label Bot added the api: spanner Issues related to the Spanner API. label May 19, 2026
@rahul2393 rahul2393 requested a review from olavloite May 19, 2026 12:46
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements an error penalty mechanism for the DynamicChannelPool to temporarily steer traffic away from channels experiencing Unavailable or ResourceExhausted errors. It introduces new configuration parameters (DCPErrorPenaltyLoad, DCPErrorPenaltyDuration), tracks penalties via metrics, and updates the load balancing logic to use a weightedLoad that includes these penalties. Review feedback highlighted a potential integer overflow risk when calculating the weighted load and suggested stricter validation for the new configuration parameters to prevent negative values.

exp := e.penaltyUntil.Load()
if exp > 0 {
if time.Now().UnixNano() < exp {
l += e.parent.cfg.DCPErrorPenaltyLoad
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If DCPErrorPenaltyLoad is configured with a very large value, adding it to the current rpcLoad could overflow the int32 result. In a 'least busy' selection strategy, an overflow resulting in a negative value would make a failing channel appear to have the lowest load, causing a thundering herd effect to a potentially unhealthy channel. It is safer to use a 64-bit sum and cap the result at the maximum 32-bit integer value.

			if sum := int64(l) + int64(e.parent.cfg.DCPErrorPenaltyLoad); sum <= 2147483647 {
				l = int32(sum)
			} else {
				l = 2147483647
			}

Comment on lines +152 to +157
if cfg.DCPErrorPenaltyLoad == 0 {
cfg.DCPErrorPenaltyLoad = def.DCPErrorPenaltyLoad
}
if cfg.DCPErrorPenaltyDuration == 0 {
cfg.DCPErrorPenaltyDuration = def.DCPErrorPenaltyDuration
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current normalization logic only checks for zero values. If a user provides a negative value for DCPErrorPenaltyLoad or DCPErrorPenaltyDuration, it could lead to unexpected behavior (e.g., a negative load making a failing channel appear more attractive). Consider validating that these values are positive, or at least non-negative, to ensure the penalty mechanism works as intended.

Suggested change
if cfg.DCPErrorPenaltyLoad == 0 {
cfg.DCPErrorPenaltyLoad = def.DCPErrorPenaltyLoad
}
if cfg.DCPErrorPenaltyDuration == 0 {
cfg.DCPErrorPenaltyDuration = def.DCPErrorPenaltyDuration
}
if cfg.DCPErrorPenaltyLoad <= 0 {
cfg.DCPErrorPenaltyLoad = def.DCPErrorPenaltyLoad
}
if cfg.DCPErrorPenaltyDuration <= 0 {
cfg.DCPErrorPenaltyDuration = def.DCPErrorPenaltyDuration
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: spanner Issues related to the Spanner API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant