Skip to content

feat: Retry alignment#343

Merged
sichanyoo merged 12 commits into
decaffrom
feat/retry-alignment
Jun 12, 2026
Merged

feat: Retry alignment#343
sichanyoo merged 12 commits into
decaffrom
feat/retry-alignment

Conversation

@sichanyoo

Copy link
Copy Markdown
Contributor

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

* `standard` - A standardized set of retry rules across the AWS SDKs.
This includes support for retry quotas, which limit the number of
unsuccessful retries a client can make. This is the default
value if no retry mode is provided.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spacing

@sichanyoo sichanyoo Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed spacing

DOCS

# @api private undocumented
option(:retry_strategy)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we're using this instead of retry_mode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It felt good to separate out retry_mode string config from the resolved strategy object. Storing resolved strategy in retry_mode felt a little awkward. But I'm open to discussion if you feel strongly / there's some rubyist convention to not do what I did.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I like the idea of using both terminology since it is hard to different what makes these two verbiage different (to me at least! Coming from a non-SRA/smithy perspective). But I understand the motivation. Looking at Smithy/AWS SDK retries documentation seems to point me to similar verbiage so I think this is the most reasonable one:

That being said... I'm unsure whether we want to have this as a private config for customers. I'm thinking of the following flow:

  • customer passes no default for retry_mode, we create Retry::Standard instance using default values and set it as a value for retry_strategy
  • customer passes in :standard but sets the max attempts to 10, we create Retry::Standard instance using the passed-in values and set it as a value for retry_strategy
  • customer passes in their OWN custom retry instance to :retry_strategy and we will respect this customization

Overall, if a retry_strategy has an instance, we have no need to check for other retry options.

We can have this discussion offline for better alignment.

@sichanyoo sichanyoo Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think taking custom retry instance via :retry_strategy and adding custom retry instance resolution to after_initialize makes sense. I can assert that the passed type has all the methods needed by retry flow before assigning it.

# @api private
class RetryErrors < Plugin
option(
:retry_strategy,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Juli might have more context, but should the class option be removed? I think it allowed customers to implement their own custom retry strategies, but not sure if we still want to support that or not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SEP doesn't require custom retry strategy configuration & afaik specific configurable options exposed in V3's retry plugin also only apply to legacy. I definitely might be missing something but it seems V3 doesn't support fully customizable strategy either.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In V3, a customer could pass their own custom Proc to influence how backoffs work but that is under the :legacy umbrella. We could have a way to accommodate this legacy support via migration gem.

While SEP does not mention custom retries, I did see some references in Smithy docs: https://smithy.io/2.0/guides/client-guidance/retries.html

For customers who wants to NOT use migration gem BUT want custom retries to be a thing. They could opt to implement their own via duck typing our retries interface (aka below). Other ways are replacing this plugin with their own. Or even utilizing custom welds.

Note: I added more context in a separate comment below on what we need to do to resolve this.

handlers.add(StubHandler, step: :send)
end

def after_initialize(client)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed? In V3 retries are disabled when stub_responses is true since customers need to mock the response behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having this logic be here, I moved it off to add_handlers in the retry error plugin. It's sort of annoying because we use this same plugin in AWS side also, and this doesn't remove AWS-specific retry plugin. Actually, alternative & better way might be having a AWS-specific stub_responses.rb plugin that replaces this, so we can still have all the plugins to remove in one place. Lmk what you think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think keeping all the stub_responses-related logic contained here makes the overall sense to find all its configuration but I'm open to ways. At the end, We should probably lean one way however.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with keeping all stub_responses related logic in a same place, I think what I'll do is create a AWS specific plugin and replace generic one with AWS specific one via default weld.

)
else
config.retry_strategy
raise ArgumentError, 'Must provide either standard` or `adaptive` for retry_mode'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a backtick at the beginning: standard`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed typo

@jterapin jterapin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, we probably should some have broader convos about the retry_stategy and custom impl.

DOCS

# @api private undocumented
option(:retry_strategy)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I like the idea of using both terminology since it is hard to different what makes these two verbiage different (to me at least! Coming from a non-SRA/smithy perspective). But I understand the motivation. Looking at Smithy/AWS SDK retries documentation seems to point me to similar verbiage so I think this is the most reasonable one:

That being said... I'm unsure whether we want to have this as a private config for customers. I'm thinking of the following flow:

  • customer passes no default for retry_mode, we create Retry::Standard instance using default values and set it as a value for retry_strategy
  • customer passes in :standard but sets the max attempts to 10, we create Retry::Standard instance using the passed-in values and set it as a value for retry_strategy
  • customer passes in their OWN custom retry instance to :retry_strategy and we will respect this customization

Overall, if a retry_strategy has an instance, we have no need to check for other retry options.

We can have this discussion offline for better alignment.

# @api private
class RetryErrors < Plugin
option(
:retry_strategy,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In V3, a customer could pass their own custom Proc to influence how backoffs work but that is under the :legacy umbrella. We could have a way to accommodate this legacy support via migration gem.

While SEP does not mention custom retries, I did see some references in Smithy docs: https://smithy.io/2.0/guides/client-guidance/retries.html

For customers who wants to NOT use migration gem BUT want custom retries to be a thing. They could opt to implement their own via duck typing our retries interface (aka below). Other ways are replacing this plugin with their own. Or even utilizing custom welds.

Note: I added more context in a separate comment below on what we need to do to resolve this.

Comment thread gems/smithy-client/lib/smithy-client/plugins/retry_errors.rb
- `record_success(retry_token)`
Specifies which retry algorithm to use. Values are:

* `standard` - A standardized set of retry rules across the AWS SDKs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit: Maybe reword as "Smithy-based" SDKs? 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, changed as suggested.

default: 'standard',
doc_default: "'standard'",
doc_type: 'String, Class',
doc_type: String,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of string, what about Ruby symbols? :standard, :adaptive for examples.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's cleaner for sure. Changed as suggested.

handlers.add(StubHandler, step: :send)
end

def after_initialize(client)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think keeping all the stub_responses-related logic contained here makes the overall sense to find all its configuration but I'm open to ways. At the end, We should probably lean one way however.

Comment thread gems/smithy-client/lib/smithy-client/retry/exponential_backoff.rb
Comment thread gems/smithy-client/lib/smithy-client/retry/token.rb
# @return [Boolean]
attr_reader :wait_to_fill

def request_bookkeeping(error_info)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method feels misleading since it is only handled at an error case. We can either rename this or consider handling the success case as well. I would look at this in the perspective if a custom implementor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Refactored so that the function gets called on both success and error paths.

@@ -5,23 +5,33 @@ module Client
module Retry
# Default exponential backoff retry strategy for retrying requests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we intend to use the wording, retry_strategy as a holder for the Retry instance, I think we should update this documentation. Reads as if it's an instance intended for retry_strategy

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Removed the strategy from the doc string.

@richardwang1124 richardwang1124 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. I'm okay with changing the option to a symbol, just wasn't sure if the extra conversion is necessary. Most options are also strings, I think only a few things like logging uses a symbol.

doc_type: 'String, Class',
:retry_mode,
default: :standard,
doc_type: Symbol,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reasoning for turning this into a symbol? This config options for this when read from the environment variables or shared config will be strings and will need to be converted into symbols. Is this extra conversion necessary?

@jterapin jterapin Jun 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was associated with my feedback. I personally prefer symbols when it comes to configurations options that are pre-determined and unique. Due to the following:

  • Idiomatic Ruby, i would say by most ruby standards
  • Symbols are immutable and cheaper to compare (identity vs character-by-character)
  • Prevents subtle bugs like "standard " != "standard" (trailing whitespace in env vars)

I think conversation if needed, will happen once at client init. We can technically accept both gracefully which we have done this in V3 (for other configs? i think?)

But this is really a style choice so I don't have strong feeling on this so I'm good with whatever we decide on.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in person, I'll revert this option back to String to conform to preexisting pattern, but create a followup investigation task for looking into config rework which will include moving to using symbols instead of string literals where possible.


# TODO: Revisit after trait is finalized.
def long_polling_operation?(context)
context.operation.traits.key?('smithy.api#longPoll')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this is gonna look, but at least according to the SEP this trait is aws.api#longPoll.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I should add a revisit task for this and ClockSkew on our board if there's isn't one already.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardwang1124 At least in Smithy website, it's under smithy.api. I guess we'll see where it gets namespaced to when it's finalized: https://smithy.io/2.0/trait-index.html.

@jterapin jterapin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

The retry strategy used by the client. If not provided, a default strategy is built
based on `:retry_mode` — either `Standard` or `Adaptive`.

A custom strategy must respond to:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

doc_type: 'String, Class',
:retry_mode,
default: :standard,
doc_type: Symbol,

@jterapin jterapin Jun 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was associated with my feedback. I personally prefer symbols when it comes to configurations options that are pre-determined and unique. Due to the following:

  • Idiomatic Ruby, i would say by most ruby standards
  • Symbols are immutable and cheaper to compare (identity vs character-by-character)
  • Prevents subtle bugs like "standard " != "standard" (trailing whitespace in env vars)

I think conversation if needed, will happen once at client init. We can technically accept both gracefully which we have done this in V3 (for other configs? i think?)

But this is really a style choice so I don't have strong feeling on this so I'm good with whatever we decide on.


# TODO: Revisit after trait is finalized.
def long_polling_operation?(context)
context.operation.traits.key?('smithy.api#longPoll')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I should add a revisit task for this and ClockSkew on our board if there's isn't one already.

@sichanyoo sichanyoo requested a review from jterapin June 12, 2026 22:00
@sichanyoo sichanyoo merged commit 98be23c into decaf Jun 12, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants