Skip to content

Add Label-Based Group Replica Response Strategy#279

Open
yuchen-db wants to merge 3 commits intodb_mainfrom
yuchen-db/new-group-replica
Open

Add Label-Based Group Replica Response Strategy#279
yuchen-db wants to merge 3 commits intodb_mainfrom
yuchen-db/new-group-replica

Conversation

@yuchen-db
Copy link
Collaborator

@yuchen-db yuchen-db commented Jan 22, 2026

Add Label-Based Group Replica Response Strategy

Summary

This PR enhances the GROUP_REPLICA partial response strategy to support label-based group and quorum identification, enabling more flexible failure tolerance for replicated data setups like aligned_ketama hashring.

New Flags

  • --query.group-replica.group-label: External label name identifying the group (stores with same value hold replicated data)
  • --query.group-replica.quorum-label: External label name whose value specifies minimum healthy stores required per group

How It Works

┌─────────────────────────────────────────────────────────────────────────┐
│                           Thanos Query                                   │
│                                                                          │
│  Flags:                                                                  │
│    --query.group-replica.group-label=receive_group                       │
│    --query.group-replica.quorum-label=quorum                             │
│    --query.partial-response.strategy=GROUP_REPLICA                       │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
            ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
            │  Receive-0   │ │  Receive-1   │ │  Receive-2   │
            │              │ │              │ │              │
            │ Labels:      │ │ Labels:      │ │ Labels:      │
            │ receive_group│ │ receive_group│ │ receive_group│
            │   ="group-A" │ │   ="group-A" │ │   ="group-A" │
            │ quorum="2"   │ │ quorum="2"   │ │ quorum="2"   │
            └──────────────┘ └──────────────┘ └──────────────┘
                    │               │               │
                    └───────────────┴───────────────┘
                                    │
                         Group "group-A" (quorum=2)
                         Needs 2 of 3 stores healthy

Behavior

Scenario Result
Group has >= quorum healthy stores Query succeeds for that group
Group has < quorum healthy stores Query aborts with error
Store missing labels or invalid quorum Treated as "must-success" (any failure aborts)
Flags not configured Falls back to legacy DNS-based strategy

Example Configuration

Receive pods with external labels:

# receive-0 in AZ-1
- --label=receive_group="ordinal-0"
- --label=quorum="2"

# receive-1 in AZ-2 (replica of receive-0)
- --label=receive_group="ordinal-0"
- --label=quorum="2"

# receive-2 in AZ-3 (replica of receive-0)
- --label=receive_group="ordinal-0"
- --label=quorum="2"

Query configuration:

- --query.partial-response.strategy=GROUP_REPLICA
- --query.group-replica.group-label=receive_group
- --query.group-replica.quorum-label=quorum

Failure scenarios:

Group "ordinal-0" has 3 stores, quorum=2:

  ✓ receive-0 (AZ-1) - healthy
  ✗ receive-1 (AZ-2) - failed
  ✓ receive-2 (AZ-3) - healthy

  Result: 2 >= 2 (quorum met) → Query succeeds
Group "ordinal-0" has 3 stores, quorum=2:

  ✗ receive-0 (AZ-1) - failed
  ✗ receive-1 (AZ-2) - failed
  ✓ receive-2 (AZ-3) - healthy

  Result: 1 < 2 (quorum not met) → Query aborts

Label Stripping

Both group-label and quorum-label are automatically stripped from query results (similar to replica labels with deduplication).

Backward Compatibility

  • When flags are not set, the existing DNS-based GROUP_REPLICA behavior is preserved
  • No changes required for existing deployments

@yuchen-db yuchen-db requested review from jnyi and willh-db January 22, 2026 21:05
@yuchen-db yuchen-db changed the title add new group replica response strategy Add Label-Based Group Replica Response Strategy Jan 23, 2026
@yuchen-db yuchen-db force-pushed the yuchen-db/new-group-replica branch 3 times, most recently from 7e7553b to fa140fa Compare January 23, 2026 04:13
@yuchen-db yuchen-db force-pushed the yuchen-db/new-group-replica branch from fa140fa to 020cb44 Compare January 23, 2026 07:20
groupReplicaGroupLabel := cmd.Flag("query.group-replica.group-label", "External label name that identifies the group for group-replica partial response strategy. Stores with the same group label value hold replicated data. Must be set together with --query.group-replica.quorum-label.").
Default("").String()

groupReplicaQuorumLabel := cmd.Flag("query.group-replica.quorum-label", "External label name whose value specifies the minimum number of healthy stores required per group. Must be set together with --query.group-replica.group-label. Stores without these labels or with invalid quorum values (<1) are treated as must-success stores.").
Copy link
Collaborator

@jnyi jnyi Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't understand from the pr description what this quorum-label is for. If a querier connects to multiple store groups like below, which quorum number should we use? this is a single value for entire querier seems can't sematically fit:

  • pantheon-db: quorum == 2
  • pantheon-db-dp: quorum == 2
  • pantheon-store: quorum == 1
  • pantheon-long-range-store: quorum == 1

Comment on lines +26 to +39
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Receive-0 │ │ Receive-1 │ │ Receive-2 │
│ │ │ │ │ │
│ Labels: │ │ Labels: │ │ Labels: │
│ receive_group│ │ receive_group│ │ receive_group│
│ ="group-A" │ │ ="group-A" │ │ ="group-A" │
│ quorum="2" │ │ quorum="2" │ │ quorum="2" │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
└───────────────┴───────────────┘
Group "group-A" (quorum=2)
Needs 2 of 3 stores healthy
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain a bit, does that mean those receive need to attach external labels all the time for every time series? that might implicitly mean we have a constant tax of network IO overheads as well as CPU overheads in db pods?

Copy link
Collaborator

@jnyi jnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super sure if this approach is optimal, understood about aligned_ketama so we have consistent shards, but the overhead might be a lot (each returned series needs to carry over external labels), is it possible to use Store.Info to exchange this additional info instead?

@jnyi jnyi requested review from a team, abhijith-db and kusumdb January 23, 2026 21:26
@yuchen-db yuchen-db force-pushed the yuchen-db/new-group-replica branch from b6658c0 to a98483e Compare January 29, 2026 01:23
@yuchen-db yuchen-db force-pushed the yuchen-db/new-group-replica branch from a98483e to a8f58fc Compare January 29, 2026 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants