ELares · ELares · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026
@@ -0,0 +1,92 @@
+# ADR-0026: Default replication and consistency model (async primary/replica plus WAIT)
+
+Status: Accepted
+Issue: #76
+
+## Context
+
+IronCache must ship a single default replication and consistency model before
+the opt-in tiers (#77, #78, #12) can be specified against a fixed baseline. The
+default has to be correct under the failure modes operators actually hit and
+stay drop-in compatible with how Redis clients already reason about durability.
+The top two tenets, Compatible then Efficient, both bear on the choice: clients
+should port over unchanged, and the hot write path should pay no quorum tax.
+Scope here is the default model only; this ADR does not specify the streaming
+protocol, replica handoff mechanics, or the read contract (#147 owns the
+replica-read contract, #149 owns node lifecycle).
+
+Redis Cluster replicates asynchronously between a primary and its replicas by
+default, and exposes WAIT for callers that want bounded synchronous
+acknowledgement [redis-cluster-async-replication]. WAIT confirms in-memory
+replica receipt, not disk persistence, and the Redis docs are explicit that it
+does not make the store strongly consistent: a write synchronously replicated
+to several replicas can still be lost [redis-wait-not-cp]. The Jepsen analysis
+reaches the same conclusion from the failure side, that default async
+replication can drop acknowledged writes on failover [redis-wait-not-strongly-consistent].
+The two strong-consistency alternatives are per-shard Raft or quorum writes,
+which remove single-node-failover write loss at the cost of write latency and
+operational weight on every write, and Dynamo-style leaderless quorums with a
+sloppy quorum over the first N healthy nodes plus hinted handoff and
+app-level conflict resolution [dynamo-quorum-sloppy-hinted].
+
+## Decision
+
+- **Default to asynchronous primary/replica replication, with WAIT exposed for
+  bounded synchronous acks.** This is the Compatible and Efficient choice:
+  clients that already speak Redis replication semantics and tooling port over
+  unchanged, and the steady-state write path pays no quorum round-trip
+  [redis-cluster-async-replication]. WAIT N timeout is offered as a per-command
+  durability floor, not a consistency mode.
+- **Document the default as best-effort, not CP, and name the loss window.**
+  There is a write-loss window: a write acknowledged to the client but not yet
+  replicated can be lost on primary failover or on the minority side of a
+  partition. WAIT bounds this window but does not eliminate it, because it
+  confirms in-memory receipt only and is not strong consistency
+  [redis-wait-not-cp] [redis-wait-not-strongly-consistent]. This honesty is a
+  shipped requirement, surfaced to clients per #147.
+- **Ship three guardrail defaults.** `replica-read-only` is on, so replicas
+  reject writes and cannot silently diverge [redis-replica-read-only-default].
+  `min-replicas-to-write` is wired so a primary can stop accepting writes when
+  too few replicas are in sync [redis-min-replicas-to-write-default].
+  `min-replicas-max-lag` bounds how stale an in-sync replica may be before it
+  stops counting toward that floor [redis-min-replicas-max-lag-default]. The
+  shipped numeric defaults track the pinned upstream values in claims.yaml.
+- **Strong consistency is opt-in, never a tax on every write.** No-acknowledged-
+  write-loss on single-node failover is real value, but it is delivered through
+  an opt-in quorum/Raft tier (#78, #12), layered on this async baseline, not by
+  changing the default. Whether that becomes a headline differentiator is
+  deferred to those issues; this ADR commits only to the async default.
+
+## Rejected Alternatives
+
+- **Per-shard Raft or quorum writes by default.** Removes acknowledged-write
+  loss on single-node failover and gives a clean CP story. Rejected as the
+  default: it adds write latency and operational weight to every write and
+  diverges from Redis defaults, breaking Compatible, which ranks above the
+  consistency gain. It survives as the opt-in tier in #78 and #12, layered on
+  this baseline rather than replacing it.
+- **Dynamo-style sloppy quorum with hinted handoff.** Stays writable during
+  partitions via a sloppy quorum over the first N healthy nodes, hinted handoff,
+  and vector-clock conflict resolution [dynamo-quorum-sloppy-hinted]. Rejected:
+  its conflict, read-repair, and merge model is foreign to the Redis data model,
+  surprising for compatibility-focused users, and complex, so it violates
+  Compatible and Simple for a marginal availability gain. This is the rejection
+  this ADR exists to freeze so it is not relitigated.
+
+## Consequences
+
+- Unmodified Redis clients and replication tooling work against IronCache with
+  no protocol change, and the steady-state write path carries no quorum tax,
+  satisfying Compatible then Efficient [redis-cluster-async-replication].
+- The default is explicitly best-effort, not CP. The acknowledged-but-
+  unreplicated write-loss window on failover and partition is documented and
+  surfaced to clients (#147), and WAIT is positioned as a durability floor that
+  bounds but does not close it [redis-wait-not-cp] [redis-wait-not-strongly-consistent].
+- Three guardrails ship on by intent: read-only replicas, a min-in-sync-
+  replicas write floor, and a max-replica-lag bound, so a misconfigured or
+  lagging fleet fails toward refusing writes rather than silently diverging
+  [redis-replica-read-only-default] [redis-min-replicas-to-write-default]
+  [redis-min-replicas-max-lag-default].
+- The opt-in strong-consistency tier (#78, #12) is unblocked to build on a
+  fixed async baseline, and the replica-read contract (#147) and node lifecycle
+  (#149) are specified against this decision rather than against an open one.
@@ -31,6 +31,7 @@ in [OPEN.md](OPEN.md); research questions in [QUESTIONS.md](QUESTIONS.md).
 | [0023](0023-cold-tier-engine.md) | Cold-tier engine (reject RocksDB/LSM, adopt hybrid log) | Accepted | #65 |
 | [0024](0024-geo-command-scope.md) | Geo command family scope (non-goal for v1) | Accepted | #133 |
 | [0025](0025-cluster-partition-count.md) | Cluster keyspace partition count (16384 dual-purpose unit) | Accepted | #72 |
+| [0026](0026-replication-consistency-model.md) | Default replication and consistency model (async primary/replica plus WAIT) | Accepted | #76 |
 
 As `[DECISION]` issues close, each adds its row here and its `NNNN-*.md` record.
 The numbering is monotonic and never reused, even after supersession.
@@ -0,0 +1,93 @@
+# Design: Cluster bootstrap and node lifecycle (seed/MEET join, learner-to-voter-to-slot-owner)
+
+Issue: #149. Decisions: ADR-0026 (async primary/replica default, replica
+guardrails). Related: #73 (CONTROL_PLANE: Raft slot map and config epoch), #74
+(MEMBERSHIP: SWIM plus Lifeguard), #69 (single-node-first staged path), #75
+(migration), #1 (vision).
+
+## Goal and scope
+
+The roster owns steady-state membership (#74 SWIM), the authoritative map (#73
+Raft), and migration (#75), but nothing owns how a node enters or leaves the
+cluster. This spec owns the lifecycle: cold-start seed discovery and a CLUSTER
+MEET-equivalent handshake, the staged promotion of a joining node from
+SWIM-discovered to Raft learner to voter to slot owner, the operator/CLI
+add-node and remove-node surface, and the single-node-to-first-replica
+bootstrap that #69's staged path assumes. Scope is the transitions between
+states; the SWIM signal itself (#74), the Raft commit semantics (#73), and the
+slot-migration mechanics (#75) are owned elsewhere and only invoked here.
+
+## Design
+
+### Seed and MEET join
+
+- A new node boots with a seed list (operator-supplied or CLI add-node). It
+  contacts a seed, which performs a MEET-equivalent handshake: the seed
+  introduces the joiner into the SWIM membership view (#74) so the rest of the
+  ring learns of it through normal gossip, no full-mesh fan-out.
+- SWIM membership is a hint, not authority: a node SWIM has discovered is not
+  yet part of the cluster's committed state. Per #74's contract, SWIM proposes
+  and Raft commits, so MEET only makes a node a candidate for promotion.
+
+### Learner to voter to slot-owner promotion
+
+- A SWIM-discovered node is first admitted to the Raft control plane (#73) as a
+  non-voting learner: it receives committed slot-map deltas and config-epoch
+  updates but does not vote, so it cannot affect commit latency or quorum while
+  it catches up [raft-overview].
+- A learner is promoted to voter only by an explicit committed control-plane
+  decision (#73), keeping the voter set small (the #73 3-to-5 voter group) and
+  the slot map linearizable. Voter promotion is a control-plane role change, not
+  a data assignment.
+- Becoming a slot owner is the last and separate step: the control plane assigns
+  slots and, for a replica being promoted toward ownership, applies a
+  replication-lag gate before the node is eligible, since replication is async
+  (ADR-0026). Replica handoff reuses PSYNC2-style secondary-replid resync so
+  promotion does not force a full resync [redis-psync2-secondary-replid]. Slot
+  movement itself runs through migration (#75) under MOVED/ASK.
+
+### Add/remove-node operator surface
+
+- add-node (CLI/operator) supplies a seed and triggers the MEET handshake, then
+  the staged learner-to-voter-to-owner path above; the operator observes each
+  stage via CLUSTER SHARDS health/role (#74) rather than poking internal state.
+- remove-node is the reverse and drains first: slots are migrated off (#75),
+  the node is demoted from voter to learner to leave the quorum cleanly, then
+  removed from the committed membership (#73) and finally from the SWIM view
+  (#74). A node is never removed from the map while it still owns a slot.
+
+### Single-node to first-replica bootstrap
+
+- A single node boots as a degenerate one-voter control plane owning all 16384
+  slots, consistent with #69's single-node-first staged layout. The first
+  replica joins by the same seed/MEET path, enters as a learner, and attaches as
+  an async replica of the primary under ADR-0026 (replica-read-only on,
+  min-replicas guardrails inactive at one replica). This is the transition #69
+  assumes but does not specify: the inter-stage step from standalone to a
+  primary-with-replica pair.
+
+## Open questions
+
+- The replication-lag threshold that gates a replica from learner to
+  slot-owner-eligible (ties to ADR-0026 min-replicas-max-lag and #73's
+  promotion-policy open decision).
+- Whether learner admission to Raft is automatic on SWIM discovery or requires
+  an explicit operator add-node (#73 lists data-nodes-as-learners as open).
+- Seed-list bootstrapping when all seeds are down, and how MEET interacts with a
+  partitioned SWIM view.
+
+## Acceptance and test hooks
+
+- A node added by seed/MEET appears as a SWIM hint, then a Raft learner, then a
+  voter, then a slot owner, with each stage visible in CLUSTER SHARDS and never
+  skipped.
+- remove-node drains all slots (#75) and demotes through learner before the
+  node leaves committed membership; no slot is orphaned.
+- A standalone node accepts a first replica via MEET and reaches a
+  primary-with-async-replica pair per #69 and ADR-0026.
+
+## References
+
+- ADR-0026; issues #149, #73, #74, #69, #75, #1; specs CONTROL_PLANE (#73),
+  MEMBERSHIP (#74).
+- Claims: [raft-overview], [redis-psync2-secondary-replid].
@@ -96,3 +96,7 @@ Specs added as the M1 milestone progresses.
   authoritative slot map, config epoch, membership, and replica promotion (#73).
 - [MEMBERSHIP.md](MEMBERSHIP.md): SWIM + non-optional Lifeguard data-plane
   membership and failure detection, joined with the Raft-committed map (#74).
+- [REPLICA_READ.md](REPLICA_READ.md): the replica-read contract (READONLY/
+  READWRITE, replica routing, bounded staleness surfaced to clients) (#147).
+- [NODE_LIFECYCLE.md](NODE_LIFECYCLE.md): cluster bootstrap and node lifecycle
+  (seed/MEET join, learner to voter to slot-owner promotion, add/remove-node) (#149).
@@ -0,0 +1,86 @@
+# Design: Replica-read contract (READONLY/READWRITE, replica routing, bounded staleness)
+
+Issue: #147. Decisions: ADR-0026 (async primary/replica default, best-effort not
+CP, replica-read-only on). Related: #70 (CLUSTER_CONTRACT: 16384 slots, CRC16,
+MOVED/ASK), #76 (replication default), #1 (vision).
+
+## Goal and scope
+
+Redis Cluster clients scale reads by sending READONLY on a connection and then
+routing reads to replicas; this is part of the wire contract IronCache promises
+to keep. ADR-0026 fixes replica-read-only on, so replicas reject writes, but it
+does not decide whether clients may READ from replicas, how the
+READONLY/READWRITE connection-state pair behaves, how a replica answers for the
+slots it serves, or how async-replication staleness is bounded and surfaced.
+This spec owns that command-pair and consistency contract. Scope: the
+per-connection READONLY/READWRITE state machine, replica read routing under the
+#70 slot view, and the bounded-staleness signal surfaced to clients. Out of
+scope: the slot map authority (#73), migration redirection mechanics (#70), and
+the write path (#76).
+
+## Design
+
+### READONLY/READWRITE connection state
+
+- A connection carries one bit: read-write (default) or read-only. READONLY
+  sets the bit; READWRITE clears it. The bit is per-connection, not global, and
+  is unaffected by the node role. This mirrors the Redis Cluster READONLY/
+  READWRITE pair that lets a replica serve reads for slots it replicates
+  [redis-cluster-readonly-replica].
+- On a replica, a read for an owned-or-replicated slot succeeds only when the
+  read-only bit is set; otherwise the replica returns MOVED to the primary, so
+  a default (read-write) connection keeps the strong-read behavior unmodified
+  clients expect [redis-cluster-readonly-replica]. Writes on a replica are
+  always rejected per ADR-0026's replica-read-only posture, independent of the
+  bit.
+
+### Replica read routing
+
+- Slot ownership and the CLUSTER SLOTS/SHARDS projection come from
+  CLUSTER_CONTRACT (#70); this spec only adds the replica leg. A read-only
+  connection whose key hashes (CRC16 mod 16384, #70) to a slot this replica
+  replicates is answered locally; a key for a slot this node neither owns nor
+  replicates returns MOVED, driving the client's normal map refresh.
+- Because replication is asynchronous (ADR-0026), a replica read may observe a
+  value older than the primary. This is the Envoy ReadPolicy model: non-primary
+  read targets may return stale data due to async replication
+  [envoy-redis-readpolicy] [redis-cluster-async-replication]. IronCache does
+  not silently proxy reads to the primary to hide this; the client chose the
+  replica by setting READONLY and is told the staleness bound.
+
+### Bounded staleness surfaced to clients
+
+- Each replica tracks its replication lag against the primary using the same
+  in-sync signal ADR-0026 bounds with min-replicas-max-lag. A replica whose lag
+  exceeds the configured staleness bound stops serving read-only reads for its
+  slots and returns MOVED, so a client never silently reads beyond the bound.
+- The bound is observable, not just enforced: it is exposed through INFO
+  replication fields and the CLUSTER SHARDS health/role projection (#70), so a
+  client or operator can reason about the worst-case staleness of any replica
+  read. This makes the best-effort-not-CP property of ADR-0026 legible at the
+  read path rather than hidden.
+
+## Open questions
+
+- Whether to expose an Envoy-style per-request ReadPolicy hint
+  (PREFER_REPLICA/PREFER_MASTER) beyond the binary READONLY/READWRITE bit
+  [envoy-redis-readpolicy], or keep the Redis-native pair only for v1.
+- The exact default staleness bound, and whether it is derived from
+  min-replicas-max-lag (ADR-0026) or set independently per keyspace.
+- How a replica that crosses the staleness bound interacts with the #70 ASK
+  path during an in-flight slot migration.
+
+## Acceptance and test hooks
+
+- A READONLY connection reads from a replica for a replicated slot; the same
+  connection after READWRITE gets MOVED to the primary for that slot.
+- A replica past its staleness bound returns MOVED for read-only reads and its
+  lag is visible in INFO/CLUSTER SHARDS before and after crossing the bound.
+- Unmodified redis-cli, go-redis, lettuce, and ioredis route replica reads via
+  READONLY without errors, matching the #70 contract.
+
+## References
+
+- ADR-0026; issues #147, #76, #70, #1; specs CLUSTER_CONTRACT (#70).
+- Claims: [redis-cluster-readonly-replica], [envoy-redis-readpolicy],
+  [redis-cluster-async-replication].
@@ -6982,3 +6982,34 @@ claims:
     note: rustsec.org states the database is maintained by the Rust Secure Code Working Group, covers
       crates published via crates.io, uses RUSTSEC-YYYY-NNNN IDs (example RUSTSEC-2022-0051), and exports
       to OSV in real time for the GitHub Advisory Database/Dependabot. Cross-checked against github.com/rustsec/advisory-db.
+- id: redis-cluster-readonly-replica
+  dimension: resp-protocol-compat
+  system: Redis Cluster (READONLY / READWRITE connection commands)
+  version: 'Redis 3.0.0+ (current docs: redis.io latest)'
+  claim: Redis Cluster surfaces replica reads with bounded staleness as a per-connection contract via
+    the READONLY and READWRITE commands. READONLY tells a Redis Cluster replica node that the client is
+    willing to read possibly stale data and is not interested in running write queries; with the connection
+    in readonly mode the cluster only sends a redirection to the client when the operation involves keys
+    not served by the replica's master (e.g. slots never owned by that master, or after a resharding).
+    READWRITE resets the readonly flag back to the default, where read queries against a replica are redirected
+    to the authoritative master. Both commands have been available since Redis 3.0.0 and belong to the
+    `cluster` command group.
+  value: READONLY = opt-in per-connection 'willing to read possibly stale data' from replica (redirect
+    only for keys not served by replica's master); READWRITE resets to default; since Redis 3.0.0; group=cluster
+  source_url: https://redis.io/docs/latest/commands/readonly/
+  accessed_date: '2026-06-13'
+  confidence: high
+  confidence_reason: Read directly from the official redis.io command reference for READONLY; the 'willing
+    to read possibly stale data' wording, redirection conditions, 'since 3.0.0' version, and 'cluster'
+    command group are quoted from the upstream page, with READWRITE semantics corroborated by the companion
+    redis.io page.
+  load_bearing: false
+  verification:
+    verdict: confirmed
+    best_source_url: https://redis.io/docs/latest/commands/readonly/
+    note: 'WebFetch of redis.io/docs/latest/commands/readonly confirmed the verbatim description (''willing
+      to read possibly stale data and is not interested in running write queries''), the redirection-only-for-unserved-keys
+      behavior, ''Available since: 3.0.0'', and ''Group: cluster''. The complementary READWRITE command
+      (redis.io/docs/latest/commands/readwrite) was surfaced in the same search and resets the readonly
+      flag; redis-py-cluster docs independently note replicas may not return latest data due to asynchronous
+      replication, consistent with the bounded-staleness framing.'