ELares · ELares · Jun 14, 2026 · Jun 14, 2026
@@ -0,0 +1,104 @@
+# Design: List large representation (quicklist-equivalent chunked deque)
+
+Issue: #135. Decisions: ADR-0018 (encoding thresholds), ADR-0005 (per-shard
+unsynchronized map), ADR-0009 (behavioral equivalence). Related: #113 (small
+listpack list chunk), #35 (index), #40 (OBJECT ENCODING name and ql_* fields),
+#136 (large-collection-bakeoff), #128 (list command semantics), #52
+(value compression), #8 (harness).
+
+## Goal and scope
+
+A list that outgrows a single small listpack chunk (ADR-0018) needs a structure
+with O(1) head and tail operations and bounded per-node memory, the quicklist
+contract. This spec fixes the chunked-deque shape, the node-size policy, the
+traversal model for the interior commands, and how chunks split and merge, plus
+the ql_nodes/ql_avg_node fields #40 must synthesize. Scope is the representation
+above the listpack threshold; the small chunk is #113, the threshold is
+ADR-0018/#37, and the flat-deque-versus-indexed-chunk choice is the #136
+bake-off. This spec sets the provisional flat baseline and the contract.
+
+## Design
+
+### Chunked deque of listpack nodes
+
+- The large list is a deque of compact listpack chunks, the quicklist shape
+  Redis uses [redis-list-max-listpack-size-neg2]. Each chunk is one contiguous
+  listpack with the ~6-byte header (total bytes plus element count) and a 1-byte
+  terminator [redis-listpack-header-6-bytes], holding a run of elements in order;
+  the chunks are linked head-to-tail. The whole list is one value on one core
+  (ADR-0005), so no chunk link is synchronized.
+- The provisional structure is a flat doubly linked deque of chunks (a plain
+  prev/next chain). It is provisional because #136 evaluates an indexed chunk
+  structure (a small B-tree or rope of chunks) for faster positional access; this
+  spec commits to the chunk-deque trait, not to flat-versus-indexed.
+
+### Node sizing (~8 KB)
+
+- A chunk's byte budget maps to list-max-listpack-size -2, the Redis default that
+  caps each node's listpack at 8 KB rather than an element count
+  [redis-list-max-listpack-size-neg2]. A push that would exceed the budget starts
+  a new chunk; the cap keeps each node cache-resident and bounds the cost of an
+  interior memmove within one chunk. IronCache stores only the listpack bytes per
+  chunk, contrasting Redis's 32-byte quicklistNode struct (prev/next, listpack
+  ptr, sz, count, and bitfields) [redis-quicklist-node-32-bytes]; interior-node
+  LZF compression [redis-quicklist-node-32-bytes] is a design choice deferred to
+  COMPRESSION.md (#52), not adopted here.
+
+### Head/tail O(1) and interior traversal
+
+- LPUSH/RPUSH/LPOP/RPOP touch only the head or tail chunk: an append or pop
+  inside that chunk's listpack, allocating or freeing a chunk only at the budget
+  boundary, so end operations are O(1) amortized.
+- LINDEX/LRANGE/LSET/LINSERT walk the chunk chain accumulating element counts to
+  locate the target chunk, then scan within it. Each chunk carries its element
+  count in the listpack header [redis-listpack-header-6-bytes], so locating a
+  chunk by index is a walk over chunk counts, not over every element; the flat
+  baseline makes this O(number of chunks), which is the cost #136 weighs against
+  an indexed variant. LSET rewrites one entry in place; LINSERT inserts into the
+  target chunk's listpack with at most one tail memmove within that chunk.
+
+### Chunk split and merge
+
+- An insert that pushes a chunk past the ~8 KB budget
+  [redis-list-max-listpack-size-neg2] splits it into two chunks at an element
+  boundary near the midpoint. Deletions that leave two adjacent chunks jointly
+  under the budget merge them, bounding chunk count and keeping ql_avg_node
+  meaningful. The merge low-watermark (how empty before merging) is harness-tuned
+  (#8), a churn-versus-resident-bytes trade, not fixed here.
+
+### ql_nodes and ql_avg_node derivation
+
+- ql_nodes is the live chunk count; ql_avg_node is total element count divided by
+  ql_nodes. Both are computed from the deque IronCache actually holds and
+  surfaced through DEBUG OBJECT for `quicklist` keys [redis-quicklist-node-32-bytes],
+  the synthesis #40 wires in. They reflect IronCache chunking, not a Redis node
+  layout, and are a pure function of the current representation (#40).
+
+## Open questions
+
+- Flat doubly linked chunk chain vs an indexed chunk structure (small B-tree or
+  rope) for positional access, decided by #136 on throughput-per-core and
+  bytes-per-element.
+- The chunk split point (strict midpoint vs fill-the-tail) and the merge
+  low-watermark, tuned on the harness (#8).
+- Whether a chunk reuses the #113 `pack` exactly or a length-only variant sized
+  to the ~8 KB cap [redis-list-max-listpack-size-neg2].
+
+## Acceptance and test hooks
+
+- LPUSH/RPUSH/LPOP/RPOP touch only the end chunk and allocate or free a chunk
+  only at the byte budget (O(1) amortized, structural test).
+- An interior LINSERT performs at most one tail memmove within the target chunk
+  and never rewrites another chunk; no chunk exceeds the ~8 KB budget after split
+  [redis-list-max-listpack-size-neg2] (property test).
+- DEBUG OBJECT reports ql_nodes equal to the live chunk count and a consistent
+  ql_avg_node [redis-quicklist-node-32-bytes]; OBJECT ENCODING reports
+  `quicklist` [valkey-assert-encoding-vocab] (ADR-0009, name map #40).
+- LINDEX/LRANGE/LSET match the oracle across chunk boundaries (#97/#98, #128).
+
+## References
+
+- ADR-0005, ADR-0009, ADR-0018; issues #113, #35, #40, #136, #128, #52, #37,
+  #8, #97, #98.
+- Claims: [redis-list-max-listpack-size-neg2], [redis-listpack-header-6-bytes],
+  [redis-quicklist-node-32-bytes], [valkey-assert-encoding-vocab].
@@ -0,0 +1,100 @@
+# Design: OBJECT ENCODING / DEBUG OBJECT compatibility mapping
+
+Issue: #40. Decisions: ADR-0009 (behavioral equivalence via OBJECT ENCODING),
+ADR-0018 (encoding thresholds). Related: #35 (index, parent), #111 (object
+layout), #112 (scalar encodings), #113 (collection container), #134 (large
+zset), #135 (large list), #95 (conformance), #150 (DEBUG OBJECT command).
+
+## Goal and scope
+
+Clients and conformance suites introspect storage through OBJECT ENCODING and
+DEBUG OBJECT and branch on the exact synthetic name returned, so IronCache must
+report Redis-vocabulary names even though its internal representations are chosen
+for a Rust runtime, not Redis's C internals (ADR-0009). This spec fixes the total
+function from every internal representation to one reported name, the DEBUG
+OBJECT field synthesis, and the assert_encoding wiring. Out of scope are the
+structures themselves (#35, #112, #113, #134, #135) and the thresholds at which
+they convert (ADR-0018/#37); this spec reports the active representation's name,
+it does not decide the representation.
+
+## Design
+
+### The representation-to-name table (total function)
+
+- The reported vocabulary is the eight Redis synthetic names the conformance
+  suite asserts on [valkey-assert-encoding-vocab]: embstr, int, raw, listpack,
+  intset, hashtable, skiplist, quicklist. The mapping is a total function: each
+  internal representation maps to exactly one name, never two. Issue #40's
+  acceptance table collapsed embstr/raw into a single bullet and left the
+  embstr-vs-raw split as an open decision; this spec keeps both names, matching
+  ENCODINGS.md, which reports out-of-line strings as the `raw`-class.
+- String types: a pointer-tagged inline integer (#112) reports `int`; an inline
+  short string (SSO, the embstr-class up to the inline threshold
+  [redis-embstr-threshold-44]) reports `embstr`; an out-of-line string with a
+  variable-width header [redis-sds-header-variants] reports `raw`. The embstr/raw
+  boundary is the inline-value threshold (#111), reported off the current
+  representation, not recomputed from config.
+- Collection types: the small universal `pack` container (#113) reports
+  `listpack` for hash, list, set, and zset alike; the all-integer sorted-array
+  analog [redis-intset-layout] reports `intset`. The large hash and set report
+  `hashtable`, the large sorted set (#134) reports `skiplist`, and the chunked
+  list deque (#135) reports `quicklist`. The borrowed name `quicklist` describes
+  the chunked shape, not the 32-byte Redis node layout [redis-quicklist-node-32-bytes].
+
+### Name derives from representation, not from thresholds
+
+- The reported name is a pure function of the active internal representation, so
+  reconfiguring an ADR-0018 threshold (which changes WHEN a value converts) never
+  changes the name reported for a value that has not converted. Two keys of the
+  same logical type report different names exactly when their representations
+  differ (for example a 50-member zset listpack vs a 5000-member zset skiplist),
+  matching the oracle (ADR-0009).
+
+### DEBUG OBJECT field synthesis
+
+- DEBUG OBJECT emits a line with `encoding:<name>` from the same function above,
+  so OBJECT ENCODING and DEBUG OBJECT always agree on the name. Fields IronCache
+  can compute honestly are synthesized: `serializedlength` from the value's
+  encoded byte size, and for `quicklist` keys `ql_nodes` (the live chunk count)
+  and `ql_avg_node` (elements per chunk), both derived from IronCache's chunking
+  (#135) rather than a Redis node count [redis-quicklist-node-32-bytes]. Fields
+  that name a Redis-internal IronCache does not have are omitted rather than
+  emitted as fabricated zeros, so no test asserts on an invented internal.
+
+### assert_encoding wiring and rejected alternatives
+
+- The conformance suite adopts Valkey's assert_encoding helper, which runs OBJECT
+  ENCODING and matches the expected name from the same vocabulary
+  [valkey-assert-encoding-vocab], treating a mismatch as a correctness failure
+  (#95). Reporting native names (`btree-zset`, `radix-hash`) even behind a flag
+  is rejected: it would fork the test corpus and defeat compatibility. A separate
+  read-only native-introspection verb for IronCache's own debugging is left open
+  and would never be OBJECT ENCODING.
+
+## Open questions
+
+- The exact embstr-vs-raw byte boundary (the inline-value threshold shared with
+  #111), and whether any string ever reports `raw` below it.
+- Which DEBUG OBJECT fields beyond serializedlength/ql_nodes/ql_avg_node are
+  load-bearing for the target suites, surfaced as #95 enumerates them.
+- Whether a native-name introspection command is worth adding for debugging
+  (separate verb, never OBJECT ENCODING).
+
+## Acceptance and test hooks
+
+- Every internal representation maps to exactly one name from {embstr, int, raw,
+  listpack, intset, hashtable, skiplist, quicklist} (a documented total-function
+  table, unit-tested for totality).
+- OBJECT ENCODING and DEBUG OBJECT agree on the name for the same key, and the
+  name does not change when only thresholds are reconfigured (property test).
+- assert_encoding passes against IronCache across the size ladder and at every
+  conversion boundary [valkey-assert-encoding-vocab] (#95/#97/#98).
+- A `quicklist` key returns a plausible ql_nodes derived from IronCache chunking
+  [redis-quicklist-node-32-bytes] (#135).
+
+## References
+
+- ADR-0009, ADR-0018; issues #35, #111, #112, #113, #134, #135, #95, #97, #98,
+  #150.
+- Claims: [valkey-assert-encoding-vocab], [redis-quicklist-node-32-bytes],
+  [redis-embstr-threshold-44], [redis-sds-header-variants], [redis-intset-layout].
@@ -119,3 +119,9 @@ Specs added as the M1 milestone progresses.
   monoio/glommio/tokio swappable (#27).
 - [IOURING_DATAPATH.md](IOURING_DATAPATH.md): the Linux io_uring net fast path
   (per-shard ring, registered fixed buffers, multishot + one-shot fallback) (#28).
+- [ZSET_LARGE.md](ZSET_LARGE.md): the large sorted-set representation (ordered
+  index plus parallel member->score map; final structure deferred to #136) (#134).
+- [LIST_LARGE.md](LIST_LARGE.md): the large list (quicklist-equivalent chunked
+  listpack deque, O(1) head/tail, ~8KB node sizing) (#135).
+- [OBJECT_ENCODING_MAPPING.md](OBJECT_ENCODING_MAPPING.md): the internal-repr to
+  OBJECT ENCODING name map and DEBUG OBJECT field synthesis (#40).
@@ -0,0 +1,98 @@
+# Design: Sorted-set large representation (ordered index plus member map)
+
+Issue: #134. Decisions: ADR-0018 (encoding thresholds), ADR-0005 (per-shard
+unsynchronized map), ADR-0009 (behavioral equivalence). Related: #113 (small
+listpack zset), #35 (index), #40 (OBJECT ENCODING name), #136
+(large-collection-bakeoff), #128 (zset command semantics), #8 (harness).
+
+## Goal and scope
+
+A sorted set that outgrows the small listpack container (ADR-0018) needs a
+structure that serves both an ordered range/rank query and an O(1) member point
+lookup, the two access patterns the zset command set demands. This spec fixes the
+two-structure shape, the sync invariant that keeps them consistent on one core,
+the ordering contracts behind ZRANGEBYSCORE and ZRANGEBYLEX, and which knobs are
+harness parameters rather than fixed numbers. Scope is the representation above
+the listpack threshold only. The promotion thresholds are ADR-0018/#37, the small
+container is #113, and the final choice of ordered-index structure is the #136
+bake-off; this spec sets the provisional baseline and the contract every
+candidate must satisfy, not the winner.
+
+## Design
+
+### Two structures, one value
+
+- The large zset is a dual structure mirroring Redis: an ordered index keyed by
+  (score, member) for range and rank, plus a parallel hashmap from member to
+  score for O(1) ZSCORE and ZADD score-update [redis-zset-skiplist-plus-ht]. The
+  member bytes are stored once and shared between both views, so a member is not
+  duplicated per structure [redis-zset-skiplist-plus-ht]. The whole value lives
+  in one kvobj on one core (ADR-0005), so neither structure takes a lock.
+- The provisional ordered index is a skiplist [redis-zset-skiplist-plus-ht]. It
+  is provisional because the #136 bake-off evaluates a cache-conscious B-tree and
+  an ART against it on throughput-per-core and bytes-per-element; a B-tree packs
+  many keys per cache line versus the skiplist's one element per tower node
+  [skiplist-vs-btree-cache], and ART keeps keys ordered at a low per-key byte
+  cost [art-adaptive-radix-tree-icde13]. This spec commits to the trait the index
+  sits behind, not the structure that wins.
+
+### The sync invariant
+
+- Every member appears in exactly one of two states: present in BOTH the ordered
+  index and the member map with the same score, or present in NEITHER. There is
+  no transient single-structure state observable to a command, because all
+  mutation runs inline on the owning core (ADR-0005) with no yield point inside a
+  zset write. ZADD that updates a score is a remove-then-reinsert in the ordered
+  index plus an in-place score rewrite in the map; ZREM deletes from both. A
+  property test asserts the two views agree on membership and score after every
+  operation (the sync invariant).
+
+### Ordering: ZRANGEBYSCORE vs ZRANGEBYLEX
+
+- The ordered index is sorted by (score, member): primarily ascending score,
+  ties broken by member byte order, the ordering Redis defines for a skiplist
+  zset [redis-zset-skiplist-plus-ht]. ZRANGEBYSCORE, ZRANK, and ZRANGE by index
+  walk this order directly, forward or reversed.
+- ZRANGEBYLEX assumes all members share one score and returns a purely
+  lexicographic member range. Because the index already breaks score ties by
+  member bytes, the equal-score run is contiguous and already in member order, so
+  ZRANGEBYLEX is a sub-scan of that run with no second index. Its result is
+  defined only when scores are equal, matching the oracle (ADR-0009, #128).
+
+### Level and fanout as harness parameters
+
+- For the skiplist baseline the max level and the level-promotion probability are
+  harness parameters (#8), not fixed here; for a B-tree or ART candidate the
+  analogous knob is node fanout. They are swept in the #136 bake-off because the
+  right value depends on IronCache's value layout and the thread-per-core engine,
+  where cross-paper numbers do not transfer.
+
+## Open questions
+
+- The final ordered-index structure (skiplist vs cache-conscious B-tree vs ART),
+  decided by #136 on throughput-per-core and bytes-per-element.
+- Whether the member map is a distinct per-zset hashbrown table or folds into the
+  ordered index nodes once the structure is chosen (#136), and the score-update
+  path's exact cost under each.
+- Whether a maintained rank/size annotation is worth its bytes for O(log n) ZRANK
+  versus a counted walk, tuned on the harness (#8).
+
+## Acceptance and test hooks
+
+- After any ZADD/ZREM/ZINCRBY the ordered index and the member map agree on
+  membership and score for every member (the sync invariant, property test).
+- ZSCORE is a single member-map lookup with no ordered-index walk; ZRANGEBYSCORE
+  and ZRANK walk the (score, member) order and match the oracle (#97/#98).
+- ZRANGEBYLEX over an equal-score set returns the lexicographic member range and
+  matches the oracle; mixed scores follow the oracle's defined behavior
+  (#97/#98, #128).
+- OBJECT ENCODING reports `skiplist` for the large zset regardless of the chosen
+  internal structure [valkey-assert-encoding-vocab] (ADR-0009, name map in #40).
+
+## References
+
+- ADR-0005, ADR-0009, ADR-0018; issues #113, #35, #40, #136, #128, #37, #8,
+  #97, #98.
+- Claims: [redis-zset-skiplist-plus-ht], [redis-zset-max-listpack-entries-128],
+  [skiplist-vs-btree-cache], [art-adaptive-radix-tree-icde13],
+  [valkey-assert-encoding-vocab].