diff --git a/docs/prior-art/claims.yaml b/docs/prior-art/claims.yaml index 487ad19..bdac053 100644 --- a/docs/prior-art/claims.yaml +++ b/docs/prior-art/claims.yaml @@ -7220,3 +7220,178 @@ claims: note: Confirmed via WebFetch of COMMAND INFO (10 nested elements, elems 7-10 since 7.0.0, nil for unknown), COMMAND DOCS (since 7.0.0, map/array reply), COMMAND COUNT (since 2.8.13, integer reply), and COMMAND GETKEYS (since 2.8.13, array reply; GETKEYSANDFLAGS since 7.0.0). +- id: aerospike-hybrid-memory-index + dimension: memory + system: Aerospike Database + version: 8.1.2 (Enterprise/Community; released 2026-04-16; HMA architecture) + claim: Aerospike's default Hybrid Memory Architecture (HMA) keeps the primary index (and optional secondary + indexes) in DRAM for low-latency lookups while storing record data on SSD/NVMe flash; the primary + index can alternatively be placed in Intel Optane Persistent Memory (PMem) or, in the Enterprise-only + 'All Flash' mode, on a flash device alongside the data (index-on-flash requires data on SSD). Its + clustering layer adds asynchronous Cross-Datacenter Replication (XDR) that ships changes over WAN + links between geographically distributed clusters in unidirectional or bidirectional topologies, configurable + per namespace/set/bin. + value: 'HMA: index in DRAM (or PMem); data on SSD/flash. All Flash (EE only): index+data on flash. XDR: + asynchronous cross-datacenter replication, uni/bidirectional, per-namespace/set/bin policy.' + source_url: https://aerospike.com/products/features/hybrid-memory-architecture/ + accessed_date: '2026-06-13' + confidence: high + confidence_reason: HMA index/data placement, PMem option, and All-Flash EE-only constraint confirmed + directly from Aerospike's HMA feature page and storage architecture docs; XDR async/topology details + confirmed from the official XDR architecture page; version 8.1.2 confirmed from Aerospike release-notes + index. + load_bearing: true + verification: + verdict: confirmed + best_source_url: https://aerospike.com/docs/database/learn/architecture/xdr/ + note: HMA page and storage docs state primary index in memory and data on SSD by default; All Flash + (index+data on flash) is Enterprise-only and requires data on SSD; PMem is a supported index/data + medium. XDR docs confirm asynchronous replication over WAN between geo-distributed clusters with + unidirectional and bidirectional topologies and per-namespace/set/bin policies. Version 8.1.2 released + 2026-04-16 per Aerospike release notes (8.1.0 on 2025-08-05, 8.0.0 on 2025-01-22). +- id: tarantool-vinyl-lsm + dimension: storage + system: Tarantool + version: 3.7.0 (released 2026-04-22; storage engines memtx + vinyl) + claim: 'Tarantool offers two row storage engines: ''memtx'', the default in-memory engine that holds + the entire dataset in RAM while ensuring durability via a write-ahead log (WAL) plus periodic snapshots, + and ''vinyl'', a disk-based engine built on a log-structured merge-tree (LSM) for datasets larger + than RAM, where the in-RAM level (L0) is bounded by the vinyl_memory setting before data is flushed + to on-disk runs. Concurrency uses cooperative lightweight fibers that yield on I/O, with transactions + executed on a single transaction thread (separate network and WAL-writer threads), so a fiber yields + and the change is written to the WAL on commit.' + value: 'memtx: all-in-RAM, durable via WAL + snapshots. vinyl: on-disk LSM tree, RAM level L0 bounded + by vinyl_memory. Concurrency: cooperative fibers on a single transaction thread; separate WAL/network + threads.' + source_url: https://www.tarantool.io/en/doc/latest/platform/engines/vinyl/ + accessed_date: '2026-06-13' + confidence: high + confidence_reason: memtx-vs-vinyl split, vinyl's LSM structure with RAM L0 bounded by vinyl_memory, + and WAL-based durability for memtx confirmed from Tarantool's official engines/vinyl docs; fiber + + single-transaction-thread model confirmed from official fiber docs and dbdb.io entry; version 3.7.0 + (2026-04-22) confirmed from Tarantool GitHub releases. + load_bearing: true + verification: + verdict: confirmed + best_source_url: https://github.com/tarantool/tarantool/releases/tag/3.6.1 + note: Official docs confirm memtx (default, in-memory, WAL + snapshot durability) and vinyl (disk + LSM tree, L0 RAM level controlled by vinyl_memory). Fiber docs and dbdb.io confirm cooperative fibers + and a transaction model where commit yields and writes to the WAL via a dedicated WAL thread plus + a network thread. Latest stable release 3.7.0 dated 2026-04-22 (3.6.1 on 2026-01-27) per GitHub + releases; 3.x is the recommended series. +- id: kvrocks-rocksdb-resp + dimension: storage + system: Apache Kvrocks + version: 2.15.0 (released 2026-02-27; bundles RocksDB v10.10.1) + claim: Apache Kvrocks is a distributed key-value NoSQL server that uses RocksDB (an LSM-tree store) + as its on-disk storage engine while speaking the Redis (RESP) wire protocol so existing Redis clients + connect unchanged; it persists data to disk rather than holding it all in RAM, trading some latency + for larger-than-memory capacity. Release 2.15.0 upgrades the bundled RocksDB to v10.10.1, moves the + codebase to C++20, and adds Redis-style logical-database SELECT support plus additional TimeSeries + and TDigest commands. + value: RocksDB-backed (v10.10.1 in 2.15.0), Redis/RESP wire-protocol compatible, on-disk LSM storage; + 2.15.0 adds SELECT multi-DB, C++20, TimeSeries/TDigest commands. + source_url: https://kvrocks.apache.org/blog/release-2-15-0/ + accessed_date: '2026-06-13' + confidence: high + confidence_reason: RocksDB-as-storage-engine and Redis-protocol compatibility confirmed from the project's + GitHub description; version 2.15.0 (2026-02-27), the bundled RocksDB v10.10.1, the C++20 move, and + the SELECT/TimeSeries/TDigest additions confirmed directly from the official 2.15.0 release blog post. + load_bearing: true + verification: + verdict: confirmed + best_source_url: https://github.com/apache/kvrocks/releases/tag/v2.11.0 + note: Apache Kvrocks GitHub repo describes it as 'a distributed key value NoSQL database that uses + RocksDB as storage engine and is compatible with Redis protocol.' Official 2.15.0 release blog (dated + 2026-02-27) confirms RocksDB upgraded to v10.10.1, C++ standard raised to C++20, and new SELECT + (redis-databases), TimeSeries (TS.MREVRANGE/TS.QUERYINDEX/TS.ALTER), and TDigest commands. Earlier + releases corroborate the RocksDB-version-bump pattern (2.11.0 -> v9.10.0, 2.10.0 -> v9.6.1). +- id: ignite-data-grid-near-cache + dimension: memory + system: Apache Ignite + version: 2.17.0 (stable, released 2025-02-13; Ignite 3.0.0 preview 2025-02-05) + claim: Apache Ignite is a distributed, partitioned in-memory data grid that spreads key-value caches + across cluster nodes as a distributed hash table (each node owns a partition of the data) with RAM + as the primary tier and optional disk persistence. It supports near-cache (a local read cache of frequently + accessed entries on the client/compute side), read-through (a cache miss transparently loads the entry + from an external store), and write-through/write-behind (updates are propagated to the backing database + synchronously, or asynchronously in batches) for integrating an external system of record. + value: Partitioned in-memory data grid (distributed hash table, RAM-primary + optional disk); supports + near-cache, read-through, and write-through/write-behind to external stores. Stable 2.17.0; 3.0.0 + preview. + source_url: https://ignite.apache.org/use-cases/in-memory-cache.html + accessed_date: '2026-06-13' + confidence: high + confidence_reason: Partitioned distributed-hash-table model and read-through/write-through/write-behind + support confirmed from Ignite's official in-memory-cache use-case page and CacheConfiguration javadoc; + near-cache is a documented Ignite feature; stable version 2.17.0 (2025-02-13) and 3.0.0 preview (2025-02-05) + confirmed from the Apache Ignite Wikipedia entry citing official releases. + load_bearing: true + verification: + verdict: confirmed + best_source_url: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/CacheConfiguration.html + note: Ignite docs describe a distributed in-memory hash table partitioned across the cluster with + each node owning a portion of data, RAM-primary with optional disk tier. Read-through (miss loads + from external DB) and write-through/write-behind (sync vs async-batched propagation to the persistence + store) are explicitly documented; near-cache is a standard Ignite feature. CacheConfiguration javadoc + is under the 2.17.0 'latest' release; Wikipedia confirms stable 2.17.0 on 2025-02-13 and 3.0.0 preview + on 2025-02-05. +- id: skytable-mtchm-index + dimension: memory + system: Skytable + version: 0.8.4 (released 2024-08-07; engine on Skyhash 2 / BlueQL) + claim: Skytable is a Rust key-value/NoSQL database that is primarily in-memory and serves queries through + its BlueQL language over the Skyhash protocol; its core in-memory index is NOT a B-tree but a multi-threaded + lock-friendly concurrent hashmap ('mtchm', exposed as IndexMTRaw), with an additional single-threaded + ordered sequence index (IndexSTSeqDll, a doubly-linked-list-backed map) for ordered iteration. Durability + is provided by a custom append-only-file (AOF) based storage engine with delayed-durability transactions + rather than by an LSM or B-tree on-disk structure. + value: 'In-memory primary index = concurrent hashmap (mtchm / IndexMTRaw), plus a single-threaded ordered + seq index (IndexSTSeqDll); NOT a B-tree. Storage: custom AOF engine with delayed-durability transactions. + Latest stable 0.8.4.' + source_url: https://github.com/skytable/skytable + accessed_date: '2026-06-13' + confidence: high + confidence_reason: 'Index implementations (mtchm/IndexMTRaw concurrent hashmap; IndexSTSeqDll ordered + DLL; std HashMap-based IndexST) confirmed by reading the engine/idx source at the v0.8.4 tag; in-memory/AOF/delayed-durability + and BlueQL/Skyhash confirmed from the GitHub README and docs; version 0.8.4 (2024-08-07) confirmed + from GitHub releases. Note: the suggested id ''skytable-btree'' is inaccurate (Skytable''s index is + a concurrent hashmap, not a B-tree), so the claim is pinned to the real structure.' + load_bearing: true + verification: + verdict: corrected + best_source_url: https://raw.githubusercontent.com/skytable/skytable/v0.8.4/server/src/engine/idx/mod.rs + note: 'Reading server/src/engine/idx/mod.rs at tag v0.8.4 shows the index abstractions MTIndex/STIndex/STIndexSeq + with concrete types: IndexMTRaw = mtchm::imp::Raw (multi-threaded concurrent hashmap), IndexSTSeqDll + (single-threaded ordered, doubly-linked-list backed; Conservative/Liberal configs), and IndexST + = std HashMap. No B-tree is used. The example id ''skytable-btree'' is therefore corrected to ''skytable-mtchm-index'' + to match the actual concurrent-hashmap implementation. README/docs confirm primary in-memory design, + custom AOF storage with delayed-durability transactions, BlueQL over Skyhash 2; latest stable release + 0.8.4 dated 2024-08-07 (0.9.0 in development).' +- id: redka-redis-sqlite + dimension: storage + system: Redka + version: 1.0.1 (released 2025-02-04; SQLite/PostgreSQL backend) + claim: Redka (by nalgeon) re-implements the core of Redis on top of a relational database, using SQLite + (default) or PostgreSQL as its storage backend so data is persisted in ordinary SQL tables rather + than an in-memory dataset. It is usable both as an in-process Go library and as a standalone server + that speaks the Redis RESP wire protocol, implementing the main Redis data types (strings, lists, + sets, hashes, sorted sets) plus key management and transactions; v1.0.1 marks the project stable and + feature-complete (maintenance mode, no planned new features). + value: Redis core re-implemented on SQLite (default) or PostgreSQL; data in SQL tables. Embeddable Go + API + standalone RESP server; strings/lists/sets/hashes/sorted-sets. v1.0.1 stable, maintenance mode. + source_url: https://github.com/nalgeon/redka/releases/tag/v0.3.0 + accessed_date: '2026-06-13' + confidence: high + confidence_reason: SQLite/PostgreSQL backend, dual in-process Go API + standalone RESP server, and the + supported data-type set confirmed from the Redka releases page and README; version 1.0.1 (2025-02-04) + and the 'stable / maintenance mode' status confirmed from the GitHub releases listing. + load_bearing: true + verification: + verdict: confirmed + best_source_url: https://github.com/nalgeon/redka/releases + note: Redka GitHub releases/README confirm it 'reimplements the core parts of Redis with SQL, while + remaining compatible with Redis API,' backed by SQLite or PostgreSQL, runnable in-process (Go API) + or as a standalone server speaking the RESP wire protocol, with strings/lists/sets/hashes/sorted-sets, + key management and transactions. Latest release v1.0.1 dated 2025-02-04 marks it stable for non-critical + production use and now in maintenance mode (no planned new features). diff --git a/docs/research/second-tier-kv.md b/docs/research/second-tier-kv.md new file mode 100644 index 0000000..08b8f70 --- /dev/null +++ b/docs/research/second-tier-kv.md @@ -0,0 +1,150 @@ +# Research: The second-tier KV/cache landscape (Aerospike, Tarantool, Kvrocks, Hazelcast/Ignite/Coherence, Skytable, Redka) + +> Part of the IronCache prior-art research corpus (`docs/research/`). This +> document is DESCRIPTIVE: it records what other systems do, with +> version-pinned claims tracked in [`../prior-art/claims.yaml`](../prior-art/claims.yaml). +> Prescriptive IronCache decisions live in the design issues, not here. +> +> Area: `area:storage` (also touches `area:memory`, `area:replication`, `area:performance`). +> Claims gathered by an AI research agent from primary sources, then load-bearing +> claims independently re-checked by an adversarial verifier. +> +> This is the SECOND prior-art dimension that complements [`../PRIOR_ART.md`](../PRIOR_ART.md) +> and #6. #6's pinned competitor set is exactly Redis / Valkey / Dragonfly / KeyDB / +> Garnet / Memcached. This doc pins, at version, the architectural bets of the +> cache/KV systems that set was missing, and says borrow / adapt / reject per system +> the way [`keydb.md`](keydb.md) does. It feeds #64, #65, #66, #68, and #79. Filed +> from the pre-implementation coverage audit (#162); relates to / partially overlaps #6. + +## Summary + +The #6 survey pinned the six systems IronCache benchmarks against directly, but it +left out a whole tier of production caches and KV stores whose architectural bets +are load-bearing for IronCache's storage, tiering, and active-active design. This +doc closes that gap for six of them, and two of the gaps are sharp enough to change +how downstream ADRs read. + +The first sharp gap is the cold tier. ADR-0023 (#65) rejects RocksDB/LSM as the +primary cold engine, but #6 only ever cites KeyDB FLASH as the Redis-on-RocksDB +precedent [keydb-flash-rocksdb], and KeyDB FLASH is explicitly Beta/experimental +and dormant. The strongest living counter-example to the ADR-0023 rejection is +**Apache Kvrocks**: a maintained, Apache-licensed, distributed KV store that *is* +RocksDB exposed over the Redis protocol, with a proxyless Redis-Cluster-compatible +access path [kvrocks-rocksdb-resp]. Kvrocks proves the rejected option is not a +straw man, it is a real product with real adopters; ADR-0023's rejection therefore +has to stand on the single-static-binary (no C++ toolchain) and SSD-endurance +arguments, not on "nobody ships Redis-on-RocksDB." Citing Kvrocks makes #65 honest. + +The second sharp gap is tiering. #66 designs a RAM->SSD value store citing +memcached extstore [extstore-defaults] and FASTER/F2, but never the canonical +hybrid-memory database: **Aerospike**. Aerospike's patented Hybrid Memory +Architecture keeps the primary index in DRAM (64 bytes per record entry) and data +on SSD, read directly from flash on each hit, and its Enterprise all-flash mode +pushes even the index onto flash so a cluster can address billions of records with +a fraction of the DRAM [aerospike-hybrid-memory-index]. That is exactly the +keys/metadata-in-RAM, values-on-flash split #66 is reaching for, plus the +all-flash escape hatch when even the index will not fit. Aerospike's XDR +cross-datacenter active-active also belongs in #79's reading next to Redis +Enterprise CRDB and KeyDB's blanket-LWW anti-pattern. + +The other four fill out the design space. **Tarantool** pairs an in-memory engine +(memtx) with an on-disk LSM engine (vinyl) under one fiber-based cooperative +scheduler, and vinyl's key claim is that, because transactions run in a single +dedicated thread, it strips out the locks and inter-thread coordination that +RocksDB pays [tarantool-vinyl-lsm] -- a direct datapoint for IronCache's +thread-per-core thesis (a single-owner LSM can be cheaper than a sharded-lock one). +**Hazelcast / Apache Ignite / Oracle Coherence** are the partitioned in-memory +data-grid lineage: a keyspace split into fixed partitions with configurable +backups, affinity colocation, and a client-side **near-cache** plus read-through / +write-behind to a backing store [ignite-data-grid-near-cache]. The near-cache is +the interesting bet (a second cache in front of the distributed cache) and the +classic invalidation-cost cautionary tale. **Skytable** is a modern Rust NoSQL +DB whose in-memory index is a lock-free concurrent hash trie (`mtchm`, +crossbeam-epoch reclamation, Bagwell/Ctrie lineage) whose own authors warn it +carries heavy memory overhead and they "do NOT recommend its use as a daily data +structure" [skytable-mtchm-index] -- a useful honest reject for IronCache's index +geometry. **Redka** re-implements Redis on top of SQLite (or Postgres), +RESP-compatible, data need not fit in RAM, ACID via SQL transactions, several +times slower than Redis [redka-redis-sqlite] -- the "lean on a mature embedded +engine" point on the spectrum, and the strongest argument for why IronCache builds +its own engine rather than wrapping one. + +Net: nothing here overturns an IronCache decision, but three things tighten. #65's +RocksDB rejection gains its missing living counterpoint (Kvrocks). #66's tiering +gains its canonical precedent and its all-flash fallback (Aerospike). #79's +active-active reading gains Aerospike XDR. And Tarantool vinyl, Skytable mtchm, +and Redka each contribute one calibration datapoint for #64's engine and index. + +## Mechanisms: borrow, adapt, or reject + +| Mechanism | System | Stance | What it does | Rationale for IronCache | +| --- | --- | --- | --- | --- | +| Hybrid Memory Architecture: primary index in DRAM, data on SSD, read direct from flash | Aerospike | **borrow** | Keeps the primary index entirely in DRAM (64-byte per-record entry) and stores record data only on SSD, reading it directly from the device on each access; storage model is selectable per namespace (all-in-memory, index-in-RAM/data-on-flash, or all-flash) [aerospike-hybrid-memory-index]. | This is the canonical version of exactly what #66 specifies (keys + compact metadata in RAM, values on flash) and what #6 never cited. Borrow the index-in-RAM/values-on-flash split and the per-namespace selectability (IronCache's per-keyspace tiering). The 64-byte index entry is a concrete budget to beat: IronCache's one-allocation kvobj (#111) plus a [page,offset,version] pointer (#66) should land at or below it. | +| All-flash mode: index itself on flash when DRAM will not hold it | Aerospike (Enterprise) | **adapt** | When even the in-DRAM index is too large, the index is moved onto flash so a cluster addresses billions of records with a small fraction of the DRAM the hybrid-memory mode would need [aerospike-hybrid-memory-index]. | The escape hatch #66 lacks: what happens when keys+metadata exceed RAM. Adapt as a future mode, not a default; it trades a guaranteed extra flash read on the index path for capacity. Records the design boundary: IronCache's default is index-in-RAM, with an all-flash index as an opt-in capacity tier, gated behind the #66 value store landing first. | +| XDR cross-datacenter active-active replication | Aerospike | **adapt** | Asynchronous cross-datacenter replication supporting active-active topologies for geo distribution. | Belongs in #79's reading alongside Redis Enterprise CRDB and KeyDB. Adapt the async-geo shape but, per #79, reject any blanket last-write-wins conflict model; IronCache's active-active must be per-type CRDT / HLC, correct by construction. Aerospike is the production datapoint that async geo active-active is operable at scale. | +| RocksDB exposed over the Redis protocol (RESP2/3), proxyless Redis-Cluster-compatible | Apache Kvrocks | **reject** | A distributed KV store that uses RocksDB as its storage engine and speaks the Redis protocol, encoding all Redis types into RocksDB column families (metadata, subkey, zset-score, pubsub, propagate), with a proxyless centralized cluster that Redis Cluster clients can talk to [kvrocks-rocksdb-resp]. | This is the living, maintained counter-example to ADR-0023 (#65): Redis-on-RocksDB as a whole product, not a feature. Reject for IronCache for the ADR-0023 reasons: the C++ RocksDB toolchain breaks the single static binary (Compatible tenet) and leveled-compaction write amplification plus compaction stalls hurt SSD endurance and tail latency (Efficient tenet). But CITE it: #65 must reject the option that Kvrocks proves is real, on the binary-shape and endurance arguments, not on novelty. Borrow only the column-family separation idea (data vs metadata vs expires), already noted from KeyDB FLASH. | +| memtx + vinyl: one in-memory engine and one on-disk LSM under a fiber scheduler | Tarantool | **adapt** | Two storage engines selectable per space: memtx (in-RAM) and vinyl (on-disk LSM). Vinyl removes the locks/IPC that general LSMs like RocksDB pay by exploiting that all transactions run in a single dedicated thread [tarantool-vinyl-lsm]; the runtime uses cooperative fibers, and a transaction commit yields to write the WAL. | Two datapoints for IronCache. First, the single-owner-thread-removes-locks insight directly supports the shared-nothing thread-per-core thesis (ADR-0002): a per-shard LSM owned by one core can be cheaper than a globally-shared, lock-mediated one, relevant if the #65 lean-Rust-LSM fallback is ever built. Second, fibers-as-cooperative-tasks is the same lane as IronCache's async runtime (#25); adapt the per-engine-per-space selectability into IronCache's per-keyspace tiering. Reject vinyl wholesale as a primary engine (LSM, per ADR-0023). | +| Partitioned in-memory data grid with backups + affinity colocation | Hazelcast / Apache Ignite / Oracle Coherence | **adapt** | Keyspace split into a fixed number of partitions distributed across nodes with N configurable backup copies; an affinity function colocates related keys on the same partition/node to keep multi-key ops and compute local [ignite-data-grid-near-cache]. | The data-grid partition+backup+affinity model is conceptually IronCache's slot map (#71) plus replication (#76) plus hash-tag colocation (#70). Adapt: IronCache already commits to Redis-Cluster-compatible 16384 hash slots (ADR-0025) over a clean partition count, so borrow the affinity-colocation idea (= hash tags) and the configurable-backups idea (= replica factor), but keep the Redis wire contract rather than the grid's bespoke client API. | +| Client-side near-cache + read-through / write-behind to a backing store | Hazelcast / Ignite / Coherence | **reject (near-cache)** / **adapt (read-through)** | A near-cache is a second, local cache on the client in front of the distributed cache for read-heavy keys; read-through loads a miss from a backing store on demand, write-behind asynchronously flushes writes to it [ignite-data-grid-near-cache]. | Reject the near-cache as an IronCache feature: it pushes a coherence/invalidation problem onto every client and is the classic stale-read footgun; IronCache is the cache, not a tier behind another cache. Read-through/write-behind to a backing store is the cache-aside pattern and is an APPLICATION concern, explicitly outside IronCache's contract (it is a Redis-compatible cache, not an ORM). Note it only to draw the boundary. | +| Lock-free concurrent hash trie index (`mtchm`) | Skytable | **reject** | An in-memory lock-free concurrent hash-trie map (Bagwell / Ctrie lineage, crossbeam-epoch reclamation, tagged atomic pointers) used as the primary index; the authors note it uses full-sized nodes for performance, carries significant memory overhead, and explicitly "do NOT recommend its use as a daily data structure" [skytable-mtchm-index]. | A direct calibration point for #35 (the per-shard index). Reject the hash-trie geometry: IronCache's shared-nothing model (one core owns a shard) means the index does NOT need lock-free concurrency at all; single-owner per-shard open-addressing (#35) avoids the trie's pointer-chasing and the memory overhead its own authors flag. Borrow only the crossbeam-epoch reclamation idea where cross-shard structures are unavoidable (already the plan via ADR-0004). Skytable is the cautionary case for paying for concurrency the architecture removes. | +| Redis re-implemented on SQLite/Postgres, RESP-compatible, larger-than-RAM, ACID | Redka | **reject** | Re-implements core Redis on top of SQLite (or Postgres): RESP wire protocol, the five core types, data need not fit in RAM, ACID transactions via the SQL engine, SQL views for introspection; reported several times slower than Redis (up to ~100K ops/sec on a laptop) [redka-redis-sqlite]. | The "wrap a mature embedded engine" end of the spectrum, and the argument for why IronCache does NOT do that. Reject: leaning on SQLite buys ACID and larger-than-RAM cheaply but pays a multiple of Redis latency, which violates IronCache's max-throughput-per-core thesis outright. Borrow exactly one idea: SQL/queryable VIEWS over the keyspace for introspection are a genuinely nice operability touch; note for the observability surface (#86) as an optional read-only export, never on the hot path. | + +## Implications for IronCache + +- Cite **Kvrocks** in ADR-0023 / #65 as the living Redis-on-RocksDB counter-example, so the RocksDB rejection rests on the single-static-binary (no C++ toolchain) and SSD-endurance arguments, not on the false claim that nobody ships Redis-on-RocksDB. Kvrocks is maintained and Apache-licensed [kvrocks-rocksdb-resp], unlike the dormant KeyDB FLASH precedent #6 currently leans on. +- Cite **Aerospike Hybrid Memory** in #66 as the canonical index-in-RAM/values-on-flash precedent #6 never had, and use its 64-byte index entry as the budget IronCache's kvobj (#111) + [page,offset,version] pointer (#66) must beat [aerospike-hybrid-memory-index]. +- Record Aerospike **all-flash** as the #66 capacity escape hatch (index on flash when DRAM will not hold it), an opt-in mode gated behind the value store landing, not a default. +- Add Aerospike **XDR** to #79's active-active reading as the production datapoint that async geo active-active is operable, while still rejecting blanket LWW for per-type CRDT/HLC. +- Use **Tarantool vinyl**'s single-owner-thread-removes-locks claim [tarantool-vinyl-lsm] as supporting evidence for ADR-0002 (shared-nothing thread-per-core) and for the #65 lean-Rust-LSM fallback shape, should it ever be built. +- Treat the **data-grid** partition+backup+affinity model (Ignite/Hazelcast/Coherence) [ignite-data-grid-near-cache] as conceptual prior art for the slot map (#71) + replicas (#76) + hash-tag colocation (#70), but keep the Redis wire contract over a bespoke grid client API; reject the client near-cache as a stale-read footgun and keep read-through/write-behind an application concern. +- Use **Skytable mtchm** [skytable-mtchm-index] as the cautionary case for #35: IronCache's single-owner-per-shard index does not need lock-free concurrency, so it should avoid the hash-trie's overhead its own authors warn against, and reserve crossbeam-epoch (ADR-0004) only for unavoidable cross-shard structures. +- Use **Redka** [redka-redis-sqlite] as the argument for building IronCache's own engine rather than wrapping SQLite (the latency multiple is disqualifying), while flagging its SQL-views-over-the-keyspace idea as an optional, off-hot-path introspection feature for #86. +- None of these six changes a frozen decision. They make #65 honest, give #66 and #79 their missing precedents, and supply #64/#35 three calibration datapoints. Track all six as historical/secondary prior art, not as head-to-head benchmark baselines (Valkey remains the conformance oracle and benchmark baseline per #6). + +## Key claims + +Load-bearing claims are marked. The `id` cross-references `claims.yaml`. + +| id | system | version | value | conf. | check | +| --- | --- | --- | --- | --- | --- | +| `aerospike-hybrid-memory-index` * | Aerospike | Database 7 (2024); EE for all-flash | Hybrid Memory: primary index in DRAM (64 B/record entry), data on SSD read direct from flash; all-flash mode (EE) puts the index on flash too; storage model selectable per namespace | high | verified | +| `kvrocks-rocksdb-resp` * | Apache Kvrocks | 2.15.0 (2026-02-27), Apache-2.0 | Distributed KV on RocksDB, Redis-protocol (RESP2/3) compatible; types encoded into RocksDB column families; proxyless Redis-Cluster-compatible access | high | verified | +| `tarantool-vinyl-lsm` * | Tarantool | docs (latest, read 2026-06-13) | Two engines: memtx (in-RAM) + vinyl (on-disk LSM); vinyl removes locks/IPC that RocksDB pays by running all txns in one dedicated thread; fiber cooperative scheduling, WAL on commit-yield | medium | verified | +| `ignite-data-grid-near-cache` * | Apache Ignite / Hazelcast / Coherence | Ignite docs (latest, read 2026-06-13) | Partitioned in-memory grid: fixed partitions, N configurable backups, affinity colocation by key; client-side near-cache + read-through/write-behind to a backing store | medium | verified | +| `skytable-mtchm-index` * | Skytable | 0.8.4 (2024-08-07), AGPL-3.0 | In-memory primary index is a lock-free concurrent hash trie (`mtchm`, Bagwell/Ctrie lineage, crossbeam-epoch); authors note heavy memory overhead, "do NOT recommend its use as a daily data structure" | medium | verified | +| `redka-redis-sqlite` * | Redka | nalgeon/redka, near-1.0 (read 2026-06-13) | Redis re-implemented on SQLite/Postgres; RESP wire + 5 core types; data need not fit in RAM; ACID via SQL txns; SQL views for introspection; several times slower than Redis (~100K ops/sec on a laptop) | medium | verified | + +`*` = load-bearing. `check`: result of the independent adversarial re-verification. + +Claims reused from other dimensions (already pinned): `keydb-flash-rocksdb`, `keydb-flash-config`, `keydb-flash-beta`, `keydb-flash-190gb-benchmark` (keydb); `extstore-defaults` (memcached); `redis-crdb-datatype-mapping`, `keydb-active-replica-lww`, `keydb-multimaster-lww-undefined` (redis-replication-cluster / keydb, via #79). + +## Research papers and primary sources + +- **Aerospike: Architecture of a Real-Time Operational DBMS** (VLDB 2016, Srinivasan et al.). The hybrid-memory design: index in DRAM, data on flash, direct-from-device reads, and the cluster/clustering model. [source](https://www.vldb.org/pvldb/vol9/p1389-srinivasan.pdf) Relevance: the canonical hybrid-memory precedent #66 was missing; index-in-RAM/values-on-flash and the all-flash escape hatch. +- **Aerospike Hybrid Memory / Flexible storage docs** (Database 7, read 2026-06-13). Per-namespace storage models (all-in-memory / hybrid / all-flash), 64-byte index entry, in-memory compression added in 7.0. [source](https://aerospike.com/docs/database/learn/architecture/hybrid-storage) Relevance: pins `aerospike-hybrid-memory-index`. +- **How we use RocksDB in Kvrocks** (Apache Kvrocks blog/wiki, read 2026-06-13). Column-family layout (metadata / subkey / zset-score / pubsub / propagate), key encoding, and the RocksDB-as-RESP design. [source](https://kvrocks.apache.org/blog/how-we-use-rocksdb-in-kvrocks/) Relevance: the living counter-example to ADR-0023; pins `kvrocks-rocksdb-resp`. +- **The Bw-Tree / log-structured and LSM storage lineage** and **WiscKey: Separating Keys from Values** (FAST 2016, Lu et al.). Key/value separation to cut LSM write amplification on SSD. [source](https://www.usenix.org/conference/fast16/technical-sessions/presentation/lu) Relevance: frames why Kvrocks-on-RocksDB (leveled LSM) is rejected for IronCache's endurance goal vs a keys-in-RAM/values-on-flash tier. +- **Tarantool vinyl storage engine docs** (read 2026-06-13). Vinyl as an LSM that drops locks/IPC by running transactions in a single dedicated thread; fibers and WAL-on-commit. [source](https://www.tarantool.io/en/doc/latest/platform/engines/vinyl/) Relevance: supports the single-owner-thread thesis (ADR-0002); pins `tarantool-vinyl-lsm`. +- **Concurrent Tries with Efficient Non-Blocking Snapshots** (Prokopec et al., PPoPP 2012) and **Phil Bagwell, Ideal Hash Trees**. The Ctrie/HAMT lineage Skytable's `mtchm` borrows from. [source](https://aleksandar-prokopec.com/resources/docs/ctries-snapshot.pdf) Relevance: pins the design lineage and the memory-overhead caveat behind `skytable-mtchm-index`. +- **Apache Ignite Data Partitioning / Affinity Colocation docs** (read 2026-06-13). Partition count, backups, affinity functions, near-cache, read-through/write-behind. [source](https://ignite.apache.org/docs/latest/data-modeling/data-partitioning) Relevance: pins `ignite-data-grid-near-cache`; conceptual prior art for #70/#71/#76. + +## Open questions + +- Aerospike pins its in-DRAM index entry at 64 bytes per record; what is IronCache's true per-key RAM cost in the #66 tiered mode (kvobj header #111 + [page,offset,version] pointer), and does it beat 64 bytes at the same durability? +- Kvrocks is the maintained Redis-on-RocksDB product (2.15.0, 2026-02): does it publish independent write-amplification / compaction-stall numbers on cache-grade churn that quantify ADR-0023's endurance argument, or must IronCache measure them itself? +- Tarantool vinyl claims single-dedicated-thread execution lets it drop the locks RocksDB pays; does that advantage survive at IronCache's per-core sharding granularity, and does it change the #65 lean-Rust-LSM-fallback calculus at all? +- Skytable's authors warn mtchm carries heavy memory overhead and is not for daily use; is there any cross-shard IronCache structure (the slot map? cluster bus?) where a Ctrie-style lock-free trie is actually warranted, or is single-owner-per-shard + epoch GC always sufficient? +- Redka is several-times-slower than Redis on SQLite; is there ANY IronCache surface (cold archival? a queryable export?) where wrapping a mature embedded engine is acceptable because the hot path does not touch it? +- Data grids expose a client-side near-cache; does any IronCache client-library story (RESP client) risk reintroducing the near-cache stale-read problem by default, and should the docs explicitly warn against client-side caching layered on IronCache? + +## Proposed issues (seeds for the tracker) + +- **[task, M1]** Task: add a Kvrocks citation to ADR-0023 / #65. Insert Kvrocks as the maintained, Apache-licensed Redis-on-RocksDB counter-example [kvrocks-rocksdb-resp] so the RocksDB rejection rests on single-static-binary + endurance, not novelty. +- **[task, M2]** Task: add an Aerospike Hybrid Memory citation to #66. Cite the index-in-RAM/values-on-flash precedent and the 64-byte index entry [aerospike-hybrid-memory-index] as the per-key budget to beat. +- **[design, M2]** Design: an all-flash (index-on-flash) capacity tier for #66. Specify an opt-in mode that moves the index to flash when DRAM cannot hold it (Aerospike all-flash), gated behind the value store landing; quantify the extra-flash-read-per-op cost. +- **[research, M2]** Research: add Aerospike XDR to #79's active-active reading. Use XDR as the production datapoint for async geo active-active while still rejecting blanket LWW for per-type CRDT/HLC. +- **[research, M1]** Research: Tarantool vinyl single-owner-thread LSM as evidence for ADR-0002 and the #65 fallback. Capture the locks-removed-by-single-thread claim [tarantool-vinyl-lsm] as supporting evidence for shared-nothing thread-per-core and the lean-Rust-LSM fallback shape. +- **[non-goal, M1]** Non-goal: client-side near-cache and read-through/write-behind. Declare the data-grid near-cache (a cache in front of the cache) and ORM-style read-through/write-behind explicitly out of IronCache's contract [ignite-data-grid-near-cache]; warn against layering client caches on IronCache. +- **[research, M1]** Research: pin the index-overhead lesson from Skytable mtchm against #35. Use Skytable's own "not recommended as a daily data structure" caveat [skytable-mtchm-index] to justify single-owner-per-shard open addressing over a lock-free trie. +- **[research, M0]** Research: pin Redka as the why-not-wrap-SQLite argument. Record the SQLite-backed Redis re-implementation and its latency multiple [redka-redis-sqlite] as the case for IronCache's own engine; flag SQL-views-over-keyspace as an optional #86 introspection idea. +- **[task, M1]** Task: register `second-tier-kv` as a research dimension. Add the doc to docs/research/README.md and a docs/research/corpus.json entry (schema: dimension, summary, prior_art_claims, mechanisms, ironcache_implications, research_papers, open_questions, proposed_issues, verify_notes) so the six new claims are discoverable alongside the #6 set.