Conversation
940dca6 to
2f271e9
Compare
ac8e049 to
08b24cd
Compare
a337c19 to
48d5246
Compare
53f222c to
00b1503
Compare
73f40d3 to
6abd616
Compare
a7d31f5 to
d74756d
Compare
`wheel` is a `Sentinel[NW][NQ]` (NW = number of wheels, NQ = queues per wheel). Per the algorithm and the inline comment, queueIndex must be reduced modulo NQ, but add() reduced modulo `wheel.length` (== NW). With the default config (NW=2, NQ=256), inserts collapsed onto the first NW queues of the chosen wheel, leaving the remaining 254 queues unreachable by add() and breaking the cost-stamp ordering of the cost wheel for any workload with varying miss penalties. Use `wheel[wheelIndex].length` to match the algorithm and the two peer call sites (`findNextQueue` and `migrate`) that already index by NQ.
Three remaining gaps versus Algorithm 2 of the GD-Wheel paper: 1. evict() never assigned the next-non-empty index back to clockHand[0] (== C[1]). The level-1 hand never advanced and the "C[1] wrapped a whole round → migration(2)" trigger fired only when wheel[0] became completely empty. Net effect: high-cost items in wheel[1+] rarely migrated down, and evictions were biased toward whatever queue index happened to scan first. 2. migrate() drained at most NQ entries per call because its inner loop was bounded by wheel[level].length rather than running to queue-empty. Concentrated cost classes left entries stranded in the upper wheel until the next full rotation. 3. migrate()'s destination queue index used the source level's hand instead of clockHand[level-1] (the destination wheel's hand), and reduced modulo the source wheel's length instead of the destination wheel's. Observationally equivalent today (all wheels share NQ) but wrong per the paper. With all four corrections (this commit + 7ce1c9b), GDWheel becomes competitive again on Systor-style workloads with varying miss penalties — within ~5pp of Gdsf at 1M cache, beating Camp at every size on the LUNs I checked. Uniform-penalty traces are unchanged. Bug 1 was identified by an internal audit; bugs 2 and 3 were identified by walking the paper's Algorithm 2 against the current code after the audit's add() fix exposed the regression.
LocalAsyncLoadingCache.tryComputeRefresh's discard branch unconditionally fired a REPLACED notification when a concurrent write invalidated the refresh. If the refresh's loaded value equaled the racing put's value, this produced an extra notification that the sync explicit-refresh path (LocalLoadingCache.refresh) and the implicit refreshIfNeeded path (BoundedLocalCache.refreshIfNeeded) both correctly suppressed. Hoist the value-identity check ahead of the success/discard branching so it covers all paths uniformly.
Differential audit comparing matched code paths that should behave identically. Spawns one auditor per sibling pair (sync vs async, storage variants, view consistency, bulk vs single, equivalent-by- construction, internal paths) and requires a concrete witness for each divergence finding. Caught the explicit refresh REPLACED divergence fixed in the prior commit.
- activate() demotion victim: pick from the LRU end of the active list
(headActive.prev), matching the Javadoc ("oldest active entry") and
the kernel's shrink_active_list scan order. Also adopt the kernel's
PG_referenced second-chance scan: walk from the LRU end, rotating
referenced entries back to MRU (clearing the bit) and demoting the
first unreferenced one. This recovers most of the hit rate that a
pure LRU demotion would otherwise lose on stable hot sets like DS1.
- onActiveHit() drops the moveToHead reorder in favor of just marking
the entry referenced, matching the kernel's mark_page_accessed which
never reorders the active list on hits.
- evict() refault snapshot: read currentNonResidentAge() before the
evictions++ increment so the stored shadow matches the kernel's
pre-increment snapshot in workingset_eviction.
- refaultDistance(): drop Math.abs; both counters are monotonic so the
subtraction is always non-negative.
- Constructor: also require maxActive > 0 (zero would corrupt the
active list on the first activation by demoting the sentinel).
The canonical Song Jiang reference (lirs.c) and Chen Zhong's replace_lirs_base.cc both unconditionally call addLruListHead after the miss-class dispatch, which places every newly-referenced page on stack S — including HIR warmup misses. The simulator's onHirWarmupMiss only put the node on Q, delaying LIR promotion by one access for entries that enter during the HIR warmup phase. With this fix and non-resident-multiplier set to match the C's MAX_S_LEN, the simulator matches both C references bit-for-bit on loop, multi1/2/3, 2_pools, cpp, cs, scan, DS1, and S3.
Match Song Jiang's reference C code which treats rapid re-accesses to
the same block as a single logical event ("correlated references" in
2Q paper terms). Author-intended, per private email, though not in the
published papers. ClockProSimple omits the change as its coarser state
machine regresses slightly on the canonical LIRS traces.
evict() unconditionally called recordEviction() before checking whether the resident set was over capacity, so every warmup miss and every miss on a never-full cache logged a phantom eviction. Gate the call on the same condition as the loop so eviction counts align with ClockProPolicy and LirsPolicy (one record per evict() call that actually removes an entry).
Adjust() was comparing node.next.expireTime — the second-oldest entry's expiration — instead of the LRU's own. The MQ paper (Zhou & Philbin, USENIX '01, Figure 5) specifies c = head of Q[k]; if (c.expireTime < currentTime) then demote, where "head" is the LRU end. For single-element queues the buggy check inspected the sentinel's Long.MAX_VALUE and never fired, so promoted blocks could be retained indefinitely.
Sample.RESERVOIR's fill-phase guard was sample.size() <= sampleSize, so it stopped after adding sampleSize+1 elements. The replacement loop then drew an index in [0, count) and only wrote when index < sampleSize, so the slot at index sampleSize was filled once and survived every later round. The returned list had sampleSize+1 entries with one biased slot. Empirically the bug matched, and sometimes outperformed, guess/shuffle because it sampled one extra entry and kept a stable reference element. With the fix, RESERVOIR's results land in line with guess and shuffle on the canonical traces, as expected for standard reservoir sampling.
CART (Bansal & Modha, FAST '04) Figure 3 line 30 promotes a marked short-term page at the head of T1 when |T1| >= min(p+1, |B1|). The Java guard used Math.max instead, so promotion fired only when |T1| was larger than both p+1 and |B1|. The inline pseudocode comment quotes the paper's min, confirming the intent. Net effect on the canonical traces is slightly more hits overall (LIRS and S3 improve; DS1 has mixed results), reflecting the algorithm's actual published behavior.
CAMP (Ghandeharizadeh et al. 2014/2024 §2) specifies LRU tie-breaking among equal-priority lists and refreshing a list's priority to L + c_i whenever an item is added to or accessed in that list. Sentinel.compareTo had the lastRequest operands reversed, so the more recently touched sentinel sorted before the staler one, picking the MRU bucket among ties instead of the LRU bucket. onMiss only refreshed the sentinel priority on the first-time-creation path (inside the computeIfAbsent lambda); for subsequent misses to an existing bucket, the just-computed L + c_i was discarded and the sentinel score lagged the global L. LIRS canonical traces (uniform weight, single bucket) are unaffected. Multi-bucket workloads see small differences; both fixes match the algorithm as published. Closes audit findings iss_9db75116 and iss_8fd4f605.
SegmentedLruPolicy: evict() picked headProtected.next when maxProtected==0, but the protected list is always empty in that config — the sentinel itself was being mutated and spliced into the probation list. Per Karedla et al. 1994, eviction is always from the LRU end of probation; the protected segment size is chosen so that probation cannot be empty when the cache is full. Drop the special case and gate onHit's promote/demote so a maxProtected==0 config degrades to pure LRU on probation. SievePolicy: onMiss's eviction guard used `>=` instead of `>`, so a first access whose weight equals maximumSize spun forever — pre-check passes (strict `>`), eviction guard fires, but evict() on an empty list is a no-op. The sibling onHit and S3Fifo/Camp policies all use strict `>`. Closes audit findings iss_a7afd6ce and iss_4a94848d.
CACHEUS (FAST '21, Rodriguez et al.) is a regret-minimisation cache that runs two experts in parallel — SR-LRU for scan resistance and CR-LFU for churn resistance — and learns a weighting between them online. A stochastic-gradient hill climber over the learning rate adapts to workload changes without manual tuning. Verified bit-for-bit against the reference at sylab/cacheus with the same seed (24/24 hit-count matches across the bundled LIRS traces at sizes 500/1000/2000), using java.util.Random on both sides via the java-random PyPI package.
The four gradient optimizers (Stochastic SGD, Adam, AmsGrad, Nadam) were
each computing the "gradient" as the change in miss rate between samples,
without dividing by the change in window size. That gave a signal with
the right magnitude but missing direction in w-space: the sign depended
only on whether the miss rate rose or fell, never on whether the last
window move caused that change. As a result, momentum-based variants
could only oscillate, not actually pursue a gradient.
Track the realised window size between sample resets in AbstractClimber
and expose a finite-difference `missRateGradient(hitRate) = ΔL/Δw`
helper. Use it from all four optimizers, flip the sign on the returned
step so we descend (not ascend) the loss, and bootstrap with one
positive probe so a Δw exists before the first gradient call.
Also adds a new `CorrelationClimber` (HillClimberType.CORRELATION,
default-enabled alongside `simple` and `indicator`) inspired by Cacheus'
learning-rate hill climber: uses `sign(Δw · ΔHR)` as a direction signal
and `scale · |Δw|` as step magnitude, so it self-tunes — sustained
correlation accelerates it, |Δw| shrinks as it converges, and a 5-window
degradation counter resets to the initial probe to escape local pits.
Behaviour shift on the bundled LIRS traces: clear wins on scan-heavy
workloads (scan, zigzag, sprite where the broken sign was pointing the
optimizer the wrong way) and on the cold-cache loop@500 case. Small
regressions on mid-trace sizes are expected — `percent-pivot = 0.005`
in reference.conf was implicitly tuned for the old `ΔL`-magnitude
signal, and per-step displacement is now smaller. A retune of that
hyperparameter is a separate follow-up — a sweep over {0.005, 0.05, 0.5,
1.0} showed 0.005 remains the safest default (larger values cliff
catastrophically on some traces, e.g., loop@1000 amsgrad collapses from
94.31% to 19.62% at pivot=0.05).
Small caches (≤512 entries) had three convergence issues: the climber's initial shrink was a no-op when the window was already tiny, locking it into a direction that never flipped; the step decayed too fast for HR shifts to trigger restarts on workload transitions; and very small caches had initial steps too small to move the integer window. - Start by growing the window at small caches (positive initial step) - Slow the step decay rate from 0.98 to 0.995 per sample - Floor the initial step at 2 entries - Floor the frequency sketch size
- TinyCache: remove dead-code modulus and try/catch debugging scaffold from chain-walk loops in TinyCache, TinyCacheSketch, and TinyCacheWithGhostCache (indices are always in-bounds by construction) - TinyCacheAdapter: floor nrSets to 1 to prevent zero-length arrays and ArithmeticException for maximumSize < 7 - SimpleClimber/SimulatedAnnealingClimber: drop vestigial 100x multipliers on tolerance thresholds and SA acceptance exponent to match the [0,1] hit-rate scale from AbstractClimber - FeedbackTinyLfuPolicy: only consume the per-period adaptation slot when gain actually changes, so boundary no-ops don't block valid adjustments - FeedbackWindowTinyLfuPolicy: remove redundant admitter.record() in onProtectedHit (was double-counting frequency for protected hits); same adjusted-slot fix as FeedbackTinyLfu; guard the increase loop against removing the probation sentinel when the list has only one node - SystorTraceReader: divide byte offset by BLOCK_SIZE to produce block-level keys consistent with the block-count sequence
The Simulate Java entry point already declares @option(names = "--outputDir"), but the Gradle wrapper task hardcoded reportDir to build/reports/simulate and gave callers no way to override it from the CLI. Add a Property<String> outputDir with @option(option = "outputDir"), defaulted via convention to the existing reportDir path so default behavior is preserved. The argument provider now reads from outputDir, letting callers point Simulate at any directory: ./gradlew simulator:simulate --outputDir /tmp/reports --maximumSize 1024
When a user-supplied Expiry or Weigher throws during the refresh completion handler's compute call, the exception was silently swallowed because CompletableFuture.whenComplete captures it into a discarded future. This caused loadFailureCount to be permanently under-counted and no warning to be logged. Both sibling implementations (BoundedLocalCache.refreshIfNeeded and LocalAsyncLoadingCache .tryComputeRefresh) already had the try/catch; this aligns LocalLoadingCache.refresh with them.
Found by a full-repository code review. Simulator: - K5cloudTraceReader: round the byte length up to whole blocks (IntMath.divide, RoundingMode.UP) like the sibling block readers, so sub-512-byte reads are no longer dropped. - MiniSimClimber: floor the sampling rate at 1 for maximum sizes below 100 (previously a divide-by-zero at construction). - Guard degenerate sizes that crashed or returned impossible hit rates: ClockProSimple (size > 1); Frd, HillClimberFrd, IndicatorFrd (non-empty main region); ClockPro, ClockProPlus (resident-cold floor >= 1). - ExpiringMapPolicy: disable ExpiringMap's default 60s wall-clock TTL so the policy is size-bounded and independent of run duration. - HillClimberFrdPolicy: drop the dead 'tolerance' field (constant 0). - Rewrite SimulatedAnnealingClimber as a faithful Metropolis search: a Gaussian proposal scaled by temperature, accept-worse with probability exp(delta/T) via a uniform draw, geometric cooling, and a random restart that re-anneals from the current position. The old code tested acceptance against a Gaussian draw and never proposed neighbors, so it froze at the starting window. Retune its defaults and remove the now-unused cool-down-tolerance/restart-tolerance. Adapters: - CaffeinatedGuavaLoadingCache.getAll: copy the keys once so a single-pass Iterable is not iterated twice. - jcache EntryProcessorEntry.toString: read the value field instead of the stateful getValue(), which had side effects (read tracking, loader trigger).
The async synchronous view flags query-style no-ops — putIfAbsent on a present key, a non-matching conditional remove/replace, and a same-instance compute — with RemapHints.preserveRefresh so they leave an in-flight refresh intact (they aren't real mutations). BoundedLocalCache.remap honored the hint, but UnboundedLocalCache.compute dropped it and its remap always discarded, so those no-ops cancelled a racing refresh(key) on unbounded async caches while the bounded sibling preserved it. Thread the hint through UnboundedLocalCache.compute -> remap and skip the discard on a same-instance return when preserveRefresh is set, mirroring BoundedLocalCache.remap. A real mutation still discards the refresh. Found by /audit-sibling-divergence.
…().size() The Fray tests asserted `estimatedSize() == asMap().size()` as their consistency oracle, but both sides read the same backing ConcurrentHashMap (`data.mappingCount()` vs `data.size()`), so the assertion is a tautology — it cannot detect a node stranded or duplicated in an eviction deque, which is exactly the class of bug these interleaving tests exist to catch. The frayTest source set also has no CacheValidationInterceptor, so LocalCacheSubject.isValid() (the deep deque<->data / telescoping-weight / node-lifecycle oracle) never ran there at all. Replace every tautological assertion with `assertThat(cache).isValid()` (the frayTest suite already depends on testFixtures; AsyncCacheSubject for async caches), keeping the specific count/weight/value assertions alongside. Update the documented Fray template in testing.md so the weak pattern isn't reintroduced. Found by an audit of the test suite's own assertion quality.
CacheTest.getIfPresent_present and its AsyncCacheTest twin asserted only isNotNull() on the returned value, so getIfPresent returning a wrong-but-non-null value (e.g. a bad afterRead update or a wrong-node bin traversal) would pass. The sibling tests don't backstop it: get_present routes through computeIfAbsent and getIfPresentQuietly_present exercises a different method, and a read doesn't trip the per-test isValid() harness. Pin the expected value (succeedsWith for the async future), keeping the three explicit calls so hits(3) still holds on SINGLETON. Found by an audit of the test suite's own assertion quality.
Conditional remove(k, v) and replace(k, old, new) match the expected value by Objects.equals for strong values but by identity (==) for weak/soft. The *_byIdentity tests cover the weak/soft side, but no test covered the strong side with a distinct-but-equal instance: Int interning makes the populated values identity-shared, so every existing conditional test passes the same instance and the == fast path short-circuits before equals. A regression consolidating the matcher to == would break the ConcurrentMap contract for strong values yet pass the whole suite. Add a strong/strong test that probes with new Int(value) (distinct but equal), asserting the match succeeds — exercising the equals path the interned values hid. Found by an audit of the test suite's own coverage.
JCache: two write paths ignored a zero creation expiry. Per ExpiryPolicy.getExpiryForCreation (JSR-107 1.1.1), a Duration.ZERO entry 'will not be added to the Cache', but the EntryProcessor create/load path (CacheProxy.postProcess) recorded a put and published a CREATED event, and putIfAbsent (putIfAbsentNoAwait set its absent flag before the zero-expiry guard) recorded a put and returned true -- both for an entry that is never added, unlike the guarded put()/getAndPut(). Mirror the put guard in both: skip the put stat and CREATED event, publish EXPIRED, store nothing; putIfAbsent now returns false. A new write-path parity test (JCacheCreationExpiryTest.writeOp_absent_zeroCreationExpiry over put/putAll/putIfAbsent/getAndPut/invoke) asserts every path agrees. Verified against the RI, Ehcache 2/3, Infinispan, Hazelcast, Coherence, and cache2k: the entry is added in none; no put is counted in 6/7 (incl. the RI); putIfAbsent returns false in the RI. Guava: keep the facade a drop-in for Guava. Match its asMap null-query tolerance: get/containsKey/containsValue/remove/remove(k,v)/replace(k,null,v) and keySet()/values().contains/containsAll now tolerate null (return null/false) instead of throwing, and getAllPresent drops a null element. The testlib suite is raised to ALLOWS_ANY_NULL_QUERIES and is now also run against real Guava (GuavaMapTests) to guard against divergence; both pass. A method-by-method cross-check against Guava 33.6.0's LocalCache source also fixed two behavioral divergences the testlib does not assert: invalidateAll(Iterable) threw NPE on a null element after a partial removal (now filters nulls like getAllPresent), and entrySet().addAll(emptyCollection) threw UnsupportedOperationException where Guava returns false. Simulator: GLCacheTraceReader read the little-endian libCacheSim oracleGeneral record as big-endian (mirror the OracleGeneralTraceReader sibling); Adam/Nadam advanced the bias-correction timestep on the no-op first probe (count moment updates instead); Synthetic.repeating produced items+1 distinct keys. Audit: extend /audit-sibling-divergence to the user-facing adapters (Group G -- jcache write-path family, jcache-vs-RI/ecosystem oracle, guava facade-vs-Caffeine and facade-vs-real-Guava), since the skill was silently core-scoped while these bugs lived in the adapters. The new parity test is the executable form of that group and is what caught the putIfAbsent divergence. Also tighten the auditor's escalation contract -- don't escalate single-writer-under-lock paths as races (e.g. FrequencySketch is single-writer under evictionLock, so a "sketch resize under contention" escalation was a false positive), and treat a skeleton as a TODO, not a resolution (only a shipped fix or a runnable test closes it). Run the cache fuzzer (CacheFuzzer) against the deep isValid() oracle instead of a shallow hand-rolled structural check, mirroring the same upgrade already made to the Fray suite.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.