Skip to content

Updates#1

Open
autonull wants to merge 1291 commits into
deepstupid:multilevel_researchfrom
ben-manes:master
Open

Updates#1
autonull wants to merge 1291 commits into
deepstupid:multilevel_researchfrom
ben-manes:master

Conversation

@autonull
Copy link
Copy Markdown

No description provided.

@ben-manes ben-manes force-pushed the master branch 2 times, most recently from 940dca6 to 2f271e9 Compare July 23, 2022 22:15
@ben-manes ben-manes force-pushed the master branch 2 times, most recently from ac8e049 to 08b24cd Compare August 27, 2022 03:02
@ben-manes ben-manes force-pushed the master branch 17 times, most recently from a337c19 to 48d5246 Compare September 10, 2022 01:17
@ben-manes ben-manes force-pushed the master branch 2 times, most recently from 53f222c to 00b1503 Compare September 26, 2022 02:50
@ben-manes ben-manes force-pushed the master branch 2 times, most recently from 73f40d3 to 6abd616 Compare October 2, 2022 09:40
@ben-manes ben-manes force-pushed the master branch 2 times, most recently from a7d31f5 to d74756d Compare October 14, 2022 06:02
ben-manes and others added 30 commits May 14, 2026 18:03
`wheel` is a `Sentinel[NW][NQ]` (NW = number of wheels, NQ = queues per
wheel). Per the algorithm and the inline comment, queueIndex must be
reduced modulo NQ, but add() reduced modulo `wheel.length` (== NW).
With the default config (NW=2, NQ=256), inserts collapsed onto the
first NW queues of the chosen wheel, leaving the remaining 254 queues
unreachable by add() and breaking the cost-stamp ordering of the
cost wheel for any workload with varying miss penalties.

Use `wheel[wheelIndex].length` to match the algorithm and the two peer
call sites (`findNextQueue` and `migrate`) that already index by NQ.
Three remaining gaps versus Algorithm 2 of the GD-Wheel paper:

1. evict() never assigned the next-non-empty index back to clockHand[0]
   (== C[1]). The level-1 hand never advanced and the "C[1] wrapped a
   whole round → migration(2)" trigger fired only when wheel[0] became
   completely empty. Net effect: high-cost items in wheel[1+] rarely
   migrated down, and evictions were biased toward whatever queue
   index happened to scan first.

2. migrate() drained at most NQ entries per call because its inner
   loop was bounded by wheel[level].length rather than running to
   queue-empty. Concentrated cost classes left entries stranded in the
   upper wheel until the next full rotation.

3. migrate()'s destination queue index used the source level's hand
   instead of clockHand[level-1] (the destination wheel's hand), and
   reduced modulo the source wheel's length instead of the destination
   wheel's. Observationally equivalent today (all wheels share NQ) but
   wrong per the paper.

With all four corrections (this commit + 7ce1c9b), GDWheel becomes
competitive again on Systor-style workloads with varying miss
penalties — within ~5pp of Gdsf at 1M cache, beating Camp at every
size on the LUNs I checked. Uniform-penalty traces are unchanged.

Bug 1 was identified by an internal audit; bugs 2 and 3 were
identified by walking the paper's Algorithm 2 against the current
code after the audit's add() fix exposed the regression.
LocalAsyncLoadingCache.tryComputeRefresh's discard branch unconditionally
fired a REPLACED notification when a concurrent write invalidated the
refresh. If the refresh's loaded value equaled the racing put's value,
this produced an extra notification that the sync explicit-refresh path
(LocalLoadingCache.refresh) and the implicit refreshIfNeeded path
(BoundedLocalCache.refreshIfNeeded) both correctly suppressed.

Hoist the value-identity check ahead of the success/discard branching so
it covers all paths uniformly.
Differential audit comparing matched code paths that should behave
identically. Spawns one auditor per sibling pair (sync vs async,
storage variants, view consistency, bulk vs single, equivalent-by-
construction, internal paths) and requires a concrete witness for each
divergence finding. Caught the explicit refresh REPLACED divergence
fixed in the prior commit.
- activate() demotion victim: pick from the LRU end of the active list
  (headActive.prev), matching the Javadoc ("oldest active entry") and
  the kernel's shrink_active_list scan order. Also adopt the kernel's
  PG_referenced second-chance scan: walk from the LRU end, rotating
  referenced entries back to MRU (clearing the bit) and demoting the
  first unreferenced one. This recovers most of the hit rate that a
  pure LRU demotion would otherwise lose on stable hot sets like DS1.
- onActiveHit() drops the moveToHead reorder in favor of just marking
  the entry referenced, matching the kernel's mark_page_accessed which
  never reorders the active list on hits.
- evict() refault snapshot: read currentNonResidentAge() before the
  evictions++ increment so the stored shadow matches the kernel's
  pre-increment snapshot in workingset_eviction.
- refaultDistance(): drop Math.abs; both counters are monotonic so the
  subtraction is always non-negative.
- Constructor: also require maxActive > 0 (zero would corrupt the
  active list on the first activation by demoting the sentinel).
The canonical Song Jiang reference (lirs.c) and Chen Zhong's
replace_lirs_base.cc both unconditionally call addLruListHead after
the miss-class dispatch, which places every newly-referenced page on
stack S — including HIR warmup misses. The simulator's
onHirWarmupMiss only put the node on Q, delaying LIR promotion by
one access for entries that enter during the HIR warmup phase.

With this fix and non-resident-multiplier set to match the C's
MAX_S_LEN, the simulator matches both C references bit-for-bit on
loop, multi1/2/3, 2_pools, cpp, cs, scan, DS1, and S3.
Match Song Jiang's reference C code which treats rapid re-accesses to
the same block as a single logical event ("correlated references" in
2Q paper terms). Author-intended, per private email, though not in the
published papers. ClockProSimple omits the change as its coarser state
machine regresses slightly on the canonical LIRS traces.
evict() unconditionally called recordEviction() before checking whether
the resident set was over capacity, so every warmup miss and every miss
on a never-full cache logged a phantom eviction. Gate the call on the
same condition as the loop so eviction counts align with ClockProPolicy
and LirsPolicy (one record per evict() call that actually removes an
entry).
Adjust() was comparing node.next.expireTime — the second-oldest entry's
expiration — instead of the LRU's own. The MQ paper (Zhou & Philbin,
USENIX '01, Figure 5) specifies c = head of Q[k]; if (c.expireTime <
currentTime) then demote, where "head" is the LRU end. For
single-element queues the buggy check inspected the sentinel's
Long.MAX_VALUE and never fired, so promoted blocks could be retained
indefinitely.
Sample.RESERVOIR's fill-phase guard was sample.size() <= sampleSize, so
it stopped after adding sampleSize+1 elements. The replacement loop then
drew an index in [0, count) and only wrote when index < sampleSize, so
the slot at index sampleSize was filled once and survived every later
round. The returned list had sampleSize+1 entries with one biased slot.

Empirically the bug matched, and sometimes outperformed, guess/shuffle
because it sampled one extra entry and kept a stable reference element.
With the fix, RESERVOIR's results land in line with guess and shuffle on
the canonical traces, as expected for standard reservoir sampling.
CART (Bansal & Modha, FAST '04) Figure 3 line 30 promotes a marked
short-term page at the head of T1 when |T1| >= min(p+1, |B1|). The Java
guard used Math.max instead, so promotion fired only when |T1| was
larger than both p+1 and |B1|. The inline pseudocode comment quotes the
paper's min, confirming the intent. Net effect on the canonical traces
is slightly more hits overall (LIRS and S3 improve; DS1 has mixed
results), reflecting the algorithm's actual published behavior.
CAMP (Ghandeharizadeh et al. 2014/2024 §2) specifies LRU tie-breaking
among equal-priority lists and refreshing a list's priority to L + c_i
whenever an item is added to or accessed in that list.

Sentinel.compareTo had the lastRequest operands reversed, so the more
recently touched sentinel sorted before the staler one, picking the MRU
bucket among ties instead of the LRU bucket. onMiss only refreshed the
sentinel priority on the first-time-creation path (inside the
computeIfAbsent lambda); for subsequent misses to an existing bucket,
the just-computed L + c_i was discarded and the sentinel score lagged
the global L.

LIRS canonical traces (uniform weight, single bucket) are unaffected.
Multi-bucket workloads see small differences; both fixes match the
algorithm as published. Closes audit findings iss_9db75116 and
iss_8fd4f605.
SegmentedLruPolicy: evict() picked headProtected.next when
maxProtected==0, but the protected list is always empty in that
config — the sentinel itself was being mutated and spliced into the
probation list. Per Karedla et al. 1994, eviction is always from the
LRU end of probation; the protected segment size is chosen so that
probation cannot be empty when the cache is full. Drop the special
case and gate onHit's promote/demote so a maxProtected==0 config
degrades to pure LRU on probation.

SievePolicy: onMiss's eviction guard used `>=` instead of `>`, so a
first access whose weight equals maximumSize spun forever — pre-check
passes (strict `>`), eviction guard fires, but evict() on an empty
list is a no-op. The sibling onHit and S3Fifo/Camp policies all use
strict `>`. Closes audit findings iss_a7afd6ce and iss_4a94848d.
CACHEUS (FAST '21, Rodriguez et al.) is a regret-minimisation cache that
runs two experts in parallel — SR-LRU for scan resistance and CR-LFU for
churn resistance — and learns a weighting between them online. A
stochastic-gradient hill climber over the learning rate adapts to
workload changes without manual tuning.

Verified bit-for-bit against the reference at sylab/cacheus with the
same seed (24/24 hit-count matches across the bundled LIRS traces at
sizes 500/1000/2000), using java.util.Random on both sides via the
java-random PyPI package.
The four gradient optimizers (Stochastic SGD, Adam, AmsGrad, Nadam) were
each computing the "gradient" as the change in miss rate between samples,
without dividing by the change in window size. That gave a signal with
the right magnitude but missing direction in w-space: the sign depended
only on whether the miss rate rose or fell, never on whether the last
window move caused that change. As a result, momentum-based variants
could only oscillate, not actually pursue a gradient.

Track the realised window size between sample resets in AbstractClimber
and expose a finite-difference `missRateGradient(hitRate) = ΔL/Δw`
helper. Use it from all four optimizers, flip the sign on the returned
step so we descend (not ascend) the loss, and bootstrap with one
positive probe so a Δw exists before the first gradient call.

Also adds a new `CorrelationClimber` (HillClimberType.CORRELATION,
default-enabled alongside `simple` and `indicator`) inspired by Cacheus'
learning-rate hill climber: uses `sign(Δw · ΔHR)` as a direction signal
and `scale · |Δw|` as step magnitude, so it self-tunes — sustained
correlation accelerates it, |Δw| shrinks as it converges, and a 5-window
degradation counter resets to the initial probe to escape local pits.

Behaviour shift on the bundled LIRS traces: clear wins on scan-heavy
workloads (scan, zigzag, sprite where the broken sign was pointing the
optimizer the wrong way) and on the cold-cache loop@500 case. Small
regressions on mid-trace sizes are expected — `percent-pivot = 0.005`
in reference.conf was implicitly tuned for the old `ΔL`-magnitude
signal, and per-step displacement is now smaller. A retune of that
hyperparameter is a separate follow-up — a sweep over {0.005, 0.05, 0.5,
1.0} showed 0.005 remains the safest default (larger values cliff
catastrophically on some traces, e.g., loop@1000 amsgrad collapses from
94.31% to 19.62% at pivot=0.05).
Small caches (≤512 entries) had three convergence issues: the climber's
initial shrink was a no-op when the window was already tiny, locking it
into a direction that never flipped; the step decayed too fast for HR
shifts to trigger restarts on workload transitions; and very small caches
had initial steps too small to move the integer window.

- Start by growing the window at small caches (positive initial step)
- Slow the step decay rate from 0.98 to 0.995 per sample
- Floor the initial step at 2 entries
- Floor the frequency sketch size
- TinyCache: remove dead-code modulus and try/catch debugging scaffold
  from chain-walk loops in TinyCache, TinyCacheSketch, and
  TinyCacheWithGhostCache (indices are always in-bounds by construction)
- TinyCacheAdapter: floor nrSets to 1 to prevent zero-length arrays
  and ArithmeticException for maximumSize < 7
- SimpleClimber/SimulatedAnnealingClimber: drop vestigial 100x
  multipliers on tolerance thresholds and SA acceptance exponent to
  match the [0,1] hit-rate scale from AbstractClimber
- FeedbackTinyLfuPolicy: only consume the per-period adaptation slot
  when gain actually changes, so boundary no-ops don't block valid
  adjustments
- FeedbackWindowTinyLfuPolicy: remove redundant admitter.record() in
  onProtectedHit (was double-counting frequency for protected hits);
  same adjusted-slot fix as FeedbackTinyLfu; guard the increase loop
  against removing the probation sentinel when the list has only one
  node
- SystorTraceReader: divide byte offset by BLOCK_SIZE to produce
  block-level keys consistent with the block-count sequence
The Simulate Java entry point already declares @option(names = "--outputDir"),
but the Gradle wrapper task hardcoded reportDir to build/reports/simulate and
gave callers no way to override it from the CLI.

Add a Property<String> outputDir with @option(option = "outputDir"), defaulted
via convention to the existing reportDir path so default behavior is preserved.
The argument provider now reads from outputDir, letting callers point Simulate
at any directory:

  ./gradlew simulator:simulate --outputDir /tmp/reports --maximumSize 1024
When a user-supplied Expiry or Weigher throws during the refresh
completion handler's compute call, the exception was silently swallowed
because CompletableFuture.whenComplete captures it into a discarded
future. This caused loadFailureCount to be permanently under-counted
and no warning to be logged. Both sibling implementations
(BoundedLocalCache.refreshIfNeeded and LocalAsyncLoadingCache
.tryComputeRefresh) already had the try/catch; this aligns
LocalLoadingCache.refresh with them.
Found by a full-repository code review.

Simulator:
- K5cloudTraceReader: round the byte length up to whole blocks (IntMath.divide,
  RoundingMode.UP) like the sibling block readers, so sub-512-byte reads are no
  longer dropped.
- MiniSimClimber: floor the sampling rate at 1 for maximum sizes below 100
  (previously a divide-by-zero at construction).
- Guard degenerate sizes that crashed or returned impossible hit rates:
  ClockProSimple (size > 1); Frd, HillClimberFrd, IndicatorFrd (non-empty main
  region); ClockPro, ClockProPlus (resident-cold floor >= 1).
- ExpiringMapPolicy: disable ExpiringMap's default 60s wall-clock TTL so the
  policy is size-bounded and independent of run duration.
- HillClimberFrdPolicy: drop the dead 'tolerance' field (constant 0).
- Rewrite SimulatedAnnealingClimber as a faithful Metropolis search: a Gaussian
  proposal scaled by temperature, accept-worse with probability exp(delta/T)
  via a uniform draw, geometric cooling, and a random restart that re-anneals
  from the current position. The old code tested acceptance against a Gaussian
  draw and never proposed neighbors, so it froze at the starting window. Retune
  its defaults and remove the now-unused cool-down-tolerance/restart-tolerance.

Adapters:
- CaffeinatedGuavaLoadingCache.getAll: copy the keys once so a single-pass
  Iterable is not iterated twice.
- jcache EntryProcessorEntry.toString: read the value field instead of the
  stateful getValue(), which had side effects (read tracking, loader trigger).
The async synchronous view flags query-style no-ops — putIfAbsent on a present
key, a non-matching conditional remove/replace, and a same-instance compute —
with RemapHints.preserveRefresh so they leave an in-flight refresh intact (they
aren't real mutations). BoundedLocalCache.remap honored the hint, but
UnboundedLocalCache.compute dropped it and its remap always discarded, so those
no-ops cancelled a racing refresh(key) on unbounded async caches while the
bounded sibling preserved it.

Thread the hint through UnboundedLocalCache.compute -> remap and skip the
discard on a same-instance return when preserveRefresh is set, mirroring
BoundedLocalCache.remap. A real mutation still discards the refresh.

Found by /audit-sibling-divergence.
…().size()

The Fray tests asserted `estimatedSize() == asMap().size()` as their consistency
oracle, but both sides read the same backing ConcurrentHashMap
(`data.mappingCount()` vs `data.size()`), so the assertion is a tautology — it
cannot detect a node stranded or duplicated in an eviction deque, which is exactly
the class of bug these interleaving tests exist to catch. The frayTest source set
also has no CacheValidationInterceptor, so LocalCacheSubject.isValid() (the deep
deque<->data / telescoping-weight / node-lifecycle oracle) never ran there at all.

Replace every tautological assertion with `assertThat(cache).isValid()` (the
frayTest suite already depends on testFixtures; AsyncCacheSubject for async caches),
keeping the specific count/weight/value assertions alongside. Update the documented
Fray template in testing.md so the weak pattern isn't reintroduced.

Found by an audit of the test suite's own assertion quality.
CacheTest.getIfPresent_present and its AsyncCacheTest twin asserted only
isNotNull() on the returned value, so getIfPresent returning a wrong-but-non-null
value (e.g. a bad afterRead update or a wrong-node bin traversal) would pass. The
sibling tests don't backstop it: get_present routes through computeIfAbsent and
getIfPresentQuietly_present exercises a different method, and a read doesn't trip
the per-test isValid() harness. Pin the expected value (succeedsWith for the async
future), keeping the three explicit calls so hits(3) still holds on SINGLETON.

Found by an audit of the test suite's own assertion quality.
Conditional remove(k, v) and replace(k, old, new) match the expected value by
Objects.equals for strong values but by identity (==) for weak/soft. The
*_byIdentity tests cover the weak/soft side, but no test covered the strong side
with a distinct-but-equal instance: Int interning makes the populated values
identity-shared, so every existing conditional test passes the same instance and
the == fast path short-circuits before equals. A regression consolidating the
matcher to == would break the ConcurrentMap contract for strong values yet pass
the whole suite.

Add a strong/strong test that probes with new Int(value) (distinct but equal),
asserting the match succeeds — exercising the equals path the interned values hid.

Found by an audit of the test suite's own coverage.
JCache: two write paths ignored a zero creation expiry. Per
ExpiryPolicy.getExpiryForCreation (JSR-107 1.1.1), a Duration.ZERO entry 'will
not be added to the Cache', but the EntryProcessor create/load path
(CacheProxy.postProcess) recorded a put and published a CREATED event, and
putIfAbsent (putIfAbsentNoAwait set its absent flag before the zero-expiry
guard) recorded a put and returned true -- both for an entry that is never
added, unlike the guarded put()/getAndPut(). Mirror the put guard in both: skip
the put stat and CREATED event, publish EXPIRED, store nothing; putIfAbsent now
returns false. A new write-path parity test
(JCacheCreationExpiryTest.writeOp_absent_zeroCreationExpiry over
put/putAll/putIfAbsent/getAndPut/invoke) asserts every path agrees. Verified
against the RI, Ehcache 2/3, Infinispan, Hazelcast, Coherence, and cache2k: the
entry is added in none; no put is counted in 6/7 (incl. the RI); putIfAbsent
returns false in the RI.

Guava: keep the facade a drop-in for Guava. Match its asMap null-query
tolerance: get/containsKey/containsValue/remove/remove(k,v)/replace(k,null,v)
and keySet()/values().contains/containsAll now tolerate null (return
null/false) instead of throwing, and getAllPresent drops a null element. The
testlib suite is raised to ALLOWS_ANY_NULL_QUERIES and is now also run against
real Guava (GuavaMapTests) to guard against divergence; both pass. A
method-by-method cross-check against Guava 33.6.0's LocalCache source also
fixed two behavioral divergences the testlib does not assert:
invalidateAll(Iterable) threw NPE on a null element after a partial removal (now
filters nulls like getAllPresent), and entrySet().addAll(emptyCollection) threw
UnsupportedOperationException where Guava returns false.

Simulator: GLCacheTraceReader read the little-endian libCacheSim oracleGeneral
record as big-endian (mirror the OracleGeneralTraceReader sibling); Adam/Nadam
advanced the bias-correction timestep on the no-op first probe (count moment
updates instead); Synthetic.repeating produced items+1 distinct keys.

Audit: extend /audit-sibling-divergence to the user-facing adapters (Group G --
jcache write-path family, jcache-vs-RI/ecosystem oracle, guava facade-vs-Caffeine
and facade-vs-real-Guava), since the skill was silently core-scoped while these
bugs lived in the adapters. The new parity test is the executable form of that
group and is what caught the putIfAbsent divergence. Also tighten the auditor's
escalation contract -- don't escalate single-writer-under-lock paths as races
(e.g. FrequencySketch is single-writer under evictionLock, so a "sketch resize
under contention" escalation was a false positive), and treat a skeleton as a
TODO, not a resolution (only a shipped fix or a runnable test closes it). Run the
cache fuzzer (CacheFuzzer) against the deep isValid() oracle instead of a shallow
hand-rolled structural check, mirroring the same upgrade already made to the Fray
suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants