gocached: do per-shard usage stats and incremental cleanup by bradfitz · Pull Request #36 · bradfitz/go-tool-cache

bradfitz · 2026-06-02T03:42:52Z

This fixes performance problems we hit in with gocached's stats + LRU cleanup queries.

Three problems previously:

Startup blocked on usageStats + cleanOldObjects for over 10 min before the
server began accepting requests.
usageStats pinned a reader snapshot ~60% of the time, so SQLite's PASSIVE
autocheckpoint could never advance. The WAL grew to 52 GB. walFindFrame
pegged CPU and go-cacher clients stalled.
cleanCandidates GROUP BY scaled with TOTAL rows; size-pressure cleanup
was effectively unbounded work, and the same snapshot-pinning issue.

Instead, share the keyspace (by default: into 256 shards, by two hex digit prefix) and compute usage stats by shard. And store those stats (+ computation time) in the DB itself, so the server starts up quickly after a restart, without blocking.

Second: change the blob cleanup query to use an index, and switch from LRU by blobs to LRU by actions, which can delete blobs when the refcount drops to zero.

Then add more metrics.

No schema version bump; just one new table that's created if it doesn't already exist.

Updates tailscale/corp#42670

tomhjp

I have to admit I got a bit fatigued by the time I got to the tests, so I haven't done much review there. Overall though I like the shape of the changes to stats tracking + eviction

tomhjp · 2026-06-02T09:50:42Z

+// runShardStatsLoop keeps usage stats fresh by rescanning the
+// oldest-scanned shard whenever it crosses shardStalenessTarget. On a quiet
+// cache the goroutine sleeps for long stretches instead of spinning every
+// few seconds; on a busy cache the loop is effectively rate-limited by
+// shardStatsMinInterval since each scan resets the oldest age to zero.


Is this comment accurate? I don't see any reason why the cache being busy would make us scan more.

yeah, not accurate. rewording.

tomhjp · 2026-06-02T11:57:47Z

+
+	// shardStatsMu guards shardStats and serializes recomputeAggregateLocked.
+	shardStatsMu sync.Mutex
+	shardStats   map[string]*shardSnapshot // keyed by hex prefix; e.g. "00".."ff"


nit: It would be nice to have a tiny type for shard prefix

tomhjp · 2026-06-02T11:59:43Z

+type shardDelta struct {
+	count atomic.Int64
+	bytes atomic.Int64
+}


It seems like an easy win for consistency to make the fields plain ints and then atomically load a whole shardDelta value. Otherwise the values can drift arbitrarily far apart from each other over time.

I just put a little mutex around these and made them plain ints

tomhjp · 2026-06-02T12:03:36Z

+	if preCount != 0 {
+		srv.shardDeltas[idx].count.Add(-preCount)
+	}
+	if preBytes != 0 {
+		srv.shardDeltas[idx].bytes.Add(-preBytes)
+	}


Wouldn't it be more correct to decrement by the value in stats? This seems approximately correct but could drift over time.

done. changed to decrement.

tomhjp · 2026-06-02T12:12:34Z

+			// move on. No delta, no Blob row, no file.
+			continue
+		}
+		// Each Action contributes (1, storedSize) to the Blobs⋈Actions


Oh is the symbol for inner join, or left join? Let's be more explicit to prevent confusion

The usageStats() LEFT JOIN every 5 minutes didn't scale. At 272M Actions it took ~7 minutes per call, pinning a reader snapshot ~60% of the time. The WAL grew to 52 GB because SQLite's PASSIVE autocheckpoint could never advance past the snapshot; walFindFrame pegged CPU and go-cacher clients stalled. Startup also blocked on usageStats + cleanOldObjects for 10+ minutes. cleanCandidates was the same shape: GROUP BY scaling with total rows. This redoes the usage stats + LRU cleanup pipeline. Shard the histogram, persist it, refresh in the background: * New BlobShardStats(Prefix, ScannedAt, StatsJSON) table. * 256 shards keyed on a SHA256 hex prefix (--shard-prefix-len=1..4). * runShardStatsLoop picks the oldest shard, sleeps until it crosses shardStalenessTarget (10m), then rescans. * Per-shard scan is one SUM(CASE WHEN…) query scoped by SHA256 >= ? AND SHA256 < ?. No row streaming. * On restart, loadShardStats seeds lastUsage from the persisted rows. Server starts up to usable right away without a blocking step. Action-LRU cleanup driven by idx_actions_access: * evictOldestActions walks the access-time index for the N oldest stale Actions, deletes them, orphan-deletes Blobs. * INDEXED BY locks the plan; TestEvictionQueryPlan{,_atScale} asserts via EXPLAIN QUERY PLAN. * Action-LRU instead of Blob-LRU: a stale Action is evicted even if its Blob has fresher siblings. Equivalent for the common 1:1 case; stricter LRU on shared content. * One short Tx per 200-row batch on a 1s tick. No multi-minute reader snapshots. Dead-reckon PUTs and evictions between scans: * Per-shard mutex-guarded counters track (count, bytes) added or removed since the shard was last persisted. * cleanupTick includes the delta in the size pressure check so a burst of PUTs between scans triggers cleanup immediately. Observability: /usage shows cohort table, shard freshness histogram, per-shard scan duration p25/p50/p90; new gauges gocached_shard_stats_{unscanned,oldest_age_seconds}, pending blob count/bytes; new histogram gocached_shard_scan_duration_seconds replayed from persisted data at startup; new counter gocached_evicted_actions. Schema migration is additive (schemaVersion stays 4). Perf test (TestPerfQueries) always runs but defaults to 100 rows so normal `go test` is sub-second. Operators set --performance-test-rows larger for real scale; seed reused across runs via --performance-test-dir. Sample, cumulative across rows: rows DB scanShard evict legacy GROUP BY ratio usageStats 1K 396K 188µs 5.2ms 648µs 0.1x 321ns 10K 3.4M 248µs 5.5ms 6.4ms 1.2x 227ns 100K 35M 1.8ms 6.1ms 73ms 12x 328ns 1M 349M 22ms 7.5ms 763ms 102x 231ns 2M 698M 51ms 8.2ms 1.6s 196x 261ns 5M 1.8G 162ms 8.7ms 4.1s 475x 224ns 10M 3.5G 349ms 9.0ms 8.3s 922x 249ns usageStats and evictOldestActions are flat in N. scanShard scales with per-shard rows. The "legacy GROUP BY" column is the cleanCandidates query gocached used before this branch (since deleted); it scales linearly with total rows, the behavior that was wedging prod at 250M. Updates tailscale/corp#42670 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

bradfitz changed the title ~~gocached: always-run perf test sized by -performance-test-rows~~ gocached: do per-shard usage stats and incremental cleanup Jun 2, 2026

bradfitz requested a review from tomhjp June 2, 2026 03:47

tomhjp approved these changes Jun 2, 2026

View reviewed changes

bradfitz force-pushed the bradfitz/wal2 branch from dfc2995 to ecb0074 Compare June 2, 2026 14:30

bradfitz force-pushed the bradfitz/wal2 branch from ecb0074 to bd597bf Compare June 2, 2026 14:36

bradfitz merged commit d48e363 into main Jun 2, 2026
3 checks passed

Conversation

bradfitz commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomhjp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bradfitz commented Jun 2, 2026 •

edited

Loading