Skip to content

gocached: do per-shard usage stats and incremental cleanup#36

Merged
bradfitz merged 1 commit into
mainfrom
bradfitz/wal2
Jun 2, 2026
Merged

gocached: do per-shard usage stats and incremental cleanup#36
bradfitz merged 1 commit into
mainfrom
bradfitz/wal2

Conversation

@bradfitz

@bradfitz bradfitz commented Jun 2, 2026

Copy link
Copy Markdown
Owner

This fixes performance problems we hit in with gocached's stats + LRU cleanup queries.

Three problems previously:

  1. Startup blocked on usageStats + cleanOldObjects for over 10 min before the
    server began accepting requests.
  2. usageStats pinned a reader snapshot ~60% of the time, so SQLite's PASSIVE
    autocheckpoint could never advance. The WAL grew to 52 GB. walFindFrame
    pegged CPU and go-cacher clients stalled.
  3. cleanCandidates GROUP BY scaled with TOTAL rows; size-pressure cleanup
    was effectively unbounded work, and the same snapshot-pinning issue.

Instead, share the keyspace (by default: into 256 shards, by two hex digit prefix) and compute usage stats by shard. And store those stats (+ computation time) in the DB itself, so the server starts up quickly after a restart, without blocking.

Second: change the blob cleanup query to use an index, and switch from LRU by blobs to LRU by actions, which can delete blobs when the refcount drops to zero.

Then add more metrics.

No schema version bump; just one new table that's created if it doesn't already exist.

Updates tailscale/corp#42670

@bradfitz bradfitz changed the title gocached: always-run perf test sized by -performance-test-rows gocached: do per-shard usage stats and incremental cleanup Jun 2, 2026
@bradfitz bradfitz requested a review from tomhjp June 2, 2026 03:47

@tomhjp tomhjp left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I got a bit fatigued by the time I got to the tests, so I haven't done much review there. Overall though I like the shape of the changes to stats tracking + eviction

Comment thread cmd/gocached/gocached.go Outdated
Comment thread gocached/gocached.go Outdated
Comment thread gocached/gocached.go Outdated
Comment thread gocached/gocached.go Outdated
Comment on lines +1935 to +1939
// runShardStatsLoop keeps usage stats fresh by rescanning the
// oldest-scanned shard whenever it crosses shardStalenessTarget. On a quiet
// cache the goroutine sleeps for long stretches instead of spinning every
// few seconds; on a busy cache the loop is effectively rate-limited by
// shardStatsMinInterval since each scan resets the oldest age to zero.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment accurate? I don't see any reason why the cache being busy would make us scan more.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, not accurate. rewording.

Comment thread gocached/gocached.go Outdated
Comment thread gocached/gocached.go Outdated
Comment thread gocached/gocached.go Outdated

// shardStatsMu guards shardStats and serializes recomputeAggregateLocked.
shardStatsMu sync.Mutex
shardStats map[string]*shardSnapshot // keyed by hex prefix; e.g. "00".."ff"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be nice to have a tiny type for shard prefix

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread gocached/gocached.go
Comment on lines +1628 to +1631
type shardDelta struct {
count atomic.Int64
bytes atomic.Int64
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like an easy win for consistency to make the fields plain ints and then atomically load a whole shardDelta value. Otherwise the values can drift arbitrarily far apart from each other over time.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just put a little mutex around these and made them plain ints

Comment thread gocached/gocached.go Outdated
Comment on lines +1926 to +1931
if preCount != 0 {
srv.shardDeltas[idx].count.Add(-preCount)
}
if preBytes != 0 {
srv.shardDeltas[idx].bytes.Add(-preBytes)
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more correct to decrement by the value in stats? This seems approximately correct but could drift over time.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. changed to decrement.

Comment thread gocached/gocached.go Outdated
// move on. No delta, no Blob row, no file.
continue
}
// Each Action contributes (1, storedSize) to the Blobs⋈Actions

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh is the symbol for inner join, or left join? Let's be more explicit to prevent confusion

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

The usageStats() LEFT JOIN every 5 minutes didn't scale. At 272M
Actions it took ~7 minutes per call, pinning a reader snapshot ~60% of
the time. The WAL grew to 52 GB because SQLite's PASSIVE
autocheckpoint could never advance past the snapshot; walFindFrame
pegged CPU and go-cacher clients stalled. Startup also blocked on
usageStats + cleanOldObjects for 10+ minutes. cleanCandidates was the
same shape: GROUP BY scaling with total rows.

This redoes the usage stats + LRU cleanup pipeline.

Shard the histogram, persist it, refresh in the background:

  * New BlobShardStats(Prefix, ScannedAt, StatsJSON) table.
  * 256 shards keyed on a SHA256 hex prefix (--shard-prefix-len=1..4).
  * runShardStatsLoop picks the oldest shard, sleeps until it crosses
    shardStalenessTarget (10m), then rescans.
  * Per-shard scan is one SUM(CASE WHEN…) query scoped by
    SHA256 >= ? AND SHA256 < ?. No row streaming.
  * On restart, loadShardStats seeds lastUsage from the persisted rows.
    Server starts up to usable right away without a blocking step.

Action-LRU cleanup driven by idx_actions_access:

  * evictOldestActions walks the access-time index for the N oldest
    stale Actions, deletes them, orphan-deletes Blobs.
  * INDEXED BY locks the plan; TestEvictionQueryPlan{,_atScale} asserts
    via EXPLAIN QUERY PLAN.
  * Action-LRU instead of Blob-LRU: a stale Action is evicted even if
    its Blob has fresher siblings. Equivalent for the common 1:1 case;
    stricter LRU on shared content.
  * One short Tx per 200-row batch on a 1s tick. No multi-minute reader
    snapshots.

Dead-reckon PUTs and evictions between scans:

  * Per-shard mutex-guarded counters track (count, bytes) added or
    removed since the shard was last persisted.
  * cleanupTick includes the delta in the size pressure check so a
    burst of PUTs between scans triggers cleanup immediately.

Observability: /usage shows cohort table, shard freshness histogram,
per-shard scan duration p25/p50/p90; new gauges
gocached_shard_stats_{unscanned,oldest_age_seconds}, pending blob
count/bytes; new histogram gocached_shard_scan_duration_seconds
replayed from persisted data at startup; new counter
gocached_evicted_actions.

Schema migration is additive (schemaVersion stays 4).

Perf test (TestPerfQueries) always runs but defaults to 100 rows so
normal `go test` is sub-second. Operators set --performance-test-rows
larger for real scale; seed reused across runs via
--performance-test-dir. Sample, cumulative across rows:

    rows     DB    scanShard    evict   legacy GROUP BY   ratio  usageStats
      1K   396K        188µs    5.2ms             648µs    0.1x       321ns
     10K   3.4M        248µs    5.5ms             6.4ms    1.2x       227ns
    100K    35M        1.8ms    6.1ms              73ms     12x       328ns
      1M   349M         22ms    7.5ms             763ms    102x       231ns
      2M   698M         51ms    8.2ms             1.6s     196x       261ns
      5M   1.8G        162ms    8.7ms             4.1s     475x       224ns
     10M   3.5G        349ms    9.0ms             8.3s     922x       249ns

usageStats and evictOldestActions are flat in N. scanShard scales with
per-shard rows. The "legacy GROUP BY" column is the cleanCandidates
query gocached used before this branch (since deleted); it scales
linearly with total rows, the behavior that was wedging prod at
250M.

Updates tailscale/corp#42670

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
@bradfitz bradfitz merged commit d48e363 into main Jun 2, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants