Skip to content

feat: opt-in ETag read cache for github storage#6

Open
openrijal wants to merge 3 commits into
awecode:mainfrom
openrijal:feat/github-etag-cache
Open

feat: opt-in ETag read cache for github storage#6
openrijal wants to merge 3 commits into
awecode:mainfrom
openrijal:feat/github-etag-cache

Conversation

@openrijal

Copy link
Copy Markdown

Stacked on #3 — please merge #3 first. Until #3 lands, the diff here shows both #3's commits and the new cache commit (`6c3103f`). After #3 merges, only the cache commit will remain in the diff.

Summary

Adds an in-process LRU read cache (cap 64) keyed by `owner/repo@ref:path`, storing parsed JSON alongside the GitHub response `ETag`. When enabled, reads send `If-None-Match` on subsequent requests and short-circuit on 304. GitHub returns 304 without spending rate-limit budget, so this is a meaningful latency + quota win on repeated reads of the same resource — admin list re-renders, navigation between detail and list views, etc.

Disabled by default at every layer.

Enabling

Globally (recommended for trusted deployments where you own all writes):

```ts
// nuxt.config.ts
runtimeConfig: {
autoadmin: {
github: { cacheReads: true },
},
}
```

Per-resource (overrides the global default):

```ts
register({
kind: 'array',
key: 'blogs',
storage: {
kind: 'github',
owner: 'me',
repo: 'cms',
path: 'data/blogs.json',
cacheReads: true,
},
// ...
})
```

Why opt-in, not always-on

  • Module-scoped state is undesirable in multi-tenant shared isolates.
  • A stale cache could hide a manual repo edit (e.g. someone edits the JSON in GitHub's web UI between admin reads).
  • Some deployments deliberately want every read to hit GitHub for audit reasons.

Writes always invalidate the cached entry unconditionally (success or 409 conflict), so the cache invariants stay correct regardless of the flag. The gate only controls whether reads populate and consult the cache.

What it does NOT do

  • No TTL — entries live until they're evicted by LRU or invalidated by a write. ETag-conditional requests are cheap (304 is free), so a TTL would be a regression. If you want time-based eviction, call `clearGithubReadCache()` from your own scheduler.
  • No cross-isolate sharing. Each worker/isolate has its own cache. For Cloudflare Pages this means warm isolates benefit, cold starts don't.
  • Does not cache writes, error responses, or 404s.

API additions (all optional)

  • `GithubReadOptions.cacheReads?: boolean`
  • `GithubJsonRepositoryOptions.cacheReads?: boolean`
  • `JsonStorageConfig` (github variant): `cacheReads?: boolean`
  • `runtimeConfig.autoadmin.github.cacheReads?: boolean`
  • `clearGithubReadCache(): void` exported from `server/utils/githubContents.ts` for tests

Test plan

  • Standalone typecheck clean (only Nitro auto-imports flagged).
  • Manual: enable, hit the same endpoint twice, confirm second response is 304 from GitHub.
  • Manual: write a row, then read — confirm cache is bypassed and new content is returned.
  • CI / lint on this repo.

openrijal and others added 3 commits May 20, 2026 09:37
The GitHub Contents API only returns the `content` field for files
under 1MB. For files between 1-100 MB it responds with `type: 'file'`
and `encoding: 'none'` but an empty `content`, causing the existing
guard to throw `GitHub response is not a single file with content.`

When `content` is missing, fetch the blob by sha via the Git Blobs API
(`GET /repos/{owner}/{repo}/git/blobs/{sha}`), which streams base64
content up to 100 MB, then decode as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds on the Blobs API fallback with three operational improvements:

1. `maxBytes` / `warnAtBytes` size guardrail. New per-resource
   `storage.maxBytes` throws a 413 with the actual byte count when a
   read or write exceeds it; `warnAtBytes` logs `console.warn` once per
   path. Enforced against the Contents API's `body.size` on reads and
   `Buffer.byteLength(payload.content, 'base64')` on writes.

2. Locator-prefixed error messages. Every `createError` now embeds
   `owner/repo:path[@ref]` and, where relevant, the file size or short
   blob sha. This matters in serverless logs where the original
   request context is otherwise lost.

3. Explicit narrowing for `base64Content`. Replaces the post-fallback
   non-null assertion with a typed check, and surfaces empty-file
   decode as a clear 422 instead of the previous misleading
   "not valid JSON".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an in-process LRU read cache (cap 64) keyed by
`owner/repo@ref:path`, storing parsed JSON alongside the response
ETag. When enabled, reads send `If-None-Match` on subsequent requests
and short-circuit on 304 — GitHub returns 304 without spending
rate-limit budget, so this is a meaningful latency + quota win on
repeated reads of the same resource (e.g. admin list re-renders).

**Disabled by default at every layer.** Enable globally via
`runtimeConfig.autoadmin.github.cacheReads = true` or per-resource via
`storage.cacheReads = true`. Per-resource takes precedence.

Opt-in rather than always-on because module-scoped state is undesirable
in multi-tenant shared isolates and a stale read could hide a manual
repo edit. Successful and conflicting writes always invalidate the
cached entry unconditionally, so the cache code is safe to leave in
even when `cacheReads` is `false` (the gate ensures it's never
populated in that case).

Also exports `clearGithubReadCache()` for tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant