Skip to content

Commit 6d7512b

Browse files
docs: bare-repo invariant in deploy, runbook, local setup
- docs/operations/deploy.md — boot sequence describes git clone --bare, reconcile state machine now references update-ref / merge-tree / commit-tree plumbing. Env table note for CFP_DATA_REPO_PATH calls out emptyDir + re-clone-on-boot. - docs/operations/runbook.md — recovery for "API won't boot" drops the delete-PVC step (no PVC for data). The "Drop into the pod" snippet inspects the bare repo via git --git-dir / `git show HEAD:.gitsheets`. - .claude/CLAUDE.md — Local setup section: contributors clone --bare; optional second clone for a working-tree browser; Pod boot bullet reflects bare clone + emptyDir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d808c2a commit 6d7512b

3 files changed

Lines changed: 59 additions & 46 deletions

File tree

.claude/CLAUDE.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -100,10 +100,11 @@ Two background concerns keep that store in sync with the world:
100100
## Local setup
101101

102102
1. `asdf install` — picks up Node from `.tool-versions`
103-
2. Clone the data repo as a sibling: `git clone git@github.com:CodeForPhilly/codeforphilly-data.git ../codeforphilly-data` (checkout `fixture` for a small seed, or `published` for the full laddr import)
104-
3. `cp .env.example .env` and edit — point `CFP_DATA_REPO_PATH` at your sibling clone (absolute path recommended; relative paths resolve from `apps/api/`, not repo root)
105-
4. `npm install`
106-
5. `npm run dev` — api + web concurrently
103+
2. **Bare-clone** the data repo as a sibling: `git clone --bare git@github.com:CodeForPhilly/codeforphilly-data.git ../codeforphilly-data` (use `--branch fixture` for a small seed, or `--branch published` for the full laddr import). The app reads via gitsheets' tree-object interface and *requires* bare — see [specs/behaviors/storage.md](../specs/behaviors/storage.md) → "The data clone is bare."
104+
3. *(optional)* If you want a working tree to browse/edit records by hand, clone *from* your bare into a second directory: `git clone ../codeforphilly-data ../codeforphilly-data-wt`. Push/pull between them; the app doesn't care.
105+
4. `cp .env.example .env` and edit — point `CFP_DATA_REPO_PATH` at your bare sibling clone (absolute path recommended; relative paths resolve from `apps/api/`, not repo root)
106+
5. `npm install`
107+
6. `npm run dev` — api + web concurrently
107108

108109
```bash
109110
npm install # install all workspaces
@@ -124,7 +125,7 @@ Typical change flow:
124125
1. **Merge to `main`** — CI builds + tests; nothing deploys yet.
125126
2. **Publish image** (currently manual) — `docker build --platform=linux/amd64 -t ghcr.io/codeforphilly/codeforphilly-ng:sandbox . && docker push …`. Apple-silicon dev machines must set the platform flag — cluster nodes are amd64.
126127
3. **GitOps pickup**`cfp-sandbox-cluster` projects from our `deploy/kustomize/`; on its own merge, applies via `kubectl apply -k`.
127-
4. **Pod boot** — single replica, `Recreate` strategy. Container entrypoint clones the data repo on first boot (PVC persists across pods). Node boots: env → store load → **reconcile** (ff/rebase/escape-hatch against `origin/<CFP_DATA_BRANCH>`) → **push daemon** → routes → SPA. `/api/health/ready` returns 200 once stores are loaded *and* reconciled.
128+
4. **Pod boot** — single replica, `Recreate` strategy. Container entrypoint bare-clones the data repo on every pod start (the data volume is `emptyDir`, so first boot = every fresh pod). Node boots: env → store load → **reconcile** (ff/replay/escape-hatch against `origin/<CFP_DATA_BRANCH>`) → **push daemon** → routes → SPA. `/api/health/ready` returns 200 once stores are loaded *and* reconciled.
128129
5. **Live data updates** — independent of app deploy. Pushes to `published` trigger the [hot-reload webhook](../docs/operations/runbook.md#hot-reload-webhook); the pod rebuilds in-memory state in place, no restart.
129130

130131
Constraints worth knowing before touching anything deploy-shaped:

docs/operations/deploy.md

Lines changed: 44 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -104,43 +104,49 @@ curl http://localhost:3001/ # SPA index.html
104104
The container entrypoint (`deploy/docker/entrypoint.sh`) only handles the
105105
bits that *must* run before the Node process exists:
106106

107-
- Trusts the PVC mount via `git config --global safe.directory`.
107+
- Trusts the data path via `git config --global safe.directory`.
108108
- Sets a pseudonymous git identity (`CodeForPhilly API
109-
<api@users.noreply.codeforphilly.org>`) for any committer line a future
110-
rebase might write.
111-
- On first pod boot — and only then — does a full-history `git clone` of
112-
`CFP_DATA_REMOTE` into `CFP_DATA_REPO_PATH` when no `.git` directory
113-
exists. On subsequent boots the PVC already holds a clone; no clone is
114-
performed.
115-
- Refreshes `origin`'s URL to whatever `CFP_DATA_REMOTE` is set to (lets
116-
operators rotate the remote without re-cloning the PVC).
117-
- `exec`s the API. That's all — about a dozen lines of shell now.
109+
<api@users.noreply.codeforphilly.org>`) for any committer line the
110+
reconcile's commit-replay path might write.
111+
- On first pod boot — `data` is an `emptyDir`, so this is every fresh
112+
pod — does a `git clone --bare --branch $CFP_DATA_BRANCH` of
113+
`CFP_DATA_REMOTE` into `CFP_DATA_REPO_PATH`. Bare means no working
114+
tree; gitsheets operates on the git object DB directly. See
115+
[`specs/behaviors/storage.md`](../../specs/behaviors/storage.md)
116+
"The data clone is bare."
117+
- Refreshes `origin`'s URL to whatever `CFP_DATA_REMOTE` is set to
118+
(operators can rotate the remote with a pod restart; the new
119+
`emptyDir` re-clones from the new URL).
120+
- `exec`s the API. That's all.
118121

119122
Then `exec node apps/api/dist/index.js`. Inside node, `buildApp()` registers
120123
plugins ([apps/api/src/app.ts](../../apps/api/src/app.ts)) in order: env →
121-
CORS → cookies → trace IDs → error mapper → **store** (loads public +
122-
private into memory) → **reconcile** (fetch + ff/rebase/escape-hatch against
124+
CORS → cookies → trace IDs → error mapper → **store** (opens the bare
125+
public clone via `openRepo({ gitDir })`, loads public + private into
126+
memory) → **reconcile** (fetch + ff/rebase-replay/escape-hatch against
123127
origin — see below) → **push daemon** (starts pushing transact'd commits to
124128
`CFP_DATA_REMOTE`) → services (FTS) → rate limit → idempotency → session
125129
middleware → swagger → routes → static SPA. Fastify's `listen()` doesn't
126130
fire until all of those resolve, so once `/api/health/ready` returns 200
127-
both stores have loaded **and** the working tree has been reconciled with
128-
origin.
131+
both stores have loaded **and** local refs have been reconciled with origin.
129132

130133
### Reconciliation state machine
131134

132135
Lives in [`apps/api/src/store/reconcile.ts`](../../apps/api/src/store/reconcile.ts)
133-
and is invoked at boot by the reconcile plugin. Same state machine the
134-
shell used to run, just structured Node so exit codes propagate naturally
135-
and the same code is reusable from the future hot-reload webhook (#65):
136+
and is invoked at boot by the reconcile plugin. Operates entirely on the
137+
object DB via plumbing (`update-ref`, `merge-tree --write-tree`,
138+
`commit-tree`) so it works against the bare clone with no working tree:
136139

137140
- in sync → no-op (`'in-sync'`)
138-
- behind → fast-forward (`'fast-forwarded'`)
141+
- behind → fast-forward via `git update-ref refs/heads/<branch>` (CAS
142+
against old commit) (`'fast-forwarded'`)
139143
- ahead → push (`'pushed-ahead'`; push daemon retries on push failure)
140-
- diverged + clean rebase → rebase + push (`'rebased'`)
141-
- diverged + conflicts → abort rebase, create + push a
142-
`conflicts/<UTC-timestamp>` branch from the pre-rebase HEAD, hard-reset
143-
local to origin (`'conflict-escaped'`; logged at ERROR level so operators
144+
- diverged + clean replay → `merge-tree --write-tree` + `commit-tree`
145+
per local commit on top of remote tip, then `update-ref` + push
146+
(`'rebased'`)
147+
- diverged + replay conflict → preserve pre-replay HEAD on
148+
`conflicts/<UTC-timestamp>`, push it, fast-forward local refs to
149+
remote tip (`'conflict-escaped'`; logged at ERROR level so operators
144150
see it in production logs)
145151
- fetch itself fails (network blip) → log warn, continue with local state
146152
(`'fetch-failed'`)
@@ -158,24 +164,26 @@ skips reconciliation entirely.
158164

159165
## Data repo on disk
160166

161-
The API operates on a working tree at `/app/data` backed by a PVC. The
162-
entrypoint ensures the working tree exists (cloning on first boot); the
163-
API-side reconcile plugin then synchronizes that tree with `CFP_DATA_REMOTE`
164-
on every boot, and the push daemon pushes commits made during the pod's
165-
lifetime back to the remote.
167+
The API operates on a **bare** clone at `/app/data` backed by an
168+
`emptyDir` volume. The entrypoint clones (`git clone --bare`) on every
169+
pod start since `emptyDir` doesn't survive restarts. Within a pod's
170+
lifetime, the API-side reconcile plugin synchronizes local refs with
171+
`CFP_DATA_REMOTE` (boot reconcile + hot-reload webhook), and the push
172+
daemon pushes commits made during the pod's lifetime back to the
173+
remote.
166174

167175
Implications:
168176

169-
- **PVC contents are durable enough to outlive a single pod**, which lets the
170-
push daemon finish pushing any commits made just before pod terminate.
171-
But the source of truth is the git remote, not the PVC — wiping the PVC
172-
is safe (the next boot re-clones).
177+
- **No PVC for data.** The git remote is the source of truth; the
178+
pod's bare clone is recoverable from there. Pod restart is the
179+
recovery primitive — there's nothing to delete first, and no
180+
Multi-Attach errors during node failover.
173181
- **The deploy key matters.** When `CFP_DATA_REMOTE` is SSH (the
174182
default), the entrypoint relies on `GIT_SSH_COMMAND` (set in the
175-
ConfigMap) pointing at the mounted private key. Rotation: replace the
176-
SealedSecret, restart the pod. See
177-
[secrets.md](secrets.md#data-repo-deploy-key) and the rotation procedure
178-
in [sandbox-deploy.md](sandbox-deploy.md#rotating-the-deploy-key).
183+
ConfigMap) pointing at the mounted private key. Rotation: replace
184+
the SealedSecret, restart the pod. See
185+
[secrets.md](secrets.md#data-repo-deploy-key) and the rotation
186+
procedure in [sandbox-deploy.md](sandbox-deploy.md#rotating-the-deploy-key).
179187

180188
## Bucket provisioning (production)
181189

@@ -212,7 +220,7 @@ comments. Production pod gets these mounted:
212220
| `NODE_ENV` | ConfigMap | `production` |
213221
| `PORT` | ConfigMap | `3001` |
214222
| `HOST` | ConfigMap | `0.0.0.0` |
215-
| `CFP_DATA_REPO_PATH` | ConfigMap | `/app/data` (PVC mount) |
223+
| `CFP_DATA_REPO_PATH` | ConfigMap | `/app/data` — bare gitdir, backed by an `emptyDir`; re-cloned on every pod boot |
216224
| `CFP_DATA_REMOTE` | Secret | git URL (ssh in prod) |
217225
| `CFP_DATA_BRANCH` | ConfigMap | e.g. `fixture` / `main` |
218226
| `CFP_DATA_RELOAD_SECRET` | **Secret** | Shared bearer-token for the hot-reload webhook; when unset the `/api/_internal/reload-data` endpoint returns 503. See [runbook.md](runbook.md#hot-reload-webhook). |

docs/operations/runbook.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Look for one of the four common boot failures:
2323
|------------------|-------|-----|
2424
| `[entrypoint] ERROR: CFP_DATA_REMOTE is unset` | The Secret containing `CFP_DATA_REMOTE` isn't reaching the pod. | Check `kubectl get secret codeforphilly-secrets -o yaml`; verify the SealedSecret in the GitOps repo decrypted successfully (look at the sealed-secrets controller logs). |
2525
| `fatal: could not read Username for 'https://...'` or `Permission denied (publickey)` | Bad/missing data-repo credentials. | Verify the `codeforphilly-data-deploy-key` Secret holds a valid `id_ed25519` whose public key has push access to the data repo. See [secrets.md](secrets.md#data-repo-deploy-key). |
26-
| `Failed to open public gitsheets store` | Working tree corrupt or missing `.gitsheets/` configs. | Exec into the pod, inspect `/app/data/.gitsheets/`. Recovery: `kubectl delete pvc codeforphilly-data -n <ns>`, then trigger a rolloutthe entrypoint re-clones from `CFP_DATA_REMOTE`. |
26+
| `Failed to open public gitsheets store` | Bare clone corrupt or missing `.gitsheets/` configs. | Exec into the pod, inspect `/app/data/refs/`, `/app/data/objects/`, and verify `.gitsheets/` exists in HEAD via `git --git-dir=/app/data show HEAD:.gitsheets`. Recovery: restart the pod`data` is an `emptyDir`, so a fresh pod re-clones from `CFP_DATA_REMOTE` automatically. |
2727
| `Failed to load private store (s3)` | Bucket creds wrong, bucket gone, or network ACL blocks egress. | Confirm `S3_*` env in the ConfigMap + Secret. From the pod, `curl $S3_ENDPOINT` to confirm reachability. |
2828
| `environment variable ... is required` | A required env (`CFP_DATA_REPO_PATH`, `STORAGE_BACKEND`, `CFP_JWT_SIGNING_KEY`) is missing. | Manifest regression. Compare against `deploy/kustomize/base/configmap.yaml` + the GitOps repo's SealedSecret. |
2929

@@ -37,8 +37,10 @@ kubectl -n codeforphilly debug -it deploy/codeforphilly \
3737
From inside:
3838

3939
```bash
40-
# Is the data repo really there?
41-
ls -la /app/data /app/data/.gitsheets
40+
# Is the bare data repo really there? Bare gitdir lives at the path root —
41+
# no .git subdir; expect HEAD, config, objects/, refs/ at the top.
42+
ls -la /app/data
43+
git --git-dir=/app/data show HEAD:.gitsheets
4244

4345
# Are env vars present?
4446
env | grep -E '^(CFP_|S3_|STORAGE_|GITHUB_)' | sort
@@ -71,8 +73,10 @@ kubectl -n codeforphilly-rewrite-sandbox set image \
7173
deploy/codeforphilly codeforphilly=ghcr.io/codeforphilly/codeforphilly-ng:<known-good-tag>
7274
```
7375

74-
Data is **not** in the PVC long-term; it's in the git remote. Deleting the
75-
PVC and letting the entrypoint re-clone is safe.
76+
The bare data clone lives in an `emptyDir` — re-cloned from the git remote on
77+
every pod boot. Pod restart is the recovery primitive; there's no PVC to
78+
delete. (A `codeforphilly-private` PVC still exists for the S3-fallback
79+
private store; only `codeforphilly-data` was retired.)
7680

7781
## "Readiness flapping / 503 spikes"
7882

0 commit comments

Comments
 (0)