Observation
gbrain import against ~2000 staged markdown files runs at ~2.1 seconds per file. After 35 minutes the wall-clock had embedded 1000 of 1989 files:
~/.gbrain/import-checkpoint.json:
{
"totalFiles": 1989,
"processedIndex": 1000,
"completedFiles": 1000,
"timestamp": "2026-05-19T19:30:05.008Z" ← 35 min after start
}
2100s ÷ 1000 files = 2.1 s/file.
That number is unmistakable: it's a single OpenAI text-embedding-3-large (or similar) API round-trip per file. CPU work on the file itself is microseconds. The throughput is bottlenecked entirely on serial calls to OpenAI's /v1/embeddings endpoint.
What gstack expected
gstack's bin/gstack-gbrain-sync.ts:1063 cites an estimate:
"Per ED2: ~25-35 min for ~11.7K transcripts = ~150ms/page synchronous"
150ms/page would be plausible if embedding calls were batched (per-batch round-trip cost amortized over many files). 2100ms/page is what you see when each file is a single API call.
The fix
OpenAI's /v1/embeddings accepts up to 2048 inputs per request and a max payload of ~8M tokens. Batching the embedding step at the chunk level (or even at the file level, for short files) yields the kind of speedup that lets a one-time gbrain import of a personal archive finish in minutes instead of hours:
| Batch size |
Embedding calls for 1989 files |
Time @ ~2s/call |
| 1 (current) |
1989 |
35 min+ |
| 32 |
63 |
~2 min |
| 256 |
8 |
~16 s |
(Rough; assumes one chunk per file for the table — adjust for multi-chunk files.)
Why this matters now
The personal-brain use case has gotten an order of magnitude bigger in the last month — e.g. one user (me) just landed ~166k Gmail markdown files in ~/brain/email/ after wiring the iCloud + Gmail LaunchAgents. At 2.1s/file, that's a 97-hour initial embed at sequential rate. Multi-day. With batching at size 256, the same archive embeds in ~22 minutes.
The 35-minute SIGTERM wall in gstack (filed separately at garrytan/gstack#1611) is partially a downstream symptom of this — if gbrain batched, the wall would stop biting big-brain users.
Where to look
I haven't poked at gbrain's embedder pipeline directly, but a quick grep for the OpenAI client call site in src/embedding/ or wherever the per-chunk loop lives would be the place. Two paths:
- Tactical: wrap the per-call site in a batcher that collects N items then flushes (back-pressure-safe via a bounded queue).
- Structural: rewrite the import stream to fan inputs into a batched embedder with a configurable batch size (env:
GBRAIN_EMBEDDING_BATCH_SIZE, default 256).
Either way, the same code path needs to respect OpenAI's token budget per request (~8M; conservative default 1M is plenty).
Environment
- macOS 15.x, Apple Silicon
- gbrain v0.33.1.0
- gstack v1.40.0.0 (invoking
gbrain import via gstack-memory-ingest --bulk)
- Engine: Supabase Session Pooler (postgres)
- OPENAI_API_KEY present in env; OpenAI API responding normally
Verifying the diagnosis
If anyone wants to confirm this is single-call (not some other pathology), run with OPENAI_LOG=debug or wrap the embedder call in a logger that prints per-batch input count. A "1 input per call" log == this issue.
Happy to PR the batcher if it's a welcome change.
Observation
gbrain importagainst ~2000 staged markdown files runs at ~2.1 seconds per file. After 35 minutes the wall-clock had embedded 1000 of 1989 files:2100s ÷ 1000 files = 2.1 s/file.
That number is unmistakable: it's a single OpenAI
text-embedding-3-large(or similar) API round-trip per file. CPU work on the file itself is microseconds. The throughput is bottlenecked entirely on serial calls to OpenAI's/v1/embeddingsendpoint.What gstack expected
gstack's
bin/gstack-gbrain-sync.ts:1063cites an estimate:150ms/page would be plausible if embedding calls were batched (per-batch round-trip cost amortized over many files). 2100ms/page is what you see when each file is a single API call.
The fix
OpenAI's
/v1/embeddingsaccepts up to 2048 inputs per request and a max payload of ~8M tokens. Batching the embedding step at the chunk level (or even at the file level, for short files) yields the kind of speedup that lets a one-timegbrain importof a personal archive finish in minutes instead of hours:(Rough; assumes one chunk per file for the table — adjust for multi-chunk files.)
Why this matters now
The personal-brain use case has gotten an order of magnitude bigger in the last month — e.g. one user (me) just landed ~166k Gmail markdown files in
~/brain/email/after wiring the iCloud + Gmail LaunchAgents. At 2.1s/file, that's a 97-hour initial embed at sequential rate. Multi-day. With batching at size 256, the same archive embeds in ~22 minutes.The 35-minute SIGTERM wall in gstack (filed separately at garrytan/gstack#1611) is partially a downstream symptom of this — if gbrain batched, the wall would stop biting big-brain users.
Where to look
I haven't poked at gbrain's embedder pipeline directly, but a quick grep for the OpenAI client call site in
src/embedding/or wherever the per-chunk loop lives would be the place. Two paths:GBRAIN_EMBEDDING_BATCH_SIZE, default 256).Either way, the same code path needs to respect OpenAI's token budget per request (~8M; conservative default 1M is plenty).
Environment
gbrain importviagstack-memory-ingest --bulk)Verifying the diagnosis
If anyone wants to confirm this is single-call (not some other pathology), run with
OPENAI_LOG=debugor wrap the embedder call in a logger that prints per-batch input count. A "1 input per call" log == this issue.Happy to PR the batcher if it's a welcome change.