Skip to content

gbrain import makes per-file OpenAI embedding calls (~2s/file); batching would give 10-50× speedup #1207

@pennyultima

Description

@pennyultima

Observation

gbrain import against ~2000 staged markdown files runs at ~2.1 seconds per file. After 35 minutes the wall-clock had embedded 1000 of 1989 files:

~/.gbrain/import-checkpoint.json:
{
  "totalFiles": 1989,
  "processedIndex": 1000,
  "completedFiles": 1000,
  "timestamp": "2026-05-19T19:30:05.008Z"   ← 35 min after start
}

2100s ÷ 1000 files = 2.1 s/file.

That number is unmistakable: it's a single OpenAI text-embedding-3-large (or similar) API round-trip per file. CPU work on the file itself is microseconds. The throughput is bottlenecked entirely on serial calls to OpenAI's /v1/embeddings endpoint.

What gstack expected

gstack's bin/gstack-gbrain-sync.ts:1063 cites an estimate:

"Per ED2: ~25-35 min for ~11.7K transcripts = ~150ms/page synchronous"

150ms/page would be plausible if embedding calls were batched (per-batch round-trip cost amortized over many files). 2100ms/page is what you see when each file is a single API call.

The fix

OpenAI's /v1/embeddings accepts up to 2048 inputs per request and a max payload of ~8M tokens. Batching the embedding step at the chunk level (or even at the file level, for short files) yields the kind of speedup that lets a one-time gbrain import of a personal archive finish in minutes instead of hours:

Batch size Embedding calls for 1989 files Time @ ~2s/call
1 (current) 1989 35 min+
32 63 ~2 min
256 8 ~16 s

(Rough; assumes one chunk per file for the table — adjust for multi-chunk files.)

Why this matters now

The personal-brain use case has gotten an order of magnitude bigger in the last month — e.g. one user (me) just landed ~166k Gmail markdown files in ~/brain/email/ after wiring the iCloud + Gmail LaunchAgents. At 2.1s/file, that's a 97-hour initial embed at sequential rate. Multi-day. With batching at size 256, the same archive embeds in ~22 minutes.

The 35-minute SIGTERM wall in gstack (filed separately at garrytan/gstack#1611) is partially a downstream symptom of this — if gbrain batched, the wall would stop biting big-brain users.

Where to look

I haven't poked at gbrain's embedder pipeline directly, but a quick grep for the OpenAI client call site in src/embedding/ or wherever the per-chunk loop lives would be the place. Two paths:

  1. Tactical: wrap the per-call site in a batcher that collects N items then flushes (back-pressure-safe via a bounded queue).
  2. Structural: rewrite the import stream to fan inputs into a batched embedder with a configurable batch size (env: GBRAIN_EMBEDDING_BATCH_SIZE, default 256).

Either way, the same code path needs to respect OpenAI's token budget per request (~8M; conservative default 1M is plenty).

Environment

  • macOS 15.x, Apple Silicon
  • gbrain v0.33.1.0
  • gstack v1.40.0.0 (invoking gbrain import via gstack-memory-ingest --bulk)
  • Engine: Supabase Session Pooler (postgres)
  • OPENAI_API_KEY present in env; OpenAI API responding normally

Verifying the diagnosis

If anyone wants to confirm this is single-call (not some other pathology), run with OPENAI_LOG=debug or wrap the embedder call in a logger that prints per-batch input count. A "1 input per call" log == this issue.

Happy to PR the batcher if it's a welcome change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions