`gbrain import` makes per-file OpenAI embedding calls (~2s/file); batching would give 10-50× speedup

## Observation

`gbrain import` against ~2000 staged markdown files runs at ~2.1 seconds per file. After 35 minutes the wall-clock had embedded 1000 of 1989 files:

```
~/.gbrain/import-checkpoint.json:
{
  "totalFiles": 1989,
  "processedIndex": 1000,
  "completedFiles": 1000,
  "timestamp": "2026-05-19T19:30:05.008Z"   ← 35 min after start
}
```

2100s ÷ 1000 files = **2.1 s/file**.

That number is unmistakable: it's a single OpenAI `text-embedding-3-large` (or similar) API round-trip per file. CPU work on the file itself is microseconds. The throughput is bottlenecked entirely on serial calls to OpenAI's `/v1/embeddings` endpoint.

## What gstack expected

gstack's `bin/gstack-gbrain-sync.ts:1063` cites an estimate:

> "Per ED2: ~25-35 min for ~11.7K transcripts = ~150ms/page synchronous"

150ms/page would be plausible if embedding calls were batched (per-batch round-trip cost amortized over many files). 2100ms/page is what you see when each file is a single API call.

## The fix

OpenAI's `/v1/embeddings` accepts **up to 2048 inputs per request** and a max payload of ~8M tokens. Batching the embedding step at the chunk level (or even at the file level, for short files) yields the kind of speedup that lets a one-time `gbrain import` of a personal archive finish in minutes instead of hours:

| Batch size | Embedding calls for 1989 files | Time @ ~2s/call |
|---|---:|---:|
| 1 (current) | 1989 | 35 min+ |
| 32 | 63 | ~2 min |
| 256 | 8 | ~16 s |

(Rough; assumes one chunk per file for the table — adjust for multi-chunk files.)

## Why this matters now

The personal-brain use case has gotten an order of magnitude bigger in the last month — e.g. one user (me) just landed ~166k Gmail markdown files in `~/brain/email/` after wiring the iCloud + Gmail LaunchAgents. At 2.1s/file, that's a **97-hour** initial embed at sequential rate. Multi-day. With batching at size 256, the same archive embeds in ~22 minutes.

The 35-minute SIGTERM wall in gstack (filed separately at garrytan/gstack#1611) is partially a downstream symptom of this — if gbrain batched, the wall would stop biting big-brain users.

## Where to look

I haven't poked at gbrain's embedder pipeline directly, but a quick grep for the OpenAI client call site in `src/embedding/` or wherever the per-chunk loop lives would be the place. Two paths:

1. **Tactical:** wrap the per-call site in a batcher that collects N items then flushes (back-pressure-safe via a bounded queue).
2. **Structural:** rewrite the import stream to fan inputs into a batched embedder with a configurable batch size (env: `GBRAIN_EMBEDDING_BATCH_SIZE`, default 256).

Either way, the same code path needs to respect OpenAI's token budget per request (~8M; conservative default 1M is plenty).

## Environment

- macOS 15.x, Apple Silicon
- gbrain v0.33.1.0
- gstack v1.40.0.0 (invoking `gbrain import` via `gstack-memory-ingest --bulk`)
- Engine: Supabase Session Pooler (postgres)
- OPENAI_API_KEY present in env; OpenAI API responding normally

## Verifying the diagnosis

If anyone wants to confirm this is single-call (not some other pathology), run with `OPENAI_LOG=debug` or wrap the embedder call in a logger that prints per-batch input count. A "1 input per call" log == this issue.

Happy to PR the batcher if it's a welcome change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`gbrain import` makes per-file OpenAI embedding calls (~2s/file); batching would give 10-50× speedup #1207

Observation

What gstack expected

The fix

Why this matters now

Where to look

Environment

Verifying the diagnosis

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

gbrain import makes per-file OpenAI embedding calls (~2s/file); batching would give 10-50× speedup #1207

Description

Observation

What gstack expected

The fix

Why this matters now

Where to look

Environment

Verifying the diagnosis

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`gbrain import` makes per-file OpenAI embedding calls (~2s/file); batching would give 10-50× speedup #1207