Token-efficient artifact updates and streaming for LLMs — 90-99% output token reduction per edit.
Crates.io
·
Report Bug
·
Specification
Warning: This project is
v0— the protocol, schemas, and APIs are subject to breaking changes without notice until a formal release.
An open standard protocol — GAP — that lets LLMs declare, diff, and reprovision text artifacts with minimal token expenditure. Includes a Rust reference implementation of the apply engine plus a Python evaluation framework for measuring token efficiency against real LLM runs.
- Envelope system — three operation types (
synthesize,edit,handle) for full generation, targeted updates, and lightweight references - Stateless apply engine — pure function, no I/O, ~2μs per edit; portable to browsers (WASM), IDEs, CLIs, or service backends
- ID-based targeting —
<gap:target id="ID">markers and JSON Pointer paths eliminate hallucinated search strings - Format-agnostic — works with HTML, Python, JavaScript, JSON, YAML, Rust, Go, SVG, and more
- 90-99% output token reduction per edit, translating to 43-86% total cost savings (cost model)
- SSE transport binding — wire format for streaming with reconnection support (GAP-SSE)
- Evaluation framework — 89 experiment datasets measuring token efficiency and reliability against real LLM runs
Rust crate:
cargo add generative-artifact-protocolFrom source (full workspace):
git clone https://github.com/urmzd/generative-artifact-protocol
cd generative-artifact-protocolRequires Rust (stable), uv (for evals), and optionally just (for recipes).
# Build the Rust library
just build
# Run tests
just test
# Run criterion benchmarks (apply engine speed)
just bench
# Sync workspace — build FFI via maturin + Python packages
just bind
# Run LLM evaluations
just run count=5 model="gemini-2.0-flash" provider="google"
# Generate report from experiment metrics
just reportLLM ──produces──▶ envelope ──apply──▶ (artifact, handle)
▲
gap (stateless, ~2μs)
- An LLM produces an artifact envelope (JSON) — either a
synthesizeenvelope (full content with target markers) or aneditenvelope (targeted changes by ID or JSON Pointer). - The apply engine resolves the envelope against the current artifact state to produce the updated artifact and a lightweight handle.
- The orchestrator holds handles; the resolved artifact is stored and consumed by downstream tools — browsers, IDEs, etc.
The core of the library is a single stateless function:
pub fn apply(artifact: Option<&Artifact>, envelope: &Envelope) -> Result<(Artifact, Envelope)>| Envelope | Direction | Description |
|---|---|---|
| synthesize | input | Complete artifact content (baseline or reset) with <gap:target> markers |
| edit | input | Targeted changes via ID (<gap:target> markers) or JSON Pointer |
| handle | output | Lightweight reference returned after every synthesize or edit |
| Recipe | Description |
|---|---|
just build |
Compile the Rust library |
just test |
Run Rust unit tests |
just bench |
Criterion micro-benchmarks (apply engine speed) |
just bind |
Sync workspace — build FFI via maturin + Python packages |
just run [count] [model] [id] [provider] |
Run conversation benchmark experiments (base vs GAP flows) |
just report |
Generate markdown report from experiment metrics |
GAP saves tokens by replacing full artifact regeneration with small diff envelopes. The savings vary with the model's tokenizer, output/input price ratio, and whether a cheaper model handles diffs. See the full derivation in the spec.
The maintain context reads the full artifact (
-
Output token reduction:
$d$ instead of$S$ per edit (95–99% fewer output tokens) -
Context flattening: each edit reads only the current artifact (
$S$ ), not all prior versions ($k \cdot S$ at edit$k$ ) - Model asymmetry: the maintain context can use a cheaper model, multiplying savings further
Example (2,000-token artifact, 30-token edit,
| After |
Naive conversation | GAP | Total savings |
|---|---|---|---|
| 1 | $0.071 | $0.039 | 45% |
| 5 | $0.304 | $0.070 | 77% |
| 10 | $0.763 | $0.107 | 86% |
Payload size and apply time for each envelope type, measured against an 8 KB HTML dashboard fixture.
Note: "Payload savings" measures byte reduction — a proxy for output token reduction but not identical (tokenizers vary). See cost model for the full derivation.
| Envelope | Scenario | Payload | % of Full | Payload savings | Apply Time |
|---|---|---|---|---|---|
| synthesize | Full generation (baseline) | 8,164 B | 100.0% | — | 1 ns |
| edit | 1 value replace (ID targeting) | 12 B | 0.1% | 99.9% | 1.5 µs |
| edit | 4 value replaces (ID targeting) | 50 B | 0.6% | 99.4% | 3.5 µs |
| edit | 1 section replace (ID targeting) | 441 B | 5.4% | 94.6% | 1.4 µs |
| edit | 2 section replaces (ID targeting) | 516 B | 6.3% | 93.7% | 3.8 µs |
This project is dual-licensed:
- Code (
src/,evals/,benches/, build files) — Apache License 2.0 - Specification & docs (
spec/,assets/, documentation) — CC-BY 4.0
See NOTICE for details. Attribution is required under both licenses.