Skip to content

Latest commit

 

History

History
119 lines (84 loc) · 4.86 KB

File metadata and controls

119 lines (84 loc) · 4.86 KB

LAP Benchmark Results

v0.3 — 162 specs across 5 formats — 10.3x overall compression (4.37M → 423K tokens)


Compression Ratios — All 30 OpenAPI Specs

Compression ratios for all 30 OpenAPI specs

API Raw Tokens Lean Tokens Compression
notion 68,587 1,733 39.6x
snyk 201,205 5,193 38.7x
zoom 848,983 22,474 37.8x
digitalocean 345,401 12,896 26.8x
plaid 304,530 18,930 16.1x
box 232,848 17,559 13.3x
asana 97,427 8,257 11.8x
hetzner 167,308 14,242 11.7x
vercel 136,159 13,472 10.1x
slack2 126,517 13,316 9.5x
linode 203,653 21,795 9.3x
stripe-full 1,034,829 118,758 8.7x
launchdarkly 31,522 4,731 6.7x
twitter 61,043 9,474 6.4x
netlify 20,142 3,127 6.4x
gitlab 88,242 15,092 5.8x
resend 21,890 3,776 5.8x
vonage 1,889 412 4.6x
github-core 2,190 531 4.1x
circleci 5,725 1,464 3.9x
stripe-charges 1,892 490 3.9x
petstore 4,656 1,217 3.8x
twilio-core 2,465 688 3.6x
openai-core 1,730 524 3.3x
google-maps 941 316 3.0x
discord 909 308 3.0x
cloudflare 763 265 2.9x
spotify 826 331 2.5x
sendgrid 518 209 2.5x
slack 762 316 2.4x

Median: 5.2x — Large, verbose specs see the biggest gains (up to 39.6x). Small specs with minimal redundancy still achieve 2–3x.


Token Savings — Top 10 Largest Specs

Token savings for top 10 specs


Format Comparison

Median compression by format

Format Specs Median Compression
OpenAPI 30 5.2x
Postman 36 4.1x
Protobuf 35 1.5x
AsyncAPI 31 1.4x
GraphQL 30 1.3x

LAP delivers the highest compression on verbose formats (OpenAPI, Postman) where JSON/YAML schemas carry significant structural redundancy. Compact formats like Protobuf and GraphQL are already information-dense, so compression is modest — but still meaningful at scale.


Agent Implementation Test

Implementation comparison

Real-world validation: five scenarios where an LLM agent generates working API integration code. The agent produces identical output regardless of whether it receives raw OpenAPI or LAP — it just costs fewer tokens.

Scenario OpenAPI Tokens LAP Tokens Savings
Stripe 3,166 1,736 45%
GitHub 3,384 1,743 49%
Twilio 3,623 1,857 49%
Slack 2,163 1,639 24%
Hetzner 168,388 15,045 91%

Hetzner's 91% reduction is notable — the raw spec barely fits in most context windows, while the LAP version fits comfortably.


Cost Projections

Cost projection table

With 10.3x average compression, a 100K-token API spec drops to ~10K tokens. At current model pricing, that translates to ~90% cost reduction per API call that includes spec context.

For teams making hundreds of agent calls per day against large API specs, this compounds to significant savings — potentially thousands of dollars per month.


Key Findings

  1. Verbose specs benefit most. Notion (39.6x), Snyk (38.7x), and Zoom (37.8x) have massive schemas with repetitive patterns that LAP compresses aggressively.

  2. Small specs still compress. Even minimal specs like Slack (762 tokens) achieve 2.4x — the floor is meaningful, not zero.

  3. Format matters. OpenAPI and Postman carry structural overhead that compresses well. GraphQL and Protobuf are already concise — LAP still helps, but expect 1.3–1.5x rather than 5–10x.

  4. Agent output is preserved. Implementation tests confirm that agents produce identical working code with LAP input — the compression is lossless for practical purposes.

  5. v0.3 trades ~4% size for reliability. Inline schemas and completeness signals add a small number of tokens but eliminate common agent failure modes (broken $ref resolution, missing required fields).


Methodology

  • Token counting: cl100k_base tokenizer (GPT-4/Claude compatible)
  • Specs: 162 real-world API specifications sourced from public repositories and official documentation
  • Formats: OpenAPI 3.x, Postman Collections, Protocol Buffers, AsyncAPI, GraphQL SDL
  • Implementation test: Claude Sonnet generates integration code from both raw and LAP specs; output correctness verified manually
  • Compression ratio: Raw tokens ÷ Lean tokens
  • All results reproducible from the benchmark suite in this repository