Skip to content

size escrow off real prompt + output cap with headroom + per-job ceiling#16

Open
ffaerber wants to merge 2 commits into
mainfrom
claude/token-limits-proxy-gateway-Mr36B
Open

size escrow off real prompt + output cap with headroom + per-job ceiling#16
ffaerber wants to merge 2 commits into
mainfrom
claude/token-limits-proxy-gateway-Mr36B

Conversation

@ffaerber
Copy link
Copy Markdown
Collaborator

Previous maxPayment formula padded by a fixed 1M tokens on each side off
max_tokens (default 1024). That under-budgets long-context requests
(Gemini 2M, GPT-5 long inputs) — provider's honest claimJob reverts on
PaymentTooHigh, gateway times out, provider gets slashed for an honest
client-sized prompt.

New compute_max_payment sizes the escrow off the estimated prompt length
(chars/4 fallback) and the requested or default output cap, each padded
by T4T_ESCROW_HEADROOM_RATIO. Optional T4T_MAX_ESCROW_PER_JOB rejects
oversized requests with HTTP 413 instead of locking that much xBZZ.

claude added 2 commits May 23, 2026 09:03
Previous maxPayment formula padded by a fixed 1M tokens on each side off
max_tokens (default 1024). That under-budgets long-context requests
(Gemini 2M, GPT-5 long inputs) — provider's honest claimJob reverts on
PaymentTooHigh, gateway times out, provider gets slashed for an honest
client-sized prompt.

New compute_max_payment sizes the escrow off the estimated prompt length
(chars/4 fallback) and the requested or default output cap, each padded
by T4T_ESCROW_HEADROOM_RATIO. Optional T4T_MAX_ESCROW_PER_JOB rejects
oversized requests with HTTP 413 instead of locking that much xBZZ.
…ment

Two defensive changes so an honest provider can't get slashed for an
honest workload that overshoots the escrow:

1. Before calling chatCompletion, worker derives the maximum completion
   tokens the on-chain maxPayment can pay for (given the provider's
   declared per-million prices and a conservative chars/4 prompt
   estimate), and lowers req.max_tokens if it's higher (or absent).
   Any backend that honors max_tokens then physically cannot produce a
   response whose actualPayment would exceed maxPayment.

2. In the claim path, re-read the on-chain Job to get the authoritative
   maxPayment (defense against a gateway that tampers with notify.body),
   then clip actualPayment = min(actual, maxPayment). A backend that
   ignores max_tokens still claims what it can instead of reverting with
   PaymentTooHigh and burning to timeoutJob's 3x slash.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants