Skip to content

rehydrate: cache pre-tokenized conversation history #514

Description

@leseb

What

Add cached token fields to ResponseRecord and ConversationRecord so the rehydrate filter can skip re-tokenizing unchanged conversation history on each turn.

Context

When rehydrating multi-turn conversations, the full history must be in token form for the model. Without caching, every request re-tokenizes the entire conversation — even parts unchanged since the last turn. Storing the pre-tokenized representation avoids this redundant work, which grows linearly with conversation length.

Scope

  • Add a token cache field to ResponseRecord (e.g., cached_tokens: Option<Vec<u8>> or similar)
  • Add a token cache field to ConversationRecord
  • Populate during response persistence (tokens are already available from the inference response)
  • Load during rehydration, pass directly to inference instead of re-tokenizing
  • Schema migration: ALTER TABLE ADD COLUMN for the new fields

Deferred until rehydrate is built

The exact representation depends on the tokenization format (tiktoken encoding, provider-specific token IDs, etc.), which isn't known until the rehydrate filter and inference proxy are implemented. Adding the field prematurely risks choosing the wrong format.

Origin

Suggested in PR #491 review by @franciscojavierarceo.

Parent

Part of #429 (rehydrate filter), #443 (response store).

Metadata

Metadata

Assignees

Type

No fields configured for Task.

Projects

Status
Next

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions