What
Add cached token fields to ResponseRecord and ConversationRecord so the rehydrate filter can skip re-tokenizing unchanged conversation history on each turn.
Context
When rehydrating multi-turn conversations, the full history must be in token form for the model. Without caching, every request re-tokenizes the entire conversation — even parts unchanged since the last turn. Storing the pre-tokenized representation avoids this redundant work, which grows linearly with conversation length.
Scope
- Add a token cache field to
ResponseRecord (e.g., cached_tokens: Option<Vec<u8>> or similar)
- Add a token cache field to
ConversationRecord
- Populate during response persistence (tokens are already available from the inference response)
- Load during rehydration, pass directly to inference instead of re-tokenizing
- Schema migration:
ALTER TABLE ADD COLUMN for the new fields
Deferred until rehydrate is built
The exact representation depends on the tokenization format (tiktoken encoding, provider-specific token IDs, etc.), which isn't known until the rehydrate filter and inference proxy are implemented. Adding the field prematurely risks choosing the wrong format.
Origin
Suggested in PR #491 review by @franciscojavierarceo.
Parent
Part of #429 (rehydrate filter), #443 (response store).
What
Add cached token fields to
ResponseRecordandConversationRecordso the rehydrate filter can skip re-tokenizing unchanged conversation history on each turn.Context
When rehydrating multi-turn conversations, the full history must be in token form for the model. Without caching, every request re-tokenizes the entire conversation — even parts unchanged since the last turn. Storing the pre-tokenized representation avoids this redundant work, which grows linearly with conversation length.
Scope
ResponseRecord(e.g.,cached_tokens: Option<Vec<u8>>or similar)ConversationRecordALTER TABLE ADD COLUMNfor the new fieldsDeferred until rehydrate is built
The exact representation depends on the tokenization format (tiktoken encoding, provider-specific token IDs, etc.), which isn't known until the rehydrate filter and inference proxy are implemented. Adding the field prematurely risks choosing the wrong format.
Origin
Suggested in PR #491 review by @franciscojavierarceo.
Parent
Part of #429 (rehydrate filter), #443 (response store).