Skip to content

Conversation

@mschristensen
Copy link
Contributor

Description

AIT DOCS INTEGRATION BRANCH
Not (yet) intended to merge but opening to create review apps

Checklist

@coderabbitai
Copy link

coderabbitai bot commented Dec 17, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch AIT-129-AIT-Docs-release-branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mschristensen mschristensen added the review-app Create a Heroku review app label Dec 17, 2025
meta_description: "Stream individual tokens from AI models into a single message over Ably."
---

Token streaming with message-per-response is a pattern where every token generated by your model is appended to a single Ably message. Each complete AI response then appears as one message in the channel history while delivering live tokens in realtime. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Token streaming with message-per-response is a pattern where every token generated by your model is appended to a single Ably message. Each complete AI response then appears as one message in the channel history while delivering live tokens in realtime. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients.
Token streaming with message-per-response is a pattern where every token generated by your model for a given response is appended to a single Ably message. Each complete AI response then appears as one message in the channel history while delivering live tokens in realtime. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients.


## Enable appends <a id="enable"/>

Message append functionality requires the "Message annotations, updates, and deletes" [channel rule](/docs/channels#rules) enabled for your channel or [namespace](/docs/channels#namespaces).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the use of the term "rule" and "namespace" is correct. It's using the term "rule" to refer to a single configurable attribute of a namespace, whereas I think "rule" is the namespace definition (comprising the settings for all of the configurable attributes).
I think a more appropriate statement here would be:

Message append functionality requires "Message annotations, updates, and deletes" to be enabled in a channel rule associated with the channel.

Message append functionality requires the "Message annotations, updates, and deletes" [channel rule](/docs/channels#rules) enabled for your channel or [namespace](/docs/channels#namespaces).

<Aside data-type="important">
When the "Message updates and deletes" channel rule is enabled, messages are persisted regardless of whether or not persistence is enabled, in order to support the feature. This may increase your usage since [we charge for persisting messages](https://faqs.ably.com/how-does-ably-count-messages).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the "Message updates and deletes" channel rule is enabled, messages are persisted regardless of whether or not persistence is enabled, in order to support the feature. This may increase your usage since [we charge for persisting messages](https://faqs.ably.com/how-does-ably-count-messages).
When the "Message updates and deletes" channel rule is enabled, messages are persisted irrespective of whether or not persistence has also been explicitly enabled. This will be reflected in increased usage since [we charge for persisting messages](https://faqs.ably.com/how-does-ably-count-messages).

2. Navigate to the "Configuration" > "Rules" section from the left-hand navigation bar.
3. Choose "Add new rule".
4. Enter a channel name or namespace pattern (e.g. `ai:*` for all channels starting with `ai:`).
5. Select the "Message annotations, updates, and deletes" rule from the list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Select the "Message annotations, updates, and deletes" rule from the list.
5. Select the "Message annotations, updates, and deletes" option from the list.

```
</Code>

When publishing tokens, don't await the `channel.appendMessage()` call. Ably rolls up acknowledgments and debounces them for efficiency, which means awaiting each append would unnecessarily slow down your token stream. Messages are still published in the order that `appendMessage()` is called, so delivery order is not affected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we suggest that clients check for the success or failure of the publish?

</Code>

<Aside data-type="note">
When appending tokens, include the `extras` with all headers to preserve them on the message. If you omit `extras` from an append operation, any existing headers will be removed. If you include `extras`, the headers completely replace any previous headers. This is the same [mixin behavior](/docs/messages/updates-deletes) used for message updates and deletes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When appending tokens, include the `extras` with all headers to preserve them on the message. If you omit `extras` from an append operation, any existing headers will be removed. If you include `extras`, the headers completely replace any previous headers. This is the same [mixin behavior](/docs/messages/updates-deletes) used for message updates and deletes.
When appending tokens, include the `extras` with all headers to preserve them on the message. If you omit `extras` from an append operation, any existing headers will be removed. If you include `extras`, the headers completely supersede any previous headers. This is the same [mixin behavior](/docs/messages/updates-deletes) used for message updates and deletes.

</Code>

<Aside data-type="note">
Live messages may arrive via the subscription while you are still processing historical messages. Your application should handle this by queueing live messages and processing them only after all historical messages have been processed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Live messages may arrive via the subscription while you are still processing historical messages. Your application should handle this by queueing live messages and processing them only after all historical messages have been processed.
Live messages can arrive via the subscription while you are still processing historical messages. Your application should handle this by queueing live messages and processing them only after all historical messages have been processed.

meta_description: "Stream individual tokens from AI models as separate messages over Ably."
---

Token streaming with message-per-token is a pattern where every token generated by your model is published as its own Ably message. Each token then appears as one message in the channel history. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Token streaming with message-per-token is a pattern where every token generated by your model is published as its own Ably message. Each token then appears as one message in the channel history. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients.
Token streaming with message-per-token is a pattern where every token generated by your model is published as an independent Ably message. Each token then appears as one message in the channel history. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients.

This pattern is useful when clients only care about the most recent part of a response and you are happy to treat the channel history as a short sliding window rather than a full conversation log. For example:

- **Backend-stored responses**: The backend writes complete responses to a database and clients load those full responses from there, while Ably is used only to deliver live tokens for the current in-progress response.
- **Live transcription, captioning, or translation**: A viewer who joins a live stream only needs the last few tokens for the current "frame" of subtitles, not the entire transcript so far.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Live transcription, captioning, or translation**: A viewer who joins a live stream only needs the last few tokens for the current "frame" of subtitles, not the entire transcript so far.
- **Live transcription, captioning, or translation**: A viewer who joins a live stream only needs sufficient tokens for the current "frame" of subtitles, not the entire transcript so far.


#### Subscribe to tokens

Use the `responseId` header in message extras to correlate tokens. The `responseId` allows you to group tokens belonging to the same response and correctly handle token delivery for multiple responses, even when delivered concurrently.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use the `responseId` header in message extras to correlate tokens. The `responseId` allows you to group tokens belonging to the same response and correctly handle token delivery for multiple responses, even when delivered concurrently.
Use the `responseId` header in message extras to correlate tokens. The `responseId` allows you to group tokens belonging to the same response and correctly handle token delivery for distinct responses, even when delivered concurrently.

GregHolmes and others added 21 commits December 23, 2025 10:41
Link to the pending `/ai-transport` overview page.
Add intro describing the pattern, its properties, and use cases.
Includes continuous token streams, correlating tokens for distinct
responses, and explicit start/end events.
Splits each token streaming approach into distinct patterns and shows
both the publish and subscribe side behaviour alongside one another.
Includes hydration with rewind and hydration with persisted history +
untilAttach. Describes the pattern for handling in-progress live
responses with complete responses loaded from the database.
Add doc explaining streaming tokens with appendMessage and update
compaction allowing message-per-response history.
Unifies the token streaming nav for token streaming after rebase.
Refines the intro copy in message-per-response to have structural
similarity with the message-per-token page.
Refine the Publishing section of the message-per-response docs.

- Include anchor tags on title
- Describe the `serial` identifier
- Align with stream pattern used in message-per-token docs
- Remove duplicate example
Refine the Subscribing section of the message-per-response docs.

- Add anchor tag to heading
- Describes each action upfront
- Uses RANDOM_CHANNEL_NAME
Refine the rewind section of the message-per-response docs.

- Include description of allowed rewind paameters
- Tweak copy
Refines the history section for the message-per-response docs.

- Adds anchor to heading
- Uses RANDOM_CHANNEL_NAME
- Use message serial in code snippet instead of ID
- Tweaks copy
Fix the hydration of in progress responses via rewind by using the responseId in the extras to correlate messages with completed responses loaded from the database.
Fix the hydration of in progress responses using history by obtaining
the timestamp of the last completed response loaded from the database
and paginating history forwards from that point.
Removes the headers/metadata section, as this covers the specific
semantics of extras.headers handling with appends, which is better
addressed by the (upcoming) message append pub/sub docs. Instead, a
callout is used to describe header mixin semantics in the appropriate
place insofar as it relates to the discussion at hand.
Update the token streaming with message per token docs to include a
callout describing resume behaviour in case of transient disconnection.
Fix the message per token docs headers to include anchors and align with
naming in the message per response page.
@matt423 matt423 force-pushed the AIT-129-AIT-Docs-release-branch branch from 400eb09 to f8056cb Compare December 23, 2025 10:41
@matt423 matt423 added review-app Create a Heroku review app and removed review-app Create a Heroku review app labels Dec 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-app Create a Heroku review app

Development

Successfully merging this pull request may close these issues.

7 participants