-
Notifications
You must be signed in to change notification settings - Fork 45
Ait 129 ait docs release branch #3040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| meta_description: "Stream individual tokens from AI models into a single message over Ably." | ||
| --- | ||
|
|
||
| Token streaming with message-per-response is a pattern where every token generated by your model is appended to a single Ably message. Each complete AI response then appears as one message in the channel history while delivering live tokens in realtime. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Token streaming with message-per-response is a pattern where every token generated by your model is appended to a single Ably message. Each complete AI response then appears as one message in the channel history while delivering live tokens in realtime. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients. | |
| Token streaming with message-per-response is a pattern where every token generated by your model for a given response is appended to a single Ably message. Each complete AI response then appears as one message in the channel history while delivering live tokens in realtime. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients. |
|
|
||
| ## Enable appends <a id="enable"/> | ||
|
|
||
| Message append functionality requires the "Message annotations, updates, and deletes" [channel rule](/docs/channels#rules) enabled for your channel or [namespace](/docs/channels#namespaces). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the use of the term "rule" and "namespace" is correct. It's using the term "rule" to refer to a single configurable attribute of a namespace, whereas I think "rule" is the namespace definition (comprising the settings for all of the configurable attributes).
I think a more appropriate statement here would be:
Message append functionality requires "Message annotations, updates, and deletes" to be enabled in a channel rule associated with the channel.
| Message append functionality requires the "Message annotations, updates, and deletes" [channel rule](/docs/channels#rules) enabled for your channel or [namespace](/docs/channels#namespaces). | ||
|
|
||
| <Aside data-type="important"> | ||
| When the "Message updates and deletes" channel rule is enabled, messages are persisted regardless of whether or not persistence is enabled, in order to support the feature. This may increase your usage since [we charge for persisting messages](https://faqs.ably.com/how-does-ably-count-messages). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| When the "Message updates and deletes" channel rule is enabled, messages are persisted regardless of whether or not persistence is enabled, in order to support the feature. This may increase your usage since [we charge for persisting messages](https://faqs.ably.com/how-does-ably-count-messages). | |
| When the "Message updates and deletes" channel rule is enabled, messages are persisted irrespective of whether or not persistence has also been explicitly enabled. This will be reflected in increased usage since [we charge for persisting messages](https://faqs.ably.com/how-does-ably-count-messages). |
| 2. Navigate to the "Configuration" > "Rules" section from the left-hand navigation bar. | ||
| 3. Choose "Add new rule". | ||
| 4. Enter a channel name or namespace pattern (e.g. `ai:*` for all channels starting with `ai:`). | ||
| 5. Select the "Message annotations, updates, and deletes" rule from the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 5. Select the "Message annotations, updates, and deletes" rule from the list. | |
| 5. Select the "Message annotations, updates, and deletes" option from the list. |
| ``` | ||
| </Code> | ||
|
|
||
| When publishing tokens, don't await the `channel.appendMessage()` call. Ably rolls up acknowledgments and debounces them for efficiency, which means awaiting each append would unnecessarily slow down your token stream. Messages are still published in the order that `appendMessage()` is called, so delivery order is not affected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we suggest that clients check for the success or failure of the publish?
| </Code> | ||
|
|
||
| <Aside data-type="note"> | ||
| When appending tokens, include the `extras` with all headers to preserve them on the message. If you omit `extras` from an append operation, any existing headers will be removed. If you include `extras`, the headers completely replace any previous headers. This is the same [mixin behavior](/docs/messages/updates-deletes) used for message updates and deletes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| When appending tokens, include the `extras` with all headers to preserve them on the message. If you omit `extras` from an append operation, any existing headers will be removed. If you include `extras`, the headers completely replace any previous headers. This is the same [mixin behavior](/docs/messages/updates-deletes) used for message updates and deletes. | |
| When appending tokens, include the `extras` with all headers to preserve them on the message. If you omit `extras` from an append operation, any existing headers will be removed. If you include `extras`, the headers completely supersede any previous headers. This is the same [mixin behavior](/docs/messages/updates-deletes) used for message updates and deletes. |
| </Code> | ||
|
|
||
| <Aside data-type="note"> | ||
| Live messages may arrive via the subscription while you are still processing historical messages. Your application should handle this by queueing live messages and processing them only after all historical messages have been processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Live messages may arrive via the subscription while you are still processing historical messages. Your application should handle this by queueing live messages and processing them only after all historical messages have been processed. | |
| Live messages can arrive via the subscription while you are still processing historical messages. Your application should handle this by queueing live messages and processing them only after all historical messages have been processed. |
| meta_description: "Stream individual tokens from AI models as separate messages over Ably." | ||
| --- | ||
|
|
||
| Token streaming with message-per-token is a pattern where every token generated by your model is published as its own Ably message. Each token then appears as one message in the channel history. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Token streaming with message-per-token is a pattern where every token generated by your model is published as its own Ably message. Each token then appears as one message in the channel history. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients. | |
| Token streaming with message-per-token is a pattern where every token generated by your model is published as an independent Ably message. Each token then appears as one message in the channel history. This uses [Ably Pub/Sub](/docs/basics) for realtime communication between agents and clients. |
| This pattern is useful when clients only care about the most recent part of a response and you are happy to treat the channel history as a short sliding window rather than a full conversation log. For example: | ||
|
|
||
| - **Backend-stored responses**: The backend writes complete responses to a database and clients load those full responses from there, while Ably is used only to deliver live tokens for the current in-progress response. | ||
| - **Live transcription, captioning, or translation**: A viewer who joins a live stream only needs the last few tokens for the current "frame" of subtitles, not the entire transcript so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - **Live transcription, captioning, or translation**: A viewer who joins a live stream only needs the last few tokens for the current "frame" of subtitles, not the entire transcript so far. | |
| - **Live transcription, captioning, or translation**: A viewer who joins a live stream only needs sufficient tokens for the current "frame" of subtitles, not the entire transcript so far. |
|
|
||
| #### Subscribe to tokens | ||
|
|
||
| Use the `responseId` header in message extras to correlate tokens. The `responseId` allows you to group tokens belonging to the same response and correctly handle token delivery for multiple responses, even when delivered concurrently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Use the `responseId` header in message extras to correlate tokens. The `responseId` allows you to group tokens belonging to the same response and correctly handle token delivery for multiple responses, even when delivered concurrently. | |
| Use the `responseId` header in message extras to correlate tokens. The `responseId` allows you to group tokens belonging to the same response and correctly handle token delivery for distinct responses, even when delivered concurrently. |
Link to the pending `/ai-transport` overview page.
Add intro describing the pattern, its properties, and use cases.
Includes continuous token streams, correlating tokens for distinct responses, and explicit start/end events.
Splits each token streaming approach into distinct patterns and shows both the publish and subscribe side behaviour alongside one another.
Includes hydration with rewind and hydration with persisted history + untilAttach. Describes the pattern for handling in-progress live responses with complete responses loaded from the database.
Add doc explaining streaming tokens with appendMessage and update compaction allowing message-per-response history.
Unifies the token streaming nav for token streaming after rebase.
Refines the intro copy in message-per-response to have structural similarity with the message-per-token page.
Refine the Publishing section of the message-per-response docs. - Include anchor tags on title - Describe the `serial` identifier - Align with stream pattern used in message-per-token docs - Remove duplicate example
Refine the Subscribing section of the message-per-response docs. - Add anchor tag to heading - Describes each action upfront - Uses RANDOM_CHANNEL_NAME
Refine the rewind section of the message-per-response docs. - Include description of allowed rewind paameters - Tweak copy
Refines the history section for the message-per-response docs. - Adds anchor to heading - Uses RANDOM_CHANNEL_NAME - Use message serial in code snippet instead of ID - Tweaks copy
Fix the hydration of in progress responses via rewind by using the responseId in the extras to correlate messages with completed responses loaded from the database.
Fix the hydration of in progress responses using history by obtaining the timestamp of the last completed response loaded from the database and paginating history forwards from that point.
Removes the headers/metadata section, as this covers the specific semantics of extras.headers handling with appends, which is better addressed by the (upcoming) message append pub/sub docs. Instead, a callout is used to describe header mixin semantics in the appropriate place insofar as it relates to the discussion at hand.
Update the token streaming with message per token docs to include a callout describing resume behaviour in case of transient disconnection.
Fix the message per token docs headers to include anchors and align with naming in the message per response page.
400eb09 to
f8056cb
Compare
Description
AIT DOCS INTEGRATION BRANCH
Not (yet) intended to merge but opening to create review apps
Checklist