✨(back) limit output token per message by maxenceh · Pull Request #458 · suitenumerique/conversations

maxenceh · 2026-05-07T15:45:00Z

Purpose

Adds a configurable output token limit per AI message to control LLM costs on very long responses. When the limit is reached, the backend stops generation and the frontend displays an error message to
the user.

Proposal

Backend

Adds a new setting to configure the token limit
Enforces the output token limit during AI response generation

Frontend

New component that renders the "response was truncated" message shown to the user
Integrates TruncatedResponseMessage into the message rendering

Demo

token_limit.mov

Summary by CodeRabbit

Release Notes

New Features
- Implemented output token limit per message; responses exceeding the limit are automatically truncated with user notification
- Added a new UI notification displayed when a response has been truncated due to reaching the maximum token limit
Documentation
- Added environment variable documentation for configuring the maximum output token limit

coderabbitai · 2026-05-07T15:45:50Z

Warning

Rate limit exceeded

@maxenceh has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1d7f7329-d065-48d3-99a5-b5e92a254e4b

📥 Commits

Reviewing files that changed from the base of the PR and between fbabc9b and b16bf5c.

📒 Files selected for processing (9)

CHANGELOG.md
docs/env.md
src/backend/chat/clients/pydantic_ai.py
src/backend/chat/tests/clients/pydantic_ai/test_output_token_limit.py
src/backend/chat/tests/test_ai_agent_service_co2.py
src/backend/conversations/settings.py
src/frontend/apps/conversations/src/features/chat/components/MessageItem.tsx
src/frontend/apps/conversations/src/features/chat/components/TruncatedResponseMessage.tsx
src/frontend/apps/conversations/src/features/chat/components/__tests__/MessageItem.test.tsx

Walkthrough

This PR implements output token limits for LLM-generated chat messages. The backend now enforces a configurable maximum tokens per response via Pydantic-AI ModelSettings, tracks truncation via finish reason hooks, and propagates truncation state through message annotations. The frontend detects these annotations and renders a localized UI notice when responses are truncated. Changes include configuration, backend implementation with comprehensive tests, frontend UI component with tests, and documentation updates.

Changes

Output Token Limit Feature

Layer / File(s)	Summary
Configuration and Documentation `src/backend/conversations/settings.py`, `docs/env.md`	Introduces `LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE` setting (default 8192) wired to environment variable and documents the setting's purpose and default value.
Backend Truncation Tracking and Token Limit Enforcement `src/backend/chat/clients/pydantic_ai.py`	Adds Pydantic-AI `Hooks` and `ModelSettings` imports, registers `after_model_request` hook to track finish reasons, passes `max_tokens` setting to agent iterations, and conditionally emits/appends truncation annotations when model returns `finish_reason="length"`.
Backend Truncation Tests `src/backend/chat/tests/clients/pydantic_ai/test_output_token_limit.py`, `src/backend/chat/tests/test_ai_agent_service_co2.py`	Validates setting existence and defaults, hook initialization and state reset, finish reason tracking for different scenarios, and streaming annotation emission on truncation.
Frontend Truncation UI Component and Integration `src/frontend/apps/conversations/src/features/chat/components/TruncatedResponseMessage.tsx`, `src/frontend/apps/conversations/src/features/chat/components/MessageItem.tsx`	Adds new `TruncatedResponseMessage` component with localized feedback, integrates detection and rendering of truncated messages in `MessageItem`, and updates memoization logic to re-render on annotation changes.
Frontend Truncation UI Tests `src/frontend/apps/conversations/src/features/chat/components/__tests__/MessageItem.test.tsx`	Tests conditional rendering of truncation UI based on presence of `truncated: true` in message annotations.
Changelog Entry `CHANGELOG.md`	Documents new output token limit feature in unreleased additions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

frontend, documentation

Suggested reviewers

qbey

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 79.17% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: implementing an output token limit per message, which is the core objective of this pull request affecting both backend (settings, AI agent service) and frontend (truncation UI component).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch maxenceh/limit-output-tokens

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/backend/chat/tests/clients/pydantic_ai/test_output_token_limit.py`:
- Around line 20-29: The fixture base_settings is forcing
LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE to 8192 for every test, making the "default is
8192" assertions tautological; remove or stop setting
settings.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE in the base_settings fixture and
instead set that value only in tests that need it (or create a separate fixture
for tests that require a non-default), and update the tests that assert the
default (the "default is 8192" test and the other assertions flagged) to rely on
the real unset default value so they can detect regressions; locate references
to settings.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE and the pytest fixture
base_settings to implement this change.

In `@src/backend/conversations/settings.py`:
- Around line 681-685: ConversationsSettings defines
LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE but doesn't validate it; add a fail-fast check
in the class's post_setup method that reads
self.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE and raises a ValueError (or logger +
sys.exit) if the value is <= 0, with a clear message identifying
LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE so startup fails early and prevents invalid
token limits from propagating into generation settings.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a3b26a1d-96d2-470e-9953-ddf14d477e3f

📥 Commits

Reviewing files that changed from the base of the PR and between 5ca4ef9 and fbabc9b.

📒 Files selected for processing (9)

CHANGELOG.md
docs/env.md
src/backend/chat/clients/pydantic_ai.py
src/backend/chat/tests/clients/pydantic_ai/test_output_token_limit.py
src/backend/chat/tests/test_ai_agent_service_co2.py
src/backend/conversations/settings.py
src/frontend/apps/conversations/src/features/chat/components/MessageItem.tsx
src/frontend/apps/conversations/src/features/chat/components/TruncatedResponseMessage.tsx
src/frontend/apps/conversations/src/features/chat/components/__tests__/MessageItem.test.tsx

coderabbitai · 2026-05-13T12:30:35Z

+    LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE = values.IntegerValue(
+        8192,
+        environ_name="LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE",
+        environ_prefix=None,
+    )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate token limit as strictly positive at startup.

A non-positive value here can flow into generation settings and break response generation at runtime. Add a fail-fast validation in post_setup.

Suggested fix

class Base(BraveSettings, Configuration): @@ `@classmethod` def post_setup(cls): @@ + if cls.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE <= 0: + raise ValueError( + "LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE must be > 0, " + f"got {cls.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE}." + ) + # Document context budget ratio must be a fraction (0 disables full inlining,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/conversations/settings.py` around lines 681 - 685, ConversationsSettings defines LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE but doesn't validate it; add a fail-fast check in the class's post_setup method that reads self.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE and raises a ValueError (or logger + sys.exit) if the value is <= 0, with a clear message identifying LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE so startup fails early and prevents invalid token limits from propagating into generation settings.

If limit is reached display a message to the user. It's meant to limit cost of very long messages, so the limit must be high.

sonarqubecloud · 2026-05-13T13:26:34Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

maxenceh force-pushed the maxenceh/limit-output-tokens branch 2 times, most recently from d8924a2 to fbabc9b Compare May 13, 2026 12:22

maxenceh marked this pull request as ready for review May 13, 2026 12:25

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

✨(back) add output token limit per message

b16bf5c

If limit is reached display a message to the user. It's meant to limit cost of very long messages, so the limit must be high.

maxenceh force-pushed the maxenceh/limit-output-tokens branch from fbabc9b to b16bf5c Compare May 13, 2026 13:25

maxenceh mentioned this pull request May 18, 2026

Limit total token output #464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨(back) limit output token per message#458

✨(back) limit output token per message#458
maxenceh wants to merge 1 commit into
mainfrom
maxenceh/limit-output-tokens

maxenceh commented May 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Rate limit exceeded

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot May 13, 2026

Uh oh!

sonarqubecloud Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maxenceh commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Proposal

Backend

Frontend

Demo

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 13, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maxenceh commented May 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading