✨(back) limit output token per message#458
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
WalkthroughThis PR implements output token limits for LLM-generated chat messages. The backend now enforces a configurable maximum tokens per response via Pydantic-AI ChangesOutput Token Limit Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
d8924a2 to
fbabc9b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/backend/chat/tests/clients/pydantic_ai/test_output_token_limit.py`:
- Around line 20-29: The fixture base_settings is forcing
LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE to 8192 for every test, making the "default is
8192" assertions tautological; remove or stop setting
settings.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE in the base_settings fixture and
instead set that value only in tests that need it (or create a separate fixture
for tests that require a non-default), and update the tests that assert the
default (the "default is 8192" test and the other assertions flagged) to rely on
the real unset default value so they can detect regressions; locate references
to settings.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE and the pytest fixture
base_settings to implement this change.
In `@src/backend/conversations/settings.py`:
- Around line 681-685: ConversationsSettings defines
LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE but doesn't validate it; add a fail-fast check
in the class's post_setup method that reads
self.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE and raises a ValueError (or logger +
sys.exit) if the value is <= 0, with a clear message identifying
LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE so startup fails early and prevents invalid
token limits from propagating into generation settings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a3b26a1d-96d2-470e-9953-ddf14d477e3f
📒 Files selected for processing (9)
CHANGELOG.mddocs/env.mdsrc/backend/chat/clients/pydantic_ai.pysrc/backend/chat/tests/clients/pydantic_ai/test_output_token_limit.pysrc/backend/chat/tests/test_ai_agent_service_co2.pysrc/backend/conversations/settings.pysrc/frontend/apps/conversations/src/features/chat/components/MessageItem.tsxsrc/frontend/apps/conversations/src/features/chat/components/TruncatedResponseMessage.tsxsrc/frontend/apps/conversations/src/features/chat/components/__tests__/MessageItem.test.tsx
| LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE = values.IntegerValue( | ||
| 8192, | ||
| environ_name="LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE", | ||
| environ_prefix=None, | ||
| ) |
There was a problem hiding this comment.
Validate token limit as strictly positive at startup.
A non-positive value here can flow into generation settings and break response generation at runtime. Add a fail-fast validation in post_setup.
Suggested fix
class Base(BraveSettings, Configuration):
@@
`@classmethod`
def post_setup(cls):
@@
+ if cls.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE <= 0:
+ raise ValueError(
+ "LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE must be > 0, "
+ f"got {cls.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE}."
+ )
+
# Document context budget ratio must be a fraction (0 disables full inlining,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/backend/conversations/settings.py` around lines 681 - 685,
ConversationsSettings defines LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE but doesn't
validate it; add a fail-fast check in the class's post_setup method that reads
self.LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE and raises a ValueError (or logger +
sys.exit) if the value is <= 0, with a clear message identifying
LLM_MAX_OUTPUT_TOKENS_PER_MESSAGE so startup fails early and prevents invalid
token limits from propagating into generation settings.
If limit is reached display a message to the user. It's meant to limit cost of very long messages, so the limit must be high.
fbabc9b to
b16bf5c
Compare
|



Purpose
Adds a configurable output token limit per AI message to control LLM costs on very long responses. When the limit is reached, the backend stops generation and the frontend displays an error message to
the user.
Proposal
Backend
Frontend
Demo
token_limit.mov
Summary by CodeRabbit
Release Notes
New Features
Documentation