✨(tools) add basic document translate tool#302
Conversation
WalkthroughAdds a translation feature: a TranslationAgent, a document_translate tool, pydantic AI client wiring, tests and fixtures, a TRANSLATION_MAX_CHARS setting, utility to read documents, and frontend UI/i18n updates for translation progress. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Frontend/User
participant Service as AIAgentService
participant Tool as document_translate
participant Agent as TranslationAgent
participant Storage as Document Storage
Client->>Service: Request translation (via tool invocation)
Service->>Tool: invoke translate(target_language, instructions)
Tool->>Storage: Get latest text attachment metadata
Storage-->>Tool: Attachment metadata
Tool->>Storage: Read document content
Storage-->>Tool: Document text
Tool->>Agent: Execute translation(prompt with content)
Agent-->>Tool: Translated text
Tool->>Service: Return ToolReturn(translated_text, metadata)
Service-->>Client: Display translation result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This pull request adds a dedicated document translation tool to fix issue #301, which previously caused translation requests to be handled by the summary tool, resulting in translated summaries rather than full translations.
Changes:
- Implements a new
translatetool that translates the last uploaded text document into a specified target language, with a configurable size limit (100k characters by default) - Adds frontend support for displaying translation progress with appropriate loading messages
- Extends test coverage with comprehensive test cases for the new translation functionality
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/backend/chat/tools/document_translate.py | New tool implementation that translates the last uploaded text document with size validation |
| src/backend/chat/agents/translate.py | New TranslationAgent class following the BaseAgent pattern used by SummarizationAgent |
| src/backend/conversations/settings.py | Adds TRANSLATION_MAX_CHARS configuration setting (default 100k characters) |
| src/backend/chat/clients/pydantic_ai.py | Registers the translate tool and adds system prompt for translation behavior |
| src/backend/chat/tests/tools/test_document_translate.py | Comprehensive test suite covering success cases, error handling, and edge cases |
| src/backend/chat/tests/conftest.py | Adds mock_translation_agent fixture following the pattern of mock_summarization_agent |
| src/frontend/apps/conversations/src/i18n/translations.json | Adds translation strings for "Extracting document" (singular) and "Translation in progress" in all supported languages |
| src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx | Updates to show singular/plural document extraction messages |
| src/frontend/apps/conversations/src/features/chat/components/MessageItem.tsx | Adds display logic for translation progress indicator |
| CHANGELOG.md | Documents the new feature in the Unreleased section |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
src/backend/chat/agents/translate.py (2)
19-20:TranslationAgentsilently reusesLLM_SUMMARIZATION_MODEL_HRID.The translation agent borrows the summarization model HRID with no comment explaining the intentional coupling. This prevents independent per-role model configuration and makes the setting name misleading when operators try to tune translation separately.
Consider introducing a dedicated
LLM_TRANSLATION_MODEL_HRIDsetting (falling back to the summarization HRID as a default), or at least add an inline comment documenting the deliberate reuse.♻️ Suggested approach (new setting in settings.py + usage here)
In
src/backend/conversations/settings.py:LLM_TRANSLATION_MODEL_HRID = values.Value( None, # falls back to LLM_SUMMARIZATION_MODEL_HRID when unset environ_name="LLM_TRANSLATION_MODEL_HRID", environ_prefix=None, )In
translate.py:- model_hrid=settings.LLM_SUMMARIZATION_MODEL_HRID, + model_hrid=settings.LLM_TRANSLATION_MODEL_HRID or settings.LLM_SUMMARIZATION_MODEL_HRID,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/agents/translate.py` around lines 19 - 20, TranslationAgent is currently hard-coded to reuse settings.LLM_SUMMARIZATION_MODEL_HRID which prevents separate configuration; update TranslationAgent to read a dedicated setting (settings.LLM_TRANSLATION_MODEL_HRID) falling back to settings.LLM_SUMMARIZATION_MODEL_HRID when unset, or at minimum add an inline comment above the super().__init__ call explaining the intentional reuse; reference the TranslationAgent class and the model_hrid parameter in the super().__init__ call to implement the fallback or add the explanatory comment.
4-10: Unusedloggerinstance.
loggingis imported and aloggerinstance is created at line 10, but no log statements exist anywhere in the file. Remove both or add at minimum a debug log in__init__.🧹 Proposed fix
-import logging - from django.conf import settings from .base import BaseAgent - -logger = logging.getLogger(__name__)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/agents/translate.py` around lines 4 - 10, The module defines an unused logger (import logging and logger = logging.getLogger(__name__)) in translate.py; either remove the logging import and the module-level logger variable, or keep them and add a minimal log statement (e.g., logger.debug) inside the agent initializer (the Translate agent class that subclasses BaseAgent) to justify the logger's presence; update imports and the class __init__ accordingly so there are no unused symbols.src/backend/chat/clients/pydantic_ai.py (1)
658-679:translation_system_promptandsummarization_system_promptshare near-identical wording.Both instruct the model to return results verbatim without modification. The duplication is harmless but could be consolidated into a single reusable instruction if more tool-specific verbatim prompts are added in the future.
♻️ Optional consolidation
- `@self.conversation_agent.instructions` - def summarization_system_prompt() -> str: - return ( - "When you receive a result from the summarization tool, you MUST return it " - "directly to the user without any modification, paraphrasing, or additional " - "summarization." - "The tool already produces optimized summaries that should be presented " - "verbatim." - "You may translate the summary if required, but you MUST preserve all the " - "information from the original summary." - "You may add a follow-up question after the summary if needed." - ) - - `@self.conversation_agent.instructions` - def translation_system_prompt() -> str: - return ( - "When you receive a result from the translation tool, you MUST return it " - "directly to the user without any modification, paraphrasing, or additional " - "summarization. " - "The tool already produces complete translations that should be presented " - "verbatim." - ) + `@self.conversation_agent.instructions` + def verbatim_tool_results_prompt() -> str: + return ( + "When you receive a result from the summarization or translation tool, " + "you MUST return it directly to the user without any modification, " + "paraphrasing, or additional summarization. " + "The tool already produces optimized output that should be presented verbatim. " + "For summaries, you may translate if required but MUST preserve all information. " + "You may add a follow-up question after the result if needed." + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/clients/pydantic_ai.py` around lines 658 - 679, Both summarization_system_prompt and translation_system_prompt register near-identical instructions via self.conversation_agent.instructions; consolidate into a single reusable function (e.g., verbatim_tool_prompt or create_tool_prompt) that returns the shared verbatim-return policy and call that from both translation_system_prompt and summarization_system_prompt (or replace both with direct references to the unified instruction) so future tool-specific tweaks are easier to manage while preserving current behavior.src/backend/chat/tests/tools/test_document_translate.py (1)
51-59: Consider addingcontent_typefilter verification to the mock queryset.
_mock_attachments_querysetstubs.filter().order_by().afirst()but doesn't verify that.filter()was called withcontent_type__startswith="text/". Adding an assertion (likemock_conversation.attachments.filter.assert_called_once_with(content_type__startswith="text/")) in at least one test would guard against filter regressions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tests/tools/test_document_translate.py` around lines 51 - 59, Update the tests that use _mock_attachments_queryset to assert the .filter call includes the content_type prefix: when you create the mock via _mock_attachments_queryset(attachment), ensure the test asserts mock_conversation.attachments.filter.assert_called_once_with(content_type__startswith="text/") (or equivalent) after the code under test runs; this will validate that the mocked .filter was invoked with content_type__startswith="text/" and prevent regressions in how attachments are filtered (reference _mock_attachments_queryset, mock_attachments.filter, and mock_conversation.attachments.filter in the assertion).src/backend/chat/tools/document_translate.py (1)
104-111: Unnecessaryf-string prefixes on lines without interpolation.Lines 105, 107, and 110 use
f"..."but contain no{…}placeholders. This is a minor readability nit — plain strings suffice.Proposed fix
translate_prompt = ( - f"You are an agent specializing in text translation. " + "You are an agent specializing in text translation. " f"Translate the following document to {target_language}. " - f"Preserve all markdown formatting exactly as-is. " + "Preserve all markdown formatting exactly as-is. " f"{instructions_hint}\n\n" f"'''\n{content}\n'''\n\n" - f"Respond directly with the translated text only, no commentary." + "Respond directly with the translated text only, no commentary." )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/document_translate.py` around lines 104 - 111, The translate_prompt construction uses unnecessary f-strings on literal lines; update the translate_prompt assignment in document_translate.py (the translate_prompt variable creation inside the document_translate function) by removing the f-prefix from string fragments that have no interpolation (the lines that currently read f"Translate the following document to {target_language}." may keep the f when needed, but the literal fragments like f"Preserve all markdown formatting exactly as-is. ", f"{instructions_hint}\n\n" if instructions_hint is already interpolated keep its f, and f"'''\n{content}\n'''\n\n" should remain f only where {content} is used) — in short, change any f"..." fragments with no {…} placeholders to plain "..." to improve readability while keeping f-strings where placeholders like {target_language}, {instructions_hint}, or {content} are used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/chat/tests/tools/test_document_translate.py`:
- Around line 67-96: The test fails because document_translate awaits the
synchronous helper read_document_content; update document_translate to call
read_document_content without await (or alternatively make read_document_content
async and update its callers), specifically fix the await usage around
read_document_content in the document_translate flow so the call is synchronous
(remove the await and handle its return directly) or convert
read_document_content to an async def and await it consistently across
document_translate and any callers.
In `@src/backend/chat/tools/document_translate.py`:
- Line 9: Update the ToolReturn import to use the top-level package: replace the
import statement that references ToolReturn from pydantic_ai.messages with an
import from pydantic_ai (i.e., change "from pydantic_ai.messages import
ToolReturn" to "from pydantic_ai import ToolReturn"). Apply the same change in
the other affected modules—document_search_rag.py, document_summarize.py,
document_generic_search_rag.py, and web_search_brave.py—so any references to
ToolReturn come from pydantic_ai instead of pydantic_ai.messages.
In
`@src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx`:
- Around line 51-57: The current render in ToolInvocationItem.tsx uses
documentIdentifiers.length === 1 to pick singular vs plural but doesn't guard
length === 0, producing "Extracting documents: ..." for an empty array; update
the JSX to first check if documentIdentifiers.length === 0 and render a clear
fallback (e.g., t('No documents to extract') or another appropriate translation
key), otherwise keep the existing singular (documentIdentifiers[0]) and plural
(documentIdentifiers.join(', ')) branches; ensure you reference the
documentIdentifiers variable and update the localized strings accordingly.
---
Nitpick comments:
In `@src/backend/chat/agents/translate.py`:
- Around line 19-20: TranslationAgent is currently hard-coded to reuse
settings.LLM_SUMMARIZATION_MODEL_HRID which prevents separate configuration;
update TranslationAgent to read a dedicated setting
(settings.LLM_TRANSLATION_MODEL_HRID) falling back to
settings.LLM_SUMMARIZATION_MODEL_HRID when unset, or at minimum add an inline
comment above the super().__init__ call explaining the intentional reuse;
reference the TranslationAgent class and the model_hrid parameter in the
super().__init__ call to implement the fallback or add the explanatory comment.
- Around line 4-10: The module defines an unused logger (import logging and
logger = logging.getLogger(__name__)) in translate.py; either remove the logging
import and the module-level logger variable, or keep them and add a minimal log
statement (e.g., logger.debug) inside the agent initializer (the Translate agent
class that subclasses BaseAgent) to justify the logger's presence; update
imports and the class __init__ accordingly so there are no unused symbols.
In `@src/backend/chat/clients/pydantic_ai.py`:
- Around line 658-679: Both summarization_system_prompt and
translation_system_prompt register near-identical instructions via
self.conversation_agent.instructions; consolidate into a single reusable
function (e.g., verbatim_tool_prompt or create_tool_prompt) that returns the
shared verbatim-return policy and call that from both translation_system_prompt
and summarization_system_prompt (or replace both with direct references to the
unified instruction) so future tool-specific tweaks are easier to manage while
preserving current behavior.
In `@src/backend/chat/tests/tools/test_document_translate.py`:
- Around line 51-59: Update the tests that use _mock_attachments_queryset to
assert the .filter call includes the content_type prefix: when you create the
mock via _mock_attachments_queryset(attachment), ensure the test asserts
mock_conversation.attachments.filter.assert_called_once_with(content_type__startswith="text/")
(or equivalent) after the code under test runs; this will validate that the
mocked .filter was invoked with content_type__startswith="text/" and prevent
regressions in how attachments are filtered (reference
_mock_attachments_queryset, mock_attachments.filter, and
mock_conversation.attachments.filter in the assertion).
In `@src/backend/chat/tools/document_translate.py`:
- Around line 104-111: The translate_prompt construction uses unnecessary
f-strings on literal lines; update the translate_prompt assignment in
document_translate.py (the translate_prompt variable creation inside the
document_translate function) by removing the f-prefix from string fragments that
have no interpolation (the lines that currently read f"Translate the following
document to {target_language}." may keep the f when needed, but the literal
fragments like f"Preserve all markdown formatting exactly as-is. ",
f"{instructions_hint}\n\n" if instructions_hint is already interpolated keep its
f, and f"'''\n{content}\n'''\n\n" should remain f only where {content} is used)
— in short, change any f"..." fragments with no {…} placeholders to plain "..."
to improve readability while keeping f-strings where placeholders like
{target_language}, {instructions_hint}, or {content} are used.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/backend/chat/tools/document_translate.py (1)
9-9:⚠️ Potential issue | 🔴 Critical
ToolReturnmust be imported frompydantic_ai, notpydantic_ai.messages.
from pydantic_ai import ToolReturnis the correct import path per the pydantic-ai documentation.🐛 Proposed fix
-from pydantic_ai.messages import ToolReturn +from pydantic_ai import ToolReturn🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/document_translate.py` at line 9, The file imports ToolReturn from the wrong module; change the import to use the top-level package by replacing "from pydantic_ai.messages import ToolReturn" with "from pydantic_ai import ToolReturn" so the symbol ToolReturn is resolved correctly (look for the import line near the top of src/backend/chat/tools/document_translate.py and update the import statement that currently references pydantic_ai.messages).
🧹 Nitpick comments (1)
src/backend/chat/tools/document_translate.py (1)
12-12:read_document_contentimported from a peer tool module — move it to a shared utility.
read_document_contentis a generic document-reading helper, not summarization logic. Importing it fromchat.tools.document_summarizecreates a coupling between two independent tool modules. Moving it tochat.tools.utils(or a dedicatedchat.tools.document_utils) would give both tools a clean, stable import.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/document_translate.py` at line 12, The import of read_document_content from chat.tools.document_summarize creates an unnecessary coupling; extract the read_document_content function into a shared utility module (e.g., chat.tools.utils or chat.tools.document_utils), update the implementation there, then change imports in chat.tools.document_translate (and keep chat.tools.document_summarize) to import read_document_content from the new module; ensure any internal references to read_document_content are updated and run tests/linters to verify no remaining imports from chat.tools.document_summarize.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/chat/tests/tools/test_document_translate.py`:
- Around line 18-38: The fixture fixture_translation_agent_config sets
settings.LLM_CONFIGURATIONS keyed to settings.LLM_DEFAULT_MODEL_HRID but
TranslationAgent.__init__ passes
model_hrid=settings.LLM_SUMMARIZATION_MODEL_HRID and BaseAgent.__init__
immediately looks up settings.LLM_CONFIGURATIONS[model_hrid], causing
ImproperlyConfigured; to fix, change the fixture to register the LLModel under
settings.LLM_SUMMARIZATION_MODEL_HRID (or add an entry for that HRID) so that
settings.LLM_CONFIGURATIONS contains a mapping for the key TranslationAgent
expects before instantiation.
In `@src/backend/chat/tools/document_translate.py`:
- Around line 79-91: The early byte-size check using last_attachment.size
against settings.TRANSLATION_MAX_CHARS is producing false positives for
multi-byte UTF-8 content; update the logic in document_translate.py by either
removing this early check entirely (rely on the existing content-length check
later) or change the threshold to a conservative bytes threshold of
settings.TRANSLATION_MAX_CHARS * 4 before raising ModelCannotRetry so you only
reject files that cannot possibly be within the char limit; ensure the
ModelCannotRetry message remains unchanged.
In `@src/backend/conversations/configuration/llm/default.json`:
- Line 40: The change of default provider "kind" from "openai" to "mistral" in
src/backend/conversations/configuration/llm/default.json breaks deployments;
either revert "kind" back to "openai" so default-provider, default-model and
default-summarization-model continue to work with OpenAI-compatible backends, or
if the switch to "mistral" is intentional update the schema default (schema
Literal default for kind), add a CHANGELOG entry and migration guidance
explaining required env changes (e.g., setting AI_BASE_URL to a Mistral
endpoint) and update any docs referencing
default-provider/default-model/default-summarization-model so users are aware
and deployments won’t fail at runtime.
---
Duplicate comments:
In `@src/backend/chat/tools/document_translate.py`:
- Line 9: The file imports ToolReturn from the wrong module; change the import
to use the top-level package by replacing "from pydantic_ai.messages import
ToolReturn" with "from pydantic_ai import ToolReturn" so the symbol ToolReturn
is resolved correctly (look for the import line near the top of
src/backend/chat/tools/document_translate.py and update the import statement
that currently references pydantic_ai.messages).
---
Nitpick comments:
In `@src/backend/chat/tools/document_translate.py`:
- Line 12: The import of read_document_content from
chat.tools.document_summarize creates an unnecessary coupling; extract the
read_document_content function into a shared utility module (e.g.,
chat.tools.utils or chat.tools.document_utils), update the implementation there,
then change imports in chat.tools.document_translate (and keep
chat.tools.document_summarize) to import read_document_content from the new
module; ensure any internal references to read_document_content are updated and
run tests/linters to verify no remaining imports from
chat.tools.document_summarize.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/backend/chat/tests/tools/test_document_translate.pysrc/backend/chat/tools/document_translate.pysrc/backend/conversations/configuration/llm/default.jsonsrc/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
- src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/backend/chat/tools/document_translate.py (1)
9-9:⚠️ Potential issue | 🔴 Critical
ToolReturnmust be imported frompydantic_ai, notpydantic_ai.messages.
ToolReturnis a top-level export — the pydantic-ai docs consistently showfrom pydantic_ai import ToolReturn.pydantic_ai.messagesdoes not exportToolReturn; this will raise anImportErrorat startup.🐛 Proposed fix
-from pydantic_ai.messages import ToolReturn +from pydantic_ai import ToolReturn🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/document_translate.py` at line 9, The import for ToolReturn in document_translate.py is wrong; replace the line importing ToolReturn from pydantic_ai.messages with an import from the top-level pydantic_ai package (i.e., import ToolReturn from pydantic_ai) so that the symbol ToolReturn is resolved at startup; update the import statement that references ToolReturn to use pydantic_ai instead of pydantic_ai.messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/chat/tools/document_translate.py`:
- Around line 59-61: The code builds instructions_hint using
instructions.strip() but tests truthiness on the original instructions, which
makes whitespace-only strings produce a dangling "Follow these instructions: "
hint; change the guard to use a stripped value (e.g., stripped =
instructions.strip() and then instructions_hint = f"Follow these instructions:
{stripped}" if stripped else "") so only non-empty, non-whitespace instructions
populate instructions_hint; update uses of instructions_hint accordingly.
- Around line 102-109: The prompt in document_translate.py uses a fixed
triple-quote delimiter (translate_prompt) which is vulnerable if the uploaded
document contains the same sequence; replace the fixed delimiter with a strong,
unpredictable boundary (e.g., generate a UUID or random token in the translation
function), insert that token into the prompt template and wrap the content with
the same start/end boundary strings so the LLM can never confuse document
content with instructions; update references to translate_prompt to include the
generated boundary variable and ensure you validate/escape nothing else (or
assert the token does not appear in content), and include the boundary value in
the prompt metadata (e.g., "BOUNDARY: <token>") so both sides use the same
unique delimiter.
- Line 79: The call site is awaiting read_document_content but that function is
synchronous (read_document_content), causing a TypeError; fix by making
read_document_content an async function or by removing await and wrapping the
synchronous call with sync_to_async at the call sites (document_translate.py and
document_summarize.py). Specifically, either convert the definition of
read_document_content to async def (and use an async file reader like aiofiles
or run the blocking read in an executor) so await read_document_content(...) is
valid, or keep it synchronous and replace await read_document_content(...) with
await sync_to_async(read_document_content)(...) or simply call
read_document_content(...) without await and adjust callers accordingly (update
both the call in document_translate.py and the other call in
document_summarize.py).
---
Duplicate comments:
In `@src/backend/chat/tools/document_translate.py`:
- Line 9: The import for ToolReturn in document_translate.py is wrong; replace
the line importing ToolReturn from pydantic_ai.messages with an import from the
top-level pydantic_ai package (i.e., import ToolReturn from pydantic_ai) so that
the symbol ToolReturn is resolved at startup; update the import statement that
references ToolReturn to use pydantic_ai instead of pydantic_ai.messages.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/backend/chat/agents/translate.pysrc/backend/chat/tools/document_translate.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/backend/chat/agents/translate.py
| "You must explain this to the user and ask them to provide documents." | ||
| ) | ||
|
|
||
| doc_name, content = await read_document_content(last_attachment) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all definitions of read_document_content
echo "=== Definitions of read_document_content ==="
rg -n "def read_document_content|async def read_document_content" --type=py -A 3
echo -e "\n=== All usages with await ==="
rg -n "await read_document_content" --type=py
echo -e "\n=== File location check ==="
fd read_document_contentRepository: suitenumerique/conversations
Length of output: 799
await-ing a synchronous function will raise TypeError at runtime.
The function read_document_content (line 23 in src/backend/chat/tools/document_summarize.py) is defined as a plain def, not async def, yet its docstring claims it reads content "asynchronously." The function returns a tuple directly: (doc.file_name, f.read().decode("utf-8")).
Awaiting a non-coroutine raises TypeError: object tuple can't be used in 'await' expression. This occurs at line 79 here and also at line 94 in document_summarize.py.
Convert read_document_content to async def, or if it must remain synchronous, wrap the calls with sync_to_async() and remove the await keyword.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/chat/tools/document_translate.py` at line 79, The call site is
awaiting read_document_content but that function is synchronous
(read_document_content), causing a TypeError; fix by making
read_document_content an async function or by removing await and wrapping the
synchronous call with sync_to_async at the call sites (document_translate.py and
document_summarize.py). Specifically, either convert the definition of
read_document_content to async def (and use an async file reader like aiofiles
or run the blocking read in an executor) so await read_document_content(...) is
valid, or keep it synchronous and replace await read_document_content(...) with
await sync_to_async(read_document_content)(...) or simply call
read_document_content(...) without await and adjust callers accordingly (update
both the call in document_translate.py and the other call in
document_summarize.py).
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
src/backend/chat/tools/document_translate.py (4)
9-9:ToolReturnis imported from the wrong module.
from pydantic_ai.messages import ToolReturnshould befrom pydantic_ai import ToolReturn.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/document_translate.py` at line 9, The import for ToolReturn is from the wrong module; replace the current import statement "from pydantic_ai.messages import ToolReturn" with "from pydantic_ai import ToolReturn" so the ToolReturn symbol is imported from the top-level pydantic_ai package rather than pydantic_ai.messages; update the import in document_translate.py where ToolReturn is referenced.
58-60: Whitespace-onlyinstructionsinjects a dangling hint into the prompt.
if instructionsis truthy for" ", butinstructions.strip()is"", producing"Follow these instructions: "with no trailing content.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/document_translate.py` around lines 58 - 60, The prompt builds instructions_hint even when instructions contains only whitespace; update the logic around the instructions and instructions_hint variables so you check the stripped value instead of the raw string (e.g., compute stripped_instructions = instructions.strip() and only set instructions_hint when stripped_instructions is non-empty), and use stripped_instructions in the formatted hint to avoid producing a dangling "Follow these instructions: " fragment.
101-108: Triple-quote delimiter is breakable by document content (prompt injection).If the uploaded document contains the literal sequence
\n'''\n, the LLM cannot reliably distinguish content from post-document instructions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/document_translate.py` around lines 101 - 108, The current translate_prompt in document_translate.py uses the fixed triple-quote delimiter (''' ) which can be injected if the uploaded content contains that sequence; change the boundary strategy so the LLM can never confuse content with prompt: generate a unique, hard-to-guess delimiter (e.g., include a UUID or random nonce) and use that delimiter both around the {content} and in the surrounding instructions, or alternatively prefix the document with its exact byte/character length and instruct the model to read exactly that many characters; update the translate_prompt construction (referencing translate_prompt, content, and instructions_hint) to use the chosen safe delimiter or length-prefix method consistently.
82-90:⚠️ Potential issue | 🟡 MinorContradictory: error message passes exact counts to the LLM then tells it not to share them.
Lines 83-84 embed
{len(content):,}and{max_chars:,}in theModelCannotRetrymessage, immediately followed by"without providing numerical details". This is self-contradictory; the LLM may still echo the numbers to the user, undermining the intent.If the goal is to hide implementation details, simply omit the numbers from the message:
🛡️ Proposed fix
- raise ModelCannotRetry( - f"The document is too large to translate ({len(content):,} characters, " - f"limit is {max_chars:,}). " - "You must explain this to the user, without providing numerical details. " - "Suggest them to reduce the document size by summarizing it or " - "by splitting it into smaller parts. " - "Also offer them to summarize the document in the target language instead, " - "which can be a good alternative to translation for large documents." - ) + raise ModelCannotRetry( + "The document is too large to translate. " + "You must explain this to the user. " + "Suggest them to reduce the document size by summarizing it or " + "by splitting it into smaller parts. " + "Also offer them to summarize the document in the target language instead, " + "which can be a good alternative to translation for large documents." + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/document_translate.py` around lines 82 - 90, The ModelCannotRetry exception message currently embeds exact character counts using {len(content):,} and {max_chars:,} but then instructs the LLM not to share numerical details; remove the interpolated numbers from the raised message in document_translate.py (the raise of ModelCannotRetry referencing content and max_chars) and replace the text with a version that omits numeric values and simply tells the user the document is too large, advises summarizing or splitting it, and offers summarizing in the target language as an alternative.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/chat/tools/utils.py`:
- Around line 56-60: The read_document_content function currently decodes bytes
with decode("utf-8") which will raise UnicodeDecodeError for non-UTF-8 text
files; update read_document_content to catch UnicodeDecodeError from
default_storage.open(...).read(), and instead of letting it bubble up, raise
ModelCannotRetry with a clear user-facing message like "Document encoding is not
supported — please re-save as UTF-8"; also change the `@sync_to_async` decorator
on read_document_content to `@sync_to_async`(thread_sensitive=False) since file IO
here does not require thread-sensitive execution to improve throughput.
---
Duplicate comments:
In `@src/backend/chat/tools/document_translate.py`:
- Line 9: The import for ToolReturn is from the wrong module; replace the
current import statement "from pydantic_ai.messages import ToolReturn" with
"from pydantic_ai import ToolReturn" so the ToolReturn symbol is imported from
the top-level pydantic_ai package rather than pydantic_ai.messages; update the
import in document_translate.py where ToolReturn is referenced.
- Around line 58-60: The prompt builds instructions_hint even when instructions
contains only whitespace; update the logic around the instructions and
instructions_hint variables so you check the stripped value instead of the raw
string (e.g., compute stripped_instructions = instructions.strip() and only set
instructions_hint when stripped_instructions is non-empty), and use
stripped_instructions in the formatted hint to avoid producing a dangling
"Follow these instructions: " fragment.
- Around line 101-108: The current translate_prompt in document_translate.py
uses the fixed triple-quote delimiter (''' ) which can be injected if the
uploaded content contains that sequence; change the boundary strategy so the LLM
can never confuse content with prompt: generate a unique, hard-to-guess
delimiter (e.g., include a UUID or random nonce) and use that delimiter both
around the {content} and in the surrounding instructions, or alternatively
prefix the document with its exact byte/character length and instruct the model
to read exactly that many characters; update the translate_prompt construction
(referencing translate_prompt, content, and instructions_hint) to use the chosen
safe delimiter or length-prefix method consistently.
- Around line 82-90: The ModelCannotRetry exception message currently embeds
exact character counts using {len(content):,} and {max_chars:,} but then
instructs the LLM not to share numerical details; remove the interpolated
numbers from the raised message in document_translate.py (the raise of
ModelCannotRetry referencing content and max_chars) and replace the text with a
version that omits numeric values and simply tells the user the document is too
large, advises summarizing or splitting it, and offers summarizing in the target
language as an alternative.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/backend/chat/agents/translate.pysrc/backend/chat/tools/document_summarize.pysrc/backend/chat/tools/document_translate.pysrc/backend/chat/tools/utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/backend/chat/agents/translate.py
| @sync_to_async | ||
| def read_document_content(doc): | ||
| """Read document content asynchronously.""" | ||
| with default_storage.open(doc.key) as f: | ||
| return doc.file_name, f.read().decode("utf-8") |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
asgiref sync_to_async thread_sensitive parameter default documentation
💡 Result:
asgiref.sync.sync_to_async() has the signature sync_to_async(func, thread_sensitive=True), so the default for thread_sensitive is True. Django’s async docs also note that asgiref 3.3.0 changed the default to True (it used to be False). [1]
Primary references:
- Django async docs (“
sync_to_async(sync_function, thread_sensitive=True)”, and the 3.3.0 default-change warning) [1] asgirefsource showingthread_sensitive: bool = TrueinSyncToAsync.__init__[2]asgirefREADME noting the default behavior (“By default,sync_to_asyncwill run all synchronous code in the program in the same thread…”) [3]
Sources: [1] (docs.djangoproject.com) [2] (docs.djangoproject.com) [3] (github.com)
Citations:
- 1: https://docs.djangoproject.com/en/4.2/topics/async/?utm_source=openai
- 2: https://docs.djangoproject.com/en/5.0/_modules/asgiref/sync/?utm_source=openai
- 3: https://github.com/django/asgiref?utm_source=openai
🏁 Script executed:
# Search for content_type filter and how read_document_content is used
rg "content_type.*startswith.*text" --type py -B 2 -A 2Repository: suitenumerique/conversations
Length of output: 1317
🏁 Script executed:
# Find where read_document_content is called and how errors are handled
rg "read_document_content" --type py -B 2 -A 2Repository: suitenumerique/conversations
Length of output: 2013
🏁 Script executed:
# Check the full context of the utils.py file to understand error handling
cat -n src/backend/chat/tools/utils.py | head -80Repository: suitenumerique/conversations
Length of output: 2540
🏁 Script executed:
# Check error handling in document_translate.py around read_document_content call
cat -n src/backend/chat/tools/document_translate.py | head -150Repository: suitenumerique/conversations
Length of output: 6615
🏁 Script executed:
# Check error handling in document_summarize.py around read_document_content call
cat -n src/backend/chat/tools/document_summarize.py | head -150Repository: suitenumerique/conversations
Length of output: 7002
🏁 Script executed:
# Search for exception handlers that might catch these errors
rg "except.*Exception" src/backend/chat/tools/document_translate.py src/backend/chat/tools/document_summarize.py -B 3 -A 3Repository: suitenumerique/conversations
Length of output: 3853
decode("utf-8") will raise UnicodeDecodeError for non-UTF-8 text attachments.
Since the tool only filters by content_type__startswith="text/", a user could upload a text/plain file encoded in Windows-1252 or Latin-1. The error propagates through the caller's outer exception handler and surfaces as "An unexpected error occurred during document translation: UnicodeDecodeError", which is opaque to the user.
🛡️ Proposed fix
- return doc.file_name, f.read().decode("utf-8")
+ return doc.file_name, f.read().decode("utf-8", errors="replace")Alternatively, catch UnicodeDecodeError explicitly and raise a ModelCannotRetry with a clear message (e.g., "Document encoding is not supported — please re-save as UTF-8").
Also, @sync_to_async defaults to thread_sensitive=True, which routes all invocations through the main async thread. Since default_storage.open / file reads have no thread-local state requirements, thread_sensitive=False is better for throughput.
⚡ Optional optimisation
-@sync_to_async
+@sync_to_async(thread_sensitive=False)
def read_document_content(doc):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @sync_to_async | |
| def read_document_content(doc): | |
| """Read document content asynchronously.""" | |
| with default_storage.open(doc.key) as f: | |
| return doc.file_name, f.read().decode("utf-8") | |
| `@sync_to_async`(thread_sensitive=False) | |
| def read_document_content(doc): | |
| """Read document content asynchronously.""" | |
| with default_storage.open(doc.key) as f: | |
| return doc.file_name, f.read().decode("utf-8", errors="replace") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/chat/tools/utils.py` around lines 56 - 60, The
read_document_content function currently decodes bytes with decode("utf-8")
which will raise UnicodeDecodeError for non-UTF-8 text files; update
read_document_content to catch UnicodeDecodeError from
default_storage.open(...).read(), and instead of letting it bubble up, raise
ModelCannotRetry with a clear user-facing message like "Document encoding is not
supported — please re-save as UTF-8"; also change the `@sync_to_async` decorator
on read_document_content to `@sync_to_async`(thread_sensitive=False) since file IO
here does not require thread-sensitive execution to improve throughput.



Purpose
First fix to #301 (for short documents at least)
This tool adds a simple document translate tool, because right now document translation is done through summary.
Proposal
Summary by CodeRabbit
New Features
Tests