Skip to content

feat: Allow large file uploads (up to 100 MB) for code interpreter agents with LLM prompt truncation#1

Open
Copilot wants to merge 3 commits intorelease/v3.0from
copilot/update-file-upload-limits
Open

feat: Allow large file uploads (up to 100 MB) for code interpreter agents with LLM prompt truncation#1
Copilot wants to merge 3 commits intorelease/v3.0from
copilot/update-file-upload-limits

Conversation

Copy link

Copilot AI commented Mar 10, 2026

Previously, all uploaded files were subject to a token limit (~100k tokens) and rejected if exceeded. This adds a two-tier file handling strategy based on whether the target agent has code interpreter access.

Behavior changes

Agents with code interpreter (PythonTool):

  • Accept files up to 100 MB (configurable via CODE_INTERPRETER_MAX_FILE_SIZE_BYTES)
  • If a file exceeds 10,000 tokens (configurable via CODE_INTERPRETER_FILE_TOKEN_THRESHOLD), only the first and last 1,000 tokens (configurable via CODE_INTERPRETER_FILE_TOKEN_CONTEXT_SIZE) are injected into the LLM prompt, with a note that the full file is available to the code interpreter
  • Files are still fully stored and accessible to the code interpreter

Agents without code interpreter: unchanged — token limit rejection still applies.

Implementation

  • Upload time (projects_file_utils.py): categorize_uploaded_files() receives has_code_interpreter flag, bypasses token-count rejection, enforces 100 MB size cap instead
  • Chat time (chat_utils.py): convert_chat_history() receives has_code_interpreter + tokenizer; calls _truncate_file_text_for_code_interpreter() for oversized files, replacing the middle with an omission notice
  • Persona detection (db/persona.py): persona_has_code_interpreter_tool() checks if a persona has PythonTool attached; called at upload time using persona_id passed from the frontend
  • Chat processing (process_message.py): detects PythonTool in the constructed tool list, passes has_code_interpreter=True and a tokenizer to convert_chat_history()
  • New config vars in app_configs.py: CODE_INTERPRETER_FILE_TOKEN_THRESHOLD, CODE_INTERPRETER_FILE_TOKEN_CONTEXT_SIZE, CODE_INTERPRETER_MAX_FILE_SIZE_BYTES
  • Frontend: uploadFiles() / beginUpload() now accept and forward personaId so the backend can gate the size limit per agent

How Has This Been Tested?

  • Unit tests for _truncate_file_text_for_code_interpreter() (short/exact/large files, first+last token coverage)
  • Unit tests for categorize_uploaded_files() covering: size cap rejection for CI agents, token limit bypass for CI agents, token limit enforcement for non-CI agents, skip-threshold global override

Additional Options

  • [Optional] Please cherry-pick this PR to the latest release version.
  • [Optional] Override Linear Check

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • disposable.github.io
    • Triggering command: /home/REDACTED/work/onyx/onyx/.venv/bin/pytest pytest tests/unit/ --ignore=tests/unit/model_server -x (dns block)
  • huggingface.co
    • Triggering command: /home/REDACTED/work/onyx/onyx/.venv/bin/pytest pytest tests/unit/onyx/server/features/test_projects_file_utils.py -xvs (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: kleincode <10796927+kleincode@users.noreply.github.com>
Co-authored-by: kleincode <10796927+kleincode@users.noreply.github.com>
Copilot AI changed the title [WIP] Update file upload limits for code interpreter access feat: Allow large file uploads (up to 100 MB) for code interpreter agents with LLM prompt truncation Mar 10, 2026
@kleincode kleincode marked this pull request as ready for review March 10, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants