Skip to content

fix: enforce response size limit in ReadWebpageTool to prevent resource exhaustion#2410

Open
Koushik-Salammagari wants to merge 1 commit intoarc53:mainfrom
Koushik-Salammagari:fix/read-webpage-response-size-limit
Open

fix: enforce response size limit in ReadWebpageTool to prevent resource exhaustion#2410
Koushik-Salammagari wants to merge 1 commit intoarc53:mainfrom
Koushik-Salammagari:fix/read-webpage-response-size-limit

Conversation

@Koushik-Salammagari
Copy link
Copy Markdown

Vulnerability Summary

CWE-400: Uncontrolled Resource Consumption in application/agents/tools/read_webpage.py

Problem

ReadWebpageTool.execute_action fetched the full response body unconditionally via response.text before converting it with markdownify:

response = requests.get(url, timeout=10, ...)
html_content = response.text          # loads entire body into memory
markdown_content = markdownify(html_content, ...)

An attacker can prompt the agent to call read_webpage with a URL that serves a multi-GB response (e.g. a large file host or a server under attacker control). The server allocates memory proportional to the response size, with no upper bound, which can cause OOM crashes or severe performance degradation.

Data flow

User chat prompt (attacker-influenced)
  → LLM function-calling output
    → ToolActionParser.parse_args → call_args
      → tool_executor.execute → parameters dict
        → ReadWebpageTool.execute_action(url=<LLM-controlled value>)
          → requests.get(url).text   ← unbounded memory sink

SSRF is already blocked by validate_url(). This is a separate, complementary defence against resource exhaustion via legitimate-but-oversized responses.

Fix

  • Switch to stream=True so the connection is opened without reading the body
  • Reject immediately if Content-Length header exceeds MAX_CONTENT_BYTES (10 MB)
  • Abort mid-stream if accumulated body bytes exceed MAX_CONTENT_BYTES
  • Call response.close() on early exit to release the TCP connection
+MAX_CONTENT_BYTES = 10 * 1024 * 1024  # 10 MB

-response = requests.get(url, timeout=10, headers={...})
-response.raise_for_status()
-html_content = response.text
+response = requests.get(url, timeout=10, headers={...}, stream=True)
+response.raise_for_status()
+
+content_length = response.headers.get("Content-Length")
+if content_length and int(content_length) > MAX_CONTENT_BYTES:
+    response.close()
+    return f"Error: Response too large. Maximum allowed size is {max_mb} MB."
+
+chunks = []
+total_bytes = 0
+for chunk in response.iter_content(chunk_size=8192):
+    total_bytes += len(chunk)
+    if total_bytes > MAX_CONTENT_BYTES:
+        response.close()
+        return f"Error: Response too large. Maximum allowed size is {max_mb} MB."
+    chunks.append(chunk)
+
+html_content = b"".join(chunks).decode(response.encoding or "utf-8", errors="replace")

Tests added (5 new cases)

Test What it verifies
test_content_length_too_large_rejected Rejects before reading body when Content-Length exceeds limit; iter_content never called
test_streaming_body_too_large_rejected Aborts mid-stream when accumulated bytes exceed limit
test_content_at_limit_allowed Accepts a response exactly at the limit (boundary check)
test_uses_stream_true Confirms stream=True is passed to requests.get
test_user_agent_header_sent Confirms User-Agent header is still sent on streaming request

Existing tests updated to mock iter_content instead of .text to match the streaming code path.

All 13 tests pass (python -m pytest tests/agents/test_read_webpage_tool.py -v).


Screenshots: N/A — backend-only change with no UI impact.

…ce exhaustion

An LLM agent could be instructed to fetch an arbitrarily large URL
(e.g. a multi-GB file), causing the server to load the entire response
into memory via response.text before passing it to markdownify.

Fix:
- Use stream=True so the connection is established before reading
- Reject responses whose Content-Length header exceeds MAX_CONTENT_BYTES (10 MB)
- Abort mid-stream if accumulated body bytes exceed MAX_CONTENT_BYTES
- Call response.close() on early exit to release the connection

Tests: add 5 new cases covering Content-Length rejection, streaming
body rejection, boundary-exact acceptance, stream=True assertion, and
User-Agent header verification. Update existing test_successful_fetch
to use iter_content mock consistent with streaming path.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 18, 2026

@Koushik-Salammagari is attempting to deploy a commit to the Arc53 Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions Bot added application Application tests Tests labels Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

application Application tests Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant