fix: enforce response size limit in ReadWebpageTool to prevent resource exhaustion by Koushik-Salammagari · Pull Request #2410 · arc53/DocsGPT

Koushik-Salammagari · 2026-04-18T07:15:39Z

Vulnerability Summary

CWE-400: Uncontrolled Resource Consumption in application/agents/tools/read_webpage.py

Problem

ReadWebpageTool.execute_action fetched the full response body unconditionally via response.text before converting it with markdownify:

response = requests.get(url, timeout=10, ...)
html_content = response.text          # loads entire body into memory
markdown_content = markdownify(html_content, ...)

An attacker can prompt the agent to call read_webpage with a URL that serves a multi-GB response (e.g. a large file host or a server under attacker control). The server allocates memory proportional to the response size, with no upper bound, which can cause OOM crashes or severe performance degradation.

Data flow

User chat prompt (attacker-influenced)
  → LLM function-calling output
    → ToolActionParser.parse_args → call_args
      → tool_executor.execute → parameters dict
        → ReadWebpageTool.execute_action(url=<LLM-controlled value>)
          → requests.get(url).text   ← unbounded memory sink

SSRF is already blocked by validate_url(). This is a separate, complementary defence against resource exhaustion via legitimate-but-oversized responses.

Fix

Switch to stream=True so the connection is opened without reading the body
Reject immediately if Content-Length header exceeds MAX_CONTENT_BYTES (10 MB)
Abort mid-stream if accumulated body bytes exceed MAX_CONTENT_BYTES
Call response.close() on early exit to release the TCP connection

+MAX_CONTENT_BYTES = 10 * 1024 * 1024  # 10 MB

-response = requests.get(url, timeout=10, headers={...})
-response.raise_for_status()
-html_content = response.text
+response = requests.get(url, timeout=10, headers={...}, stream=True)
+response.raise_for_status()
+
+content_length = response.headers.get("Content-Length")
+if content_length and int(content_length) > MAX_CONTENT_BYTES:
+    response.close()
+    return f"Error: Response too large. Maximum allowed size is {max_mb} MB."
+
+chunks = []
+total_bytes = 0
+for chunk in response.iter_content(chunk_size=8192):
+    total_bytes += len(chunk)
+    if total_bytes > MAX_CONTENT_BYTES:
+        response.close()
+        return f"Error: Response too large. Maximum allowed size is {max_mb} MB."
+    chunks.append(chunk)
+
+html_content = b"".join(chunks).decode(response.encoding or "utf-8", errors="replace")

Tests added (5 new cases)

Test	What it verifies
`test_content_length_too_large_rejected`	Rejects before reading body when `Content-Length` exceeds limit; `iter_content` never called
`test_streaming_body_too_large_rejected`	Aborts mid-stream when accumulated bytes exceed limit
`test_content_at_limit_allowed`	Accepts a response exactly at the limit (boundary check)
`test_uses_stream_true`	Confirms `stream=True` is passed to `requests.get`
`test_user_agent_header_sent`	Confirms `User-Agent` header is still sent on streaming request

Existing tests updated to mock iter_content instead of .text to match the streaming code path.

All 13 tests pass (python -m pytest tests/agents/test_read_webpage_tool.py -v).

Screenshots: N/A — backend-only change with no UI impact.

…ce exhaustion An LLM agent could be instructed to fetch an arbitrarily large URL (e.g. a multi-GB file), causing the server to load the entire response into memory via response.text before passing it to markdownify. Fix: - Use stream=True so the connection is established before reading - Reject responses whose Content-Length header exceeds MAX_CONTENT_BYTES (10 MB) - Abort mid-stream if accumulated body bytes exceed MAX_CONTENT_BYTES - Call response.close() on early exit to release the connection Tests: add 5 new cases covering Content-Length rejection, streaming body rejection, boundary-exact acceptance, stream=True assertion, and User-Agent header verification. Update existing test_successful_fetch to use iter_content mock consistent with streaming path.

vercel · 2026-04-18T07:15:44Z

@Koushik-Salammagari is attempting to deploy a commit to the Arc53 Team on Vercel.

A member of the Team first needs to authorize it.

github-actions Bot added application Application tests Tests labels Apr 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: enforce response size limit in ReadWebpageTool to prevent resource exhaustion#2410

fix: enforce response size limit in ReadWebpageTool to prevent resource exhaustion#2410
Koushik-Salammagari wants to merge 1 commit intoarc53:mainfrom
Koushik-Salammagari:fix/read-webpage-response-size-limit

Koushik-Salammagari commented Apr 18, 2026

Uh oh!

vercel Bot commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Koushik-Salammagari commented Apr 18, 2026

Vulnerability Summary

Problem

Data flow

Fix

Tests added (5 new cases)

Uh oh!

vercel Bot commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant