feat: add --max-context-tokens for OOM prevention by eloe · Pull Request #19 · eloe/mlx-vlm

eloe · 2026-04-06T16:16:33Z

Summary\nReject requests exceeding context window with 400 instead of OOM crash. --max-context-tokens flag. 5 tests.

Track prompt token count and reject requests exceeding the configured context window before they reach the GPU, preventing OOM crashes. New flag: --max-context-tokens N (default: 0 = unlimited) Env: MAX_CONTEXT_TOKENS 5 tests: within-limit, exceeds-limit, zero-unlimited, default, env var. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds an optional prompt context-length guard to the FastAPI server to proactively reject overly-long requests (HTTP 400) instead of risking an OOM crash. It introduces a --max-context-tokens CLI flag (backed by MAX_CONTEXT_TOKENS) and basic unit tests for the new helpers.

Changes:

Add get_max_context_tokens() and check_context_length() utilities to compute tokenized prompt length and reject requests beyond a configured limit.
Wire the context-length check into both /responses and /chat/completions request flows after chat-template formatting.
Add tests covering default/env configuration and helper behavior for within/over-limit cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`mlx_vlm/server.py`	Adds max-context configuration + prompt token counting guard; integrates guard into request handlers; adds CLI flag/env wiring.
`mlx_vlm/tests/test_server.py`	Adds unit tests for max-context env parsing and context-length validation helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-06T16:32:08Z

+    parser.add_argument(
+        "--max-context-tokens",
+        type=int,


The new --max-context-tokens flag accepts negative values; those end up setting MAX_CONTEXT_TOKENS to a negative number, and check_context_length then treats it as “no limit” (max_context <= 0). Consider validating/clamping this argument to be non-negative (e.g., argparse custom type that raises on negatives, or enforce >= 0 after parsing) so misconfiguration doesn’t silently disable the protection.

Suggested change

parser.add_argument(

"--max-context-tokens",

type=int,

def non_negative_int(value):

try:

parsed_value = int(value)

except ValueError as exc:

raise argparse.ArgumentTypeError(

f"invalid int value: {value!r}"

) from exc

if parsed_value < 0:

raise argparse.ArgumentTypeError(

"--max-context-tokens must be greater than or equal to 0"

)

return parsed_value

parser.add_argument(

"--max-context-tokens",

type=non_negative_int,

Copilot · 2026-04-06T16:32:08Z

+def test_get_max_context_tokens_default():
+    import os
+    os.environ.pop("MAX_CONTEXT_TOKENS", None)
+    assert server.get_max_context_tokens() == 0
+
+
+def test_get_max_context_tokens_from_env():
+    import os
+    os.environ["MAX_CONTEXT_TOKENS"] = "16384"
+    assert server.get_max_context_tokens() == 16384
+    os.environ.pop("MAX_CONTEXT_TOKENS")


These tests mutate os.environ (pop/set MAX_CONTEXT_TOKENS) without restoring any pre-existing value. This can make the suite order-/environment-dependent when MAX_CONTEXT_TOKENS is set in the runner. Use pytest’s monkeypatch fixture (or save/restore the prior value) so environment changes are isolated to the test.

Suggested change

def test_get_max_context_tokens_default():

import os

os.environ.pop("MAX_CONTEXT_TOKENS", None)

assert server.get_max_context_tokens() == 0

def test_get_max_context_tokens_from_env():

import os

os.environ["MAX_CONTEXT_TOKENS"] = "16384"

assert server.get_max_context_tokens() == 16384

os.environ.pop("MAX_CONTEXT_TOKENS")

def test_get_max_context_tokens_default(monkeypatch):

monkeypatch.delenv("MAX_CONTEXT_TOKENS", raising=False)

assert server.get_max_context_tokens() == 0

def test_get_max_context_tokens_from_env(monkeypatch):

monkeypatch.setenv("MAX_CONTEXT_TOKENS", "16384")

assert server.get_max_context_tokens() == 16384

- Reject negative --max-context-tokens (validate >= 0) - Use pytest monkeypatch for env var tests instead of os.environ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eloe requested a review from Copilot April 6, 2026 16:27

Copilot started reviewing on behalf of eloe April 6, 2026 16:28 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

fix: address Copilot review feedback for PR #19

67d34da

- Reject negative --max-context-tokens (validate >= 0) - Use pytest monkeypatch for env var tests instead of os.environ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eloe mentioned this pull request Apr 8, 2026

Combined server enhancements: OpenAI API compliance, prompt caching, concurrency #21

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add --max-context-tokens for OOM prevention#19

feat: add --max-context-tokens for OOM prevention#19
eloe wants to merge 2 commits into
mainfrom
feature/context-tracking

eloe commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    parser.add_argument(
-        "--max-context-tokens",
-        type=int,
+    def non_negative_int(value):
+        try:
+            parsed_value = int(value)
+        except ValueError as exc:
+            raise argparse.ArgumentTypeError(
+                f"invalid int value: {value!r}"
+            ) from exc
+        if parsed_value < 0:
+            raise argparse.ArgumentTypeError(
+                "--max-context-tokens must be greater than or equal to 0"
+            )
+        return parsed_value
+    parser.add_argument(
+        "--max-context-tokens",
+        type=non_negative_int,

Conversation

eloe commented Apr 6, 2026

Summary\nReject requests exceeding context window with 400 instead of OOM crash. --max-context-tokens flag. 5 tests.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants