fix(pymllm): reduce scheduler CPU busy-loop from 100% to ~2% during decode by FarmersWrap · Pull Request #655 · UbiquitousLearning/mllm

FarmersWrap · 2026-03-20T19:43:14Z

Summary

The scheduler event loop used poll(timeout=0) (non-blocking spin) between decode batches, burning 100% of a CPU core while waiting for new requests
Track decode state in event_loop and use a 1ms poll timeout during active decode to yield the CPU, dropping usage to ~2%
Apply the same optimization to both ZMQ and shared-queue receive paths

Details

During continuous decode (token generation), the scheduler's recv_requests() path called zmq.Poller.poll(timeout=0) in a tight loop. Since decode batches are always available, the idle sleeper never kicks in, and the CPU spins at 100%.

The fix tracks whether the previous iteration ran a decode batch (_in_decode flag). When true, the first poll uses _DECODE_POLL_TIMEOUT_MS (1ms) which lets the OS schedule other work on the core. Subsequent polls in the same recv_requests call drain any queued messages non-blocking, so no latency is added for burst arrivals.

Why 1ms? Decode forward passes on GPU typically take >1ms even for small models, so a 1ms yield adds no measurable generation latency. Prefill (extend) batches remain fully non-blocking since they are latency-sensitive.

Benchmark results

Integration test using the real SchedulerProcess with ZMQ sockets and a mock model runner:

Metric	Before (spin)	After (brief)
CPU usage	100.0%	1.8%
CPU time (5s window)	5.00s	0.09s

Test plan

10 unit tests covering poll timeout values, parameter forwarding, event loop state tracking, ForwardMode behavior, CPU measurement, and constant validation
Standalone ZMQ poll micro-benchmark
Full SchedulerProcess integration test with real ZMQ sockets
All existing tests still pass

🤖 Generated with Claude Code

Summary by CodeRabbit

Performance
- Reduced CPU usage between batch decode operations by using an adaptive, brief-polling receive strategy; improves efficiency during sustained workloads.
- Exposed a configurable timeout (via environment variable) to fine-tune brief-poll behavior for different deployment loads.
Documentation
- Updated runtime docs to describe the decode-between-batches yield and polling behavior.

…ecode The scheduler event loop used poll(timeout=0) (non-blocking spin) between decode batches, burning 100% of a CPU core while waiting for new requests. Track decode state and use a 1ms poll timeout during active decode to yield the CPU core to the OS scheduler, dropping usage from 100% to ~2%. - Add _DECODE_POLL_TIMEOUT_MS constant (1ms) for configurable poll timeout - Track _in_decode state in event_loop based on ForwardMode - Forward brief_poll parameter through recv_requests → _recv_from_zmq - Apply same optimization to _recv_from_shared_queue path - Add unit tests, benchmark, and integration test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-20T19:43:31Z

📝 Walkthrough

Walkthrough

Introduces an environment-configurable _DECODE_POLL_TIMEOUT_MS and a brief_poll flag to the scheduler's receive methods. When brief_poll is true, ZMQ and shared-queue receive paths use a short initial blocking poll/get (then switch to draining) to yield CPU between decode batches.

Changes

Cohort / File(s)	Summary
Scheduler Process Decode Polling `pymllm/orchestrator/scheduler_process.py`	Added module-level `_DECODE_POLL_TIMEOUT_MS` (from `MLLM_DECODE_POLL_TIMEOUT_MS`, default 1 ms) and `_read_decode_poll_timeout_ms()`; added `brief_poll: bool` parameter to `recv_requests()`, `_recv_from_zmq()`, and `_recv_from_shared_queue()`; ZMQ path uses initial poll timeout when `brief_poll=True` then switches to non-blocking drain; shared-queue path uses initial `Queue.get()` timeout (`_DECODE_POLL_TIMEOUT_MS/1000.0`) then reverts to short draining timeout. Docstrings updated.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibble milliseconds, then hop away,
Brief polls hush the bustle of the day.
Between decode batches I softly rest,
Tiny pauses make the whole run best. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.06% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: reducing CPU usage during decode from 100% to ~2% by optimizing the scheduler's polling mechanism.
Description check	✅ Passed	The description is comprehensive and well-structured, covering the problem statement, solution details, benchmark results, and test plan; however, it does not follow the repository's description template which requests a reference to contributing guidelines.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

pymllm/tests/test_scheduler_cpu_busy_loop.py (2)

167-176: Mock overrides IntEnum method incorrectly.

The test creates a MagicMock for batch.forward_mode and then sets batch.forward_mode.is_extend = lambda: True/False. However, batch.forward_mode is assigned ForwardMode.EXTEND or ForwardMode.DECODE directly, which overwrites the MagicMock. This works coincidentally because the lambda is then attached to the actual enum value, but it's fragile.

A cleaner approach would be to use the actual ForwardMode enum directly since its is_extend() method already returns the correct value.

♻️ Proposed fix

         def fake_get_next_batch():
             i = iteration[0]
             iteration[0] += 1
             if i == 0:
                 # First iteration: return an extend (prefill) batch
-                batch = MagicMock()
-                batch.forward_mode = ForwardMode.EXTEND
-                batch.forward_mode.is_extend = lambda: True
-                return batch
+                return MagicMock(forward_mode=ForwardMode.EXTEND)
             elif i == 1:
                 # Second iteration: return a decode batch
-                batch = MagicMock()
-                batch.forward_mode = ForwardMode.DECODE
-                batch.forward_mode.is_extend = lambda: False
-                return batch
+                return MagicMock(forward_mode=ForwardMode.DECODE)
             elif i == 2:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pymllm/tests/test_scheduler_cpu_busy_loop.py` around lines 167 - 176, The
test incorrectly assigns a MagicMock to batch.forward_mode and then overwrites
it with ForwardMode.EXTEND/DECODE while patching is_extend on the enum value;
instead set batch.forward_mode to the actual ForwardMode enum (e.g.,
ForwardMode.EXTEND or ForwardMode.DECODE) so its built-in is_extend() is used,
or keep batch.forward_mode as a MagicMock but configure its is_extend()
return_value explicitly (e.g., batch.forward_mode = ForwardMode.EXTEND or
batch.forward_mode = MagicMock(is_extend=lambda: True)) to avoid mutating enum
members referenced in the test code.

66-99: Consider extracting scheduler fixture.

The _make_scheduler method duplicates a lot of manual attribute setup. This pattern is repeated in the integration test. Consider extracting to a shared fixture or factory function in a conftest.py file.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pymllm/tests/test_scheduler_cpu_busy_loop.py` around lines 66 - 99, The test
contains a long duplicated manual setup in _make_scheduler creating a
SchedulerProcess instance and setting many internal attributes; extract this
into a reusable fixture or factory function (e.g., scheduler_factory or
scheduler_fixture) placed in conftest.py and have tests/importing modules use it
instead of calling _make_scheduler directly; ensure the factory constructs a
SchedulerProcess (using SchedulerProcess.__new__ if needed) and initializes the
same fields (_recv_from_tokenizer_addr, _send_to_detokenizer_addr,
_server_config, _model_config, _gpu_id, _shared_queue, _enable_shared_queue,
_tensor_transport_mode, _zmq_ctx, _recv_from_tokenizer, _send_to_detokenizer,
_model_runner, _waiting_queue, _pending_queue, _running_batch, _finished,
_max_running_requests, _max_prefill_tokens, _max_total_tokens, _used_tokens,
_eos_token_ids, _default_max_new_tokens, _next_req_pool_idx,
_decode_log_interval, _num_prefill_tokens, _num_prefill_cache_tokens,
_num_decode_tokens, _num_prefill_reqs, _last_prefill_stats_tic,
_last_decode_stats_tic, _forward_ct_decode) and update tests to use the fixture
to remove duplication.

pymllm/tests/integration_cpu_busy_loop.py (1)

264-270: Consider adjusting pass threshold or adding context.

The thresholds (30% for PASS, 10% for moderate) seem reasonable, but on some systems with different scheduling characteristics, results may vary. The "INCONCLUSIVE" result might benefit from suggesting possible causes or next steps.

💡 Suggested improvement

     if reduction > 30:
         print("  PASS: Significant CPU savings — the fix works!")
     elif reduction > 10:
         print("  PASS: Moderate CPU savings.")
     else:
-        print("  INCONCLUSIVE: Minimal difference.")
+        print("  INCONCLUSIVE: Minimal difference (system scheduling may vary).")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pymllm/tests/integration_cpu_busy_loop.py` around lines 264 - 270, Update the
pass/fail logic around the reduction variable (the if/elif/else block that
prints PASS/INCONCLUSIVE) to either increase flexibility or provide guidance:
either make the thresholds configurable (e.g., expose the 30 and 10 values as
constants or CLI args) or expand the "INCONCLUSIVE" branch to print suggested
causes and next steps (examples: run multiple trials, check system
load/scheduler, adjust warmup/measurement windows) so test output gives
actionable context when reduction is small; modify the prints that reference
reduction to include the actual reduction percentage for clarity.

pymllm/orchestrator/scheduler_process.py (1)

584-590: Consider aligning timeout behavior between ZMQ and shared queue paths.

The shared queue path uses _DECODE_POLL_TIMEOUT_MS / 1000.0 for the first get (correct conversion to seconds), but falls back to 0.002 (2ms) for draining, while the ZMQ path uses 0 (non-blocking) for draining.

This asymmetry is likely intentional since Queue.get(timeout=0) may behave differently than zmq.poll(timeout=0), but consider adding a brief comment explaining why the drain timeout differs.

💡 Suggested clarification

         get_timeout = _DECODE_POLL_TIMEOUT_MS / 1000.0 if brief_poll else 0.002
         while True:
             try:
                 rid, shm_name, mm_inputs = self._shared_queue.get(timeout=get_timeout)
-                get_timeout = 0.002  # drain remaining without extra delay
+                # Use small timeout to drain remaining (Queue.get blocks on timeout=0)
+                get_timeout = 0.002

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pymllm/orchestrator/scheduler_process.py` around lines 584 - 590, The
shared-queue loop sets get_timeout = _DECODE_POLL_TIMEOUT_MS / 1000.0 for the
first get and then assigns 0.002 to drain remaining items, which differs from
the ZMQ path that uses a zero (non-blocking) timeout; add a concise comment next
to the get_timeout assignments (and/or above the while loop) explaining why we
use 2ms for Queue.get draining (e.g., Python Queue.get with timeout=0/0.0
behaves differently than zmq.poll(timeout=0) and can be less CPU-friendly or
raise Empty immediately), referencing _DECODE_POLL_TIMEOUT_MS, get_timeout,
self._shared_queue.get and the ZMQ poll call so future readers understand the
intentional asymmetry.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pymllm/orchestrator/scheduler_process.py`:
- Around line 530-538: The docstring around the brief_poll logic contains EN
DASH characters (U+2013) that should be replaced with ASCII hyphen-minus
(U+002D); inspect the docstring mentioning brief_poll, _DECODE_POLL_TIMEOUT_MS,
_waiting_queue and TokenizedGenerateReqInput and replace all EN DASHes with
standard hyphens so only printable ASCII hyphens remain.

In `@pymllm/tests/bench_cpu_busy_loop.py`:
- Line 68: Remove the unnecessary f-string prefix from the print call in
bench_cpu_busy_loop.py: locate the print statement that currently reads
print(f"(poll for requests → loop back, no simulated GPU work)") and change it
to a plain string literal print("(poll for requests → loop back, no simulated
GPU work)") so the f-prefix is not used when there are no interpolation
placeholders.

In `@pymllm/tests/test_scheduler_cpu_busy_loop.py`:
- Around line 216-219: The assertions in test_scheduler_cpu_busy_loop.py compare
against constant strings but use f-string syntax unnecessarily; update the four
asserts that reference call_log (the lines asserting call_log[0]..call_log[3])
to use normal string literals (remove the leading f from each f"...") so they
are plain strings while keeping the existing message interpolation via
comma-separated f-strings only if needed.

---

Nitpick comments:
In `@pymllm/orchestrator/scheduler_process.py`:
- Around line 584-590: The shared-queue loop sets get_timeout =
_DECODE_POLL_TIMEOUT_MS / 1000.0 for the first get and then assigns 0.002 to
drain remaining items, which differs from the ZMQ path that uses a zero
(non-blocking) timeout; add a concise comment next to the get_timeout
assignments (and/or above the while loop) explaining why we use 2ms for
Queue.get draining (e.g., Python Queue.get with timeout=0/0.0 behaves
differently than zmq.poll(timeout=0) and can be less CPU-friendly or raise Empty
immediately), referencing _DECODE_POLL_TIMEOUT_MS, get_timeout,
self._shared_queue.get and the ZMQ poll call so future readers understand the
intentional asymmetry.

In `@pymllm/tests/integration_cpu_busy_loop.py`:
- Around line 264-270: Update the pass/fail logic around the reduction variable
(the if/elif/else block that prints PASS/INCONCLUSIVE) to either increase
flexibility or provide guidance: either make the thresholds configurable (e.g.,
expose the 30 and 10 values as constants or CLI args) or expand the
"INCONCLUSIVE" branch to print suggested causes and next steps (examples: run
multiple trials, check system load/scheduler, adjust warmup/measurement windows)
so test output gives actionable context when reduction is small; modify the
prints that reference reduction to include the actual reduction percentage for
clarity.

In `@pymllm/tests/test_scheduler_cpu_busy_loop.py`:
- Around line 167-176: The test incorrectly assigns a MagicMock to
batch.forward_mode and then overwrites it with ForwardMode.EXTEND/DECODE while
patching is_extend on the enum value; instead set batch.forward_mode to the
actual ForwardMode enum (e.g., ForwardMode.EXTEND or ForwardMode.DECODE) so its
built-in is_extend() is used, or keep batch.forward_mode as a MagicMock but
configure its is_extend() return_value explicitly (e.g., batch.forward_mode =
ForwardMode.EXTEND or batch.forward_mode = MagicMock(is_extend=lambda: True)) to
avoid mutating enum members referenced in the test code.
- Around line 66-99: The test contains a long duplicated manual setup in
_make_scheduler creating a SchedulerProcess instance and setting many internal
attributes; extract this into a reusable fixture or factory function (e.g.,
scheduler_factory or scheduler_fixture) placed in conftest.py and have
tests/importing modules use it instead of calling _make_scheduler directly;
ensure the factory constructs a SchedulerProcess (using SchedulerProcess.__new__
if needed) and initializes the same fields (_recv_from_tokenizer_addr,
_send_to_detokenizer_addr, _server_config, _model_config, _gpu_id,
_shared_queue, _enable_shared_queue, _tensor_transport_mode, _zmq_ctx,
_recv_from_tokenizer, _send_to_detokenizer, _model_runner, _waiting_queue,
_pending_queue, _running_batch, _finished, _max_running_requests,
_max_prefill_tokens, _max_total_tokens, _used_tokens, _eos_token_ids,
_default_max_new_tokens, _next_req_pool_idx, _decode_log_interval,
_num_prefill_tokens, _num_prefill_cache_tokens, _num_decode_tokens,
_num_prefill_reqs, _last_prefill_stats_tic, _last_decode_stats_tic,
_forward_ct_decode) and update tests to use the fixture to remove duplication.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8794f227-a176-4180-8916-5b1a09d5e804

📥 Commits

Reviewing files that changed from the base of the PR and between 05c75f8 and cd33f83.

📒 Files selected for processing (4)

pymllm/orchestrator/scheduler_process.py
pymllm/tests/bench_cpu_busy_loop.py
pymllm/tests/integration_cpu_busy_loop.py
pymllm/tests/test_scheduler_cpu_busy_loop.py

pymllm/orchestrator/scheduler_process.py

pymllm/tests/bench_cpu_busy_loop.py

pymllm/tests/test_scheduler_cpu_busy_loop.py

Replace U+2013 EN DASH characters with standard ASCII hyphens in scheduler docstrings, and remove test/benchmark files that trigger heavy CICC compilation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

pymllm/orchestrator/scheduler_process.py (1)
511-511: Prefer is_decode() for _in_decode.

Line 511 currently means “any non-extend mode,” while the surrounding docstring and timeout behavior are specifically about decode batches. Using is_decode() makes the state match the intent and avoids treating a future IDLE batch like decode.
🔧 Proposed fix
-                _in_decode = not batch.forward_mode.is_extend()
+                _in_decode = batch.forward_mode.is_decode()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pymllm/orchestrator/scheduler_process.py` at line 511, The variable
_in_decode currently computes non-extend mode using "not
batch.forward_mode.is_extend()", which incorrectly treats future IDLE batches as
decode; change the assignment to use "batch.forward_mode.is_decode()" so
_in_decode truly reflects decode-mode batches (update the line that sets
_in_decode and any related comments/docstring to match the new intent).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pymllm/orchestrator/scheduler_process.py`:
- Around line 59-63: The environment override MLLM_DECODE_POLL_TIMEOUT_MS is
used directly in the int() call assigned to _DECODE_POLL_TIMEOUT_MS without
validation; parse the env var with a try/except around int(...) (strip
whitespace first), and if parsing fails raise a clear ValueError describing the
bad value; after parsing ensure the integer is >= 0 and raise ValueError if it's
negative (0 is allowed for non-blocking behavior); update the assignment of
_DECODE_POLL_TIMEOUT_MS accordingly and mention in the error message the
environment variable name and that zmq.Poller.poll() expects non-negative
timeouts.

---

Nitpick comments:
In `@pymllm/orchestrator/scheduler_process.py`:
- Line 511: The variable _in_decode currently computes non-extend mode using
"not batch.forward_mode.is_extend()", which incorrectly treats future IDLE
batches as decode; change the assignment to use "batch.forward_mode.is_decode()"
so _in_decode truly reflects decode-mode batches (update the line that sets
_in_decode and any related comments/docstring to match the new intent).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 583f25d5-79d5-424a-a1fa-d4a300b0c799

📥 Commits

Reviewing files that changed from the base of the PR and between cd33f83 and 8ae89bf.

📒 Files selected for processing (1)

pymllm/orchestrator/scheduler_process.py

pymllm/orchestrator/scheduler_process.py

Reject non-integer and negative values at import time with a clear error message. Negative values would cause zmq.Poller.poll() to block indefinitely instead of yielding briefly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

pymllm/orchestrator/scheduler_process.py (1)
603-607: Extract shared-queue drain timeout magic number into a named constant.

0.002 is duplicated in Line 603 and Line 607; pulling it into a module constant improves readability and keeps timeout tuning centralized.

As per coding guidelines, "Use named constants instead of magic numbers."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pymllm/orchestrator/scheduler_process.py` around lines 603 - 607, The two
occurrences of the literal 0.002 used to set get_timeout when draining the
shared queue should be replaced with a named module constant (e.g.
_SHARED_QUEUE_DRAIN_TIMEOUT_SEC) to centralize tuning and improve readability;
add the constant near other module-level constants (alongside
_DECODE_POLL_TIMEOUT_MS), then replace both uses in scheduler_process.py (the
get_timeout initialization and the assignment after successful get from
self._shared_queue.get) to reference _SHARED_QUEUE_DRAIN_TIMEOUT_SEC instead of
the magic number.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pymllm/orchestrator/scheduler_process.py`:
- Around line 67-70: Change the bare exception re-raise to preserve causality:
update the except clause from "except ValueError:" to "except ValueError as
exc:" and re-raise the new ValueError with "from exc" so the original parse
error is chained (keep the existing message that uses raw and the
MLLM_DECODE_POLL_TIMEOUT_MS context).

---

Nitpick comments:
In `@pymllm/orchestrator/scheduler_process.py`:
- Around line 603-607: The two occurrences of the literal 0.002 used to set
get_timeout when draining the shared queue should be replaced with a named
module constant (e.g. _SHARED_QUEUE_DRAIN_TIMEOUT_SEC) to centralize tuning and
improve readability; add the constant near other module-level constants
(alongside _DECODE_POLL_TIMEOUT_MS), then replace both uses in
scheduler_process.py (the get_timeout initialization and the assignment after
successful get from self._shared_queue.get) to reference
_SHARED_QUEUE_DRAIN_TIMEOUT_SEC instead of the magic number.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 56cad3e8-d56c-498e-a5e4-dbb51b2d857a

📥 Commits

Reviewing files that changed from the base of the PR and between 8ae89bf and 4481cbc.

📒 Files selected for processing (1)

pymllm/orchestrator/scheduler_process.py

coderabbitai · 2026-03-21T00:35:09Z

pymllm/orchestrator/scheduler_process.py

+    except ValueError:
+        raise ValueError(
+            f"MLLM_DECODE_POLL_TIMEOUT_MS must be a non-negative integer, got {raw!r}"
+        )


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the file exists and read the relevant section wc -l pymllm/orchestrator/scheduler_process.py

Repository: UbiquitousLearning/mllm

Length of output: 112

🏁 Script executed:

# Read lines 60-80 to see the context around lines 67-70 sed -n '60,80p' pymllm/orchestrator/scheduler_process.py

Repository: UbiquitousLearning/mllm

Length of output: 755

🏁 Script executed:

# Let's also check the full function to understand the complete context sed -n '60,80p' pymllm/orchestrator/scheduler_process.py | cat -n

Repository: UbiquitousLearning/mllm

Length of output: 902

🏁 Script executed:

# Verify the exact line numbers by counting head -70 pymllm/orchestrator/scheduler_process.py | tail -15 | cat -n

Repository: UbiquitousLearning/mllm

Length of output: 811

Preserve exception causality by chaining the original parse error.

The ValueError caught at line 68 should be chained when re-raised to preserve the original exception context for debugging. This aligns with Python best practices (PEP 3134) and helps maintain diagnostic information when the environment variable fails to parse.

🔧 Suggested fix

except ValueError: - raise ValueError( + raise ValueError( f"MLLM_DECODE_POLL_TIMEOUT_MS must be a non-negative integer, got {raw!r}" - ) + ) from exc

(Also add as exc to the except clause: except ValueError as exc:)

🧰 Tools

🪛 Ruff (0.15.6)

[warning] 68-70: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pymllm/orchestrator/scheduler_process.py` around lines 67 - 70, Change the bare exception re-raise to preserve causality: update the except clause from "except ValueError:" to "except ValueError as exc:" and re-raise the new ValueError with "from exc" so the original parse error is chained (keep the existing message that uses raw and the MLLM_DECODE_POLL_TIMEOUT_MS context).

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

pymllm/orchestrator/scheduler_process.py Outdated Show resolved Hide resolved

pymllm/tests/bench_cpu_busy_loop.py Outdated Show resolved Hide resolved

pymllm/tests/test_scheduler_cpu_busy_loop.py Outdated Show resolved Hide resolved

fix(pymllm): replace EN DASH with HYPHEN-MINUS and remove test files

8ae89bf

Replace U+2013 EN DASH characters with standard ASCII hyphens in scheduler docstrings, and remove test/benchmark files that trigger heavy CICC compilation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

pymllm/orchestrator/scheduler_process.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pymllm): reduce scheduler CPU busy-loop from 100% to ~2% during decode#655

fix(pymllm): reduce scheduler CPU busy-loop from 100% to ~2% during decode#655
FarmersWrap wants to merge 3 commits intoUbiquitousLearning:mainfrom
FarmersWrap:fix/scheduler-cpu-busy-loop

FarmersWrap commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FarmersWrap commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Benchmark results

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FarmersWrap commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading