Skip to content

[rollout, vllm] fix: avoid SIGSEGV on ROCm TP=1 by conditionally omitting distributed_executor_backend#6459

Open
HeShiLie wants to merge 2 commits into
verl-project:mainfrom
HeShiLie:fix/rocm-asyncllm-sigsegv
Open

[rollout, vllm] fix: avoid SIGSEGV on ROCm TP=1 by conditionally omitting distributed_executor_backend#6459
HeShiLie wants to merge 2 commits into
verl-project:mainfrom
HeShiLie:fix/rocm-asyncllm-sigsegv

Conversation

@HeShiLie
Copy link
Copy Markdown

Summary

On ROCm (AMD GPU) with tensor_parallel_size=1, verl's hardcoded distributed_executor_backend="mp" forces vLLM to use MultiprocExecutor, creating a two-level multiprocessing spawn chain inside a Ray actor:

Ray actor (CUDA/HIP already initialized)
  └→ AsyncLLM
       └→ EngineCore_DP0 (1st spawn)
            └→ MultiprocExecutor
                 └→ WorkerProc (2nd spawn) → SIGSEGV (exitcode=-11)

The inner WorkerProc consistently dies with SIGSEGV before entering worker_main. This appears to be a ROCm/HIP runtime issue triggered by nested multiprocessing spawn after HIP initialization.

Fix: Conditionally omit distributed_executor_backend on ROCm + TP=1, letting vLLM auto-select UniProcExecutor (which keeps the worker in-process, no nested spawn). For TP>1, "mp" is still set as MultiprocExecutor is required.

  • CUDA/NVIDIA behavior: unchanged (condition is current_platform.is_rocm())
  • TP>1 behavior: unchanged (condition requires tensor_model_parallel_size == 1)

Investigation

Systematically eliminated all other hypotheses (import order, Ray worker pool, vLLM extensions, ZMQ IPC, etc.) across 10+ test scripts and 50+ runs, then isolated the crash to AsyncLLM + backend="mp" specifically:

Variant distributed_executor_backend Result
Bare LLM() N/A PASS
AsyncLLM + "mp" forced "mp" CRASH (SIGSEGV)
AsyncLLM + default not set (→ UniProcExecutor) PASS

Verified that UniProcExecutor supports the full verl worker extension path (vLLMColocateWorkerExtension, collective_rpc, reset_mm_cache).

Environment

Component Version
GPU AMD MI308X
ROCm 7.0.2
PyTorch 2.8.0+rocm7.0.2
vLLM 0.11.0rc2 (V1 engine)
verl main branch
Ray 2.44.1

Scope and Limitations

Test plan

  • Verified fix with AsyncLLM + UniProcExecutor in Ray actor (4 test variants, all PASS)
  • Verified vLLMColocateWorkerExtension injection works with UniProcExecutor
  • Verified collective_rpc calls (reset_mm_cache, monkey_patch_model) succeed
  • Full TAGPO training script reaches wandb init successfully (no more SIGSEGV)
  • CUDA/NVIDIA regression test (change is gated by is_rocm(), no behavioral change expected)

AI assistance was used (Claude) for drafting this PR description. All code changes were written, reviewed, and tested by a human.

🤖 Generated with Claude Code

…ributed_executor_backend

On ROCm with TP=1, the hardcoded distributed_executor_backend="mp" forces
vLLM to use MultiprocExecutor, creating a two-level multiprocessing spawn
chain (EngineCore -> MultiprocExecutor -> WorkerProc) inside a Ray actor
where CUDA/HIP is already initialized. This causes the inner WorkerProc
to SIGSEGV (exitcode=-11) before entering worker_main.

By omitting distributed_executor_backend when on ROCm with TP=1, vLLM
auto-selects UniProcExecutor which keeps the worker in-process, avoiding
the nested spawn. For TP>1, "mp" is still set as MultiprocExecutor is
required for inter-process communication.

CUDA/NVIDIA and TP>1 behavior are unchanged.

Tested on: ROCm 7.0.2 + PyTorch 2.8.0 + vLLM 0.11.0rc2 + verl main

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: HeShiLie <gaozhe.gao@alibaba-inc.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 25, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates vllm_async_server.py to handle a specific SIGSEGV issue on ROCm platforms with a tensor parallel size of 1 by conditionally omitting the distributed_executor_backend setting. Feedback highlights a regression where the new logic overwrites user-specified backend configurations in engine_kwargs. A code suggestion was provided to ensure that the default 'mp' backend is only applied if not already defined by the user.

Comment on lines +280 to +286
if not self._should_omit_distributed_executor_backend():
args["distributed_executor_backend"] = "mp"
else:
logger.info(
"ROCm + TP=1 detected: omitting distributed_executor_backend "
"so vLLM uses UniProcExecutor and avoids nested WorkerProc spawn."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of the conditional logic for distributed_executor_backend causes a regression in precedence. Previously, users could override the default "mp" backend by providing distributed_executor_backend in engine_kwargs (via **engine_kwargs at line 273). Now, the logic at lines 280-281 will overwrite any value provided in engine_kwargs with "mp" for all cases except ROCm with TP=1.

Additionally, the log message at line 283 might be misleading if the user explicitly provided a backend in engine_kwargs on a ROCm+TP=1 system, as the code wouldn't actually be "omitting" it (it would just be respecting the user's choice).

To fix this, we should only set the default "mp" backend if it hasn't already been specified in args (which now includes engine_kwargs).

Suggested change
if not self._should_omit_distributed_executor_backend():
args["distributed_executor_backend"] = "mp"
else:
logger.info(
"ROCm + TP=1 detected: omitting distributed_executor_backend "
"so vLLM uses UniProcExecutor and avoids nested WorkerProc spawn."
)
if "distributed_executor_backend" not in args:
if not self._should_omit_distributed_executor_backend():
args["distributed_executor_backend"] = "mp"
else:
logger.info(
"ROCm + TP=1 detected: omitting distributed_executor_backend "
"so vLLM uses UniProcExecutor and avoids nested WorkerProc spawn."
)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you Mr. gemini! Fixed in 42f2bff — now the logic checks "distributed_executor_backend" not in args first, so user-specified values from engine_kwargs are preserved.

If the user already provides distributed_executor_backend via engine_kwargs,
do not overwrite it with the default "mp". This preserves the existing
precedence where engine_kwargs can override built-in defaults.

Addresses review feedback from gemini-code-assist.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: HeShiLie <gaozhe.gao@alibaba-inc.com>
@HeShiLie HeShiLie changed the title [ROCm] Fix AsyncLLM WorkerProc SIGSEGV when distributed_executor_backend="mp" is forced with TP=1 [rollout, vllm] fix: avoid SIGSEGV on ROCm TP=1 by conditionally omitting distributed_executor_backend May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants