Description
It is not clear which scheduler strategies are available in AReaL.
The default value used in examples (None) leads to runtime errors, which makes the system confusing and fragile for first-time users.
Issue Type
Expected Behavior
The system should either:
- Work with a safe default scheduler, or
- Clearly document valid scheduler options and enforce them in config validation.
Current Behavior
Using the default configuration results in a crash:
Traceback (most recent call last):
File "/content/AReaL/examples/vlm/geometry3k_grpo.py", line 75, in <module>
main(sys.argv[1:])
File "/content/AReaL/examples/vlm/geometry3k_grpo.py", line 61, in main
with PPOTrainer(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/areal/trainer/rl_trainer.py", line 123, in __init__
self.scheduler = self._init_scheduler()
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/areal/trainer/rl_trainer.py", line 873, in _init_scheduler
raise NotImplementedError(f"Unknown scheduler type: {cfg.type}")
NotImplementedError: Unknown scheduler type:
Additional Context
After inspecting the source code, only the following scheduler strategies are actually defined:
class SchedulingStrategyType(str, Enum):
separation = "separation"
colocation = "colocation"
And the configuration dataclass:
@dataclass
class SchedulingStrategy:
type: str = field(
default="separation",
metadata={"choices": ["separation", "colocation"]},
)
target: str | None = field(
default=None,
metadata={"help": "The target role to be colocated with"},
)
fork: bool = field(
default=True,
metadata={
"help": (
"When True with colocation, the target worker spawns a new "
"process on the same node/GPUs instead of sharing its process. "
"Provides process isolation while sharing GPU resources."
)
},
)
Suggested Fix / Clarification
- Documentation should explicitly state valid values:
scheduler:
type: separation # default
or:
scheduler:
type: colocation
- Consider rejecting
None earlier with a clear validation error message like:
Scheduler type must be one of: separation, colocation
instead of:
Unknown scheduler type: None
Description
It is not clear which scheduler strategies are available in AReaL.
The default value used in examples (
None) leads to runtime errors, which makes the system confusing and fragile for first-time users.Issue Type
Expected Behavior
The system should either:
Current Behavior
Using the default configuration results in a crash:
Additional Context
After inspecting the source code, only the following scheduler strategies are actually defined:
And the configuration dataclass:
Suggested Fix / Clarification
or:
Noneearlier with a clear validation error message like:instead of: