Skip to content

[Roadmap] Relax 2026 Q2 #14

@Yangruipis

Description

@Yangruipis

Agentic

  • Agentic Rollout — training real agent apps with strict app–RL-framework separation and lightweight API integration
  • Recipes
    • DeepEyes v2 — multi-modal visual reasoning agent
    • R2E-Gym (DeepSWE) — repo-level software engineering agent training

High Performance

  • Compact Deployment — ref/actor/actor_fwd colocated with async & sync support
  • Megatron Dynamic CP — dynamic context parallelism for variable-length sequences
  • Low-precision training (FP8 / NVFP4 / INT4) — FP8 and INT4 quantized training for memory and throughput gains
  • Dynamo as rollout model router (with @NVIDIA)

Model

  • Megatron-Bridge upgrade
  • New model support
    • GLM-5, GLM-5.1
    • Qwen3.5-Omni
    • Qwen3.6 dense, Qwen3.6-MoE (30B and larger)

Algorithms

  • OPD algorithm exploration — explore more On-Policy Distillation variants (multi-teacher, asynchronous teacher serving, weighted distillation objectives) beyond baseline validation

Ecosystem

Misc

...

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions