You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
origin/main is 21 commits ahead of upstream/main, containing cherry-picks of all 10 PRs above. All 129 relevant tests pass. All PR branches are 0 behind upstream/main and mergeable.
This issue tracks the remaining upstream work in
waybarrios/vllm-mlxand the order to close it out cleanly from our fork.Current Upstream PRs (updated 2026-03-26)
Ready to merge (approved / strong +1)
tokenizer: return successful mlx-lm load result-- APPROVEDengine: keep SimpleEngine serialized across cancellation-- strong +1, production-validatedfix: bump mlx-lm minimum to 0.31.0 for hybrid model batching-- NEW, fixes [Bug] Engine loop error: ArraysCache.__init__() missing 1 required positional argument: 'size' when enabling --continuous-batching or running bench #11Clean, awaiting review
cli: expose harmony and gpt-oss tool parserschat: forward chat_template_kwargs on simple-engine pathssimple-engine: keep tool chat on the streaming execution pathtest: make Python 3.13 async suite pass and cover it in CIPreviously draft, now ready for review
scheduler: preserve prompt checkpoints in chunked prefill resume path-- checkpoint callback now wiredprefix_cache: preserve hybrid recurrent state across blocks-- deduplication safety test added, duplicate tokenizer hunk removedLargest scope
server: add OpenAI-compatible /v1/responses endpointFork main state
origin/mainis 21 commits ahead ofupstream/main, containing cherry-picks of all 10 PRs above. All 129 relevant tests pass. All PR branches are 0 behind upstream/main and mergeable.Recommended merge order