Hi, may I ask a few questions about plans on supporting GLM-4.7 models?
I have used the DFLASH training code in SpecForge to train a draft model for GLM-4.7-FP8, and the accuracy is higher than 0.9, but when running on sglang, the accept rate is only 1.x, I used this sglang PR sgl-project/sglang#16818, ard there any other implementations needed for sglang inference for models that are not Qwen? May I know your official plans on suporting GLM-4.7-FP8?
Thanks a lot!
Hi, may I ask a few questions about plans on supporting GLM-4.7 models?
I have used the DFLASH training code in SpecForge to train a draft model for GLM-4.7-FP8, and the accuracy is higher than 0.9, but when running on sglang, the accept rate is only 1.x, I used this sglang PR sgl-project/sglang#16818, ard there any other implementations needed for sglang inference for models that are not Qwen? May I know your official plans on suporting GLM-4.7-FP8?
Thanks a lot!