[Bugfix] Register GGUF (ggml) ops so `--quantization gguf` works by linkeLi0421 · Pull Request #263 · MetaX-MACA/vLLM-metax

linkeLi0421 · 2026-05-30T15:08:13Z

Fixes #262.

Problem

--quantization gguf fails on vllm-metax:

AttributeError: '_OpNamespace' '_C' object has no attribute 'ggml_dequantize'.
Did you mean: 'awq_dequantize'?

csrc/quantization/gguf/gguf_kernel.cu is compiled into _C (it's in CMakeLists.txt) and csrc/ops.h declares all six ggml_* functions — but csrc/torch_bindings.cpp never registered them. AWQ/GPTQ register both symbol and schema; ggml only had the symbol, so torch.ops._C.ggml_dequantize doesn't exist.

Fix

Add the ggml_* ops.def/ops.impl block (matching upstream vLLM) after the GPTQ registrations. No kernel changes — the kernels were already built.

Validation

On MetaX C500 (MACA 3.5.3.20, torch 2.8), releases/v0.17.0, from-source build (USE_PRECOMPILED_KERNEL=0):

	before	after
`torch.ops._C.ggml_dequantize`	missing	registered
`--quantization gguf` load	crashes	works

Qwen2.5-0.5B-Instruct-GGUF (q4_k_m) loads and generates coherent output (EN + ZH) after the fix.

Note for maintainers

The default runtime (USE_PRECOMPILED_KERNEL=1) loads the precompiled mcoplib package, which has the same missing registration. This source fix only reaches the USE_PRECOMPILED_KERNEL=0 build — mcoplib needs the same ggml_* registration restored for the default path. (The kernels are already in mcoplib; only the TORCH_LIBRARY registration is missing.)

The GGUF kernels in csrc/quantization/gguf/gguf_kernel.cu are compiled into the _C extension (CMakeLists.txt lists gguf_kernel.cu), but their TORCH_LIBRARY registrations were missing from csrc/torch_bindings.cpp. As a result torch.ops._C.ggml_dequantize (and the other ggml_* ops) are never registered with the dispatcher, so loading any GGUF model fails: AttributeError: '_OpNamespace' '_C' object has no attribute 'ggml_dequantize'. Did you mean: 'awq_dequantize'? awq/gptq kernels register both the symbol and the schema; ggml only had the symbol. This adds the ggml ops.def/ops.impl block (matching upstream vLLM) right after the GPTQ registrations, binding the schemas to the already-built kernels. No kernel code changes. Verified on a MetaX C500 (MACA 3.5.3): with the ggml ops registered, Qwen2.5-0.5B-Instruct-GGUF (q4_k_m) loads and generates correctly via --quantization gguf.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds Torch library bindings for GGML quantization kernels (dequantization, matrix-vector/matrix multiplication, and MoE variants) to the CUDA extension.

Changes:

Register ggml_dequantize, ggml_mul_mat_vec_a8, and ggml_mul_mat_a8 CUDA ops.
Register ggml_moe_a8 and ggml_moe_a8_vec MoE CUDA ops.
Register ggml_moe_get_block_size op (no backend dispatch key).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gemini-code-assist

Code Review

This pull request registers several GGML-related operations to the Torch library bindings, including dequantization, matrix-vector multiplication, matrix-matrix multiplication, and Mixture of Experts (MoE) kernels. No review comments were provided, so there is no feedback to address.

Copilot AI review requested due to automatic review settings May 30, 2026 15:08

Copilot AI reviewed May 30, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Register GGUF (ggml) ops so `--quantization gguf` works#263

[Bugfix] Register GGUF (ggml) ops so `--quantization gguf` works#263
linkeLi0421 wants to merge 1 commit into
MetaX-MACA:masterfrom
linkeLi0421:fix-gguf-ggml-op-registration

linkeLi0421 commented May 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

linkeLi0421 commented May 30, 2026

Problem

Fix

Validation

Note for maintainers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants