Skip to content

[Bug] --quantization gguf fails: ggml ops compiled but not registered (torch.ops._C.ggml_dequantize missing) #262

@linkeLi0421

Description

@linkeLi0421

Summary

Loading any GGUF model (--quantization gguf) fails because the ggml_* custom ops are never registered with the PyTorch dispatcher:

AttributeError: '_OpNamespace' '_C' object has no attribute 'ggml_dequantize'.
Did you mean: 'awq_dequantize'?
...
RuntimeError: Engine core initialization failed.

Root cause

The GGUF kernels are compiled but not registered:

  • csrc/quantization/gguf/gguf_kernel.cu defines ggml_dequantize, ggml_mul_mat_a8, ggml_mul_mat_vec_a8, ggml_moe_a8, ggml_moe_a8_vec, ggml_moe_get_block_size.
  • CMakeLists.txt compiles gguf_kernel.cu into the _C extension, so the symbols are present in the .so.
  • But csrc/torch_bindings.cpp has no ggml_* ops.def/ops.impl lines. AWQ/GPTQ register both symbol and schema; ggml only got the symbol.

Result: torch.ops._C.ggml_dequantize does not exist, and vLLM's GGUF weight loader (vllm/model_executor/layers/quantization/gguf.py) crashes.

Reproduction (model-free, ~5s) on MetaX C500 / MACA 3.5.3.20

import torch, mcoplib._C
print(hasattr(torch.ops._C, "awq_dequantize"))   # True
print(hasattr(torch.ops._C, "ggml_dequantize"))  # False  <-- bug

Binary evidence:

nm -D  mcoplib/_C.abi3.so | grep ggml_dequantize     # PRESENT  (kernel compiled in)
strings mcoplib/_C.abi3.so | grep 'awq_dequantize('  # PRESENT  (registered)
strings mcoplib/_C.abi3.so | grep 'ggml_dequantize(' # ABSENT   (NOT registered)

Affected versions

Present in releases/v0.17.0, releases/v0.18.0, releases/v0.19.0, and master (all have 0 ggml registrations while still compiling gguf_kernel.cu).

Fix

Add the ggml_* ops.def/ops.impl block to csrc/torch_bindings.cpp (matching upstream vLLM). PR incoming.

Verified on MetaX C500 (MACA 3.5.3, torch 2.8): after registering, a from-source build (USE_PRECOMPILED_KERNEL=0) registers ggml_dequantize and Qwen2.5-0.5B-Instruct-GGUF (q4_k_m) loads and generates correctly.

Note: precompiled mcoplib

The default runtime (USE_PRECOMPILED_KERNEL=1) loads the precompiled mcoplib package, which has the same missing registration. The source fix only covers the USE_PRECOMPILED_KERNEL=0 build; mcoplib needs a matching rebuild for the default path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions