[Bug] `--quantization gguf` fails: ggml ops compiled but not registered (torch.ops._C.ggml_dequantize missing)

## Summary
Loading any GGUF model (`--quantization gguf`) fails because the `ggml_*` custom ops are never registered with the PyTorch dispatcher:

```
AttributeError: '_OpNamespace' '_C' object has no attribute 'ggml_dequantize'.
Did you mean: 'awq_dequantize'?
...
RuntimeError: Engine core initialization failed.
```

## Root cause
The GGUF kernels are **compiled** but **not registered**:

- `csrc/quantization/gguf/gguf_kernel.cu` defines `ggml_dequantize`, `ggml_mul_mat_a8`, `ggml_mul_mat_vec_a8`, `ggml_moe_a8`, `ggml_moe_a8_vec`, `ggml_moe_get_block_size`.
- `CMakeLists.txt` compiles `gguf_kernel.cu` into the `_C` extension, so the symbols are present in the `.so`.
- But `csrc/torch_bindings.cpp` has **no `ggml_*` `ops.def`/`ops.impl`** lines. AWQ/GPTQ register both symbol **and** schema; ggml only got the symbol.

Result: `torch.ops._C.ggml_dequantize` does not exist, and vLLM's GGUF weight loader (`vllm/model_executor/layers/quantization/gguf.py`) crashes.

## Reproduction (model-free, ~5s) on MetaX C500 / MACA 3.5.3.20
```python
import torch, mcoplib._C
print(hasattr(torch.ops._C, "awq_dequantize"))   # True
print(hasattr(torch.ops._C, "ggml_dequantize"))  # False  <-- bug
```
Binary evidence:
```bash
nm -D  mcoplib/_C.abi3.so | grep ggml_dequantize     # PRESENT  (kernel compiled in)
strings mcoplib/_C.abi3.so | grep 'awq_dequantize('  # PRESENT  (registered)
strings mcoplib/_C.abi3.so | grep 'ggml_dequantize(' # ABSENT   (NOT registered)
```

## Affected versions
Present in `releases/v0.17.0`, `releases/v0.18.0`, `releases/v0.19.0`, and `master` (all have 0 ggml registrations while still compiling `gguf_kernel.cu`).

## Fix
Add the `ggml_*` `ops.def`/`ops.impl` block to `csrc/torch_bindings.cpp` (matching upstream vLLM). PR incoming.

Verified on MetaX C500 (MACA 3.5.3, torch 2.8): after registering, a from-source build (`USE_PRECOMPILED_KERNEL=0`) registers `ggml_dequantize` and `Qwen2.5-0.5B-Instruct-GGUF` (q4_k_m) loads and generates correctly.

## Note: precompiled `mcoplib`
The default runtime (`USE_PRECOMPILED_KERNEL=1`) loads the precompiled `mcoplib` package, which has the **same** missing registration. The source fix only covers the `USE_PRECOMPILED_KERNEL=0` build; **`mcoplib` needs a matching rebuild** for the default path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] `--quantization gguf` fails: ggml ops compiled but not registered (torch.ops._C.ggml_dequantize missing) #262

Summary

Root cause

Reproduction (model-free, ~5s) on MetaX C500 / MACA 3.5.3.20

Affected versions

Fix

Note: precompiled `mcoplib`

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] --quantization gguf fails: ggml ops compiled but not registered (torch.ops._C.ggml_dequantize missing) #262

Description

Summary

Root cause

Reproduction (model-free, ~5s) on MetaX C500 / MACA 3.5.3.20

Affected versions

Fix

Note: precompiled mcoplib

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug] `--quantization gguf` fails: ggml ops compiled but not registered (torch.ops._C.ggml_dequantize missing) #262

Note: precompiled `mcoplib`