Skip to content

Align GPTQ-Pro kernel with Ampere GPUs (auto-selection + native SASS)#7

Merged
groxaxo merged 3 commits into
mainfrom
copilot/check-bottlenecks-gptq-inferencing
Apr 29, 2026
Merged

Align GPTQ-Pro kernel with Ampere GPUs (auto-selection + native SASS)#7
groxaxo merged 3 commits into
mainfrom
copilot/check-bottlenecks-gptq-inferencing

Conversation

Copilot AI commented Apr 28, 2026

Copy link
Copy Markdown
  • Raise SUPPORTS_FORMATS priority 0 → 95 so GPTQ-Pro enters auto-selection on Ampere
  • Add -gencode flags for sm_80/sm_86/sm_87 (native Ampere SASS)
  • Add -gencode arch=compute_87,code=compute_87 PTX fallback for Ada/Hopper forward compatibility

@groxaxo

groxaxo commented Apr 29, 2026

Copy link
Copy Markdown
Owner

@copilot review the changes please.

Copilot AI commented Apr 29, 2026

Copy link
Copy Markdown
Author

@copilot review the changes please.

Here's the review of all changes in this PR:

Commit 1 — QUANTIZE_BLACK_LIST (a81fc20)

  • Correct. QUANTIZE_BLACK_LIST was an empty dict {}, so no method was ever blocked. The change correctly lists non-GPTQ methods (GGUF, FP8, BitsAndBytes, QQQ, AWQ, EXL3, PARO) and improves the error message to tell the user what method to use instead.

Commit 2 — Ampere alignment (39e5aae)

  • SUPPORTS_FORMATS priority 0 → 95 — Correct. Priority 0 permanently excluded GPTQ-Pro from BACKEND.AUTO selection (the selector skips anything with priority <= 0). At 95 it ranks above Marlin (90) and ExllamaV2 (80). The existing validate_device() guard (major >= 8) ensures transparent fallthrough to Marlin on pre-Ampere hardware.
  • -gencode flags — Correct, with one issue I found and fixed: _validate_gptq_pro_device_support() checks major >= 8, which also passes on Ada Lovelace (sm_89 / RTX 4090) and Hopper (sm_90 / H100). With only SASS cubins for sm_80/86/87 and no PTX fallback, loading the kernel on those devices would fail with "no kernel image available". Fixed in 0457356 by adding -gencode arch=compute_87,code=compute_87, which embeds sm_87 PTX that the CUDA driver can forward-JIT for any post-Ampere device.

Copilot AI requested a review from groxaxo April 29, 2026 04:23
@groxaxo groxaxo merged commit 299ecd8 into main Apr 29, 2026
@groxaxo groxaxo deleted the copilot/check-bottlenecks-gptq-inferencing branch April 30, 2026 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants