Skip to content

[TASK] Add INT8 Quantization Support in the Frontend #605

@zhanghb97

Description

@zhanghb97

Deliverables

  • A pull request (PR) enabling end-to-end DeepSeek inference with INT8 quantization

Task Description

  • Run the DeepSeek R1 inference pipeline to become familiar with the workflow.
  • Apply PyTorch INT8 quantization to DeepSeek R1 and evaluate inference performance on an AVX512-VNNI machine.
  • Extend the buddy-mlir frontend to support PyTorch INT8-quantized models.
  • After generating MLIR for the full model, build a complete end-to-end inference example based on the existing example.

Timeline

Phase Time
Coding Phase Oct 30, 2025 – Nov 6, 2025
Code Review Begins Nov 7, 2025

If finished ahead of schedule, the review process may begin earlier.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions