Deliverables
- A pull request (PR) enabling end-to-end DeepSeek inference with INT8 quantization
Task Description
- Run the DeepSeek R1 inference pipeline to become familiar with the workflow.
- Apply PyTorch INT8 quantization to DeepSeek R1 and evaluate inference performance on an AVX512-VNNI machine.
- Extend the buddy-mlir frontend to support PyTorch INT8-quantized models.
- After generating MLIR for the full model, build a complete end-to-end inference example based on the existing example.
Timeline
| Phase |
Time |
| Coding Phase |
Oct 30, 2025 – Nov 6, 2025 |
| Code Review |
Begins Nov 7, 2025 |
If finished ahead of schedule, the review process may begin earlier.
Deliverables
Task Description
Timeline
If finished ahead of schedule, the review process may begin earlier.