Deliverables
- A pull request (PR) enabling common quantisation patterns below.
- A pull request add quantisation examples.
- A pull request deploy these to Gemmini backend to test accuracy,with a preliminary static quantisation script for different models.
Task Description
- Extend Quantisation Patterns
- Extend the quantisation framework under buddy-mlir frontend to support more common quantisation (per-channel/per-tenso/per-block and weight & activation). There is a weight_only_channel_wise pattern for your reference. Simply require static and post-training INT8 quantisation.
- Add Relative Examples
- Add relative examples for quantisation under examples as BuddyQuant. Inputting basic operators such as MatmulOp can generate MLIR with quantisation for corresponding data formats.
- Deployed to Gemmini
- Run the Gemmini E2E deployment on FPGA to become familiar with the workflow.
- Owing to the hardware characteristics of gemmini, we need apply INT8 quantisation to both the weight and activation components of the matrix multiplication, whilst employing dequantisation for the remaining parts.
- This step requires separate quantification tailored to the characteristics of different models.
Timeline
| Phase |
Time |
| Coding Phase |
Feb 28, 2026 – March 31, 2026 |
| Code Review |
Begins March 31, 2026 |
If finished ahead of schedule, the review process may begin earlier.
Notes
The PR deployed to Gemmini should be submitted to the buddy-examples repository.