Skip to content

[TASK] Add Quantisation Examples and Deployed to Gemmini #705

@shirohasuki

Description

@shirohasuki

Deliverables

  • A pull request (PR) enabling common quantisation patterns below.
  • A pull request add quantisation examples.
  • A pull request deploy these to Gemmini backend to test accuracy,with a preliminary static quantisation script for different models.

Task Description

  1. Extend Quantisation Patterns
  • Extend the quantisation framework under buddy-mlir frontend to support more common quantisation (per-channel/per-tenso/per-block and weight & activation). There is a weight_only_channel_wise pattern for your reference. Simply require static and post-training INT8 quantisation.
  1. Add Relative Examples
  • Add relative examples for quantisation under examples as BuddyQuant. Inputting basic operators such as MatmulOp can generate MLIR with quantisation for corresponding data formats.
  1. Deployed to Gemmini
  • Run the Gemmini E2E deployment on FPGA to become familiar with the workflow.
  • Owing to the hardware characteristics of gemmini, we need apply INT8 quantisation to both the weight and activation components of the matrix multiplication, whilst employing dequantisation for the remaining parts.
  • This step requires separate quantification tailored to the characteristics of different models.

Timeline

Phase Time
Coding Phase Feb 28, 2026 – March 31, 2026
Code Review Begins March 31, 2026

If finished ahead of schedule, the review process may begin earlier.

Notes

The PR deployed to Gemmini should be submitted to the buddy-examples repository.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions