Skip to content

Optimize SmoothQuant alpha per-topology for the best accuracy on Llama2 family models #6

@vvchernov

Description

@vvchernov
  • Use different datasets for calibration (dummy, Pile, gsm8k, triviaqa ans so on)
  • Use llama2-7b with different int8 quantization types
  • Use alpha in range (0, 1)
  • Use lm-evaluation-harness to accuracy benchmark on tasks like gsm8k, triviaqa ans so on

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions