Skip to content

Multiple ways to change gemmPrecision #91

@jerinphilip

Description

@jerinphilip

--gem-precision and --int* are two ways to do the same thing. Functionality would still work and be accessible without the following.

cli.add<bool>("--int8",
"Optimize speed even more aggressively sacrificing memory or precision by using 8bit integer GEMM with intgemm instead of floats. Only available on CPU. Corresponds to --gemm-precision int8");
cli.add<bool>("--int8Alpha",
"Use a precomputed quantisation multipliers for the activations. Requires a special model. Corresponds to --gemm-precision int8Alpha");
cli.add<bool>("--int8shift",
"Use a faster, shifted integer 8bit GEMM implementation. Corresponds to --gemm-precision int8shift");
cli.add<bool>("--int8shiftAlpha",
"Use a faster, shifted integer 8bit GEMM implementation, with precomputed alphas. Corresponds to --gemm-precision int8shiftAlpha");
cli.add<bool>("--int8shiftAll",
"Use a faster, shifted integer 8bit GEMM implementation even for matrices that don't have a bias. Beneficial on VNNI. Corresponds to --gemm-precision int8shiftAll");
cli.add<bool>("--int8shiftAlphaAll",
"Use a faster, shifted integer 8bit GEMM implementation even for matrices that don't have a bias, with precomputed alphas. Should be the fastest option. Corresponds to --gemm-precision int8shiftAlphaAll");
cli.add<std::string>("--gemm-precision",
"Use lower precision for the GEMM operations only. Supported values: float32, int16, int8, int8Alpha, int8shift, int8shiftAlpha, int8shiftAll, int8shiftAlphaAll", "float32");
cli.add<bool>("--dump-quantmult",

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions