forked from marian-nmt/marian-dev
-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
--gem-precision and --int* are two ways to do the same thing. Functionality would still work and be accessible without the following.
marian-dev/src/common/config_parser.cpp
Lines 933 to 947 in 844800e
| cli.add<bool>("--int8", | |
| "Optimize speed even more aggressively sacrificing memory or precision by using 8bit integer GEMM with intgemm instead of floats. Only available on CPU. Corresponds to --gemm-precision int8"); | |
| cli.add<bool>("--int8Alpha", | |
| "Use a precomputed quantisation multipliers for the activations. Requires a special model. Corresponds to --gemm-precision int8Alpha"); | |
| cli.add<bool>("--int8shift", | |
| "Use a faster, shifted integer 8bit GEMM implementation. Corresponds to --gemm-precision int8shift"); | |
| cli.add<bool>("--int8shiftAlpha", | |
| "Use a faster, shifted integer 8bit GEMM implementation, with precomputed alphas. Corresponds to --gemm-precision int8shiftAlpha"); | |
| cli.add<bool>("--int8shiftAll", | |
| "Use a faster, shifted integer 8bit GEMM implementation even for matrices that don't have a bias. Beneficial on VNNI. Corresponds to --gemm-precision int8shiftAll"); | |
| cli.add<bool>("--int8shiftAlphaAll", | |
| "Use a faster, shifted integer 8bit GEMM implementation even for matrices that don't have a bias, with precomputed alphas. Should be the fastest option. Corresponds to --gemm-precision int8shiftAlphaAll"); | |
| cli.add<std::string>("--gemm-precision", | |
| "Use lower precision for the GEMM operations only. Supported values: float32, int16, int8, int8Alpha, int8shift, int8shiftAlpha, int8shiftAll, int8shiftAlphaAll", "float32"); | |
| cli.add<bool>("--dump-quantmult", |
Metadata
Metadata
Assignees
Labels
No labels