Skip to content

[quantization] Process past_key_values#573

Closed
stamalakhov wants to merge 1 commit intoSamsung:mainfrom
stamalakhov:quant_cache_model
Closed

[quantization] Process past_key_values#573
stamalakhov wants to merge 1 commit intoSamsung:mainfrom
stamalakhov:quant_cache_model

Conversation

@stamalakhov
Copy link
Copy Markdown
Contributor

This PR processes past_key_values in QuantLlamaModel if use_cache was set.

Draft: #570
TICO-DCO-1.0-Signed-off-by: s.malakhov s.malakhov@partner.samsung.com

This PR processes `past_key_values`  in QuantLlamaModel if `use_cache` was set.

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>
@stamalakhov stamalakhov self-assigned this Mar 23, 2026
@stamalakhov stamalakhov requested review from a team and mhs4670go March 23, 2026 14:40
@stamalakhov
Copy link
Copy Markdown
Contributor Author

@mhs4670go
Should tests for decode mode be provided?

@stamalakhov stamalakhov removed the request for review from a team March 23, 2026 14:47
@stamalakhov
Copy link
Copy Markdown
Contributor Author

Let it be tested in decode mode in the tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant