I was running speedup.sh with Llama model but got this issue trace.
The error follows from the file Consistency_LLM/cllm/cllm_llama_modeling.py
|
if self.model._use_flash_attention_2: |
the code needs to be updated to
if self.model.config._attn_implementation=='flash_attention_2':
Do i need to change model config to check speed of base model with jacobi iteration?
base model="meta-llama/Meta-Llama-3-8B-Instruct"
I was running speedup.sh with Llama model but got this issue trace.
The error follows from the file Consistency_LLM/cllm/cllm_llama_modeling.py
Consistency_LLM/cllm/cllm_llama_modeling.py
Line 154 in b2a7283
the code needs to be updated to
if self.model.config._attn_implementation=='flash_attention_2':Do i need to change model config to check speed of base model with jacobi iteration?
base model="meta-llama/Meta-Llama-3-8B-Instruct"