AttributeError: 'LlamaModel' object has no attribute '_use_flash_attention_2'

I was running speedup.sh with Llama model but got this issue trace.
 
<img width="1173" alt="Screenshot 2024-05-20 at 2 42 06 PM" src="https://github.com/hao-ai-lab/Consistency_LLM/assets/124118038/cab1069e-f6e5-42f8-a480-ad681fc6c9c2">

The error follows from the file Consistency_LLM/cllm/cllm_llama_modeling.py
https://github.com/hao-ai-lab/Consistency_LLM/blob/b2a7283bafd65121e868b92fbeb811aac140be17/cllm/cllm_llama_modeling.py#L154

the code needs to be updated to 
```if self.model.config._attn_implementation=='flash_attention_2':```
Do i need to change model config to check speed of base model with jacobi iteration?
base model="meta-llama/Meta-Llama-3-8B-Instruct"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'LlamaModel' object has no attribute '_use_flash_attention_2' #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AttributeError: 'LlamaModel' object has no attribute '_use_flash_attention_2' #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions