it seems that adashield-a doesn't update defense prompts. 

hello, i just run train_our_qr.sh and got some csv files. i found that in the csv files you record the initial scores and the final scores of the queries for each scenario. i noticed that if an initial score is 10, it never becomes 1 or 5 in the final score, which suggests adashield-a didn't change an invalid defense prompt into a successful defense prompt. is this normal? i used llava as the target model and vicuna as the defense model. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

it seems that adashield-a doesn't update defense prompts. #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

it seems that adashield-a doesn't update defense prompts. #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions