|
inputs = processor(text=[query], images=[reference_img], return_tensors="pt").to("cuda", torch.float16) |
|
index_input_ids = inputs["input_ids"].shape[1] |
|
|
|
generate_ids = model.generate(**inputs, do_sample=True, max_length=512, temperature=0.2, top_p=0.9,) |
|
response = processor.decode(generate_ids[0, inputs["input_ids"].shape[1]:], skip_special_tokens=False) |
|
|
|
inputs = processor(text=[query+response], images=[reference_img], return_tensors="pt").to("cuda", torch.float16) |
|
output = model(**inputs, output_hidden_states=True) |
|
|
|
img_activations = {} |
|
for layer in layers: |
|
hidden_states = output.hidden_states[layer].detach().cpu() |
|
img_activations[layer] = torch.mean(hidden_states[0, index_input_ids+24*24:], dim=0) |
The length of the image embedding should already be included in the index input_ids, so there is no need to add 24 * 24 when obtaining the activation value in the future
ASTRA/extract_ref/extracting_activations_llava_ref.py
Lines 43 to 55 in 54df103
The length of the image embedding should already be included in the index input_ids, so there is no need to add 24 * 24 when obtaining the activation value in the future