Using `gemv` for `batched_vec`

`batched_vec` is currently implemented to use `batched_mul` (which calls batched gemm) only with some extra reshapes.

Some basic benchmarks (on an RTX PRO 6000 Blackwell) suggest batched gemv is sometimes 1-2% faster:

<img width="1053" height="409" alt="Image" src="https://github.com/user-attachments/assets/d8f05b02-02a0-4b31-a641-76c50aad4482" />

<img width="1053" height="408" alt="Image" src="https://github.com/user-attachments/assets/4446930a-816d-4c05-9026-9494bcb883b7" />

and converges to be similar in the limit:

<img width="1051" height="411" alt="Image" src="https://github.com/user-attachments/assets/c4901fd7-2b21-41c0-b30c-f69d2fccc445" />

but in some cases is consistently slightly slower:

<img width="1054" height="409" alt="Image" src="https://github.com/user-attachments/assets/ca796d39-f2a7-426f-b667-a9317769e1ee" />

<img width="1048" height="410" alt="Image" src="https://github.com/user-attachments/assets/edad3d0c-112f-4092-bca1-516158dc57ec" />

Not sure if this has been considered already. The difference isn't huge, but in the cases where there actually is justification to specifically use gemv, it could be nice to have the option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using `gemv` for `batched_vec` #662

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Using gemv for batched_vec #662

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Using `gemv` for `batched_vec` #662