repositioning fixes, tensor shape mismatch#91
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
96aeac5 to
3d660f9
Compare
Signed-off-by: Nathan Ordonez <nathanaxcan@gmail.com>
Signed-off-by: Nathan Ordonez <nathanaxcan@gmail.com>
3d660f9 to
d3077c5
Compare
Accuracy numbers were wrong on our evals, now they're good again.
The issue was in
_perform_repositioningingpu_model_runner.pyin the v1 API.Basically if there's not too many repositioning requests, the vectors from different layers of the model are concatenated along the batch dimension before RoPE is applied twice to them.
Problem was: I forgot that the "positions" tensor telling the RoPE function what positional encodings to apply to the vectors needs to be repeated as its batch size does not match the big tensor's. Normally PyTorch would broadcast that tensor, but the RoPE function calls the
.viewfunction on the key vectors such that this is a wrong assumption.