Maf 18836 gpt mistal pp rope #41
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
moreh pipeline 과 rope cache 기능을 추가함.
이를 위해 해당 코드를 추가함.
기존
MistralRotaryEmbedding에서는 forward 마다 매번 cos, sin tensor를 타켓 하드웨어(GPU, NPU) 위에서 연산하였음.이 값을 계산하기위해서는 hidden의 dtype과 shape 정보만 있으면 미리 계산을 해두는 최적화를 적용 할 수 있음.
(진짜 목적은 cache 버전을 사용하지 않으면 MAF의 Pipeline 실행이 안됨)
참고 코드.
구버전 HuggingFace의 llama Code.
https://github.com/huggingface/transformers/blob/9c804f7ec42c94289ce52eaa84eed32f770311d7/src/transformers/models/deprecated/open_llama/modeling_open_llama.py#L109
MAF의 rope 코드 (CPU에서 미리 연산을 수행해둠)
https://github.com/moreh-dev/framework/blob/df54f28ce96ff43dce4c0b40a0aeb7bff7fd6b0c/IR/driver/pytorch/torch/moreh_ops/rotary_embedding.py#L35
아래는 MAF에서 호출하는 예제 입니다.
https://github.com/moreh-dev/framework/pull/8819
sh 파일은 안올려놨는데 크게 다를건 없고 모델이름을 gpt2-small-moreh 이렇게 실행하면 됩니다.
json 예제