[peculator training] Update benchmark_speculator_logical.py to support gpt_bigcode/granite#62
Conversation
|
Thanks @sahilsuneja1 for putting this together! I don't think we need the caller script - it's ultimately just a simple python call with a bunch of arguments, right? I'd just add a comment with a sample call or two (Llama 7b / granite 20b?) to the top of the |
| # This example script measures the logical speedup of running a speculator atop a base model. Run as: | ||
| # export CUDA_VISIBLE_DEVICES=1 | ||
| # e.g., #1: torchrun --nproc_per_node=1 benchmark_speculator_logical.py --architecture=paged_llama --variant=7b --model_path=~/models/7B-F --tokenizer=~/models/tokenizer.model --model_source=hf --speculator_path=~/models/speculator_7B_F.pth --compile | ||
| # e.g., #2: torchrun --nproc_per_node=1 benchmark_speculator_logical.py --architecture=paged_gpt_bigcode --variant=ibm.20b --model_path=~/models/granite-20b-instruct --tokenizer=~/models/granite-20b-instruct --model_source=hf --speculator_path=~/models/speculator_granite20B.pth --n_predict=4 --threshes=[6,4,3,3] |
There was a problem hiding this comment.
Needs a --data_path and --subdata!
There was a problem hiding this comment.
Thanks, fixed with fake values
|
Updated formatting |
|
Further CI complaints. I'll try and figure out how to get them running automatically for you rather than having to wait for my explicit go-ahead |
|
Fixed isort issues |
|
Dunno how to fix the mypy errors |
|
There are some suggestions here on how to fix some of these, possibly via a fully qualified import? @afrittoli may also have suggestions. pasting the mypy errors here, for easier reference: |
|
Looks like a bunch of that is because this is relying on the paged attention branch, which hasn't fully landed in |
we can either point directly at the github URL within requirements.txt (see some examples here: https://stackoverflow.com/questions/16584552/how-to-state-in-requirements-txt-a-direct-github-source) or we could publish fms-extras to pypi. probably the first option is simpler/easier |
@daviswer: should we also add the caller script?