[Note: edited for clarification]
Dear authors,
I was trying to run ITAS algorithm for GSM8K benchmark to get a task specific ARCHON architecture. Unfortunately, I'm a bit stuck with unsupported benchmark issues.
I can see that provided scripts under benchmarks/ and benchmarks/gsm8k repos can generate and evaluate answers.
Unfortunately, it seems like itas_algorithm script in current released version supports only "mt_bench" and "arena_hard_auto":
|
if self.search_config["benchmark"] in ["mt_bench", "arena_hard_auto"]: |
Please, let me know if I'm wrong and what steps are necessary to get a task specific ARCHON architecture.
My intuition leads me to the fact that I need to add question map to use in power_ranker:
|
QUESTION_MAP = { |
|
"arena_hard_auto": "archon/benchmarks/arena_hard_auto/arena_questions.jsonl", |
|
"mt_bench": "archon/benchmarksmt_bench/FastChat/fastchat/llm_judge/data/mt_bench/question.jsonl", |
|
} |
as well as add some logic to compare generated answer against a correct one. Is my intuition correct? Do you plan to update the code with this logic by any chance?
Thanks in advance!
[Note: edited for clarification]
Dear authors,
I was trying to run ITAS algorithm for
GSM8Kbenchmark to get a task specific ARCHON architecture. Unfortunately, I'm a bit stuck withunsupported benchmarkissues.I can see that provided scripts under
benchmarks/andbenchmarks/gsm8krepos can generate and evaluate answers.Unfortunately, it seems like
itas_algorithmscript in current released version supports only "mt_bench" and "arena_hard_auto":Archon/src/archon/itas_algorithms/itas_algorithm.py
Line 150 in d45892c
Please, let me know if I'm wrong and what steps are necessary to get a task specific ARCHON architecture.
My intuition leads me to the fact that I need to add question map to use in
power_ranker:Archon/src/archon/itas_algorithms/power_ranker.py
Lines 24 to 27 in d45892c
as well as add some logic to compare generated answer against a correct one. Is my intuition correct? Do you plan to update the code with this logic by any chance?
Thanks in advance!