llm-tool-use

[NAACL'25 Findings] Self-Training Large Language Models for Tool-Use Without Demonstrations

Environment

The details of dependencies are in environment.yml

Usage

QA datasets collection

Downloading dataset from HuggingFace and sampling subsets for experiments:

mkdir -p data
python toolusellm/generate_dataset.py

Data generation & filtering

To generate training data for supervised fine-tuning or preference fine-tuning, below is an example of dataset TriviaQA:

Data generation: specify dataset name, model name, and subset in prompt.sh, conduct model inference on the training set of TriviaQA, and then save the output JSON file to results/triviaqa-subset-train.jsonl;

sh prompt.sh

Data filtering: filter out "correct" tool-using traces with the specified metric

python toolusellm/prepare_data.py \
    --input_json results/triviaqa-subset-train.jsonl \
    --output_json training_data/sft.triviaqa.train.acc.jsonl \
    --data_type sft \
    --metric acc \
    --dataset triviaqa

Model inference

sh prompt.sh

Model training

Supervised fine-tuning experiments:

sh sft.sh

Preference fine-tuning experiments:

sh pft.sh

Note: before running the shell script, specify the variables (e.g., model name, dataset name, etc) in the scripts accordingly.

Model evaluation

Compute Exact Match and Accuracy:

python evaluation/compute_score.py \
    --json ${result_jsonl} \
    --dataset ${dataset}

Compute Invoke Rate, Pass Rate and Answerable Rate:

python evaluation/compute_rate.py \
    --json ${result_jsonl}

Note: specify the ${result_jsonl} and ${dataset} as needed.

Acknowledgement

The model inference and training codes of this repo are supported by HuggingFace trl, transformers, and peft. The evaluation implementation of the repo incorporated codes from mandarjoshi90/triviaqa, nelson-liu/lost-in-the-middle, EleutherAI/lm-evaluation-harness. The tools implementation of the repo adapted codes from ernie-research/Tool-Augmented-Reward-Model, lucidrains/toolformer-pytorch.

A heartfelt thank you to the authors and contributors of these projects for their invaluable work and open-source contributions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llm-tool-use

Environment

Usage

QA datasets collection

Data generation & filtering

Model inference

Model training

Model evaluation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
evaluation		evaluation
toolusellm		toolusellm
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pft.sh		pft.sh
prompt.sh		prompt.sh
sft.sh		sft.sh

License

neneluo/llm-tool-use

Folders and files

Latest commit

History

Repository files navigation

llm-tool-use

Environment

Usage

QA datasets collection

Data generation & filtering

Model inference

Model training

Model evaluation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages