Skip to content

ambroiseodt/itl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛠️ Provable Benefits of In-Tool Learning (ITL)

Arxiv Dataset Python 3.10+

Sam Houliston*, Ambroise Odonnat*, Charles Arnal*, Vivien Cabannes*. *Equal contribution.

Our codebase provides utilities to train and study large language models from a memory and generalization perspective. It relies mainly on PyTorch primitives, instead of any high-level LLM libraries, allowing researchers and practitioners to easily prototype and modify. It can be used to reproduce the experiments of the paper Provable Benefits of In-Tool Learning for Large Language Models, in which we show that tool-augmented workflows are not only practical, but also provably more scalable.

  • ⚙️ In-tool learning: learning to use a tool (e.g., a calculator or a request to a database) to solve a problem,
  • 🏋🏽 In-weight learning: memorizing the solution to a problem within the model's weights.

News

Mar 03, 2026 🥳 Our paper is accepted at ICLR 2026 Workshop on MemAgents!

Overview

Our codebase is structured as follows:

🛠️ itl
┣ 📂src # Core library NanoLlama
┃ ┣ 📂nanollama
┃   ┣ 📂agent
┃   ┣ 📂data
┃   ┣ 📂inference
┃   ┣ 📂model
┃   ┣ 📂monitor
┃   ┣ 📂visualization
┃   ┣ 📄__init__.py
┃   ┣ 📄distributed.py
┃   ┣ 📄launcher.py
┃   ┣ 📄optim.py
┃   ┣ 📄tokenizer.py
┃   ┗ 📄utils.py
┣ 📂test # Unit tests
┗ 📂apps # In-tool learning with Nanollama
  ┣ 📂memory # Controlled study of memory load
  ┃ ┣ 📂compressibility
  ┃ ┣ 📂configs
  ┃ ┣ 📂datasets
  ┃ ┣ 📂generalization
  ┃ ┣ 📂plots
  ┃ ┣ 📂scripts
  ┃ ┣ 📄__init__.py
  ┃ ┣ 📄README.md
  ┃ ┣ 📄args.py
  ┃ ┣ 📄eval.py
  ┃ ┣ 📄local_grid.py
  ┃ ┣ 📄prompt_loader.py
  ┃ ┗ 📄train.py
  ┣ 📂large_scale # Large-scale experiments
  ┃ ┣ 📂data
  ┃ ┣ 📂training
  ┃ ┣ 📂evaluation
  ┃ ┣ 📂extension
  ┃ ┣ 📂plots
  ┃ ┣ 📄__init__.py
  ┗ ┗ 📄README.md

The folder src/nanollama contains the most reusable components, which can be put together in the apps folder for various applications. The code in apps/memory can be used to study the memory load of in-tool learning in a controlled setting and the code in apps/large_scale can be used to study in-tool learning at large scale.

Getting started

We provide below the instructions to install the library and start launching experiments.

Note

LLM libraries such as datasets, transformers, trl, or lm-evaluation-harness are subject to frequent changes which might impact the behavior of the codebase. If you encounter any issues, don't hesitate to reach out.

Installation

The code runs Python 3.10+. Here are some installation instructions:

  • Install miniconda. Follow the instructions online, most likely you will execute the following commands.
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ~/Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
  • Install Python in a new conda environment (be mindful to install a Python version compatible with Pytorch):
conda create -n llm python==3.12
conda activate llm
  1. Install this repo (be mindful to install a Pytorch version compatible with your CUDA driver; use nvidia-smi to check your CUDA driver)
git clone <repo url>
cd <repo path>
pip install -e .

Optional dependencies, e.g. the LLM ones, can be added by swapping the previous command for the following one:

pip install -e ".[llm]"

More details are given in the README files of the apps folders.

Using HuggingFace pretrained models

Some models are gated, e.g., the Llama ones, and users should request the access and login to the huggingface hub to use them in the scripts. See https://huggingface.co/docs/hub/en/models-gated for more information.

Development

To verify the your installation, run unit tests with the following command at the root of this repository

python -m unittest

It should return OK.

Launching jobs

Our codebase supports launching jobs with and without Slurm. See apps/memory/README.md for more details.

Reproducing our experiments

Instructions to reproduce the experiments in our paper can be found in apps/memory/README and apps/finetuning/README.

Acknowledgments

This repository builds heavily on lingua and pal which provide easy-to-use code to train and play with LLMs.

  • Vivien Cabannes implemented the controlled-experiment components, including the small-Llama3 model implementation and the pretraining pipeline.
  • Sam Houliston implemented the large-scale experiment components, including the dataset creation, fine-tuning, and evaluation pipeline.
  • Ambroise Odonnat worked on the controlled experiment on compressibility, implemented the TriviaQA extension, and maintains the code.
  • Charles Arnal implemented the TriviaQA extension.

License

The codebase is licensed under the CC BY-NC 4.0 License.

Citation

If you find this repository useful, please consider giving a star ⭐, and citing us as:

@inproceedings{houliston2026itl,
title={Tool use is provably more scalable than in-weight memory for Large Language Models},
author={Sam Houliston and Ambroise Odonnat and Charles Arnal and Vivien Cabannes},
booktitle={ICLR 2026 Workshop on Memory for LLM-Based Agentic Systems},
year={2026},
url={https://openreview.net/forum?id=s7IRNX6FUs}
}

About

Provable Benefits of In-Tool Learning for Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors