Some user requests require experimenting with multiple algorithms/recsys libraries (e.g. lenskit and recbole).
In the current agent loop, the model returns a single python source string, which is written to a single .py file and executed.
This strongly discourages modularity (seperate runner scripts), and it becomes painful to maintain or iterate on larger experiments.
This is especially relevant because LensKit and RecBole are typically used through our wrapper package "omnirec".
Current behavior (repo-specific)
Execution pipeline today:
Important nuance: the checkpoint stores code.py, but the interpreter actually executes runfile.py.
Why this is limiting
- Large monolithic scripts are harder to debug and review.
- No way to keep “experiment A” and “experiment B” as separate modules.
Goal
Enable the agent to produce and execute a multi-file experiment bundle:
- Multiple
.py files created per node/run
- A clear entrypoint file to execute
- Type checking across the bundle (at least the entrypoint + its local imports)
- All files captured in the checkpoint for reproducibility
Type checking across multiple files
ty check can be invoked on:
- a directory, or
- multiple file paths
For bundles, we likely want to run ty check on the bundle root directory (workspace subdir) or the entrypoint plus all .py files in the bundle.
Some user requests require experimenting with multiple algorithms/recsys libraries (e.g. lenskit and recbole).
In the current agent loop, the model returns a single python source string, which is written to a single .py file and executed.
This strongly discourages modularity (seperate runner scripts), and it becomes painful to maintain or iterate on larger experiments.
This is especially relevant because LensKit and RecBole are typically used through our wrapper package "omnirec".
Current behavior (repo-specific)
Execution pipeline today:
codestring (seePlanAndCodein https://github.com/ISG-Siegen/AutoRecLab/blob/develop/treesearch/function_specs.pyrunfile.pyin the workspace and runsuv run ty checkon that single file (see https://github.com/ISG-Siegen/AutoRecLab/blob/develop/treesearch/type_checker.py).runfile.pyin the workspace and runsuv run runfile.py(see https://github.com/ISG-Siegen/AutoRecLab/blob/develop/treesearch/interpreter.py).code.pyand moves generated artifacts fromout/workspace/andout/workspace/working/*intocheckpoint/<node-id>/generated/(see https://github.com/ISG-Siegen/AutoRecLab/blob/develop/treesearch/search.py).Important nuance: the checkpoint stores
code.py, but the interpreter actually executesrunfile.py.Why this is limiting
Goal
Enable the agent to produce and execute a multi-file experiment bundle:
.pyfiles created per node/runType checking across multiple files
ty checkcan be invoked on:For bundles, we likely want to run
ty checkon the bundle root directory (workspace subdir) or the entrypoint plus all.pyfiles in the bundle.