Support multi-file experiment bundles

Some user requests require experimenting with multiple algorithms/recsys libraries (e.g. lenskit and recbole).
In the current agent loop, the model returns a single python source string, which is written to a single .py file and executed.
This strongly discourages modularity (seperate runner scripts), and it becomes painful to maintain or iterate on larger experiments.

This is especially relevant because LensKit and RecBole are typically used through our wrapper package "omnirec".

## Current behavior (repo-specific)
Execution pipeline today:
- The LLM returns a single `code` string (see `PlanAndCode` in https://github.com/ISG-Siegen/AutoRecLab/blob/develop/treesearch/function_specs.py
- Type checking writes that string to `runfile.py` in the workspace and runs `uv run ty check` on that single file (see https://github.com/ISG-Siegen/AutoRecLab/blob/develop/treesearch/type_checker.py).
- Execution writes that string to `runfile.py` in the workspace and runs `uv run runfile.py` (see https://github.com/ISG-Siegen/AutoRecLab/blob/develop/treesearch/interpreter.py).
- Checkpointing stores a copy as `code.py` and moves generated artifacts from `out/workspace/` and `out/workspace/working/*` into `checkpoint/<node-id>/generated/` (see https://github.com/ISG-Siegen/AutoRecLab/blob/develop/treesearch/search.py).

Important nuance: the checkpoint stores `code.py`, but the interpreter actually executes `runfile.py`.

## Why this is limiting
- Large monolithic scripts are harder to debug and review.
- No way to keep “experiment A” and “experiment B” as separate modules.

## Goal
Enable the agent to produce and execute a *multi-file experiment bundle*:
- Multiple `.py` files created per node/run
- A clear entrypoint file to execute
- Type checking across the bundle (at least the entrypoint + its local imports)
- All files captured in the checkpoint for reproducibility


## Type checking across multiple files
`ty check` can be invoked on:
- a directory, or
- multiple file paths

For bundles, we likely want to run `ty check` on the bundle root directory (workspace subdir) or the entrypoint plus all `.py` files in the bundle.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-file experiment bundles #33

Current behavior (repo-specific)

Why this is limiting

Goal

Type checking across multiple files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support multi-file experiment bundles #33

Description

Current behavior (repo-specific)

Why this is limiting

Goal

Type checking across multiple files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions