|
| 1 | +# Rename task_data to environment_data in Tests and Examples |
| 2 | + |
| 3 | +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. |
| 4 | +
|
| 5 | +**Goal:** Eliminate confusing `task_data` naming in tests and examples where the value is actually `environment_data` |
| 6 | + |
| 7 | +**Architecture:** Pure rename refactor. No behavioral changes. The production code (base class + all 6 environments + all benchmark instantiations) already uses `environment_data` consistently. This plan fixes tests, fixtures, and examples that still use the old `task_data` naming. |
| 8 | + |
| 9 | +**Tech Stack:** Python, pytest |
| 10 | + |
| 11 | +**Note:** The MACS real_data tests have a **bug** where `{"environment_data": task.environment_data}` wraps environment_data in an extra dict, causing `setup_state` to silently get `tools=[]`. This is fixed as part of the rename. |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +### Task 1: Rename MACS test fixture and local variables |
| 16 | + |
| 17 | +**Files:** |
| 18 | +- Modify: `tests/test_benchmarks/test_macs/conftest.py:428-432` |
| 19 | +- Modify: `tests/test_benchmarks/test_macs/test_macs_environment.py` (all `sample_task_data` and `task_data` references) |
| 20 | +- Modify: `tests/test_benchmarks/test_macs/test_macs_integration.py:136-178` (local `task_data` variables) |
| 21 | + |
| 22 | +- [ ] **Step 1: Rename fixture in conftest.py** |
| 23 | + |
| 24 | +In `tests/test_benchmarks/test_macs/conftest.py`, rename the fixture from `sample_task_data` to `sample_environment_data`: |
| 25 | + |
| 26 | +```python |
| 27 | +@pytest.fixture |
| 28 | +def sample_environment_data(sample_tool_specs): |
| 29 | + """Sample environment data dict for MACSEnvironment creation.""" |
| 30 | + return { |
| 31 | + "tools": sample_tool_specs, |
| 32 | + } |
| 33 | +``` |
| 34 | + |
| 35 | +- [ ] **Step 2: Rename all references in test_macs_environment.py** |
| 36 | + |
| 37 | +In `tests/test_benchmarks/test_macs/test_macs_environment.py`, replace all occurrences of `sample_task_data` with `sample_environment_data`, and all local variables named `task_data` with `environment_data`. Examples: |
| 38 | + |
| 39 | +```python |
| 40 | +# Before |
| 41 | +def test_init_extracts_tool_specs(self, macs_model_factory, sample_task_data): |
| 42 | + env = MACSEnvironment(sample_task_data, macs_model_factory) |
| 43 | + |
| 44 | +# After |
| 45 | +def test_init_extracts_tool_specs(self, macs_model_factory, sample_environment_data): |
| 46 | + env = MACSEnvironment(sample_environment_data, macs_model_factory) |
| 47 | +``` |
| 48 | + |
| 49 | +```python |
| 50 | +# Before (local variable) |
| 51 | +task_data = {"tools": [...]} |
| 52 | +env = MACSEnvironment(task_data, macs_model_factory) |
| 53 | + |
| 54 | +# After |
| 55 | +environment_data = {"tools": [...]} |
| 56 | +env = MACSEnvironment(environment_data, macs_model_factory) |
| 57 | +``` |
| 58 | + |
| 59 | +- [ ] **Step 3: Rename in test_macs_integration.py** |
| 60 | + |
| 61 | +```python |
| 62 | +# Before |
| 63 | +task_data = {"tools": [...]} |
| 64 | +env = MACSEnvironment(task_data, macs_model_factory) |
| 65 | + |
| 66 | +# After |
| 67 | +environment_data = {"tools": [...]} |
| 68 | +env = MACSEnvironment(environment_data, macs_model_factory) |
| 69 | +``` |
| 70 | + |
| 71 | +- [ ] **Step 4: Run MACS tests to verify** |
| 72 | + |
| 73 | +Run: `uv run pytest tests/test_benchmarks/test_macs/test_macs_environment.py tests/test_benchmarks/test_macs/test_macs_integration.py -v` |
| 74 | +Expected: All tests PASS (no behavioral change, only renames) |
| 75 | + |
| 76 | +- [ ] **Step 5: Commit** |
| 77 | + |
| 78 | +```bash |
| 79 | +git add tests/test_benchmarks/test_macs/conftest.py tests/test_benchmarks/test_macs/test_macs_environment.py tests/test_benchmarks/test_macs/test_macs_integration.py |
| 80 | +git commit -m "test(macs): rename task_data to environment_data in test fixtures and variables" |
| 81 | +``` |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +### Task 2: Fix MACS real_data test bug and rename |
| 86 | + |
| 87 | +**Files:** |
| 88 | +- Modify: `tests/test_benchmarks/test_macs/test_macs_integration_real_data.py:64,86` |
| 89 | + |
| 90 | +- [ ] **Step 1: Fix the wrapping bug and rename** |
| 91 | + |
| 92 | +Lines 64 and 86 currently pass `{"environment_data": task.environment_data}` which wraps environment_data in an extra dict. `MACSEnvironment.setup_state` does `environment_data.get("tools", [])` on this, finding no `"tools"` key, silently producing an empty tools list. Fix by passing `task.environment_data` directly: |
| 93 | + |
| 94 | +```python |
| 95 | +# Before (line 64) |
| 96 | +env = MACSEnvironment({"environment_data": task.environment_data}, macs_model_factory) |
| 97 | + |
| 98 | +# After |
| 99 | +env = MACSEnvironment(task.environment_data, macs_model_factory) |
| 100 | +``` |
| 101 | + |
| 102 | +```python |
| 103 | +# Before (line 86) |
| 104 | +env = MACSEnvironment({"environment_data": task.environment_data}, macs_model_factory) |
| 105 | + |
| 106 | +# After |
| 107 | +env = MACSEnvironment(task.environment_data, macs_model_factory) |
| 108 | +``` |
| 109 | + |
| 110 | +- [ ] **Step 2: Commit** |
| 111 | + |
| 112 | +```bash |
| 113 | +git add tests/test_benchmarks/test_macs/test_macs_integration_real_data.py |
| 114 | +git commit -m "fix(macs): pass environment_data directly instead of wrapping in extra dict |
| 115 | +
|
| 116 | +The old code wrapped task.environment_data in {\"environment_data\": ...}, |
| 117 | +causing setup_state to silently get tools=[] via .get(\"tools\", [])." |
| 118 | +``` |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +### Task 3: Rename task_data in TAU2 test |
| 123 | + |
| 124 | +**Files:** |
| 125 | +- Modify: `tests/test_benchmarks/test_tau2/test_environment.py:1142-1143` |
| 126 | + |
| 127 | +- [ ] **Step 1: Rename local variable** |
| 128 | + |
| 129 | +```python |
| 130 | +# Before |
| 131 | +task_data = {"domain": "retail"} |
| 132 | +constructor = get_environment_constructor(task_data) |
| 133 | + |
| 134 | +# After |
| 135 | +environment_data = {"domain": "retail"} |
| 136 | +constructor = get_environment_constructor(environment_data) |
| 137 | +``` |
| 138 | + |
| 139 | +- [ ] **Step 2: Run TAU2 tests to verify** |
| 140 | + |
| 141 | +Run: `uv run pytest tests/test_benchmarks/test_tau2/test_environment.py -v -k "test_replay"` |
| 142 | +Expected: PASS |
| 143 | + |
| 144 | +- [ ] **Step 3: Commit** |
| 145 | + |
| 146 | +```bash |
| 147 | +git add tests/test_benchmarks/test_tau2/test_environment.py |
| 148 | +git commit -m "test(tau2): rename task_data to environment_data for clarity" |
| 149 | +``` |
| 150 | + |
| 151 | +--- |
| 152 | + |
| 153 | +### Task 4: Update examples |
| 154 | + |
| 155 | +**Files:** |
| 156 | +- Modify: `examples/five_a_day_benchmark/five_a_day_benchmark.py:135-153` |
| 157 | +- Modify: `examples/five_a_day_benchmark/five_a_day_benchmark.ipynb` (corresponding cells) |
| 158 | +- Modify: `examples/introduction/tutorial.ipynb` (cells using `task_data`) |
| 159 | +- Modify: `docs/guides/usage-tracking.md:325-326` |
| 160 | + |
| 161 | +- [ ] **Step 1: Update five_a_day_benchmark.py** |
| 162 | + |
| 163 | +The FiveADayEnvironment constructor and setup_state use `task_data` as both parameter name and local dict key. This is a user-facing example that should model correct naming. |
| 164 | + |
| 165 | +```python |
| 166 | +# Before (line 135) |
| 167 | +def __init__(self, task_data: Dict[str, Any], framework: str, callbacks: Optional[List] = None): |
| 168 | + ... |
| 169 | + super().__init__(task_data, callbacks) |
| 170 | + |
| 171 | +def setup_state(self, task_data: Dict[str, Any]) -> Dict[str, Any]: |
| 172 | + ... |
| 173 | + env_data = task_data["environment_data"].copy() |
| 174 | + |
| 175 | +# After |
| 176 | +def __init__(self, environment_data: Dict[str, Any], framework: str, callbacks: Optional[List] = None): |
| 177 | + ... |
| 178 | + super().__init__(environment_data, callbacks) |
| 179 | + |
| 180 | +def setup_state(self, environment_data: Dict[str, Any]) -> Dict[str, Any]: |
| 181 | + ... |
| 182 | + env_data = environment_data.copy() |
| 183 | +``` |
| 184 | + |
| 185 | +Also update the instantiation site (around line 743) where `task_data` dict is constructed. Note: this example builds a custom dict with `{"environment_data": {...}}` and passes the whole thing — it needs to pass just the inner environment_data dict directly, matching how the base class now works. |
| 186 | + |
| 187 | +- [ ] **Step 2: Update five_a_day_benchmark.ipynb** |
| 188 | + |
| 189 | +Mirror the same changes from step 1 in the notebook version. |
| 190 | + |
| 191 | +- [ ] **Step 3: Update tutorial.ipynb** |
| 192 | + |
| 193 | +The tutorial constructs a custom Environment subclass with `setup_state(self, task_data)`. Update parameter name to `environment_data`. Also rename the `task_data` exploration variable (where it indexes into the tasks list) — this one is actually a Task dict from the data loader, so it can stay as `task` or `task_dict` to distinguish from environment_data. |
| 194 | + |
| 195 | +- [ ] **Step 4: Update docs/guides/usage-tracking.md** |
| 196 | + |
| 197 | +```python |
| 198 | +# Before |
| 199 | +def __init__(self, task_data): |
| 200 | + super().__init__(task_data) |
| 201 | + |
| 202 | +# After |
| 203 | +def __init__(self, environment_data): |
| 204 | + super().__init__(environment_data) |
| 205 | +``` |
| 206 | + |
| 207 | +- [ ] **Step 5: Commit** |
| 208 | + |
| 209 | +```bash |
| 210 | +git add examples/ docs/guides/usage-tracking.md |
| 211 | +git commit -m "docs: rename task_data to environment_data in examples and guides" |
| 212 | +``` |
| 213 | + |
| 214 | +--- |
| 215 | + |
| 216 | +### Task 5: Verify all tests pass |
| 217 | + |
| 218 | +- [ ] **Step 1: Run full test suite** |
| 219 | + |
| 220 | +Run: `uv run pytest -v` |
| 221 | +Expected: All tests PASS |
| 222 | + |
| 223 | +- [ ] **Step 2: Run linting** |
| 224 | + |
| 225 | +Run: `uv run ruff check . && uv run ruff format --check .` |
| 226 | +Expected: No issues |
| 227 | + |
| 228 | +- [ ] **Step 3: Search for remaining task_data references** |
| 229 | + |
| 230 | +Run: `grep -rn "task_data" maseval/ tests/ examples/ docs/ --include="*.py" --include="*.md"` |
| 231 | + |
| 232 | +Verify that remaining `task_data` references are either: |
| 233 | +- `self._task_data` in multiagentbench.py (different concept — evaluation task data, not environment constructor param) |
| 234 | +- `task_data` in data loader test files (local variable for raw JSON task data before it becomes a Task object) |
| 235 | +- Notebook cell outputs (not editable source) |
0 commit comments