Skip to content

Commit 153fee8

Browse files
authored
Rename task_data to environment_data in tests, examples, and docs (#58)
fix: rename task_data to environment_data for consistency and fix MACS test bug (#58) Rename task_data parameter to environment_data across test fixtures, examples, and docs to match the base class API. Fix MACS real-data tests that wrapped task.environment_data in an extra dict, causing setup_state to silently get an empty tools list.
1 parent e1159de commit 153fee8

34 files changed

Lines changed: 894 additions & 537 deletions

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111

1212
### Changed
1313

14+
- Renamed `task_data` parameter to `environment_data` across all environment constructors, test fixtures, and examples for consistency with the base class API. (PR: #58)
15+
1416
### Fixed
1517

18+
- Fixed MACS real-data tests passing `{"environment_data": task.environment_data}` instead of `task.environment_data` directly, which caused `setup_state` to silently receive an empty tools list. (PR: #58)
19+
1620
### Removed
1721

1822
## [0.4.0] - 2026-03-28

docs/guides/usage-tracking.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -322,8 +322,8 @@ Tools, environments, and other components can track arbitrary usage by inheritin
322322
from maseval import Usage, UsageTrackableMixin
323323

324324
class BloombergEnvironment(Environment, UsageTrackableMixin):
325-
def __init__(self, task_data):
326-
super().__init__(task_data)
325+
def __init__(self, environment_data):
326+
super().__init__(environment_data)
327327
self._usage_records = []
328328

329329
def _call_bloomberg(self, query):
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Rename task_data to environment_data in Tests and Examples
2+
3+
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4+
5+
**Goal:** Eliminate confusing `task_data` naming in tests and examples where the value is actually `environment_data`
6+
7+
**Architecture:** Pure rename refactor. No behavioral changes. The production code (base class + all 6 environments + all benchmark instantiations) already uses `environment_data` consistently. This plan fixes tests, fixtures, and examples that still use the old `task_data` naming.
8+
9+
**Tech Stack:** Python, pytest
10+
11+
**Note:** The MACS real_data tests have a **bug** where `{"environment_data": task.environment_data}` wraps environment_data in an extra dict, causing `setup_state` to silently get `tools=[]`. This is fixed as part of the rename.
12+
13+
---
14+
15+
### Task 1: Rename MACS test fixture and local variables
16+
17+
**Files:**
18+
- Modify: `tests/test_benchmarks/test_macs/conftest.py:428-432`
19+
- Modify: `tests/test_benchmarks/test_macs/test_macs_environment.py` (all `sample_task_data` and `task_data` references)
20+
- Modify: `tests/test_benchmarks/test_macs/test_macs_integration.py:136-178` (local `task_data` variables)
21+
22+
- [ ] **Step 1: Rename fixture in conftest.py**
23+
24+
In `tests/test_benchmarks/test_macs/conftest.py`, rename the fixture from `sample_task_data` to `sample_environment_data`:
25+
26+
```python
27+
@pytest.fixture
28+
def sample_environment_data(sample_tool_specs):
29+
"""Sample environment data dict for MACSEnvironment creation."""
30+
return {
31+
"tools": sample_tool_specs,
32+
}
33+
```
34+
35+
- [ ] **Step 2: Rename all references in test_macs_environment.py**
36+
37+
In `tests/test_benchmarks/test_macs/test_macs_environment.py`, replace all occurrences of `sample_task_data` with `sample_environment_data`, and all local variables named `task_data` with `environment_data`. Examples:
38+
39+
```python
40+
# Before
41+
def test_init_extracts_tool_specs(self, macs_model_factory, sample_task_data):
42+
env = MACSEnvironment(sample_task_data, macs_model_factory)
43+
44+
# After
45+
def test_init_extracts_tool_specs(self, macs_model_factory, sample_environment_data):
46+
env = MACSEnvironment(sample_environment_data, macs_model_factory)
47+
```
48+
49+
```python
50+
# Before (local variable)
51+
task_data = {"tools": [...]}
52+
env = MACSEnvironment(task_data, macs_model_factory)
53+
54+
# After
55+
environment_data = {"tools": [...]}
56+
env = MACSEnvironment(environment_data, macs_model_factory)
57+
```
58+
59+
- [ ] **Step 3: Rename in test_macs_integration.py**
60+
61+
```python
62+
# Before
63+
task_data = {"tools": [...]}
64+
env = MACSEnvironment(task_data, macs_model_factory)
65+
66+
# After
67+
environment_data = {"tools": [...]}
68+
env = MACSEnvironment(environment_data, macs_model_factory)
69+
```
70+
71+
- [ ] **Step 4: Run MACS tests to verify**
72+
73+
Run: `uv run pytest tests/test_benchmarks/test_macs/test_macs_environment.py tests/test_benchmarks/test_macs/test_macs_integration.py -v`
74+
Expected: All tests PASS (no behavioral change, only renames)
75+
76+
- [ ] **Step 5: Commit**
77+
78+
```bash
79+
git add tests/test_benchmarks/test_macs/conftest.py tests/test_benchmarks/test_macs/test_macs_environment.py tests/test_benchmarks/test_macs/test_macs_integration.py
80+
git commit -m "test(macs): rename task_data to environment_data in test fixtures and variables"
81+
```
82+
83+
---
84+
85+
### Task 2: Fix MACS real_data test bug and rename
86+
87+
**Files:**
88+
- Modify: `tests/test_benchmarks/test_macs/test_macs_integration_real_data.py:64,86`
89+
90+
- [ ] **Step 1: Fix the wrapping bug and rename**
91+
92+
Lines 64 and 86 currently pass `{"environment_data": task.environment_data}` which wraps environment_data in an extra dict. `MACSEnvironment.setup_state` does `environment_data.get("tools", [])` on this, finding no `"tools"` key, silently producing an empty tools list. Fix by passing `task.environment_data` directly:
93+
94+
```python
95+
# Before (line 64)
96+
env = MACSEnvironment({"environment_data": task.environment_data}, macs_model_factory)
97+
98+
# After
99+
env = MACSEnvironment(task.environment_data, macs_model_factory)
100+
```
101+
102+
```python
103+
# Before (line 86)
104+
env = MACSEnvironment({"environment_data": task.environment_data}, macs_model_factory)
105+
106+
# After
107+
env = MACSEnvironment(task.environment_data, macs_model_factory)
108+
```
109+
110+
- [ ] **Step 2: Commit**
111+
112+
```bash
113+
git add tests/test_benchmarks/test_macs/test_macs_integration_real_data.py
114+
git commit -m "fix(macs): pass environment_data directly instead of wrapping in extra dict
115+
116+
The old code wrapped task.environment_data in {\"environment_data\": ...},
117+
causing setup_state to silently get tools=[] via .get(\"tools\", [])."
118+
```
119+
120+
---
121+
122+
### Task 3: Rename task_data in TAU2 test
123+
124+
**Files:**
125+
- Modify: `tests/test_benchmarks/test_tau2/test_environment.py:1142-1143`
126+
127+
- [ ] **Step 1: Rename local variable**
128+
129+
```python
130+
# Before
131+
task_data = {"domain": "retail"}
132+
constructor = get_environment_constructor(task_data)
133+
134+
# After
135+
environment_data = {"domain": "retail"}
136+
constructor = get_environment_constructor(environment_data)
137+
```
138+
139+
- [ ] **Step 2: Run TAU2 tests to verify**
140+
141+
Run: `uv run pytest tests/test_benchmarks/test_tau2/test_environment.py -v -k "test_replay"`
142+
Expected: PASS
143+
144+
- [ ] **Step 3: Commit**
145+
146+
```bash
147+
git add tests/test_benchmarks/test_tau2/test_environment.py
148+
git commit -m "test(tau2): rename task_data to environment_data for clarity"
149+
```
150+
151+
---
152+
153+
### Task 4: Update examples
154+
155+
**Files:**
156+
- Modify: `examples/five_a_day_benchmark/five_a_day_benchmark.py:135-153`
157+
- Modify: `examples/five_a_day_benchmark/five_a_day_benchmark.ipynb` (corresponding cells)
158+
- Modify: `examples/introduction/tutorial.ipynb` (cells using `task_data`)
159+
- Modify: `docs/guides/usage-tracking.md:325-326`
160+
161+
- [ ] **Step 1: Update five_a_day_benchmark.py**
162+
163+
The FiveADayEnvironment constructor and setup_state use `task_data` as both parameter name and local dict key. This is a user-facing example that should model correct naming.
164+
165+
```python
166+
# Before (line 135)
167+
def __init__(self, task_data: Dict[str, Any], framework: str, callbacks: Optional[List] = None):
168+
...
169+
super().__init__(task_data, callbacks)
170+
171+
def setup_state(self, task_data: Dict[str, Any]) -> Dict[str, Any]:
172+
...
173+
env_data = task_data["environment_data"].copy()
174+
175+
# After
176+
def __init__(self, environment_data: Dict[str, Any], framework: str, callbacks: Optional[List] = None):
177+
...
178+
super().__init__(environment_data, callbacks)
179+
180+
def setup_state(self, environment_data: Dict[str, Any]) -> Dict[str, Any]:
181+
...
182+
env_data = environment_data.copy()
183+
```
184+
185+
Also update the instantiation site (around line 743) where `task_data` dict is constructed. Note: this example builds a custom dict with `{"environment_data": {...}}` and passes the whole thing — it needs to pass just the inner environment_data dict directly, matching how the base class now works.
186+
187+
- [ ] **Step 2: Update five_a_day_benchmark.ipynb**
188+
189+
Mirror the same changes from step 1 in the notebook version.
190+
191+
- [ ] **Step 3: Update tutorial.ipynb**
192+
193+
The tutorial constructs a custom Environment subclass with `setup_state(self, task_data)`. Update parameter name to `environment_data`. Also rename the `task_data` exploration variable (where it indexes into the tasks list) — this one is actually a Task dict from the data loader, so it can stay as `task` or `task_dict` to distinguish from environment_data.
194+
195+
- [ ] **Step 4: Update docs/guides/usage-tracking.md**
196+
197+
```python
198+
# Before
199+
def __init__(self, task_data):
200+
super().__init__(task_data)
201+
202+
# After
203+
def __init__(self, environment_data):
204+
super().__init__(environment_data)
205+
```
206+
207+
- [ ] **Step 5: Commit**
208+
209+
```bash
210+
git add examples/ docs/guides/usage-tracking.md
211+
git commit -m "docs: rename task_data to environment_data in examples and guides"
212+
```
213+
214+
---
215+
216+
### Task 5: Verify all tests pass
217+
218+
- [ ] **Step 1: Run full test suite**
219+
220+
Run: `uv run pytest -v`
221+
Expected: All tests PASS
222+
223+
- [ ] **Step 2: Run linting**
224+
225+
Run: `uv run ruff check . && uv run ruff format --check .`
226+
Expected: No issues
227+
228+
- [ ] **Step 3: Search for remaining task_data references**
229+
230+
Run: `grep -rn "task_data" maseval/ tests/ examples/ docs/ --include="*.py" --include="*.md"`
231+
232+
Verify that remaining `task_data` references are either:
233+
- `self._task_data` in multiagentbench.py (different concept — evaluation task data, not environment constructor param)
234+
- `task_data` in data loader test files (local variable for raw JSON task data before it becomes a Task object)
235+
- Notebook cell outputs (not editable source)

0 commit comments

Comments
 (0)