Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions docs/plans/2026-03-12-home-effort-regression-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Home And Effort Regression Design

## Context

Two regressions were introduced by the recent workdir and settings panel changes:

1. `/home` still passes the raw `settings.workdir` string into `update_workdir()`. When the configured value is relative, `update_workdir()` resolves it against the current chat workdir instead of the configured default directory.
2. The effort settings panel renders every model-declared reasoning level, but `update_reasoning_effort()` still rejects anything outside a hard-coded `high` / `xhigh` allowlist.

Both problems are consistency bugs. The user-visible entry points already imply a single source of truth, but the code currently applies different rules at render time and at submit time.

## Options

### Option 1: Localized Fixes

- Patch `telegram.handle_home()` to normalize `settings.workdir` before calling `update_workdir()`.
- Extend the hard-coded effort allowlist to include `medium`.

This is small, but it keeps the code fragile. Any future caller can repeat the same `/home` bug, and the effort logic would still require code changes whenever Codex adds a new reasoning level.

### Option 2: Centralize Both Behaviors

- Expose a public `configured_workdir()` helper from `CodexBridgeService` and route all "go home" flows through it.
- Remove the static effort allowlist and validate submitted effort values against the currently selected model's declared capabilities.

This keeps the behavior aligned with existing service semantics and future-proofs the effort panel for new model metadata.

## Decision

Use Option 2.

## Design

### Configured Workdir

- Promote the existing private normalization logic into a public service method.
- Keep the implementation in `CodexBridgeService`, because the configured workdir semantics belong to service state rather than Telegram transport code.
- Update both the directory browser "home" path and Telegram `/home` command to use the same public method.

### Effort Validation

- Treat model metadata as the source of truth for allowed effort values.
- `render_setting_panel("effort")` already reads model capabilities; `update_reasoning_effort()` should validate against the same model metadata instead of a static constant.
- Preserve existing error behavior when the current model is missing or when the submitted effort is unsupported for that model.

### Testing

- Add a Telegram handler regression test where `codex_workdir` is relative and the chat has moved to another directory; `/home` must resolve back to the configured directory.
- Add a service test showing that the effort panel can successfully apply `medium` when the selected model declares it.
- Keep the tests narrow so they prove the regression before the implementation change.
133 changes: 133 additions & 0 deletions docs/plans/2026-03-12-home-effort-regression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Home And Effort Regression Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Fix `/home` to return to the normalized configured workdir and make effort panel submission accept any reasoning level supported by the selected model.

**Architecture:** Keep both behaviors owned by `CodexBridgeService`. Telegram should call a public service helper for the configured workdir, and effort validation should derive allowed values from model metadata so rendering and submission share one source of truth.

**Tech Stack:** Python, NoneBot plugin service layer, pytest, PDM

---

### Task 1: Lock In The `/home` Regression

**Files:**
- Modify: `tests/test_telegram_handlers.py`
- Modify: `src/nonebot_plugin_codex/telegram.py`
- Modify: `src/nonebot_plugin_codex/service.py`

**Step 1: Write the failing test**

Add a test that uses a real `CodexBridgeService` with:
- `settings.workdir="workspace/default"`
- current chat workdir changed to another absolute path first
- `/home` invoked through `TelegramHandlers`

Assert the resulting message points to `tmp_path / "workspace" / "default"` instead of resolving relative to the current chat workdir.

**Step 2: Run test to verify it fails**

Run: `pdm run pytest tests/test_telegram_handlers.py::test_handle_home_resolves_relative_configured_workdir -q`

Expected: FAIL because `/home` still passes the raw relative config value.

**Step 3: Write minimal implementation**

- Add a public `configured_workdir()` helper to `CodexBridgeService`.
- Update `/home` in `TelegramHandlers` to use the normalized configured workdir.
- Reuse the same helper anywhere service-owned "home" navigation already exists.

**Step 4: Run test to verify it passes**

Run: `pdm run pytest tests/test_telegram_handlers.py::test_handle_home_resolves_relative_configured_workdir -q`

Expected: PASS

**Step 5: Commit**

```bash
git add docs/plans/2026-03-12-home-effort-regression-design.md \
docs/plans/2026-03-12-home-effort-regression.md \
tests/test_telegram_handlers.py \
src/nonebot_plugin_codex/telegram.py \
src/nonebot_plugin_codex/service.py
git commit -m "fix: normalize configured home workdir"
```

### Task 2: Lock In Dynamic Effort Validation

**Files:**
- Modify: `tests/test_service.py`
- Modify: `src/nonebot_plugin_codex/service.py`

**Step 1: Write the failing test**

Add a service test that creates model metadata with a model supporting `medium`, opens the effort panel, applies `medium`, and asserts the preference is updated.

**Step 2: Run test to verify it fails**

Run: `pdm run pytest tests/test_service.py::test_apply_effort_setting_panel_accepts_model_supported_medium -q`

Expected: FAIL because `update_reasoning_effort()` still rejects non-`high`/`xhigh` values.

**Step 3: Write minimal implementation**

- Remove the static effort allowlist dependency from `update_reasoning_effort()`.
- Validate the submitted effort against the selected model's `supported_reasoning_levels`.
- Keep the existing missing-model and unsupported-effort error paths.

**Step 4: Run test to verify it passes**

Run: `pdm run pytest tests/test_service.py::test_apply_effort_setting_panel_accepts_model_supported_medium -q`

Expected: PASS

**Step 5: Commit**

```bash
git add tests/test_service.py src/nonebot_plugin_codex/service.py
git commit -m "fix: align effort validation with model metadata"
```

### Task 3: Run Focused And Full Verification

**Files:**
- Modify: `tests/test_service.py`
- Modify: `tests/test_telegram_handlers.py`
- Modify: `src/nonebot_plugin_codex/service.py`
- Modify: `src/nonebot_plugin_codex/telegram.py`

**Step 1: Run focused regressions**

Run: `pdm run pytest tests/test_service.py tests/test_telegram_handlers.py -q`

Expected: PASS

**Step 2: Run full test suite**

Run: `pdm run pytest -q`

Expected: PASS

**Step 3: Run lint**

Run: `pdm run ruff check .`

Expected: PASS

**Step 4: Review diff for scope**

Run: `git diff -- src/nonebot_plugin_codex/service.py src/nonebot_plugin_codex/telegram.py tests/test_service.py tests/test_telegram_handlers.py`

Expected: Only `/home` normalization, dynamic effort validation, and regression tests are included.

**Step 5: Commit**

```bash
git add src/nonebot_plugin_codex/service.py \
src/nonebot_plugin_codex/telegram.py \
tests/test_service.py \
tests/test_telegram_handlers.py
git commit -m "test: cover home and effort regressions"
```
14 changes: 6 additions & 8 deletions src/nonebot_plugin_codex/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
WhichResolver = Callable[[str], str | None]

VISIBLE_MODEL = "list"
SUPPORTED_EFFORT_COMMANDS = {"high", "xhigh"}
SUPPORTED_PERMISSION_MODES = {"safe", "danger"}
FALLBACK_MODEL = "gpt-5"
FALLBACK_REASONING_EFFORT = "high"
Expand Down Expand Up @@ -488,7 +487,7 @@ def __init__(
self._native_history_entries: list[HistoricalSessionSummary] = []
self._native_history_loaded = False

def _configured_workdir(self) -> str:
def configured_workdir(self) -> str:
configured = Path(self.settings.workdir).expanduser()
try:
return str(configured.resolve())
Expand Down Expand Up @@ -1666,7 +1665,7 @@ def _load_preferences(self) -> dict[str, ChatPreferences]:
workdir=(
workdir
if isinstance(workdir, str) and workdir
else self._configured_workdir()
else self.configured_workdir()
),
default_mode=(
default_mode
Expand Down Expand Up @@ -1743,7 +1742,7 @@ def _default_preferences(self) -> ChatPreferences:
model=model_slug,
reasoning_effort=effort,
permission_mode="safe",
workdir=self._configured_workdir(),
workdir=self.configured_workdir(),
default_mode="resume",
)

Expand Down Expand Up @@ -1970,7 +1969,7 @@ def navigate_directory_browser(
if action == "home":
return self._replace_browser_state(
chat_key,
self._configured_workdir(),
self.configured_workdir(),
page=0,
previous=browser,
)
Expand Down Expand Up @@ -2216,14 +2215,13 @@ async def update_model(self, chat_key: str, slug: str) -> str:

async def update_reasoning_effort(self, chat_key: str, effort: str) -> str:
self._ensure_not_running(chat_key)
if effort not in SUPPORTED_EFFORT_COMMANDS:
raise ValueError("仅支持 high 或 xhigh。")

current = self.get_preferences(chat_key)
model = self.load_models().get(current.model)
if model is None:
raise ValueError("当前模型不在本地缓存中。")
if effort not in model.supported_reasoning_levels:
if not model.supported_reasoning_levels:
raise ValueError("当前模型未声明可用推理强度。")
supported = ", ".join(model.supported_reasoning_levels)
raise ValueError(f"当前模型仅支持:{supported}")

Expand Down
2 changes: 1 addition & 1 deletion src/nonebot_plugin_codex/telegram.py
Original file line number Diff line number Diff line change
Expand Up @@ -551,7 +551,7 @@ async def handle_home(self, bot: Bot, event: MessageEvent) -> None:
try:
notice = await self.service.update_workdir(
self.chat_key(event),
self.service.settings.workdir,
self.service.configured_workdir(),
)
await self.send_event_message(bot, event, notice)
except (ValueError, RuntimeError) as exc:
Expand Down
28 changes: 28 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,31 @@ def model_cache_file(tmp_path: Path) -> Path:
encoding="utf-8",
)
return path


@pytest.fixture
def model_cache_with_medium_file(tmp_path: Path) -> Path:
path = tmp_path / "models_cache_with_medium.json"
path.write_text(
json.dumps(
{
"models": [
{
"slug": "gpt-5",
"display_name": "GPT-5",
"visibility": "list",
"priority": 1,
"default_reasoning_level": "medium",
"supported_reasoning_levels": [
{"effort": "medium"},
{"effort": "high"},
{"effort": "xhigh"},
],
}
]
},
ensure_ascii=False,
),
encoding="utf-8",
)
return path
22 changes: 22 additions & 0 deletions tests/test_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,3 +267,25 @@ async def test_apply_permission_setting_panel_updates_preference(

assert "danger" in notice
assert service.get_preferences("private_1").permission_mode == "danger"


@pytest.mark.asyncio
async def test_apply_effort_setting_panel_accepts_model_supported_medium(
tmp_path: Path,
model_cache_with_medium_file: Path,
) -> None:
service = make_service(tmp_path, model_cache_with_medium_file)
panel = service.open_setting_panel("private_1", "effort")
text, _ = service.render_setting_panel("private_1")

assert "medium" in text

notice = await service.apply_setting_panel_selection(
"private_1",
panel.token,
panel.version,
"medium",
)

assert "medium" in notice
assert service.get_preferences("private_1").reasoning_effort == "medium"
Loading