diff --git a/docs/plans/2026-03-12-home-effort-regression-design.md b/docs/plans/2026-03-12-home-effort-regression-design.md new file mode 100644 index 0000000..37b6910 --- /dev/null +++ b/docs/plans/2026-03-12-home-effort-regression-design.md @@ -0,0 +1,50 @@ +# Home And Effort Regression Design + +## Context + +Two regressions were introduced by the recent workdir and settings panel changes: + +1. `/home` still passes the raw `settings.workdir` string into `update_workdir()`. When the configured value is relative, `update_workdir()` resolves it against the current chat workdir instead of the configured default directory. +2. The effort settings panel renders every model-declared reasoning level, but `update_reasoning_effort()` still rejects anything outside a hard-coded `high` / `xhigh` allowlist. + +Both problems are consistency bugs. The user-visible entry points already imply a single source of truth, but the code currently applies different rules at render time and at submit time. + +## Options + +### Option 1: Localized Fixes + +- Patch `telegram.handle_home()` to normalize `settings.workdir` before calling `update_workdir()`. +- Extend the hard-coded effort allowlist to include `medium`. + +This is small, but it keeps the code fragile. Any future caller can repeat the same `/home` bug, and the effort logic would still require code changes whenever Codex adds a new reasoning level. + +### Option 2: Centralize Both Behaviors + +- Expose a public `configured_workdir()` helper from `CodexBridgeService` and route all "go home" flows through it. +- Remove the static effort allowlist and validate submitted effort values against the currently selected model's declared capabilities. + +This keeps the behavior aligned with existing service semantics and future-proofs the effort panel for new model metadata. + +## Decision + +Use Option 2. + +## Design + +### Configured Workdir + +- Promote the existing private normalization logic into a public service method. +- Keep the implementation in `CodexBridgeService`, because the configured workdir semantics belong to service state rather than Telegram transport code. +- Update both the directory browser "home" path and Telegram `/home` command to use the same public method. + +### Effort Validation + +- Treat model metadata as the source of truth for allowed effort values. +- `render_setting_panel("effort")` already reads model capabilities; `update_reasoning_effort()` should validate against the same model metadata instead of a static constant. +- Preserve existing error behavior when the current model is missing or when the submitted effort is unsupported for that model. + +### Testing + +- Add a Telegram handler regression test where `codex_workdir` is relative and the chat has moved to another directory; `/home` must resolve back to the configured directory. +- Add a service test showing that the effort panel can successfully apply `medium` when the selected model declares it. +- Keep the tests narrow so they prove the regression before the implementation change. diff --git a/docs/plans/2026-03-12-home-effort-regression.md b/docs/plans/2026-03-12-home-effort-regression.md new file mode 100644 index 0000000..53ad039 --- /dev/null +++ b/docs/plans/2026-03-12-home-effort-regression.md @@ -0,0 +1,133 @@ +# Home And Effort Regression Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Fix `/home` to return to the normalized configured workdir and make effort panel submission accept any reasoning level supported by the selected model. + +**Architecture:** Keep both behaviors owned by `CodexBridgeService`. Telegram should call a public service helper for the configured workdir, and effort validation should derive allowed values from model metadata so rendering and submission share one source of truth. + +**Tech Stack:** Python, NoneBot plugin service layer, pytest, PDM + +--- + +### Task 1: Lock In The `/home` Regression + +**Files:** +- Modify: `tests/test_telegram_handlers.py` +- Modify: `src/nonebot_plugin_codex/telegram.py` +- Modify: `src/nonebot_plugin_codex/service.py` + +**Step 1: Write the failing test** + +Add a test that uses a real `CodexBridgeService` with: +- `settings.workdir="workspace/default"` +- current chat workdir changed to another absolute path first +- `/home` invoked through `TelegramHandlers` + +Assert the resulting message points to `tmp_path / "workspace" / "default"` instead of resolving relative to the current chat workdir. + +**Step 2: Run test to verify it fails** + +Run: `pdm run pytest tests/test_telegram_handlers.py::test_handle_home_resolves_relative_configured_workdir -q` + +Expected: FAIL because `/home` still passes the raw relative config value. + +**Step 3: Write minimal implementation** + +- Add a public `configured_workdir()` helper to `CodexBridgeService`. +- Update `/home` in `TelegramHandlers` to use the normalized configured workdir. +- Reuse the same helper anywhere service-owned "home" navigation already exists. + +**Step 4: Run test to verify it passes** + +Run: `pdm run pytest tests/test_telegram_handlers.py::test_handle_home_resolves_relative_configured_workdir -q` + +Expected: PASS + +**Step 5: Commit** + +```bash +git add docs/plans/2026-03-12-home-effort-regression-design.md \ + docs/plans/2026-03-12-home-effort-regression.md \ + tests/test_telegram_handlers.py \ + src/nonebot_plugin_codex/telegram.py \ + src/nonebot_plugin_codex/service.py +git commit -m "fix: normalize configured home workdir" +``` + +### Task 2: Lock In Dynamic Effort Validation + +**Files:** +- Modify: `tests/test_service.py` +- Modify: `src/nonebot_plugin_codex/service.py` + +**Step 1: Write the failing test** + +Add a service test that creates model metadata with a model supporting `medium`, opens the effort panel, applies `medium`, and asserts the preference is updated. + +**Step 2: Run test to verify it fails** + +Run: `pdm run pytest tests/test_service.py::test_apply_effort_setting_panel_accepts_model_supported_medium -q` + +Expected: FAIL because `update_reasoning_effort()` still rejects non-`high`/`xhigh` values. + +**Step 3: Write minimal implementation** + +- Remove the static effort allowlist dependency from `update_reasoning_effort()`. +- Validate the submitted effort against the selected model's `supported_reasoning_levels`. +- Keep the existing missing-model and unsupported-effort error paths. + +**Step 4: Run test to verify it passes** + +Run: `pdm run pytest tests/test_service.py::test_apply_effort_setting_panel_accepts_model_supported_medium -q` + +Expected: PASS + +**Step 5: Commit** + +```bash +git add tests/test_service.py src/nonebot_plugin_codex/service.py +git commit -m "fix: align effort validation with model metadata" +``` + +### Task 3: Run Focused And Full Verification + +**Files:** +- Modify: `tests/test_service.py` +- Modify: `tests/test_telegram_handlers.py` +- Modify: `src/nonebot_plugin_codex/service.py` +- Modify: `src/nonebot_plugin_codex/telegram.py` + +**Step 1: Run focused regressions** + +Run: `pdm run pytest tests/test_service.py tests/test_telegram_handlers.py -q` + +Expected: PASS + +**Step 2: Run full test suite** + +Run: `pdm run pytest -q` + +Expected: PASS + +**Step 3: Run lint** + +Run: `pdm run ruff check .` + +Expected: PASS + +**Step 4: Review diff for scope** + +Run: `git diff -- src/nonebot_plugin_codex/service.py src/nonebot_plugin_codex/telegram.py tests/test_service.py tests/test_telegram_handlers.py` + +Expected: Only `/home` normalization, dynamic effort validation, and regression tests are included. + +**Step 5: Commit** + +```bash +git add src/nonebot_plugin_codex/service.py \ + src/nonebot_plugin_codex/telegram.py \ + tests/test_service.py \ + tests/test_telegram_handlers.py +git commit -m "test: cover home and effort regressions" +``` diff --git a/src/nonebot_plugin_codex/service.py b/src/nonebot_plugin_codex/service.py index cfc6bcf..bf0db79 100644 --- a/src/nonebot_plugin_codex/service.py +++ b/src/nonebot_plugin_codex/service.py @@ -25,7 +25,6 @@ WhichResolver = Callable[[str], str | None] VISIBLE_MODEL = "list" -SUPPORTED_EFFORT_COMMANDS = {"high", "xhigh"} SUPPORTED_PERMISSION_MODES = {"safe", "danger"} FALLBACK_MODEL = "gpt-5" FALLBACK_REASONING_EFFORT = "high" @@ -488,7 +487,7 @@ def __init__( self._native_history_entries: list[HistoricalSessionSummary] = [] self._native_history_loaded = False - def _configured_workdir(self) -> str: + def configured_workdir(self) -> str: configured = Path(self.settings.workdir).expanduser() try: return str(configured.resolve()) @@ -1666,7 +1665,7 @@ def _load_preferences(self) -> dict[str, ChatPreferences]: workdir=( workdir if isinstance(workdir, str) and workdir - else self._configured_workdir() + else self.configured_workdir() ), default_mode=( default_mode @@ -1743,7 +1742,7 @@ def _default_preferences(self) -> ChatPreferences: model=model_slug, reasoning_effort=effort, permission_mode="safe", - workdir=self._configured_workdir(), + workdir=self.configured_workdir(), default_mode="resume", ) @@ -1970,7 +1969,7 @@ def navigate_directory_browser( if action == "home": return self._replace_browser_state( chat_key, - self._configured_workdir(), + self.configured_workdir(), page=0, previous=browser, ) @@ -2216,14 +2215,13 @@ async def update_model(self, chat_key: str, slug: str) -> str: async def update_reasoning_effort(self, chat_key: str, effort: str) -> str: self._ensure_not_running(chat_key) - if effort not in SUPPORTED_EFFORT_COMMANDS: - raise ValueError("仅支持 high 或 xhigh。") - current = self.get_preferences(chat_key) model = self.load_models().get(current.model) if model is None: raise ValueError("当前模型不在本地缓存中。") if effort not in model.supported_reasoning_levels: + if not model.supported_reasoning_levels: + raise ValueError("当前模型未声明可用推理强度。") supported = ", ".join(model.supported_reasoning_levels) raise ValueError(f"当前模型仅支持:{supported}") diff --git a/src/nonebot_plugin_codex/telegram.py b/src/nonebot_plugin_codex/telegram.py index d66e401..3d63774 100644 --- a/src/nonebot_plugin_codex/telegram.py +++ b/src/nonebot_plugin_codex/telegram.py @@ -551,7 +551,7 @@ async def handle_home(self, bot: Bot, event: MessageEvent) -> None: try: notice = await self.service.update_workdir( self.chat_key(event), - self.service.settings.workdir, + self.service.configured_workdir(), ) await self.send_event_message(bot, event, notice) except (ValueError, RuntimeError) as exc: diff --git a/tests/conftest.py b/tests/conftest.py index f93f44f..26419b2 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -39,3 +39,31 @@ def model_cache_file(tmp_path: Path) -> Path: encoding="utf-8", ) return path + + +@pytest.fixture +def model_cache_with_medium_file(tmp_path: Path) -> Path: + path = tmp_path / "models_cache_with_medium.json" + path.write_text( + json.dumps( + { + "models": [ + { + "slug": "gpt-5", + "display_name": "GPT-5", + "visibility": "list", + "priority": 1, + "default_reasoning_level": "medium", + "supported_reasoning_levels": [ + {"effort": "medium"}, + {"effort": "high"}, + {"effort": "xhigh"}, + ], + } + ] + }, + ensure_ascii=False, + ), + encoding="utf-8", + ) + return path diff --git a/tests/test_service.py b/tests/test_service.py index a4c906b..18338d5 100644 --- a/tests/test_service.py +++ b/tests/test_service.py @@ -267,3 +267,25 @@ async def test_apply_permission_setting_panel_updates_preference( assert "danger" in notice assert service.get_preferences("private_1").permission_mode == "danger" + + +@pytest.mark.asyncio +async def test_apply_effort_setting_panel_accepts_model_supported_medium( + tmp_path: Path, + model_cache_with_medium_file: Path, +) -> None: + service = make_service(tmp_path, model_cache_with_medium_file) + panel = service.open_setting_panel("private_1", "effort") + text, _ = service.render_setting_panel("private_1") + + assert "medium" in text + + notice = await service.apply_setting_panel_selection( + "private_1", + panel.token, + panel.version, + "medium", + ) + + assert "medium" in notice + assert service.get_preferences("private_1").reasoning_effort == "medium" diff --git a/tests/test_telegram_handlers.py b/tests/test_telegram_handlers.py index 163c647..63e9593 100644 --- a/tests/test_telegram_handlers.py +++ b/tests/test_telegram_handlers.py @@ -13,6 +13,7 @@ CodexBridgeService, CodexBridgeSettings, encode_browser_callback, + encode_setting_callback, ) @@ -112,6 +113,9 @@ def get_preferences(self, chat_key: str) -> SimpleNamespace: def describe_preferences(self, chat_key: str) -> str: return "模型: gpt-5 | 推理: xhigh | 权限: safe" + def configured_workdir(self) -> str: + return self.settings.workdir + async def run_prompt( self, chat_key: str, @@ -240,6 +244,29 @@ async def update_default_mode(self, chat_key: str, mode: str) -> str: return f"当前默认模式:{mode}" +def make_real_service( + tmp_path: Path, + model_cache_file: Path, + *, + workdir: str | None = None, +) -> CodexBridgeService: + codex_config = tmp_path / "config.toml" + codex_config.write_text('model = "gpt-5"\nmodel_reasoning_effort = "xhigh"\n') + return CodexBridgeService( + CodexBridgeSettings( + binary="codex", + workdir=workdir or str(tmp_path), + models_cache_path=model_cache_file, + codex_config_path=codex_config, + preferences_path=tmp_path / "data" / "codex_bridge" / "preferences.json", + session_index_path=tmp_path / ".codex" / "session_index.jsonl", + sessions_dir=tmp_path / ".codex" / "sessions", + archived_sessions_dir=tmp_path / ".codex" / "archived_sessions", + ), + which_resolver=lambda _: "/usr/bin/codex", + ) + + def make_real_service_without_model_cache(tmp_path: Path) -> CodexBridgeService: codex_config = tmp_path / "config.toml" codex_config.write_text('model = "gpt-5"\nmodel_reasoning_effort = "xhigh"\n') @@ -340,6 +367,34 @@ async def test_handle_home_uses_configured_workdir() -> None: assert service.updated_workdirs == [service.settings.workdir] +@pytest.mark.asyncio +async def test_handle_home_resolves_relative_configured_workdir( + tmp_path: Path, + model_cache_file: Path, + monkeypatch: pytest.MonkeyPatch, +) -> None: + monkeypatch.chdir(tmp_path) + configured_home = tmp_path / "workspace" / "default" + configured_home.mkdir(parents=True) + moved_dir = tmp_path / "workspace" / "other" + moved_dir.mkdir(parents=True) + service = make_real_service( + tmp_path, + model_cache_file, + workdir="workspace/default", + ) + handlers = TelegramHandlers(service) + bot = FakeBot() + await service.update_workdir("private_1", str(moved_dir)) + + await handlers.handle_home(bot, FakeEvent("")) + + assert bot.sent[0]["text"].startswith(f"当前工作目录:{configured_home.resolve()}") + assert service.get_preferences("private_1").workdir == str( + configured_home.resolve() + ) + + @pytest.mark.asyncio @pytest.mark.parametrize( ("handler_name", "kind", "expected_text"), @@ -389,6 +444,26 @@ async def test_handle_setting_callback_updates_setting() -> None: assert bot.answered[0]["text"] == "已更新。" +@pytest.mark.asyncio +async def test_handle_setting_callback_updates_effort_when_model_supports_medium( + tmp_path: Path, + model_cache_with_medium_file: Path, +) -> None: + service = make_real_service(tmp_path, model_cache_with_medium_file) + handlers = TelegramHandlers(service) + bot = FakeBot() + panel = service.open_setting_panel("private_1", "effort") + event = FakeCallbackEvent( + encode_setting_callback(panel.token, panel.version, "set", "medium") + ) + + await handlers.handle_setting_callback(bot, event) + + assert bot.answered[0]["text"] == "已更新。" + assert service.get_preferences("private_1").reasoning_effort == "medium" + assert "当前推理强度:medium" in bot.edited[0]["text"] + + @pytest.mark.asyncio async def test_handle_pwd_works_when_model_cache_is_missing(tmp_path: Path) -> None: service = make_real_service_without_model_cache(tmp_path)