Improve Telegram control-plane operator gates#11
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c3cf7e4f8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| command = [ | ||
| str(sync_tool), | ||
| str(MCP_REPO / "bin/sync-agent-docs"), |
There was a problem hiding this comment.
Handle missing sync-agent-docs without aborting
When the MCP checkout does not include bin/sync-agent-docs (the managed-systems policy does not require that marker, and the previous code skipped this layout), this path is passed directly to run_json(), whose subprocess.run() call raises FileNotFoundError. In that environment the maintenance doctor crashes instead of returning a structured docs-sync finding or skip result, so ./bin/telegram-maintenance-doctor cannot complete its other gates.
Useful? React with 👍 / 👎.
|
|
||
|
|
||
| def live_runtime_compat_probe() -> dict[str, Any]: | ||
| doctor = run_json([str(TG_CLI), "doctor", "--json"], timeout=20) |
There was a problem hiding this comment.
Fall back when the live tg CLI is absent
If TELEGRAM_TG_CLI is missing in a fresh or partial MCP checkout, this run_json() call raises FileNotFoundError before live_runtime_compat_probe() can return doctor_unavailable and before audit_runtime_compat() can use the subprocess import fallback. That makes telegram-runtime-compat and the new operator status gate abort instead of reporting an unavailable live doctor.
Useful? React with 👍 / 👎.
| *, | ||
| ledger_status: str | None = None, | ||
| ) -> Classification: | ||
| if ledger_status in {"done", "quarantine", "processing"}: |
There was a problem hiding this comment.
Retry stale processing ledger entries
Treating processing as a terminal ledger state means any interruption after record_status(..., status="processing") but before the final done/quarantine update permanently hides that source message from both dry-run and apply passes. In that crash/SIGKILL scenario the dirty post is never retried or surfaced, so processing needs a stale-timeout/recovery path rather than being grouped with done and quarantine.
Useful? React with 👍 / 👎.
Summary
Verification
Regression loop result: control-plane 204 passed, runtime MCP 337 tests OK, MCP daemons restarted, golden live smoke 5/5, maintenance doctor ok, feature-status changed_count=0.