[codex] Discourage flow overuse on small tasks by justintime109 · Pull Request #14 · Thulr/pi-flows

justintime109 · 2026-06-13T18:51:58Z

Summary

Tightens the model-facing flow tool guidance so parent models reserve pi-flow for substantial delegated work instead of using it as the default path for tiny tasks.

Adds a dedicated npm run eval:select harness that loads the extension into headless pi and scores whether the parent model actually calls flow. The new cases cover two no-flow small-task controls and one explicit-flow positive control.

Why

The existing evals prove flow behavior once invoked, but they do not test invocation discipline. Small tasks can be cheaper and clearer in the parent context, so overuse needs its own regression signal.

Validation

npm run eval:select -- --dry-run — 3/3 passed
npm run eval:select — 3/3 passed live
npm run eval -- --dry-run — 7/7 behavior checks passed, 2 hard cases score-tracked, 3 canaries
npm run eval:compare -- --dry-run — passed
npm test — 87/87 passed
npm ci — passed; npm reported one high-severity dependency advisory
npm run check — passed

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5a10a9353

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-13T19:10:17Z

+			if (timer) clearTimeout(timer);
+			if (buffer.trim()) processLine(buffer);
+			if (stderr.trim()) state.stderr = stderr.trim();
+			resolveExit(code ?? 0);


Treat killed selection evals as failures

When a case times out and this timer kills pi with SIGTERM/SIGKILL, Node reports close with code === null and the signal separately, so code ?? 0 turns the timeout into a successful exit. If the model already emitted a matching final answer before hanging, or if a future selection case omits answerPattern, the harness can mark a timed-out run as passing instead of surfacing the infrastructure failure.

Useful? React with 👍 / 👎.

fix(pi-flows): discourage flow overuse on small tasks

f5a10a9

justintime109 marked this pull request as ready for review June 13, 2026 19:05

justintime109 merged commit 10890b4 into main Jun 13, 2026
3 checks passed

chatgpt-codex-connector Bot reviewed Jun 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Discourage flow overuse on small tasks#14

[codex] Discourage flow overuse on small tasks#14
justintime109 merged 1 commit into
mainfrom
codex/flow-selection-eval

justintime109 commented Jun 13, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justintime109 commented Jun 13, 2026

Summary

Why

Validation

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant