Skip to content

fix: reject status-masking test pipelines#1208

Open
Q00 wants to merge 4 commits into
mainfrom
fix/command-proof-pipeline-status
Open

fix: reject status-masking test pipelines#1208
Q00 wants to merge 4 commits into
mainfrom
fix/command-proof-pipeline-status

Conversation

@Q00
Copy link
Copy Markdown
Owner

@Q00 Q00 commented May 24, 2026

Summary

  • keep presentation plumbing aliases for non-test commands and pipefail-protected test commands
  • reject clean pytest/test proof when a trailing output pipe can mask the upstream command status
  • cover the status-masking pytest ... | cat regression

Original PR blocker coverage

Addresses the anti-fabrication command proof blocker reported on #1186.

Verification

  • uv run pytest tests/unit/orchestrator/test_parallel_executor.py -q
  • uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1208 |
| HEAD checked | b3034d7952c221a2292ae095738515cf7b513297 |
| Request ID | req_retry_exhausted_new_timeout_1779693693_1208 |
| Review record | 3b8bed75-0e1f-4a7f-8d3c-d4e1bd27d244 |

What Improved

  • Tightens test-command evidence so pager-style output pipelines such as pytest ... | cat only prove a clean test claim when pipefail is detected.
  • Adds regression coverage for rejecting a status-masking output pipe.

Issue Requirements

Requirement Status
No linked issue requirement captured N/A

Prior Findings Status

No prior human or inline review concerns were present in the provided comment artifacts.

Blockers

# File:Line Severity Finding
1 src/ouroboros/orchestrator/parallel_executor.py:1175 BLOCKING _uses_pipefail() treats any occurrence of the text pipefail as proof that the shell preserved upstream pipeline status. This is the new trust gate used before accepting stripped test pipelines at lines 923-929 and 963-969, but it also returns true for commands like `pytest tests/test_pipefail.py \

Follow-up Findings

# File:Line Priority Confidence Suggestion
1 src/ouroboros/orchestrator/parallel_executor.py:1024 Medium Medium Consider accepting common valid pipefail preambles such as set -e -o pipefail or set -euo pipefail; the current safe-preamble check only recognizes exactly set -o pipefail, so some legitimate wrapped test commands with pipefail may still be rejected.

Non-blocking Suggestions

| 1 | tests/unit/orchestrator/test_parallel_executor.py:1188 | Test coverage | Add a regression test where pipefail appears in a test path, redirect target, or shell comment without enabling shell pipefail, because that is the bypass introduced by the current substring detector. |

Test Coverage Notes

  • Reviewed tests/unit/orchestrator/test_parallel_executor.py coverage around command-claim aliases and output-plumbing pipelines.
  • Could not run the tests in this container: python is absent, python3 is too old for this project’s Python 3.12+ syntax, and pytest is not installed.

Design Notes

The design direction is sound: pipe status preservation belongs at the evidence-normalization boundary before stripping output-only pipes. The implementation needs a stricter detector for the shell option that actually provides that guarantee.

Design / Roadmap Gate

Affected boundary: runtime Bash command evidence for commands_run / tests_passed. The PR changes when a recorded pipeline can prove a clean test invocation. Because the pipefail check is substring-based, commands that do not preserve upstream pipeline failure can cross that boundary and be treated as reliable test evidence.

Directional Notes

Maintainer memory pushed the review focus toward anti-fabrication integrity and thin, reliable evidence boundaries. The blocker above is based on current changed-source behavior, not memory alone.

Test Coverage

  • Reviewed tests/unit/orchestrator/test_parallel_executor.py coverage around command-claim aliases and output-plumbing pipelines.
  • Could not run the tests in this container: python is absent, python3 is too old for this project’s Python 3.12+ syntax, and pytest is not installed.

Merge Recommendation

Do not merge until _uses_pipefail() proves an actual enabled pipefail setting rather than matching arbitrary command text. The current implementation leaves a direct status-masking bypass in the evidence verifier.

Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: b3034d7
request_id: req_retry_exhausted_new_timeout_1779693693_1208
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Tighten test-command pipeline evidence so the pipefail trust gate recognizes only an actual shell preamble instead of arbitrary command text. This closes the status-masking bypass where a filename or comment containing pipefail let output-filter pipelines prove clean test claims.\n\nServices: shared\nAffected files:\n- src/ouroboros/orchestrator/parallel_executor.py\n- tests/unit/orchestrator/test_parallel_executor.py
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1208 |
| HEAD checked | a0d35f3b8b5142a67c7d1f66c49fced8df862f01 |
| Request ID | req_1779704190_12 |
| Review record | 14b8efa4-411a-46c4-8853-295011943159 |

What Improved

  • Tightens test-command evidence so output-filter pipelines only support clean test claims when pipefail is detected.
  • Adds regression coverage for status-masking pipes and arbitrary pipefail text.

Issue Requirements

Requirement Status
No linked issue requirement captured N/A

Prior Findings Status

Prior bot concern about arbitrary substring matching for pipefail is withdrawn: current HEAD uses shlex.split() and exact ["set", "-o", "pipefail"], with a regression test for comment/path text. A related blocker remains because the exact pipefail segment can still appear after the masked pipeline and be treated as protecting it.

Blockers

# File:Line Severity Finding
1 src/ouroboros/orchestrator/parallel_executor.py:965 BLOCKING The pipefail check is not ordered relative to the pipeline it is protecting. _test_invocation_from_shell_body() calls _uses_pipefail(body) for the whole shell body, and _uses_pipefail() accepts any later set -o pipefail segment. A transcript command like `/bin/bash -lc 'cd /workspace && pytest tests/test_foo.py \

Follow-up Findings

# File:Line Priority Confidence Suggestion
1 src/ouroboros/orchestrator/parallel_executor.py:1024 Medium Medium Consider accepting common valid pipefail preambles such as set -e -o pipefail or set -euo pipefail; the current safe-preamble check only recognizes exactly set -o pipefail.

Non-blocking Suggestions

| 1 | tests/unit/orchestrator/test_parallel_executor.py:1188 | Test coverage | Add a regression test where set -o pipefail appears after pytest ... \| cat, proving the verifier rejects pipefail settings that occur too late to preserve that pipeline’s status. |

Test Coverage Notes

  • Reviewed tests/unit/orchestrator/test_parallel_executor.py coverage around pipefail, status-masking output pipes, and arbitrary pipefail text.
  • Could not run pytest: /usr/bin/python3: No module named pytest. python3 -m py_compile passed for the changed source and test file.

Design Notes

The evidence-normalization boundary is the right place for this check, but the pipefail proof must be tied to shell execution order for the specific pipeline being stripped.

Design / Roadmap Gate

Affected boundary: runtime Bash command evidence for commands_run / tests_passed. The PR changes when a recorded pipeline can prove a clean test invocation. Because pipefail is accepted from anywhere in the shell body, stale or late shell state can still let status-masking output filters cross the evidence boundary.

Directional Notes

Maintainer memory shaped the focus on anti-fabrication integrity and reliable runtime evidence boundaries. The blocker above is based on current HEAD source behavior, not memory alone.

Test Coverage

  • Reviewed tests/unit/orchestrator/test_parallel_executor.py coverage around pipefail, status-masking output pipes, and arbitrary pipefail text.
  • Could not run pytest: /usr/bin/python3: No module named pytest. python3 -m py_compile passed for the changed source and test file.

Merge Recommendation

Do not merge until pipefail detection is ordered and scoped to the specific stripped pipeline. The current implementation still accepts a clean test claim for a pipeline whose failing upstream command can be masked before pipefail is enabled.

Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: a0d35f3
request_id: req_1779704190_12
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1208 |
| HEAD checked | 2288efa130c32b25105ebec680e707e8f7f4058c |
| Request ID | req_1779710548_26 |
| Review record | 56037304-fa2b-4f1e-a7b7-e8b1587a8c39 |

What Improved

  • Tightens runtime evidence matching so pager-style output pipelines for test commands only prove clean commands_run / tests_passed claims when pipefail is enabled before the protected pipeline.
  • Adds regression coverage for unprotected status-masking pipes, arbitrary pipefail text, and late set -o pipefail.

Issue Requirements

Requirement Status
No linked issue requirement captured N/A

Prior Findings Status

Prior bot blockers are withdrawn. The original arbitrary substring concern is addressed by exact shlex.split() based set -o pipefail recognition. The later ordering concern is addressed by _segments_after_safe_shell_preamble_with_pipefail() tracking pipefail state before each shell segment and by the regression at tests/unit/orchestrator/test_parallel_executor.py:1225.

Blockers

No in-scope blocking findings remained after policy filtering.

Follow-up Findings

# File:Line Priority Confidence Suggestion
None.

Non-blocking Suggestions

| 1 | src/ouroboros/orchestrator/parallel_executor.py:1198 | Cleanup | _uses_pipefail() is now unused after the ordered pipefail checks were introduced. Removing it would reduce duplicate pipefail helper surface. |

Test Coverage Notes

  • Reviewed changed tests in tests/unit/orchestrator/test_parallel_executor.py, including the new regressions for status-masking output pipes, arbitrary pipefail text, and set -o pipefail appearing after the pipeline.
  • Ran python3 -m py_compile src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py: passed.
  • Could not run pytest because this container lacks pytest (/usr/bin/python3: No module named pytest).

Design Notes

The change stays at the runtime evidence normalization boundary, which is the right layer for deciding whether a stripped output pipe still preserves command status.

Design / Roadmap Gate

Affected boundary: Bash runtime evidence for commands_run and tests_passed. Current HEAD ties clean test-command aliases to pipefail being enabled before the specific stripped pipeline, rejects stale or late pipefail state, and preserves compatibility for non-test command output plumbing.

Directional Notes

Maintainer memory shaped the focus on anti-fabrication integrity and reliable runtime evidence boundaries. No blocker is based on memory alone.

Test Coverage

  • Reviewed changed tests in tests/unit/orchestrator/test_parallel_executor.py, including the new regressions for status-masking output pipes, arbitrary pipefail text, and set -o pipefail appearing after the pipeline.
  • Ran python3 -m py_compile src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py: passed.
  • Could not run pytest because this container lacks pytest (/usr/bin/python3: No module named pytest).

Merge Recommendation

Approve. I found no current blocking evidence-boundary regression in the changed source; test execution remains limited only by missing local pytest.

Review-Metadata:
verdict: APPROVE
head_sha: 2288efa
request_id: req_1779710548_26
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant