Skip to content

[T614] Require grounded path existence evidence#281

Merged
ai21z merged 1 commit into
v0.9.0-beta-devfrom
T614
May 31, 2026
Merged

[T614] Require grounded path existence evidence#281
ai21z merged 1 commit into
v0.9.0-beta-devfrom
T614

Conversation

@ai21z

@ai21z ai21z commented May 31, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add a dedicated path-existence evidence obligation for read-only filename existence/status prompts
  • Require either parent directory listing evidence or direct target read attempts before accepting existence answers
  • Route path-existence partial-evidence recovery through deterministic read handoff
  • Add containment so fabricated existence answers are replaced when path evidence is missing

Verification

  • RED: path-existence tests failed before implementation because the obligation did not exist
  • ./gradlew.bat test --tests "dev.talos.runtime.policy.EvidenceObligationPolicyTest" --tests "dev.talos.runtime.policy.EvidenceObligationVerifierTest" --no-daemon
  • ./gradlew.bat test --tests "dev.talos.runtime.outcome.EvidenceContainmentAnswerGuardTest" --no-daemon
  • ./gradlew.bat test --tests "dev.talos.cli.modes.ReadEvidenceHandoffTest" --no-daemon
  • ./gradlew.bat test --tests "dev.talos.runtime.policy.*" --tests "dev.talos.runtime.outcome.EvidenceContainmentAnswerGuardTest" --tests "dev.talos.cli.modes.ReadEvidenceHandoffTest" --no-daemon
  • git diff --check
  • ./gradlew.bat validateArchitectureBoundaries --no-daemon
  • ./gradlew.bat check --no-daemon

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d133f5804e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +144 to +145
|| lower.contains("is there")
|| lower.contains("are there");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Narrow existence detection to explicit path questions

With prompts such as Is there a bug in script.js?, TaskContractResolver still extracts script.js, but this new substring match classifies the turn as PATH_EXISTENCE_EVIDENCE_REQUIRED before the later static-web/read-target obligations run. Because that obligation can be satisfied by only listing the parent directory, the evidence gate no longer requires reading script.js before a bug/diagnosis answer. Restrict this branch to explicit file/path existence wording rather than generic is there/are there phrasing.

Useful? React with 👍 / 👎.

Comment on lines +74 to 76
} else if ((obligation == EvidenceObligation.READ_TARGET_REQUIRED
|| obligation == EvidenceObligation.PATH_EXISTENCE_EVIDENCE_REQUIRED) && !protectedTarget) {
targets.add(target);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use directory evidence for existence handoff

For existence-only prompts where the model initially gathers no evidence, this branch now sends the target through the read-evidence handoff path; ReadEvidenceHandoff maps each returned target to talos.read_file. That means a question like Does large.log exist? can read the whole file content into the turn just to prove existence, even though the new verifier accepts parent talos.list_dir evidence. Return parent directories/list-dir handoff calls for PATH_EXISTENCE_EVIDENCE_REQUIRED to avoid unnecessary content exposure and large reads.

Useful? React with 👍 / 👎.

@ai21z ai21z merged commit adad179 into v0.9.0-beta-dev May 31, 2026
1 check passed
@ai21z ai21z deleted the T614 branch May 31, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant