[T614] Require grounded path existence evidence#281
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d133f5804e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| || lower.contains("is there") | ||
| || lower.contains("are there"); |
There was a problem hiding this comment.
Narrow existence detection to explicit path questions
With prompts such as Is there a bug in script.js?, TaskContractResolver still extracts script.js, but this new substring match classifies the turn as PATH_EXISTENCE_EVIDENCE_REQUIRED before the later static-web/read-target obligations run. Because that obligation can be satisfied by only listing the parent directory, the evidence gate no longer requires reading script.js before a bug/diagnosis answer. Restrict this branch to explicit file/path existence wording rather than generic is there/are there phrasing.
Useful? React with 👍 / 👎.
| } else if ((obligation == EvidenceObligation.READ_TARGET_REQUIRED | ||
| || obligation == EvidenceObligation.PATH_EXISTENCE_EVIDENCE_REQUIRED) && !protectedTarget) { | ||
| targets.add(target); |
There was a problem hiding this comment.
Use directory evidence for existence handoff
For existence-only prompts where the model initially gathers no evidence, this branch now sends the target through the read-evidence handoff path; ReadEvidenceHandoff maps each returned target to talos.read_file. That means a question like Does large.log exist? can read the whole file content into the turn just to prove existence, even though the new verifier accepts parent talos.list_dir evidence. Return parent directories/list-dir handoff calls for PATH_EXISTENCE_EVIDENCE_REQUIRED to avoid unnecessary content exposure and large reads.
Useful? React with 👍 / 👎.
Summary
Verification
./gradlew.bat test --tests "dev.talos.runtime.policy.EvidenceObligationPolicyTest" --tests "dev.talos.runtime.policy.EvidenceObligationVerifierTest" --no-daemon./gradlew.bat test --tests "dev.talos.runtime.outcome.EvidenceContainmentAnswerGuardTest" --no-daemon./gradlew.bat test --tests "dev.talos.cli.modes.ReadEvidenceHandoffTest" --no-daemon./gradlew.bat test --tests "dev.talos.runtime.policy.*" --tests "dev.talos.runtime.outcome.EvidenceContainmentAnswerGuardTest" --tests "dev.talos.cli.modes.ReadEvidenceHandoffTest" --no-daemongit diff --check./gradlew.bat validateArchitectureBoundaries --no-daemon./gradlew.bat check --no-daemon