Skip to content

v0.8.63: Enforce real user-input provenance for write and continue approvals #3315

@Hmbown

Description

@Hmbown

Context

#3275 reports a serious scope/provenance failure: the agent allegedly generated approval-like user text such as 改吧 / and then treated that text as authorization to continue broad write work. That is different from ordinary over-eagerness or weak prompt wording.

Prompt-level scope discipline helps, but it is not enough by itself. A fabricated user-like phrase can still pass a naive "did a user turn exist?" check if the runtime does not preserve and enforce real input provenance.

Goal

Make write/continue/approval actions require externally sourced user input provenance, not merely user-shaped text in the transcript.

Scope

  • Tag user-visible input at ingestion with an origin/provenance marker that distinguishes real user input from assistant text, runtime events, sub-agent completion events, imported transcript text, memory, handoffs, and synthetic/system messages.
  • Reject or ignore approvals/continues that originated from assistant/runtime/sub-agent content.
  • Ensure model-generated text cannot create an authoritative user turn, runtime event, or <codewhale:subagent.done>-style sentinel.
  • Treat inspection wording such as "look", "check", "review", and "看看" as read-only unless the user provides a separate explicit write instruction.
  • Add a regression fixture based on the CodeWhale is overly involved in making modifications, engaging in self-questioning and self-answering and deviating from user intent #3275 改吧 / self-approval case.

Acceptance Criteria

Non-goals

  • Do not rename commands or config keys here.
  • Do not attempt to solve all runaway-agent behavior through prompt text alone.
  • Do not block normal user-approved Agent/YOLO workflows when input provenance is real.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocumentationImprovements or additions to documentationenhancementNew feature or requestreliabilityReliability, flaky behavior, retries, fallbacks, and robustnesssecuritySecurity, isolation, permissions, or trust-boundary worktuiTerminal UI behavior, rendering, or interactionv0.8.63Targeting v0.8.63

    Projects

    Status
    Backlog

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions