Skip to content

[Workspace]: Learn host-repo behavior from confirmed evidence, not hard-coded assumptions #1211

@rickardvh

Description

@rickardvh

Problem / intent

AW still has hard-coded rules and generic heuristics that can assume too much about a host repository. The pytest proof-selection failure in rickardvh/command-generation is one concrete symptom, not the whole problem.

The broader issue: AW should not assume how a host repo builds, tests, validates, documents, releases, routes proof, or structures ownership. Agents should learn how the repo works during setup and normal work, then preserve confirmed lessons through existing AW mechanisms such as Memory, promotion, canonical docs, config, tests/checks, or Planning where each lesson actually belongs.

This should not default to adding a new first-class field or hard-coded affordance table for every repo behavior. That would be a shortcut to correct behavior in one repo while making the product heavier and more assumption-shaped over time.

Core invariant

Host-repo behavior must be learned from confirmed repo evidence, recorded with provenance, and routed to the right AW owner. Hard-coded heuristics may suggest discovery questions or candidate routes, but must not become authoritative proof, workflow, ownership, or validation decisions.

AW must not treat host-repo facts as true merely because of filenames, keywords, language markers, or AW-source-repo path rules.

A learned repo fact needs:

  • scope: what repo, path, tool, workflow, or surface it applies to;
  • state: candidate, confirmed, stale, negative, or superseded;
  • provenance: how it was learned, by whom or what, and when;
  • owner: Memory, config, canonical docs, tests/checks, Planning, issue follow-up, or local-only scratch;
  • consumer contract: which AW behavior may rely on it.

The important distinction is not "never use heuristics." It is: heuristics can propose discovery, but confirmed evidence must own authority.

Existing owner surfaces

Memory already exists to preserve durable repo knowledge that is expensive to reconstruct: invariants, authority boundaries, recurring traps, operator runbooks, routing hints, and verified failure lessons. It is route-indexed and can promote symptomatic notes into better docs, tests, validation, config, or planning instead of treating memory volume as inherently good.

For host-repo learning, AW should first ask:

  • Is this a durable repo fact that belongs in Memory?
  • Is this an active work item that belongs in Planning?
  • Is this a stable command or policy that belongs in repo docs or config?
  • Is this a symptom that should become a test, check, or canonical documentation fix?
  • Is this merely a failed guess that should be recorded as negative evidence long enough to avoid repeating it?

Only if those homes are insufficient should AW introduce a new durable surface.

Current assumption pattern to remove

One current proof-selection pattern treats Python-looking files as enough to select pytest:

  • pyproject.toml, pytest.ini, or tests/ can imply Python test capability;
  • that can become uv run pytest;
  • AW-source-repo proof lanes such as workspace_cli can still shape host-repo proof selection.

That is the wrong trust boundary. These signals may justify discovery, but not an authoritative required command unless pytest is declared, configured, previously confirmed, or successfully probed in the target environment.

This pattern is an example of the broader issue: generic repo heuristics and AW-source-repo semantics should not silently become host-repo policy.

Example symptom

During setup of rickardvh/command-generation, AW selected uv run pytest as required proof for a dependency-source-only change in pyproject.toml and uv.lock.

That repo had a Python project, but it did not have pytest installed, tests present, or a repo-specific proof-route hint. The command failed with:

error: Failed to spawn: `pytest`
  Caused by: program not found

The proof explanation also classified the change under a generic workspace_cli lane with reasons such as root workspace CLI changes, even though the target repo was not the AW source repo and the change only switched the AW dev dependency from a local checkout to a Git source.

Why this matters

AW is meant to be a repo-native continuity and execution layer, not a bundle of assumptions about how all repositories behave. A fresh host repo should start with conservative discovery and promotion paths, not with AW-source-repo or Python-project heuristics.

The agent should be able to learn and preserve facts such as:

  • build and test commands that actually exist;
  • package managers and lockfile semantics;
  • dependency verification routes;
  • CLI smoke checks;
  • ownership or path boundaries;
  • generated-file and source-of-truth rules;
  • docs and release expectations;
  • proof commands that have been confirmed live;
  • commands that are known absent and should not be suggested again as confirmed proof.

The important design constraint is that these lessons should land in the right long-term home, not in a new product-level shortcut just because one repo needs it.

Desired behavior

For a fresh or unfamiliar repo, AW should:

  • inspect confirmed repo capabilities before selecting executable proof or workflow routes;
  • distinguish generic language/project hints from confirmed repo-local knowledge;
  • expose guessed routes as candidates, not facts;
  • avoid assuming pytest, npm, cargo, make, CI, or AW source-repo semantics merely from filenames;
  • ask, inspect, probe safely, or emit discovery tasks when repo affordances are unknown;
  • use Memory and promotion paths for durable repo-specific knowledge;
  • promote stable lessons into canonical docs, config, tests, checks, or Planning when those are better homes than Memory;
  • keep generic heuristics as fallback hints, not authoritative routes;
  • make unknown or unconfirmed assumptions visible instead of silently treating them as facts.

Full intent satisfaction

This issue is complete only when AW has a general host-repo learning posture that uses existing Memory/promotion/canonical-owner mechanisms before adding new first-class state, and proof selection is one consumer of that posture.

Required final state:

  • setup or first-use flows can surface unknown repo behavior without turning setup into a large questionnaire;
  • confirmed repo lessons have clear owner routing: Memory, canonical docs, config, tests/checks, Planning, issue follow-up, or local-only scratch;
  • learned routes include provenance: how they were confirmed, when, and what they apply to;
  • unconfirmed heuristics are visibly labeled as candidates;
  • authoritative decisions cite confirmed repo evidence or host config;
  • proof selection consults confirmed repo knowledge before generic language heuristics;
  • generic heuristics remain fallback hints, not authoritative routes;
  • failed or absent commands can be recorded or routed so AW avoids repeating them as confirmed proof;
  • the command-generation pytest case receives dependency/runtime proof plus manual verification rather than a failing pytest requirement;
  • the mechanism is generic enough to apply beyond proof selection;
  • no new first-class product surface is added unless existing Memory/promotion/config/doc/check homes are shown insufficient.

Bounded slice success

Useful partial slices may land when they reduce generic-host assumptions, for example:

  • proof selection stops requiring pytest unless pytest is confirmed;
  • setup emits an affordance-discovery or learning prompt that routes findings into Memory/promotion homes;
  • Memory guidance clarifies where proof routes, failed guesses, and operator commands belong;
  • a first consumer, such as proof selection, switches from assumptions to confirmed repo knowledge;
  • failed proof commands are captured as negative evidence or routed to the right owner;
  • docs clarify how agents should learn and promote host-repo behavior without hard-coding it.

A partial slice may merge, but it should not close this issue unless the general host-repo learning posture exists or a follow-up owner explicitly carries the remaining intent.

Closure limits

partial_pr_may_close: no

Closure blockers:

  • the fix only special-cases pytest;
  • the fix only adds more hard-coded heuristics;
  • AW still assumes build/test/proof routes without confirmation;
  • learned repo facts bypass Memory/promotion and become product-level hard-coded shortcuts without justification;
  • the mechanism cannot apply outside proof selection;
  • there is no way to distinguish confirmed repo knowledge from guessed fallback;
  • an issue-closing PR proves command success but not the broader host-repo learning invariant.

Reproduction for the motivating symptom

In rickardvh/command-generation, after initializing a fresh uv Python project and AW:

  1. Change [tool.uv.sources].agentic-workspace from a local path to git = "https://github.com/rickardvh/agentic-workspace.git", branch = "master".
  2. Run:
uv run agentic-workspace implement --changed pyproject.toml --changed uv.lock --task "Switch Agentic Workspace dev dependency from local checkout to git source" --format json
  1. AW selects:
uv run pytest
  1. Running it fails because pytest is not installed in that repo.

Proof required

Closing evidence should include:

  • tests showing proof selection does not require pytest unless repo evidence confirms it;
  • tests or fixtures for an unfamiliar host repo where AW emits discovery/manual verification rather than assumed routes;
  • a Memory/promotion/canonical-owner path for confirmed repo-specific proof rules and negative evidence;
  • at least one negative-evidence path showing a failed/absent command is not suggested again as confirmed proof;
  • documentation describing how agents should learn, update, promote, and rely on host-repo affordances;
  • explicit reasoning if any new first-class surface is introduced despite existing Memory/promotion homes.

Required residual intent

If a PR only fixes the pytest symptom, the remaining intent stays here: remove hard-coded host-repo assumptions by making AW learn and promote how each repo actually works through the right existing owner surfaces.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdogfoodingstatus/deferredDeferred; do not select for near-term implementationworkflowAgent workflow friction, confusing instructions, or handoff failures

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions