Goal
Add a hard privacy policy blocker:
NEVER EVER EVER EVER EVER allow private repositories to be surfaced if github-stars is running in a public repository.
This is a non-negotiable app invariant. If the output surface is public, private repo data must not appear in generated files, workflow summaries, logs, artifacts, issues, PR comments, Pages output, classification batches, provenance records, or diagnostic reports.
Parent: #69
Related: #42, #54, #71, #73
Why this exists
github-stars is intended to publish or expose a star catalog from a repo that may itself be public. The app may eventually support authenticated/private star fetch paths, broad GitHub App permissions, setup diagnostics, artifact provenance, and agent routing.
That creates a specific leak hazard:
authenticated/private input source
-> public repository run/output context
-> generated catalog/log/artifact/issue leak
This must hard-fail, not merely warn.
Policy Invariant
If output_repository.visibility == public:
private_repository_records MUST NOT be surfaced.
Where surfaced means written, printed, summarized, classified, routed, cached, uploaded, published, or committed anywhere visible from the public repo context.
Required Behavior
1. Public repo mode must quarantine private inputs
When github-stars runs in a public repository:
private repo slug -> do not print
private repo metadata -> do not print
private repo count -> allowed only as aggregate count if no identifiers leak
private repo classification -> disabled
private repo generated output -> prohibited
private repo artifact upload -> prohibited
private repo issue/PR/comment output -> prohibited
private repo Pages output -> prohibited
Allowed public-safe aggregate example:
private_repos_omitted: 12
Forbidden examples:
private_repos_omitted:
- owner/private-repo
Skipped private repo owner/private-repo
Classified private repo owner/private-repo as security/dev-tools
2. Auth resolver must enforce visibility boundary
The auth resolver from #69 must distinguish:
star_fetch_auth = public | pat | github_app_user | github_token | disabled
output_visibility = public | private | internal/unknown
private_repo_surface_policy = block | allow_private_context_only
Public output context requires:
private_repo_surface_policy=block
If authenticated/private data is fetched while output visibility is public, the run must either:
filter private repos before any surface
or fail closed before writing/logging/uploading/publishing anything private-derived
3. Setup doctor must report policy state safely
primeinc-stars-yoshi-doctor must report:
output_repository_visibility=public|private|unknown
private_repo_surface_policy=block|allow_private_context_only
private_repo_identifiers_printed=false
private_repo_count=<aggregate only, optional>
No private identifiers may appear in setup doctor output when output repo is public.
4. Generated artifact registry must mark public/private safety
Any generated artifact registry from #69 must include a public-safety classification:
artifact
visibility_surface
may_contain_private_identifiers
private_safe_for_public_repo
producer
validation_gate
Public repo runs may only publish artifacts where:
private_safe_for_public_repo=true
5. Classifier must refuse private repo candidates in public mode
The #71 classifier path must not receive private repo identifiers or metadata when output context is public.
Hard rule:
private repo records cannot enter model prompt/input if output repo is public
This avoids both output leaks and model-prompt leakage. Because letting an LLM see private repo names and then asking it politely not to leak them is not a policy. It is a haunted pinky promise.
Required Tests
Add tests for:
public output repo + private input repo -> private identifier omitted
public output repo + private input repo -> artifact excludes private slug
public output repo + private input repo -> workflow summary excludes private slug
public output repo + private input repo -> classifier batch excludes private slug
public output repo + private input repo -> logs do not contain private slug
public output repo + private input repo -> issue/PR/router payload excludes private slug
public output repo + aggregate count -> allowed
private output repo + private input repo -> allowed only if policy explicitly permits
unknown output visibility -> fail closed
Add at least one forbidden sentinel fixture:
owner/private-sentinel-repo-do-not-leak
The test must fail if that string appears in any public-mode output, summary, artifact, classifier input, or router payload.
Acceptance Criteria
- A formal policy exists in code or config for private repo surfacing.
- Public output repo mode defaults to hard block.
- Unknown output repo visibility fails closed.
- Private repo identifiers are filtered/quarantined before prompt/model/classifier input.
- Workflow summaries may show aggregate private omission counts but never private repo identifiers.
- Generated artifacts are marked public-safe before upload/commit/publish.
- Issue/PR/agent routing payloads are sanitized in public mode.
- Tests include sentinel private repo leak checks.
AGENTS.md or control-plane docs state this as a hard blocker.
- Completion proof includes a test showing the sentinel private repo name does not surface.
Proof Required
Completion comment must include:
- PR URL or commit SHA.
- Test output for private leak sentinel fixture.
- Workflow/setup-doctor summary excerpt showing public mode and
private_repo_surface_policy=block.
- Example sanitized aggregate output.
- Confirmation that classifier/model input excludes private repo identifiers in public mode.
- Confirmation that generated artifacts cannot be marked public-safe if they contain private identifiers.
Evidence Labels for Implementer
Use these labels in the completion report:
- Direct evidence: source diff, test fixture, test output, workflow summary, artifact validation output.
- Weak inference: a field may contain private-derived information but no identifiers; justify with sanitizer evidence.
- Unsupported: claiming private data is protected without a sentinel leak test.
- Contradicted: any private repo slug appears in public-mode logs, summaries, artifacts, prompts, generated files, issues, PR comments, or Pages output.
- Blocked: output repo visibility cannot be determined and fail-closed behavior triggers.
Non-Goals
- Do not remove support for private/authenticated star fetching in private output contexts.
- Do not expose private repo names just to explain that they were skipped.
- Do not rely on LLM instructions to suppress private identifiers after they enter prompts.
- Do not allow a warning-only mode for public output contexts.
Definition of Done
When github-stars runs in a public repository, private repositories are never surfaced. The pipeline either filters private records before any public surface or fails closed before output, and the sentinel leak test proves it.
Goal
Add a hard privacy policy blocker:
This is a non-negotiable app invariant. If the output surface is public, private repo data must not appear in generated files, workflow summaries, logs, artifacts, issues, PR comments, Pages output, classification batches, provenance records, or diagnostic reports.
Parent: #69
Related: #42, #54, #71, #73
Why this exists
github-starsis intended to publish or expose a star catalog from a repo that may itself be public. The app may eventually support authenticated/private star fetch paths, broad GitHub App permissions, setup diagnostics, artifact provenance, and agent routing.That creates a specific leak hazard:
This must hard-fail, not merely warn.
Policy Invariant
Where
surfacedmeans written, printed, summarized, classified, routed, cached, uploaded, published, or committed anywhere visible from the public repo context.Required Behavior
1. Public repo mode must quarantine private inputs
When
github-starsruns in a public repository:Allowed public-safe aggregate example:
Forbidden examples:
2. Auth resolver must enforce visibility boundary
The auth resolver from #69 must distinguish:
Public output context requires:
If authenticated/private data is fetched while output visibility is public, the run must either:
3. Setup doctor must report policy state safely
primeinc-stars-yoshi-doctormust report:No private identifiers may appear in setup doctor output when output repo is public.
4. Generated artifact registry must mark public/private safety
Any generated artifact registry from #69 must include a public-safety classification:
Public repo runs may only publish artifacts where:
5. Classifier must refuse private repo candidates in public mode
The #71 classifier path must not receive private repo identifiers or metadata when output context is public.
Hard rule:
This avoids both output leaks and model-prompt leakage. Because letting an LLM see private repo names and then asking it politely not to leak them is not a policy. It is a haunted pinky promise.
Required Tests
Add tests for:
Add at least one forbidden sentinel fixture:
The test must fail if that string appears in any public-mode output, summary, artifact, classifier input, or router payload.
Acceptance Criteria
AGENTS.mdor control-plane docs state this as a hard blocker.Proof Required
Completion comment must include:
private_repo_surface_policy=block.Evidence Labels for Implementer
Use these labels in the completion report:
Non-Goals
Definition of Done
When
github-starsruns in a public repository, private repositories are never surfaced. The pipeline either filters private records before any public surface or fails closed before output, and the sentinel leak test proves it.