Harden public reporting and leak gates#11
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dfbb7ae52b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for term in DENIED_PATH_TERMS: | ||
| if term in normalized: | ||
| return f"denied path term {term}" |
There was a problem hiding this comment.
Narrow denylist term matching to path components
The new path gate uses substring checks (if term in normalized) for generic terms like key and token, so benign filenames such as src/monkeypatch.py or docs/tokenization.md will be flagged as leaks even when file contents are safe. Because make leak-check runs this script over tracked files, this can block normal repo changes with false positives. Match on path components/word boundaries (or stricter patterns) instead of raw substring inclusion.
Useful? React with 👍 / 👎.
| safe_key = str(key) if is_public_safe_text(str(key)) else REDACTED_KEY | ||
| redacted[safe_key] = redact_obj(value) |
There was a problem hiding this comment.
Preserve all mapping entries when redacting unsafe keys
In redact_obj, every unsafe mapping key is replaced with the same literal ([REDACTED_KEY]), so multiple unsafe keys in one object collide and later values overwrite earlier ones. This silently drops data (e.g., both api_key and token keys reduce to one entry), which can make redacted diagnostics incomplete or misleading. The redaction should avoid key-collision data loss.
Useful? React with 👍 / 👎.
| ) | ||
| except (FileNotFoundError, subprocess.CalledProcessError): | ||
| return None | ||
| return [item.decode("utf-8") for item in result.stdout.split(b"\0") if item] |
There was a problem hiding this comment.
Handle non-UTF8 tracked paths when reading git output
git ls-files -z emits raw path bytes, but this code decodes each entry with strict UTF-8 and does not catch UnicodeDecodeError. In repositories that contain tracked filenames not valid UTF-8 (allowed by Git), leak-check crashes before scanning any files, so the public gate becomes unavailable. Decode with surrogateescape (or similar) to keep scanning robust.
Useful? React with 👍 / 👎.
Summary:
Issue:
Test plan: