Conversation
Converted file retrieval methods in PhabricatorPatch and related interfaces to async, replacing synchronous HTTP calls with httpx.AsyncClient.
Trusted users are now: - Mozilla Corporation employees, OR - Users with editbugs privilege who have been active within last year This prevents dormant accounts with editbugs from being considered trusted.
Refactor metadata filtering into a SanitizedBug class that inherits from Bug and overrides properties to return sanitized values when no trusted user has commented. When there's no trusted comment, the following are redacted: - Bug title/summary - Reporter name and email - Assignee name and email The trust check uses cached_property to avoid redundant API calls. Bug.get() now returns SanitizedBug by default for security.
Refactor metadata filtering into a SanitizedPhabricatorPatch class that inherits from PhabricatorPatch and overrides properties to return sanitized values when no trusted user has commented. When there's no trusted comment, the following are redacted: - Revision title - Author name and email - Summary and test plan - Diff author (if different from revision author) - Stack graph titles The trust check uses cached_property to avoid redundant API calls. PhabricatorReviewData.get_patch_by_id() now returns SanitizedPhabricatorPatch.
This is now the full list from BMO
Automatically trust comments created before 2022 as prompt injection was not a concern at that time. This applies to both comment content validation and metadata redaction policies.
Ensure collapsed/admin tagged comments are excluded from all trust validation logic, not just the timeline output: - Skip when finding last trusted comment - Skip when building trusted user cache - Skip when checking for any trusted comment
Add comprehensive test coverage for: - Admin-tagged comments being completely disregarded - Pre-2022 comments being automatically trusted - All collapsed tags (spam, abuse, nsfw, etc.) filtering These tests validate the security policy changes implemented in previous commits.
suhaibmujahid
left a comment
There was a problem hiding this comment.
Thank you! It looks good to me, with a few comments.
| return PhabricatorPatch(patch_id) | ||
| return SanitizedPhabricatorPatch(patch_id) |
There was a problem hiding this comment.
This is not used by the MCP, so we do not need to change it.
We should instead change it in the mcp server:
bugbug/mcp/src/bugbug_mcp/server.py
Lines 123 to 126 in 23ae982
| MOZILLA_CORP_GROUP_ID = 42 | ||
| EDITBUGS_GROUP_ID = 9 | ||
| EDITBUGS_CUTOFF_DAYS = 365 | ||
| TRUST_BEFORE_DATE = datetime(2022, 1, 1, tzinfo=timezone.utc) |
There was a problem hiding this comment.
I recall that we agreed not to use the activity data since it is not useful.
| group_ids = {g.get("id") for g in groups} | ||
|
|
||
| # Check if user is in mozilla-corporation group | ||
| is_moco = MOZILLA_CORP_GROUP_ID in group_ids |
There was a problem hiding this comment.
IIRC, all ppl in MOZILLA_CORP_GROUP are inherently in EDITBUGS_GROUP, so we do not need to check both; we only need EDITBUGS_GROUP. Especially since we will not use the activity as a condition for ppl in EDITBUGS_GROUP.
| assert bug_data["id"] == bug_id | ||
|
|
||
| return Bug(bug_data) | ||
| return SanitizedBug(bug_data) |
There was a problem hiding this comment.
In this way, only SanitizedBug will be used, even in places where we do not want to perform sanitization.
To target the mcp, we should modify it's method to use SanitizedBug instead of. Bug, here:
bugbug/mcp/src/bugbug_mcp/server.py
Lines 107 to 110 in 23ae982
With that, this method should become a class method (i.e., @classmethod) instead of a static method (i.e., @staticmethod). Then we use cls:
| return SanitizedBug(bug_data) | |
| return cls(bug_data) |
| return self._revision_metadata["fields"].get("summary") or None | ||
|
|
||
| @property | ||
| def test_plan_display(self) -> str | None: |
There was a problem hiding this comment.
Why are we using the _display postfix for some properties?
| if not user_agent: | ||
| user_agent = get_user_agent() | ||
|
|
||
| # This is awkward since PhabricatorAPI does not accept user agent directly | ||
| set_default_value("User-Agent", "name", user_agent) | ||
|
|
There was a problem hiding this comment.
It seems a rebase error
| if not user_agent: | |
| user_agent = get_user_agent() | |
| # This is awkward since PhabricatorAPI does not accept user agent directly | |
| set_default_value("User-Agent", "name", user_agent) |
| current_phid = revision["phid"] | ||
| patch_map = { | ||
| phid: ( | ||
| self | ||
| if phid == current_phid | ||
| else PhabricatorPatch(revision_phid=phid) | ||
| ) | ||
| for phid in stack_graph.keys() | ||
| } | ||
| current_phid = self._revision_metadata["phid"] | ||
| revision_ids = {} | ||
| for phid in self.stack_graph.keys(): | ||
| if phid == current_phid: | ||
| revision_ids[phid] = self.revision_id | ||
| else: | ||
| revision_ids[phid] = PhabricatorPatch( | ||
| revision_phid=phid | ||
| ).revision_id | ||
|
|
||
| for phid, dependencies in self.stack_graph.items(): | ||
| from_id = f"D{revision_ids[phid]}" | ||
| stack_title = self.get_stack_patch_title(phid) | ||
|
|
||
| for phid, dependencies in stack_graph.items(): | ||
| from_patch = patch_map[phid] | ||
| from_id = f"D{from_patch.revision_id}" | ||
| if phid == current_phid: | ||
| md_lines.append( | ||
| f" {from_id}[{from_patch.patch_title} - CURRENT]" | ||
| ) | ||
| md_lines.append(f" {from_id}[{stack_title} - CURRENT]") | ||
| md_lines.append(f" style {from_id} fill:#105823") | ||
| else: | ||
| md_lines.append(f" {from_id}[{from_patch.patch_title}]") | ||
| md_lines.append(f" {from_id}[{stack_title}]") | ||
|
|
||
| for dep_phid in dependencies: | ||
| dep_id = f"D{patch_map[dep_phid].revision_id}" | ||
| dep_id = f"D{revision_ids[dep_phid]}" | ||
| md_lines.append(f" {from_id} --> {dep_id}") |
There was a problem hiding this comment.
I do not see the goal of the change here. The title should be sanitized automatically by using the patch_title property without any changes to this part.
|
|
||
| stack_graph = revision["fields"].get("stackGraph") | ||
| if len(stack_graph) > 1: | ||
| if len(self.stack_graph) > 1: |
There was a problem hiding this comment.
It would be great to minimize the changes where possible:
| if len(self.stack_graph) > 1: | |
| stack_graph = revision["fields"].get("stackGraph") | |
| if len(stack_graph) > 1: |
This will also minimize changes in later lines.
| def __getattr__(self, name: str): | ||
| if name.startswith("_"): | ||
| raise AttributeError( | ||
| f"'{type(self).__name__}' object has no attribute '{name}'" | ||
| ) | ||
| value = self._metadata.get(name) | ||
| if value is None and name in ( | ||
| "keywords", | ||
| "cc", | ||
| "blocks", | ||
| "depends_on", | ||
| "regressed_by", | ||
| "duplicates", | ||
| "see_also", | ||
| ): | ||
| return [] | ||
| return value |
There was a problem hiding this comment.
I'm not fond of Python magic, but it's ok for now. We can look for a better method later.
| ) | ||
|
|
||
| def user_handler(user, data): | ||
| email = user.get("name", "").lower() |
There was a problem hiding this comment.
We might trust all the @mozilla.com emails:
if email.endswith("@mozilla.com"):
data[email] = True
This implements the policy as discussed on the 2026-02-05.