Skip to content

MCP server filtering improvements#5654

Open
padenot wants to merge 9 commits intomozilla:masterfrom
padenot:filtering-refactored
Open

MCP server filtering improvements#5654
padenot wants to merge 9 commits intomozilla:masterfrom
padenot:filtering-refactored

Conversation

@padenot
Copy link
Contributor

@padenot padenot commented Feb 6, 2026

This implements the policy as discussed on the 2026-02-05.

suhaibmujahid and others added 9 commits February 6, 2026 17:32
Converted file retrieval methods in PhabricatorPatch and related interfaces to async, replacing synchronous HTTP calls with httpx.AsyncClient.
Trusted users are now:
- Mozilla Corporation employees, OR
- Users with editbugs privilege who have been active within last year

This prevents dormant accounts with editbugs from being considered trusted.
Refactor metadata filtering into a SanitizedBug class that inherits from
Bug and overrides properties to return sanitized values when no trusted
user has commented.

When there's no trusted comment, the following are redacted:
- Bug title/summary
- Reporter name and email
- Assignee name and email

The trust check uses cached_property to avoid redundant API calls.
Bug.get() now returns SanitizedBug by default for security.
Refactor metadata filtering into a SanitizedPhabricatorPatch class that
inherits from PhabricatorPatch and overrides properties to return
sanitized values when no trusted user has commented.

When there's no trusted comment, the following are redacted:
- Revision title
- Author name and email
- Summary and test plan
- Diff author (if different from revision author)
- Stack graph titles

The trust check uses cached_property to avoid redundant API calls.
PhabricatorReviewData.get_patch_by_id() now returns SanitizedPhabricatorPatch.
This is now the full list from BMO
Automatically trust comments created before 2022 as prompt injection
was not a concern at that time. This applies to both comment content
validation and metadata redaction policies.
Ensure collapsed/admin tagged comments are excluded from all trust
validation logic, not just the timeline output:
- Skip when finding last trusted comment
- Skip when building trusted user cache
- Skip when checking for any trusted comment
Add comprehensive test coverage for:
- Admin-tagged comments being completely disregarded
- Pre-2022 comments being automatically trusted
- All collapsed tags (spam, abuse, nsfw, etc.) filtering

These tests validate the security policy changes implemented in
previous commits.
Copy link
Member

@suhaibmujahid suhaibmujahid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It looks good to me, with a few comments.

Comment on lines -668 to +775
return PhabricatorPatch(patch_id)
return SanitizedPhabricatorPatch(patch_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used by the MCP, so we do not need to change it.

We should instead change it in the mcp server:

@mcp.tool()
def get_phabricator_revision(revision_id: int) -> str:
"""Retrieve a revision from Phabricator alongside its comments."""
return PhabricatorPatch(revision_id=revision_id).to_md()

MOZILLA_CORP_GROUP_ID = 42
EDITBUGS_GROUP_ID = 9
EDITBUGS_CUTOFF_DAYS = 365
TRUST_BEFORE_DATE = datetime(2022, 1, 1, tzinfo=timezone.utc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall that we agreed not to use the activity data since it is not useful.

group_ids = {g.get("id") for g in groups}

# Check if user is in mozilla-corporation group
is_moco = MOZILLA_CORP_GROUP_ID in group_ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, all ppl in MOZILLA_CORP_GROUP are inherently in EDITBUGS_GROUP, so we do not need to check both; we only need EDITBUGS_GROUP. Especially since we will not use the activity as a condition for ppl in EDITBUGS_GROUP.

assert bug_data["id"] == bug_id

return Bug(bug_data)
return SanitizedBug(bug_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this way, only SanitizedBug will be used, even in places where we do not want to perform sanitization.

To target the mcp, we should modify it's method to use SanitizedBug instead of. Bug, here:

@mcp.tool()
def get_bugzilla_bug(bug_id: int) -> str:
"""Retrieve a bug from Bugzilla alongside its change history and comments."""
return Bug.get(bug_id).to_md()

With that, this method should become a class method (i.e., @classmethod) instead of a static method (i.e., @staticmethod). Then we use cls:

Suggested change
return SanitizedBug(bug_data)
return cls(bug_data)

return self._revision_metadata["fields"].get("summary") or None

@property
def test_plan_display(self) -> str | None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using the _display postfix for some properties?

Comment on lines +60 to +65
if not user_agent:
user_agent = get_user_agent()

# This is awkward since PhabricatorAPI does not accept user agent directly
set_default_value("User-Agent", "name", user_agent)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a rebase error

Suggested change
if not user_agent:
user_agent = get_user_agent()
# This is awkward since PhabricatorAPI does not accept user agent directly
set_default_value("User-Agent", "name", user_agent)

Comment on lines -537 to 560
current_phid = revision["phid"]
patch_map = {
phid: (
self
if phid == current_phid
else PhabricatorPatch(revision_phid=phid)
)
for phid in stack_graph.keys()
}
current_phid = self._revision_metadata["phid"]
revision_ids = {}
for phid in self.stack_graph.keys():
if phid == current_phid:
revision_ids[phid] = self.revision_id
else:
revision_ids[phid] = PhabricatorPatch(
revision_phid=phid
).revision_id

for phid, dependencies in self.stack_graph.items():
from_id = f"D{revision_ids[phid]}"
stack_title = self.get_stack_patch_title(phid)

for phid, dependencies in stack_graph.items():
from_patch = patch_map[phid]
from_id = f"D{from_patch.revision_id}"
if phid == current_phid:
md_lines.append(
f" {from_id}[{from_patch.patch_title} - CURRENT]"
)
md_lines.append(f" {from_id}[{stack_title} - CURRENT]")
md_lines.append(f" style {from_id} fill:#105823")
else:
md_lines.append(f" {from_id}[{from_patch.patch_title}]")
md_lines.append(f" {from_id}[{stack_title}]")

for dep_phid in dependencies:
dep_id = f"D{patch_map[dep_phid].revision_id}"
dep_id = f"D{revision_ids[dep_phid]}"
md_lines.append(f" {from_id} --> {dep_id}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the goal of the change here. The title should be sanitized automatically by using the patch_title property without any changes to this part.


stack_graph = revision["fields"].get("stackGraph")
if len(stack_graph) > 1:
if len(self.stack_graph) > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to minimize the changes where possible:

Suggested change
if len(self.stack_graph) > 1:
stack_graph = revision["fields"].get("stackGraph")
if len(stack_graph) > 1:

This will also minimize changes in later lines.

Comment on lines +417 to +433
def __getattr__(self, name: str):
if name.startswith("_"):
raise AttributeError(
f"'{type(self).__name__}' object has no attribute '{name}'"
)
value = self._metadata.get(name)
if value is None and name in (
"keywords",
"cc",
"blocks",
"depends_on",
"regressed_by",
"duplicates",
"see_also",
):
return []
return value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fond of Python magic, but it's ok for now. We can look for a better method later.

)

def user_handler(user, data):
email = user.get("name", "").lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might trust all the @mozilla.com emails:

if email.endswith("@mozilla.com"):
    data[email] = True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants