[Multi-Agent Privacy] Detection tools implementation#335
Open
JCHAVEROT wants to merge 21 commits into
Open
Conversation
(cherry picked from commit 794e23e)
(cherry picked from commit d8503b1)
(cherry picked from commit 11205df)
(cherry picked from commit ec9126f)
(cherry picked from commit 9797115)
(cherry picked from commit 8be4dc8)
(cherry picked from commit 6a91ebb)
(cherry picked from commit 3eae01f)
(cherry picked from commit df24d81)
(cherry picked from commit 825ff24)
(cherry picked from commit 53e384b)
(cherry picked from commit ec8001e)
…lways be ignored (cherry picked from commit 46bc5ab)
(cherry picked from commit fd49131)
(cherry picked from commit 4c34eab)
(cherry picked from commit 1d9c58e)
(cherry picked from commit 9e0b74d)
(cherry picked from commit e5723a5)
(cherry picked from commit 52f4e8c)
(cherry picked from commit 8f16202)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Related issue: #292
Depending on: #285
Review completed in a fork: link
Target branch:
swiss-ai:mmore/v2This PR adds a new Personally Identifiable Information (PII) detection toolkit to be later used as tools by the agentic privacy system.
What this adds
mmore.privacy.detectionwith four interchangeable engines:DetectionConfig, and registers itself in a tool registry so agents can call itMMORE_PRIVACY_MODEL_BUDGET_MBMMORE_PRIVACY_MODEL_CACHE=0Dependencies / CI
privacynow has new dependencies (gliner, presidio, spacy, dspy, and psutils for memory measurements)privacy-openai-filter(transformers>=5,peft) as currently there is a conflict withmarker-pdffrom the extraprocess(will be solved once #191 is closed)Tests
Disclaimer: the big numbers in the line differences most come from new dependencies, hence changes in the
uv.lockfileDemo
Input note (AI generated)
GLiNER (
nvidia/gliner-PII)15 spans at confidence_threshold = 0.4
Bobby3/2Linda617-555-0148AustinSt. Mary's4/23/6504/23/195512345678BCXY 99-88-77jsmith@hosp-derm.orgxxx-xx-4321LindaMaria Garcia4/1openai/privacy-filter
78 spans at confidence_threshold = 0.4
BobbyDr.GarciaLinda617-555-0148LinwoodAveDr.R.Lee4/23/6504/23/195512345678BCXY99-88-77Jan5558675309AB1234567jsmith@hosp-derm.orgxxx-xx-4321LindaDr.MariaGarciapager12345Presidio + custom clinical recognizers
26 spans at confidence_threshold = 0.4
BobbyTower 3GarciaLinda617-555-0148Austinwks agoR. Lee'sSt. Mary's4/23/6504/23/195504/23/1955123456781245-6788Jan.555 867 5309AB1234567AB1234567jsmith@hosp-derm.orghosp-derm.orglast weekVALindaMaria Garcia12345c. Tentative d/LLM
Qwen/Qwen2.5-7B-Instructvia DSPy21 spans at confidence_threshold = 0.4
Bobby3/2Dr. GarciaLinda617-555-0148metoprolol2 wksDr. R. LeeSt. Mary's123 Main123456781245-6788BCXY 99-88-77555 867 5309jsmith@hosp-derm.orgVAxxx-xx-4321Dr. Maria Garciapager 123454/1work