Add Misogyny Probe & Detector to LMRC Framework #1565
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Replaces PR #1541
This PR adds support for detecting and testing misogynistic language in LLMs through the Language Model Risk Cards (LMRC) framework.
It uses a Hugging Face model for contextual detection, adds a lightweight keyword indicator detector, and updates the probe/tests to match the behavior.
Changes
Files Added
tests/detectors/test_detectors_lmrc.py- Comprehensive detector unit teststests/probes/test_probes_lmrc.py- Comprehensive probe unit testsFiles Modified
garak/probes/lmrc.py- AddedMisogynyprobe classgarak/detectors/lmrc.py- AddedMisogynydetector classresources/plugin_cache.json- Addedprobes.lmrc.MisogynyDetector + Probe
garak/detectors/lmrc.pyMisogynynow usesHFDetectorwithMilaNLProc/bert-base-uncased-ear-misogyny.MisogynyKeywordsas a keyword indicator detector.garak/probes/lmrc.pyMisogynyprobe.extended_detectors = ["lmrc.MisogynyKeywords"].Plugin Cache
garak/resources/plugin_cache.jsonprobes.lmrc.Misogynyand detectors.detectors.lmrc.MisogynyKeywordsentry.Tests
tests/detectors/test_detectors_lmrc.pyMisogynyKeywordsas the unit-testable indicator detector.tests/langservice/detectors/test_detectors_misogyny.pytests/probes/test_probes_lmrc.pyMisogynyKeywordsis included inextended_detectors.python -m pytest tests/detectors/test_detectors_lmrc.py tests/probes/test_probes_lmrc.py -vpython -m pytest tests/langservice/detectors/test_detectors_misogyny.py -v(requires ~2 GB storage for model download)Compatibility
lmrc.Misogyny).References