Regex patterns catch known attack strings but miss semantic equivalents:
- "Please disregard your prior directives" (same intent, no pattern match)
- Paraphrased jailbreaks
- Novel attack vectors
Consider an optional lightweight embedding similarity check using a small local model (e.g. sentence-transformers). Should remain zero-dep by default — embedding mode as optional extra.
pip install ai-injection-guard[semantic]