The Problem
SUNGLASSES currently scans inputs — what goes INTO an agent. But agents can also leak sensitive data in their outputs: API keys in code snippets, PII in generated emails, internal URLs in customer-facing responses, credentials from memory/context bleeding into replies.
Proposed Solution: Output Scanner
A new scanning mode that checks what the agent says back before it reaches the user or downstream system:
from sunglasses.output import OutputScanner
output_scanner = OutputScanner()
# Scan agent response before delivering
agent_response = agent.run(user_query)
result = output_scanner.scan(agent_response)
if result.has_leaks:
# Redact sensitive data before delivery
safe_response = output_scanner.redact(agent_response, result.findings)
return safe_response
else:
return agent_response
What it should detect:
- API keys, tokens, passwords in generated code or text
- PII (emails, phone numbers, SSNs, addresses) in responses
- Internal URLs, file paths, or system information leaking
- Database connection strings or credentials
- Context bleed — information from one user's session appearing in another's response
- Prompt/system instructions being echoed back
Why This Matters
Input scanning stops attacks coming IN. Output scanning stops secrets going OUT. Together, they form a complete perimeter. Most data breach regulations (GDPR, CCPA) care about what leaves the system — not just what enters it.
How to Contribute
- Define the
OutputScanner API interface
- Build regex patterns for common secret formats (AWS keys, GitHub tokens, Stripe keys, etc.)
- Build PII detection patterns (can leverage existing libraries as optional deps)
- Write redaction logic that replaces detected secrets with
[REDACTED]
- Add tests with realistic agent output samples
This is on the v0.3 roadmap. Contributions welcome! 🕶️
The Problem
SUNGLASSES currently scans inputs — what goes INTO an agent. But agents can also leak sensitive data in their outputs: API keys in code snippets, PII in generated emails, internal URLs in customer-facing responses, credentials from memory/context bleeding into replies.
Proposed Solution: Output Scanner
A new scanning mode that checks what the agent says back before it reaches the user or downstream system:
What it should detect:
Why This Matters
Input scanning stops attacks coming IN. Output scanning stops secrets going OUT. Together, they form a complete perimeter. Most data breach regulations (GDPR, CCPA) care about what leaves the system — not just what enters it.
How to Contribute
OutputScannerAPI interface[REDACTED]This is on the v0.3 roadmap. Contributions welcome! 🕶️