Skip to content

Add output scanner for detecting leaked secrets in agent responses #5

@azrollin

Description

@azrollin

The Problem

SUNGLASSES currently scans inputs — what goes INTO an agent. But agents can also leak sensitive data in their outputs: API keys in code snippets, PII in generated emails, internal URLs in customer-facing responses, credentials from memory/context bleeding into replies.

Proposed Solution: Output Scanner

A new scanning mode that checks what the agent says back before it reaches the user or downstream system:

from sunglasses.output import OutputScanner

output_scanner = OutputScanner()

# Scan agent response before delivering
agent_response = agent.run(user_query)
result = output_scanner.scan(agent_response)

if result.has_leaks:
    # Redact sensitive data before delivery
    safe_response = output_scanner.redact(agent_response, result.findings)
    return safe_response
else:
    return agent_response

What it should detect:

  • API keys, tokens, passwords in generated code or text
  • PII (emails, phone numbers, SSNs, addresses) in responses
  • Internal URLs, file paths, or system information leaking
  • Database connection strings or credentials
  • Context bleed — information from one user's session appearing in another's response
  • Prompt/system instructions being echoed back

Why This Matters

Input scanning stops attacks coming IN. Output scanning stops secrets going OUT. Together, they form a complete perimeter. Most data breach regulations (GDPR, CCPA) care about what leaves the system — not just what enters it.

How to Contribute

  1. Define the OutputScanner API interface
  2. Build regex patterns for common secret formats (AWS keys, GitHub tokens, Stripe keys, etc.)
  3. Build PII detection patterns (can leverage existing libraries as optional deps)
  4. Write redaction logic that replaces detected secrets with [REDACTED]
  5. Add tests with realistic agent output samples

This is on the v0.3 roadmap. Contributions welcome! 🕶️

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions