Skip to content

DevontiaW/anancy

Repository files navigation

Anancy

License Python Dependencies Status

The clever way to mask data.

Privacy layer for AI agent interactions. Mask sensitive data before sending to Claude, ChatGPT, or any AI service. Unmask responses locally. Your real data never leaves your machine.

Named after Anancy (Anansi), the West African/Caribbean trickster spider who uses transformation and cleverness to protect what matters.

Alpha Software Disclaimer

Anancy is in active development. Do not use for:

  • HIPAA-regulated healthcare data
  • PCI-DSS payment card data
  • Legal discovery or litigation holds
  • Any data with regulatory compliance requirements

Always verify masked output before sending to AI services. This tool reduces risk but does not eliminate it. Use at your own risk.


How It Works

flowchart LR
    subgraph LOCAL["Your Machine"]
        A[("Sensitive\nData")] --> B["MASK"]
        B --> C[("Safe\nData")]
        F["UNMASK"] --> G[("Final\nResults")]
    end

    subgraph CLOUD["AI Service"]
        D["Claude /\nChatGPT"]
    end

    C --> D
    D --> E[("AI\nResponse")]
    E --> F

    style A fill:#ff6b6b,color:#fff
    style C fill:#4ecdc4,color:#fff
    style G fill:#45b7d1,color:#fff
    style D fill:#96ceb4,color:#fff
Loading

Your sensitive data stays local. AI only sees masked versions.


The Problem

AI agents like Claude can now access your files directly. That's powerful — but your files contain:

Risk Example
PII Exposure SSNs, emails, phone numbers
Financial Leakage Salaries, revenue, pricing
Competitive Risk Client names, deal values
Compliance Violations GDPR, CCPA, internal policy

Sending this to the cloud creates risk.


The Solution

Anancy masks sensitive data before it leaves your machine:

Original Masked
John Smith [ANANCY_PERSON_1]
123-45-6789 [ANANCY_SSN_1]
john@company.com [ANANCY_EMAIL_1]
$147,000 $164,640 (scaled, trends preserved)

AI analyzes the masked version. You unmask the response locally.


Quick Start

from anancy import Anancy

spider = Anancy("my_project")

# Mask sensitive text
original = "John Smith (SSN: 123-45-6789) earned $147,000"
safe_text, mappings = spider.mask(original)
# Result: "[ANANCY_PERSON_1] (SSN: [ANANCY_SSN_1]) earned $164,640"

# Send safe_text to AI, get response back...

# Unmask AI response
ai_response = "Recommend promoting [ANANCY_PERSON_1] based on performance"
real_response = spider.unmask(ai_response)
# Result: "Recommend promoting John Smith based on performance"

File-Based Workflow

# Mask a file
spider.mask_file("sensitive/employees.txt", "safe/employees.txt")

# Point AI at the safe/ folder
# Get AI output...

# Unmask the results
spider.unmask_file("ai_output.txt", "final_report.txt")

CLI Usage

# Mask a file
python cli.py mask sensitive_data.txt safe_data.txt

# Unmask AI response
python cli.py unmask ai_response.txt final_output.txt

# View vault stats
python cli.py stats

Custom Vocabularies

The killer feature: instead of obvious [ANANCY_PERSON_1] placeholders, use custom vocabularies that blend in.

Preset Vocabularies

# Nature mode - uses natural words
spider = Anancy("project", vocabulary="nature")
# "John Smith" → "Maple"
# "123-45-6789" → "Alpha"

# Healthcare mode - uses domain prefixes
spider = Anancy("project", vocabulary="healthcare")
# "John Smith" → "Patient-1"
# "123-45-6789" → "MRN-1"

# Military mode - uses coded patterns
spider = Anancy("project", vocabulary="military")
# "John Smith" → "TGTP-X9F2"
# "123-45-6789" → "TGTS-M7K1"

# Financial mode
spider = Anancy("project", vocabulary="financial")
# "John Smith" → "Account-1"

# Legal mode
spider = Anancy("project", vocabulary="legal")
# "John Smith" → "Party-1"

Why Custom Vocabularies?

Standard Placeholders Custom Vocabulary
[ANANCY_PERSON_1] Maple or Patient-1
Obviously masked Looks natural or domain-appropriate
Easy to grep/extract No obvious pattern
Reveals data types Types hidden in your codebook

Custom Configuration

from anancy import Anancy, VocabularyConfig

# Create your own vocabulary
custom_vocab = VocabularyConfig(
    mode="word_list",
    words={
        "person": ["Apollo", "Zeus", "Athena", "Hermes"],
        "ssn": ["Red", "Blue", "Green", "Yellow"],
        "email": ["Alpha", "Beta", "Gamma", "Delta"],
    }
)

spider = Anancy("project", vocabulary=custom_vocab)

Installation

# Clone the repo
git clone https://github.com/DevontiaW/anancy.git
cd anancy

# No external dependencies required!
python -m anancy.core  # Run the demo

What Gets Detected

mindmap
  root((Anancy))
    PII
      SSN
      Email
      Phone
      Names
    Financial
      Currency
      Percentages
      Account Numbers
    Location
      Addresses
      ZIP Codes
    Temporal
      Dates
      Timestamps
Loading
Pattern Example Method
SSN 123-45-6789 Regex
Email user@domain.com Regex
Phone (555) 123-4567 Regex
Currency $1,234.56 Regex + Semantic Scaling
Dates January 15, 2025 Regex
Addresses 123 Main Street Regex
Names John Smith Name Dictionary

Architecture

flowchart TB
    subgraph INPUT["Input"]
        I1["Text / File"]
    end

    subgraph ANANCY["Anancy Engine"]
        V1["Pattern\nScanner"] --> V2["Type\nClassifier"]
        V2 --> V3["Placeholder\nGenerator"]
        V3 --> V4["Mapping\nStorage"]
    end

    subgraph OUTPUT["Output"]
        O1["Masked\nContent"]
        O2["Local\nMapping Key"]
    end

    I1 --> V1
    V3 --> O1
    V4 --> O2

    style V1 fill:#667eea,color:#fff
    style V2 fill:#667eea,color:#fff
    style V3 fill:#667eea,color:#fff
    style V4 fill:#667eea,color:#fff
Loading

Storage Structure

~/.anancy/
├── {project_name}/
│   ├── mapping.json    # Your encryption key (NEVER leaves machine)
│   └── audit.jsonl     # Activity log

Demo

python demo_workspace/run_demo.py

Limitations

Be honest about what this tool can and can't do.

Catches Well May Need Manual Handling
Standard PII (SSN, email, phone) Uncommon names
Common Western names (~200 in dictionary) Non-English names
US date/currency formats International formats
Street addresses Domain-specific identifiers

Known Gaps

  1. Context Leakage - Masking "John Smith" doesn't hide context like "The CEO of Acme Corp, [ANANCY_PERSON_1]..."
  2. Structured Data Headers - CSV column headers (ssn,salary,name) reveal what masked data means
  3. Name Detection is Basic - Only ~200 common Western first names. Non-Western names often missed.
  4. Mapping File is Plaintext - Anyone with access to ~/.anancy/ can read your mappings

When NOT to Use

Use Case Why Not What To Use Instead
HIPAA healthcare data Regulatory requirements Certified BAA solutions
PCI payment data Compliance standards PCI-compliant tools
Legal discovery Chain of custody eDiscovery platforms
Production systems Alpha software Enterprise PII tools

How Anancy Compares

Tool Strengths When to Use Instead
Microsoft Presidio Enterprise-grade, NER-based, Azure integration Production systems, large scale
AWS Comprehend Cloud-native, managed service Already in AWS ecosystem
Google DLP Extensive detectors, cloud API GCP users, API-based workflows
Anancy Zero deps, local-first, instant setup, custom vocabularies Quick tasks, learning, prototyping

Anancy's niche: When you need something working in 30 seconds without cloud accounts, API keys, or pip install hell. Graduate to enterprise tools when you need scale.


Roadmap

Current (MVP)

  • Regex-based pattern detection
  • Semantic amount scaling
  • File-based workflow
  • Local mapping storage
  • Audit logging
  • CLI interface
  • Custom vocabulary system

Planned

  • spaCy NER integration for better name detection
  • Encrypted mapping files
  • VS Code extension
  • Claude native integration

Future

  • Advanced semantic analysis
  • Enhanced encryption options
  • Enterprise features

Project Structure

anancy/
├── src/anancy/
│   ├── __init__.py           # Package exports
│   ├── core.py               # Main Anancy class
│   └── vocabulary.py         # Custom vocabulary system
├── cli.py                    # Command-line interface
├── tests/
│   └── test_anancy.py        # Test suite
├── demo_workspace/
│   ├── sensitive_data/       # Sample sensitive files
│   ├── cowork_safe/          # Masked versions (AI-safe)
│   └── run_demo.py           # Full workflow demo
└── guides/
    ├── STUDENT_GUIDE.md      # Teaching curriculum
    └── BUSINESS_OWNER_QUICKSTART.md

Contributing

This is an early-stage project. Contributions welcome!

  1. Fork the repo
  2. Create a feature branch
  3. Make your changes
  4. Submit a PR

Areas where help is needed:

  • Additional pattern detection
  • Non-English name support
  • Testing across file types
  • Documentation improvements

The Name

Anancy (also spelled Anansi) is the trickster spider from West African and Caribbean folklore. He's known for:

  • Transformation - changing form to achieve goals
  • Protection through cleverness - outsmarting larger threats
  • Preserving what matters - in folklore, he protects stories and wisdom

This maps to what we do: transform your data to protect it, use clever masking to outsmart exposure risks, and preserve the real values locally where they belong.

"Anansi does not reveal his true form until he's ready."


License

MIT License - see LICENSE


Credits

Created by Devon Williams at Textstone Labs


Questions?

About

Privacy layer for AI agent interactions

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages