A Python CLI tool for analyzing email data in mbox format.
- 📧 Process mbox format email archives
- 🔧 Unix-style pipeline architecture for flexible processing
- 📊 Extendable framework for building analysis pipelines
- Coming soon: More analysis processors...
pip install swecc-email-scrapergit clone https://github.com/swecc-uw/swecc-email-scraper.git
cd swecc-email-scraper
pip install -e ".[dev]" # Install with development dependencies
# Run tests
pytestThe tool uses Unix pipes to compose commands. Each command does one thing and can be combined with others:
- Basic usage - get email stats with example processor:
swecc-email-scraper read mailbox.mbox \
| swecc-email-scraper stats \
| swecc-email-scraper format -f json > results.json- List available processors:
swecc-email-scraper list-processors- List available output formats:
swecc-email-scraper list-formatsReads an mbox file and outputs email data as JSON:
swecc-email-scraper read input.mbox > emails.jsonProcesses email data from stdin and outputs statistics:
cat emails.json | swecc-email-scraper stats > stats.jsonFormats JSON data using the specified formatter:
cat stats.json \
| swecc-email-scraper format -f json \
> formatted.json- Basic email statistics to terminal:
swecc-email-scraper read inbox.mbox \
| swecc-email-scraper stats \
| swecc-email-scraper format- Save analysis to a file:
swecc-email-scraper read inbox.mbox \
| swecc-email-scraper stats \
> analysis.json- Process with custom formatting:
swecc-email-scraper read inbox.mbox \
| swecc-email-scraper stats \
| swecc-email-scraper format -f json \
> analysis.json- Use with Unix tools:
# Filter emails before analysis
swecc-email-scraper read inbox.mbox \
| jq 'map(select(.sender | contains("important")))' \
| swecc-email-scraper statsThe tool is designed to be easily extensible. See CONTRIBUTING.md for detailed information on:
- Creating custom processors
- Adding new output formats
- Contributing to the project
- Development setup and guidelines
The tool uses a Unix pipeline architecture where:
readcommand converts mbox files to JSON email data- Processor commands (like
stats) transform or analyze the data formatcommand handles output formatting- Standard Unix pipes (
|) connect the components
MIT License - See LICENSE file for details.
Developed as part of SWECC Labs at the University of Washington.