Skip to content

Latest commit

 

History

History
191 lines (148 loc) · 7.52 KB

File metadata and controls

191 lines (148 loc) · 7.52 KB

Philosophy

Principiis obsta -- resist the beginnings.

Infrastructure drift is not a detection problem. It is a structural problem. By the time a missing bucket breaks a deployment, the damage is done. S3Spectre is designed to surface these conditions early -- in CI, in code review, in scheduled audits -- so they can be addressed before they matter.

The tool presents evidence and lets humans decide. It does not auto-remediate, does not guess intent, and does not assign confidence scores where deterministic checks suffice.

Installation

# Homebrew
brew install ppiankov/tap/s3spectre

# Docker
docker pull ghcr.io/ppiankov/s3spectre:latest

# From source
git clone https://github.com/ppiankov/s3spectre.git
cd s3spectre && make build

Usage

Scan mode

Cross-references S3 references in code with live AWS state.

# Basic scan
s3spectre scan --repo ./my-repo

# Specific AWS profile and regions
s3spectre scan --repo . --aws-profile production --regions us-east-1,eu-west-1

# JSON output for CI/CD
s3spectre scan --repo . --format json --output report.json

# Fail the pipeline on drift
s3spectre scan --repo . --fail-on-missing --fail-on-stale --stale-days 60

# Enable unused bucket detection
s3spectre scan --repo . --check-unused --fail-on-unused

# Include file-level reference details
s3spectre scan --repo . --include-references --format json

Scan flags:

Flag Default Description
--repo, -r . Repository path to scan
--aws-profile AWS profile
--aws-region Single region mode
--all-regions true Scan all enabled regions
--regions Specific regions (comma-separated)
--stale-days 90 Stale prefix threshold
--check-unused false Enable unused bucket scoring
--unused-threshold-days 180 Unused bucket threshold
--concurrency 10 Max concurrent S3 API calls
--format, -f text Output format: text or json
--output, -o stdout Output file
--fail-on-missing false Exit non-zero on missing buckets
--fail-on-stale false Exit non-zero on stale prefixes
--fail-on-version-sprawl false Exit non-zero on version sprawl
--fail-on-unused false Exit non-zero on unused buckets
--include-references false Include reference details in output
--no-progress false Disable TTY progress indicators

Discover mode

Audits all S3 buckets in an AWS account without requiring code references.

# Discover all buckets across all regions
s3spectre discover

# Security surface checks
s3spectre discover --check-encryption --check-public

# Custom staleness thresholds
s3spectre discover --age-threshold-days 730 --inactive-days 365

# CI/CD gating
s3spectre discover --fail-on-unused --fail-on-risky --format json

Discover flags:

Flag Default Description
--aws-profile AWS profile
--all-regions true Scan all enabled regions
--regions Specific regions (comma-separated)
--age-threshold-days 365 Flag buckets older than N days
--inactive-days 180 Flag buckets inactive for N days
--check-encryption false Flag missing encryption
--check-public false Flag public access
--concurrency 10 Max concurrent S3 API calls
--format, -f text Output format: text or json
--output, -o stdout Output file
--fail-on-unused false Exit non-zero on unused buckets
--fail-on-risky false Exit non-zero on risky configs
--no-progress false Disable TTY progress indicators

Drift classifications

Scan mode classifies each bucket and prefix into one of:

Status Meaning
MISSING_BUCKET Referenced in code, does not exist in AWS
UNUSED_BUCKET Exists in AWS, not referenced in code
MISSING_PREFIX Code references a prefix with no objects
STALE_PREFIX Prefix exists but unmodified for N days
VERSION_SPRAWL Versioning enabled, no lifecycle rules
LIFECYCLE_MISCONFIG Many objects, no lifecycle rules
OK Bucket and prefix match expected usage

Architecture

s3spectre/
├── cmd/s3spectre/main.go       # Entry point, delegates to commands
├── internal/
│   ├── commands/               # Cobra CLI: scan, discover, version
│   │   ├── root.go
│   │   ├── scan.go
│   │   ├── discover.go
│   │   ├── helpers.go          # Shared: error enhancement, status output
│   │   └── version.go
│   ├── scanner/                # Repository scanning (regex, YAML, Terraform, JSON, .env)
│   │   ├── scanner.go          # Orchestrator: walks files, dispatches to parsers
│   │   ├── regex.go            # S3 URL and bucket name pattern matching
│   │   ├── yaml.go
│   │   ├── terraform.go
│   │   ├── json.go
│   │   ├── env.go
│   │   └── types.go
│   ├── s3/                     # AWS S3 integration
│   │   ├── client.go           # S3 client wrapper with retry and backoff
│   │   ├── inspector.go        # Concurrent bucket and prefix inspection
│   │   └── types.go
│   ├── analyzer/               # Drift analysis and scoring
│   │   ├── analyzer.go         # Scan mode: code-vs-AWS correlation
│   │   ├── discovery.go        # Discover mode: account-wide heuristics
│   │   └── types.go
│   └── report/                 # Output generation
│       ├── text.go
│       ├── json.go
│       ├── discovery.go
│       └── types.go
├── Makefile
├── go.mod
└── go.sum

Key design decisions:

  • cmd/s3spectre/main.go is minimal -- a single Execute() call.
  • All logic lives in internal/ to prevent external import.
  • S3 API calls use a bounded worker pool (--concurrency) with exponential backoff.
  • Scanner dispatches files to format-specific parsers based on extension.
  • Analysis is deterministic: same inputs always produce the same classifications.

Known limitations

  • No object-level scanning. S3Spectre inspects bucket and prefix metadata. It does not list or read individual objects beyond what is needed for prefix existence and staleness checks.
  • Regex-based code scanning. The scanner uses pattern matching, not AST parsing. It will miss dynamically constructed bucket names and may produce false positives on commented-out code.
  • No cost estimation. The tool identifies unused resources but does not calculate storage costs.
  • IAM permissions required. Needs s3:ListBucket, s3:ListAllMyBuckets, s3:GetBucketLocation, s3:GetBucketVersioning, s3:GetLifecycleConfiguration, and s3:GetBucketTagging. Missing permissions produce access-denied errors, not silent failures.
  • No real-time monitoring. S3Spectre is a point-in-time scanner, not a daemon. Run it in CI or on a schedule.
  • Single AWS account. Cross-account scanning is not supported.
  • Progress line artifacts. The TTY progress indicator uses carriage return without clearing the full line, so shorter bucket names leave trailing characters from the previous name. Cosmetic only.

Roadmap

  • Cost estimation for unused and stale resources
  • Deep prefix scanning with pagination
  • Replication rule validation
  • IAM access path analysis
  • Naming convention enforcement
  • Historical trend tracking
  • SpectreHub integration for cross-tool correlation