Skip to content

applesauce777/DeepHashTelescope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Deep Hash Telescope

Advanced DHT Network Observatory v4

TL;DR

  • Passive DHT observer (no swarm participation)
  • Detects behavioral patterns (crawlers, monitoring, coordination)
  • CLI-based analysis (report.py)
  • Designed for research and network measurement

A passive BitTorrent DHT observatory that detects and classifies large-scale behavioral patterns in distributed hash table traffic.

Deep Hash Telescope is a passive measurement and analysis platform for the BitTorrent Distributed Hash Table. It analyzes large-scale DHT traffic patterns to surface unusual behavioral signals and coordinated campaigns.

Research focus: Identify and classify DHT traffic patterns without making claims about malicious intent. The system surfaces unusual network behaviors for further analysis.

🀝 Support

If you find Deep Hash Telescope valuable for your research or security work, please consider supporting the project:

[β˜• Support on Ko-fi](https://ko-fi.com/applesauce777) - Help maintain and develop this open-source intelligence platform

Opsec

This tool is strictly passive. It never opens TCP connections to peers, never queries torrent indexes, and never probes specific infohashes.

What it sends over UDP:

  • Bootstrap pings to public DHT nodes (identical to any BitTorrent client)
  • Pong responses to incoming pings (to stay in routing tables)
  • Find_node queries for our own node IDs (to maintain routing table health)

What it never sends:

  • No get_peers queries for external infohashes
  • No announce_peer messages (we never join swarms)
  • No TCP connections to any peers
  • No HTTP requests to torrent indexes or trackers

The system does not participate in torrent swarms and never retrieves peer lists. Your IP appears in DHT routing tables as a normal passive node with no association to specific content.

Quick Start

# Install dependencies
pip install geoip2 psutil

# Run antenna (Ctrl-C to stop)
python antenna_clean.py

# Compute behavioral profiles
python behavior_profiles.py compute

# View intelligence summary
python report.py

# View behavioral anomaly analysis
python report.py --behavior

# View systematic crawler detection
python report.py --prefix-walkers

# Direct prefix walking analysis
python prefix_walkers.py --hours 24

# View system metrics
python system_metrics.py

# Focused analysis
python report.py --operators      # Fleet behavior
python report.py --sweeps          # Keyspace sweeps
python report.py --rendezvous      # Timing patterns
python report.py --behavior       # Behavioral anomalies
python report.py --hash <hash>     # Specific hash analysis

πŸŽ›οΈ Intelligence Reporting CLI

The report.py command-line interface provides comprehensive analysis capabilities for DHT network intelligence. It's the primary tool for extracting insights from collected data.

πŸ“Š Core Reports

Full Intelligence Summary:

python report.py
# Complete overview with all key metrics and anomalies

Operator Analysis:

python report.py --operators
# Fleet behavior classifications and activity patterns

python report.py --operator <ASN>
# Deep-dive on specific autonomous system (e.g., 15830 or AS15830)

Behavioral Analysis:

python report.py --behavior
# Signal stacking analysis - identifies hashes with overlapping anomalies
# Shows top anomalous hashes by signal score (higher = more suspicious)
# Examples: burst_activity + observer_target + dead_swarm

πŸ” Advanced Detection

Systematic Pattern Detection:

python report.py --prefix-walkers
# Detects systematic crawler patterns in keyspace exploration
# Shows sequential hash scanning behavior

python report.py --prefix-entropy
# Analyzes hash prefix entropy for systematic patterns
# Low entropy = organized scanning (crawler-like)
# High entropy = normal random distribution

Network Coordination:

python report.py --coordination
# Multi-ASN coordination analysis
# Identifies cross-autonomous system collaboration patterns

Timing Pattern Analysis:

python report.py --sweeps
# Real-time keyspace sweep detection
# Shows systematic scanning across hash ranges

python report.py --rendezvous
# Numbers station and timing pattern analysis
# Detects scheduled communication patterns

πŸ€– Node Behavior Analysis

Operator Profiling:

python report.py --operator-profiles
# Classifies DHT nodes by behavior:
#   - crawler: Systematic keyspace scanning
#   - normal_client: Regular BitTorrent clients
#   - tracker_node: High announce volume, seeding
#   - measurement_node: Research/academic patterns
#   - indexer: Content cataloging systems

🎯 Targeted Analysis

Specific Hash Investigation:

python report.py --hash <infohash>
# Complete behavioral profile for specific hash
# Includes timing gaps, operator analysis, and classification

Prefix Family Analysis:

python report.py --prefixes
# Clustering analysis for hash prefix families
# Identifies related hash groups and patterns

πŸ“ˆ Analysis Workflow

Daily Intelligence Routine:

# 1. Full overview
python report.py

# 2. Focus on high-value anomalies
python report.py --behavior

# 3. Identify systematic scanners
python report.py --operator-profiles
python report.py --prefix-walkers

# 4. Investigate specific targets
python report.py --hash <suspicious_hash>

Research Investigation:

# 1. Understand network composition
python report.py --operators

# 2. Identify measurement activity
python report.py --operator-profiles

# 3. Analyze systematic patterns
python report.py --prefix-entropy
python report.py --coordination

# 4. Deep-dive on specific actors
python report.py --operator <research_AS>

πŸ”§ CLI Reference

Command Structure:

python report.py [OPTION]

Available Options:

  • (no args) - Full intelligence summary
  • --operators - Fleet behavior analysis
  • --operator <ASN> - Deep-dive on specific ASN
  • --behavior - Signal stacking and behavioral anomalies
  • --operator-profiles - Node behavior classification
  • --prefix-walkers - Systematic crawler detection
  • --prefix-entropy - Hash prefix entropy analysis
  • --sweeps - Keyspace sweep detection
  • --rendezvous - Timing pattern analysis
  • --coordination - Multi-ASN coordination
  • --prefixes - Prefix family clustering
  • --clustering - Operator clustering and distributed crawler detection
  • --heatmap - Keyspace heatmap visualization and scanning patterns
  • --campaigns - Long-term campaign tracking and persistent behavior analysis
  • --hash <hash> - Specific hash investigation

Output Features:

  • Signal stacking scores - Prioritizes multi-signal anomalies
  • Lifespan analysis - Distinguishes dead swarms vs active torrents
  • Operator classifications - Identifies crawlers, clients, and measurement nodes
  • Performance metrics - Query rates, burstiness, entropy scores
  • Timing analysis - Activity patterns and coordination detection

Integration with Data Processing:

# Before running reports, ensure profiles are computed:
python behavior_profiles.py compute
python operator_profiles.py compute

# Reports automatically use latest computed profiles

πŸ“‹ Sample Output

Behavioral Anomaly Analysis

$ python report.py --behavior

🎯 SIGNAL STACKING ANALYSIS:
Top hashes by overlapping signals (higher score = more anomalous):

    1. score:3  e13b0a00d8601785...
       signals: burst_activity, observer_target, dead_swarm
       events:   49  lookups:   49  asns:  1  burst: 1.00

    2. score:3  c276203789fa9c0c...
       signals: burst_activity, observer_target, dead_swarm
       events:   42  lookups:   42  asns:  2  burst: 1.00

(Note: Values shown are from a real sample dataset and will vary by deployment)

Keyspace Heatmap Analysis

$ python report.py --heatmap

πŸ“Š ANALYSIS OVERVIEW:
   Total Hashes: 83,651
   Unique Prefixes: 17,761
   Hot Zones: 4,156
   Systematic Scanners: 0

πŸ”₯ KEYSPACE HEATMAP (4-digit hex prefixes):
   Intensity: β–ˆ Very High  β–“ High  β–’ Moderate  β–‘ Low  Β· Minimal

β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ      
      Β·  β–ˆ      
 β–’   Β·          
 Β·              
β–’β–’β–ˆβ–ˆ    β–‘       
  β–‘             
β–“ β–’Β·β–ˆ  Β·        
  β–ˆΒ·            
β–“   Β·  Β·β–ˆβ–ˆ      
 Β·              

(Note: ASCII visualization represents real keyspace activity patterns)

Campaign Tracking Analysis

$ python report.py --campaigns

πŸ“Š ANALYSIS OVERVIEW:
   Period: 0.3 days analyzed
   Total Observations: 83,651
   Unique Operators: 9,254

🎯 CAMPAIGN SUMMARY:
   Persistent Observers: 0
   Scheduled Monitoring: 103
   Distributed Campaigns: 100
   Total Campaigns: 203

πŸš€ Advanced Intelligence Features

Operator Clustering:

python report.py --clustering
# Detects distributed crawler networks and coordinated operator clusters
# Groups IPs by ASN, behavioral similarity, hash overlap, and timing correlation
# Identifies multi-operator campaigns and infrastructure sharing

Keyspace Heatmaps:

python report.py --heatmap
# Visualizes DHT keyspace activity patterns with ASCII heatmaps
# Shows systematic scanning vs normal random distribution
# Identifies hot zones, cold zones, and monitoring targets
# Detects directional sweep patterns and scanning strategies

Campaign Tracking:

python report.py --campaigns
# Tracks long-term behavior patterns across days and weeks
# Detects persistent observers and scheduled automated monitoring
# Identifies coordinated multi-day campaigns and infrastructure persistence
# Analyzes temporal patterns and automation scheduling

Architecture

dht_protocol.py      ← Multi-port DHT listening (13 ports Γ— 16 node IDs = 208 total nodes)
      ↓
antenna_clean.py    ← Event processor & orchestrator (with bounded memory & connection pooling)
      ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Behavioral Detection Pipelines                           β”‚
β”‚  operators.py     ← Fleet behavior classification       β”‚
β”‚  sweeps.py        ← Real-time keyspace sweep detection  β”‚
β”‚  rendezvous.py    ← Scheduled timing patterns ("rendezvous")        β”‚
β”‚  behavior_profiles.py ← Behavioral fingerprinting       β”‚
β”‚  prefix_walkers.py ← Systematic crawler detection      β”‚
β”‚  prefix_entropy.py ← Hash prefix entropy analysis       β”‚
β”‚  operator_profiles.py ← Node behavior classification    β”‚
β”‚  operator_clustering.py ← Distributed crawler detection β”‚
β”‚  keyspace_heatmaps.py ← Keyspace visualization          β”‚
β”‚  campaign_tracker.py ← Long-term campaign analysis      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      ↓
bounded_collections.py ← Memory management with LRU eviction & TTL
db_pool.py          ← Database connection pooling & retry logic
system_metrics.py   ← System monitoring & performance metrics
async_db.py         ← Async database operations
      ↓
retention.py        ← Data management & rolling logs
      ↓
report.py           ← Simple CLI intelligence queries

πŸš€ Visibility Optimizations

Deterministic Keyspace Coverage:

  • 208 node IDs across 13 ports with deterministic prefix distribution
  • Each port covers exclusive keyspace slices for maximum routing inclusion
  • Eliminates clustering gaps found in random ID systems

Strategic Port Selection:

  • High-trust BitTorrent ports: 6881, 6882, 6883 (adjacent to primary)
  • Client defaults: 6969, 6889, 51413, 51414, 16881, 26881
  • Common ephemerals: 49160, 49161 (uTorrent/qBittorrent)
  • Fallback: 45682, 7881 (alternative client ports)

Node ID Rotation:

  • 12-hour rotation cycles with Β±30 minute jitter
  • IDs shift within their prefix regions for fresh routing entries
  • Avoids long-term bucket eviction while maintaining prefix authority
  • Research-backed optimization for sustained visibility

All data stored in dht_antenna.db (SQLite).

Signal Stacking

Each infohash may exhibit multiple behavioral signals simultaneously. The system assigns a signal score based on overlapping classifications.

Example:

score:3
burst_activity + observer_target + dead_swarm

Higher scores indicate increasingly unusual behavioral patterns.

Behavioral Classification System

🎯 Behavioral Fingerprinting

The system computes observable behavioral metrics for each infohash:

Core Metrics:

  • total_events - Total DHT events observed
  • lookup_count - get_peers requests (searches)
  • announce_count - announce_peer events (content sharing)
  • announce_ratio - announce_count / total_events
  • unique_ips - Number of distinct peer IPs
  • unique_asns - Number of distinct autonomous systems
  • duration_minutes - Time span of observations
  • burst_score - Event concentration in short windows (0-1)
  • periodicity_score - Regular timing patterns (0-1)
  • observer_score - Lookup-dominated behavior ratio (0-1)

🏷️ Behavioral Classifications

observer_target - Lookup-dominated surveillance

  • High lookup count, very low announce ratio (<0.05)
  • Typical: 1000+ lookups, <5% announces
  • Suggests: Hash monitoring, content availability checking

high_asn_interest - Multi-ASN coordination

  • Observed by 5+ distinct autonomous systems
  • Typical: 5+ ASNs, cross-infrastructure interest
  • Suggests: Coordinated monitoring, high-value targets

burst_activity - Concentrated event patterns

  • High burst score (>0.6), events clustered in short windows
  • Typical: Many events in <5 minute windows
  • Suggests: Coordinated probing, automated checking

periodic_lookup_target - Regular timing patterns

  • High periodicity score (>0.7), consistent lookup intervals
  • Typical: Regular checks every N hours/minutes
  • Suggests: Automated monitoring, scheduled checking

dead_swarm - Search without sharing

  • Many lookups but almost no announces (<2% ratio)
  • Typical: 50+ lookups, 0-1 announces
  • Suggests: Content searching, availability monitoring

swarm_like - Normal content distribution

  • Balanced announce/lookup ratio, many unique peers
  • Typical: 10%+ announce ratio, 20+ unique IPs
  • Suggests: Legitimate torrent activity

🏷️ Multi-Label Classification

Enhanced Analysis: Each infohash can now exhibit multiple behaviors simultaneously:

["observer_target", "high_asn_interest", "periodic_lookup_target"]

Benefits:

  • Richer patterns - Captures complex behavioral combinations
  • No information loss - Multiple classifications per hash
  • Enhanced detection - Overlapping behavioral signals

🚢 Systematic Crawler Detection

Prefix Walking Analysis identifies systematic keyspace traversal:

Detection Patterns:

  • Sequential walking (96aβ†’96bβ†’96cβ†’96d)
  • Research crawlers (academic measurement nodes)
  • Index builders (search engine indexing)
  • Network mapping (systematic enumeration)

Features:

  • Hex prefix sequence analysis
  • 2-hour time window correlation
  • Confidence scoring based on sequentiality
  • Persistent pattern storage and tracking

Usage:

# View prefix walking patterns
python report.py --prefix-walkers

# Direct analysis
python prefix_walkers.py --hours 24
[SWEEP] KEYSPACE WALK  65.38.165.59  AS29863
         prefixes 96a→96f  23 hashes  1080s  0.02/s  ascending  conf=0.85

⏰ Timing Pattern Detection

RETURN_VISIT - Same IP returns to same hash after >1h gap

  • Catches: Dead drop checking, periodic monitoring

SCHEDULED - 3+ visits with consistent intervals (Β±30% variance)

  • Catches: Regular automated checking
  • Example: 5 visits, every 24.0h Β±0.5h, 0.9 confidence

CROSS_HASH - Multiple hashes checked sequentially

  • Catches: Systematic monitoring across targets
  • Example: 5 hashes checked in 2-hour window

Intelligence Reports

Full Summary

python report.py

Shows:

  • Today's activity metrics
  • Behavioral anomaly breakdown
  • Operator classifications
  • Sweep detection statistics
  • Rendezvous patterns
  • Top operators by activity

Behavioral Analysis

python report.py --behavior

Shows:

  • Classification breakdown with statistics
  • High lookup hashes (surveillance candidates)
  • Multi-ASN hashes (coordination patterns)
  • Burst activity hashes (concentrated events)
  • Periodic lookup hashes (regular timing)

Focused Analysis

python report.py --operators      # Fleet behavior breakdown
python report.py --sweeps          # Recent sweep patterns
python report.py --rendezvous      # Numbers station scheduling
python report.py --coordination   # Multi-ASN co-appearance
python report.py --hash <hash>     # Specific hash analysis

System Metrics

python system_metrics.py                    # Current metrics summary

Shows:

  • CPU, memory, disk usage
  • Network I/O and open files
  • Events per second rate
  • Database connection pool status
  • Historical trends and averages

Database Schema

-- Raw DHT events (rolling 30 days)
dht_events (infohash, peer_ip, peer_port, timestamp, event_type, 
            source_node, asn, country)

-- Behavioral profiles (permanent)
hash_behavior_profiles (infohash, total_events, lookup_count, announce_count,
                        announce_ratio, unique_ips, unique_asns, duration_minutes,
                        lifespan_seconds, burst_score, periodicity_score, observer_score,
                        classifications, signal_score, computed_at)
                        # classifications stored as JSON array

-- Prefix walking patterns (permanent)
prefix_walk_events (id, source_ip, asn, prefix_sequence, start_time,
                   end_time, total_hashes, duration_seconds, hashes_per_second,
                   confidence, detected_at)

-- Operator profiles (permanent)
operators (ip, asn, country, actor_label, classification, 
           first_seen, last_seen, unique_hashes, total_events, notes)

-- Operator behavior profiles (permanent)
operator_profiles (ip, asn, total_queries, unique_hashes, query_rate_per_hour,
                  burst_score, entropy_score, classification, first_seen, last_seen, computed_at)

-- Sweep events (permanent)
sweep_events (id, operator_ip, asn, detected_at, prefix_start, prefix_end,
              hashes_observed, duration_seconds, hashes_per_sec)

-- Rendezvous events (permanent)
rendezvous_events (id, infohash, operator_ip, asn, visit_count, 
                first_visit, last_visit, mean_interval_h, classification)

-- Daily summaries (permanent, survive raw event pruning)
daily_summaries (date_utc, total_events, unique_hashes, observer_targets,
                  burst_activity, multi_asn_hashes, top_country, computed_at)

-- System metrics (rolling 7 days)
system_metrics (timestamp, cpu_percent, memory_percent, memory_mb,
                disk_usage_percent, disk_free_gb, network_bytes_sent,
                network_bytes_recv, open_files, threads, 
                active_connections, queue_size, db_connections, 
                events_per_second)

Configuration

All detection thresholds and system parameters in config.py:

Detection Thresholds

# Behavioral analysis
BEHAVIORAL_MIN_EVENTS = 10
BEHAVIORAL_REASSESS_INTERVAL = 300

# Operator classification
OPERATOR_BROAD_SWEEP_MIN_HASHES = 500
OPERATOR_TARGETED_MIN_DEPTH = 10.0

# Sweep detection  
SWEEP_MIN_HASHES = 10
SWEEP_SEQUENTIAL_RATIO = 0.7

# Rendezvous detection
RENDEZVOUS_MIN_SCHEDULED_VISITS = 3
RENDEZVOUS_SCHEDULED_MAX_VARIANCE = 0.3

System Configuration

# Database
DB_PATH = "dht_antenna.db"

# Performance & Memory Management
MAX_OBSERVATIONS = 10000          # Maximum hash observations in memory
MAX_OPERATOR_PROFILES = 5000       # Maximum operator profiles in memory
OBSERVATION_TTL = 3600           # 1 hour for hash observations
OPERATOR_TTL = 86400             # 24 hours for inactive operators

# Database connection pool
DB_POOL_MIN_CONNECTIONS = 2
DB_POOL_MAX_CONNECTIONS = 10
DB_POOL_MAX_IDLE_TIME = 300       # 5 minutes
DB_CONNECTION_TIMEOUT = 30

# Metrics collection
METRICS_ENABLED = True
METRICS_COLLECTION_INTERVAL = 60  # seconds
METRICS_RETENTION_HOURS = 168      # 1 week

Data Retention

Table Retention Purpose
dht_events 30 days rolling Raw noise, pruned automatically
system_metrics 7 days rolling System performance data
hash_behavior_profiles Permanent Behavioral intelligence
operators Permanent Operator intelligence
sweep_events Permanent Sweep pattern history
rendezvous_events Permanent Timing pattern evidence
daily_summaries Permanent Aggregate statistics

Pruning runs every 6 hours, summarizing expiring data before deletion.

Dependencies

Required

pip install geoip2 psutil

Optional (for enhanced features)

pip install requests  # For MaxMind database updates

Database Files

  • GeoLite2-ASN.mmdb - MaxMind ASN database (download from MaxMind)
  • GeoLite2-City.mmdb - MaxMind City database (download from MaxMind)

Place these in the same directory as antenna_clean.py.

Performance Expectations

On a typical VPS (2-4 cores, 4-8GB RAM):

Resource Usage

  • CPU: 5-15% during normal operation
  • Memory: 500-1000MB (bounded by configuration)
  • Disk: ~10MB/day for raw events, ~1MB/day for intelligence data
  • Network: 1-5MB/hour outbound (bootstrap/find_node responses only)

Scaling Characteristics

  • Multi-port architecture: 13 ports Γ— 16 node IDs = 208 total DHT presences
  • Connection pooling: 2-10 concurrent database connections
  • Bounded memory: Automatic LRU eviction prevents unbounded growth
  • Async operations: Thread pool for non-blocking database writes
  • SQL optimization: Aggregated queries with composite indexing for 1M+ row scalability

Throughput Estimates

  • Events processed: 1,000-10,000/hour depending on DHT traffic
  • Hashes tracked: Up to 10,000 concurrent observations
  • Operator profiles: Up to 10,000 concurrent profiles
  • Detection latency: <5 seconds for sweep patterns, <1 minute for rendezvous
  • Profile computation: 50k+ hash profiles in <30 seconds, 8k+ operator profiles in ~3 minutes

🎯 Use Cases

Academic Network Measurement

  • DHT traffic measurement and network topology analysis
  • Research-grade behavioral classification and pattern detection
  • Publication-quality data for network science papers

Detection of Crawler Infrastructure

  • Large-scale distributed crawler network identification
  • Systematic scanning pattern detection and attribution
  • Coordinated campaign analysis and infrastructure mapping

DHT Behavioral Research

  • Swarm discovery and lifecycle analysis
  • Operator behavior classification and profiling
  • Signal stacking analysis for anomaly detection
  • Long-term campaign tracking and persistence analysis

πŸ–ΌοΈ Screenshots & Visualizations

Keyspace Heatmap Visualization

πŸ”₯ KEYSPACE HEATMAP (4-digit hex prefixes):
   Intensity: β–ˆ Very High  β–“ High  β–’ Moderate  β–‘ Low  Β· Minimal

β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ      
      Β·  β–ˆ      
 β–’   Β·          
 Β·              
β–’β–’β–ˆβ–ˆ    β–‘       
  β–‘             
β–“ β–’Β·β–ˆ  Β·        
  β–ˆΒ·            
β–“   Β·  Β·β–ˆβ–ˆ      
 Β·              

Campaign Analysis Dashboard

🎯 CAMPAIGN SUMMARY:
   Persistent Observers: 0
   Scheduled Monitoring: 103
   Distributed Campaigns: 100
   Total Campaigns: 203

πŸ•·οΈ DISTRIBUTED CAMPAIGNS:
   β€’ distributed_8401a7d4_9ops
     Operators: 9 nodes (65.38.165.61, 65.38.165.54, ...)
     ASNs: 15830, 29863
     Coordination: 0.67
     Duration: 0.3 days

(Note: Campaign data represents actual detected coordination patterns)

Signal Stacking Analysis

🎯 SIGNAL STACKING ANALYSIS:
Top hashes by overlapping signals (higher score = more anomalous):

    1. score:3  e13b0a00d8601785...
       signals: burst_activity, observer_target, dead_swarm
       events:   49  lookups:   49  asns:  1  burst: 1.00

    2. score:3  c276203789fa9c0c...
       signals: burst_activity, observer_target, dead_swarm
       events:   42  lookups:   42  asns:  2  burst: 1.00

What to Look For

Highest Priority Findings

  1. BROAD_SWEEP + High ASN Count - Large-scale coordinated activity

    • Systematic keyspace traversal across multiple ASNs
    • High confidence (>0.8) with consistent methodology
  2. PERIODIC_LOOKUP_TARGET - Regular automated monitoring

    • 3+ visits to same hash with consistent intervals
    • High periodicity score (>0.7) with low variance
  3. HIGH_ASN_INTEREST - Multi-ASN coordination

    • Same hash observed by 5+ distinct autonomous systems
    • Cross-infrastructure monitoring patterns
  4. BURST_ACTIVITY + Observer Pattern - Coordinated probing

    • High burst score with lookup-dominated behavior
    • Suggests automated checking systems

Analysis Patterns

Fleet Coordination:

  • Multiple operators from same ASN targeting related hash prefixes
  • Synchronized timing across different IP ranges
  • Consistent methodology (sweep patterns, probe depths)

Automated Monitoring:

  • Regular periodic visits to specific hashes
  • High observer scores with low announce ratios
  • Cross-hash systematic checking

Infrastructure Patterns:

  • ASN diversity for monitoring coverage
  • Time-based operational patterns
  • Geographic clustering of activities

Troubleshooting

Common Issues

High Memory Usage:

  • Reduce MAX_OBSERVATIONS and MAX_OPERATOR_PROFILES in config.py
  • Monitor with python system_metrics.py

Database Lock Errors:

  • Increase DB_POOL_MAX_CONNECTIONS for higher concurrency
  • Check disk space and I/O performance

Missing GeoIP Data:

  • Download latest MaxMind databases:
    python mmupdate.py

Low Detection Rates:

  • Verify DHT ports are open (check firewall)
  • Confirm public IP accessibility
  • Adjust detection thresholds in config.py

Debug Mode

# Enable verbose logging
python antenna_clean.py --debug

# Test individual components
python test_detectors.py -v
python test_integration_reports.py

Security Considerations

Operational Security

  • Passive-only: No outbound connections to peers
  • Plausible deniability: Appears as normal DHT node
  • No content association: Never queries specific hashes
  • Minimal network footprint: UDP responses only

Data Protection

  • Local SQLite: No external data transmission
  • Configurable retention: Automatic cleanup of raw data
  • Bounded memory: Prevents data accumulation leaks
  • Encrypted connections: TLS for MaxMind updates (if used)

🚫 Limitations

No Content Visibility

  • Passive-only operation: Cannot see actual torrent content or file names
  • Hash-only analysis: Limited to infohash patterns without payload access
  • Routing table perspective: Views DHT through routing node participation

No Attribution

  • Observable patterns only: Classifies behavior, not intent or identity
  • Network-layer focus: ASN attribution without user-level identification
  • Behavioral inference: No access to actual motivations or purposes

Sampling Bias

  • DHT visibility β‰  full network: Only sees traffic reaching our nodes
  • Geographic bias: Limited to regions where our nodes are visible
  • Temporal bias: Analysis limited to operational time periods
  • Port-specific coverage: 208 node IDs may not represent entire keyspace

Technical Constraints

  • UDP-only protocol: Cannot observe TCP-based BitTorrent traffic
  • Routing table dependence: Effectiveness limited by DHT routing health
  • Memory constraints: Bounded collections may miss very long-term patterns
  • Database retention: Raw events pruned after 30 days

License

MIT License - see LICENSE file for details.

Changelog

v4.0 (Deep Hash Telescope)

  • βœ… Replaced darkness scoring with behavioral fingerprinting
  • βœ… Added 6 observable behavioral classifications
  • βœ… Implemented burst and periodicity analysis
  • βœ… Removed state actor watchlist (observable patterns only)
  • βœ… Updated reporting for behavioral anomalies
  • βœ… Enhanced hash analysis with behavioral profiles
  • βœ… Cleaned up deprecated modules and configurations
  • βœ… Signal Stacking Analysis - Multi-signal anomaly detection with scoring
  • βœ… Hash Lifespan Analysis - Distinguishes dead swarms vs active torrents
  • βœ… Operator Behavior Profiling - Node classification (crawler/client/measurement)
  • βœ… Prefix Entropy Detection - Systematic crawler pattern analysis
  • βœ… SQL Performance Optimization - Aggregated queries with proper indexing
  • βœ… Complete ASN Resolution - Network attribution across all detection modules
  • βœ… Enhanced CLI Workflow - Comprehensive analysis pipeline with one-liner support
  • βœ… Research-Grade Documentation - Professional README with academic positioning
  • βœ… Operator Clustering - Distributed crawler network detection and coordination analysis
  • βœ… Keyspace Heatmaps - ASCII visualization of DHT scanning patterns and hot zones
  • βœ… Campaign Tracking - Long-term persistent behavior and scheduled monitoring detection
  • βœ… Advanced Intelligence - Multi-operator campaign analysis and infrastructure mapping

v3.1 (Enhanced)

  • βœ… Added bounded memory collections with LRU eviction
  • βœ… Implemented database connection pooling with retry logic
  • βœ… Added comprehensive system metrics collection
  • βœ… Added async database operations framework
  • βœ… Fixed multi-port architecture documentation
  • βœ… Updated configuration with performance parameters
  • βœ… Enhanced error handling and logging

v3.0 (Original)

  • Multi-port DHT listening architecture
  • Real-time operator behavior classification
  • Sweep and rendezvous detection
  • Darkness scoring with 19 signals
  • State actor watchlist integration
  • SQLite database with retention policies

About

Passive BitTorrent DHT observatory for behavioral analysis and network measurement.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages