Deep Hash Telescope

Advanced DHT Network Observatory v4

TL;DR

Passive DHT observer (no swarm participation)
Detects behavioral patterns (crawlers, monitoring, coordination)
CLI-based analysis (report.py)
Designed for research and network measurement

A passive BitTorrent DHT observatory that detects and classifies large-scale behavioral patterns in distributed hash table traffic.

Deep Hash Telescope is a passive measurement and analysis platform for the BitTorrent Distributed Hash Table. It analyzes large-scale DHT traffic patterns to surface unusual behavioral signals and coordinated campaigns.

Research focus: Identify and classify DHT traffic patterns without making claims about malicious intent. The system surfaces unusual network behaviors for further analysis.

🤝 Support

If you find Deep Hash Telescope valuable for your research or security work, please consider supporting the project:

[☕ Support on Ko-fi](https://ko-fi.com/applesauce777) - Help maintain and develop this open-source intelligence platform

Opsec

This tool is strictly passive. It never opens TCP connections to peers, never queries torrent indexes, and never probes specific infohashes.

What it sends over UDP:

Bootstrap pings to public DHT nodes (identical to any BitTorrent client)
Pong responses to incoming pings (to stay in routing tables)
Find_node queries for our own node IDs (to maintain routing table health)

What it never sends:

No get_peers queries for external infohashes
No announce_peer messages (we never join swarms)
No TCP connections to any peers
No HTTP requests to torrent indexes or trackers

The system does not participate in torrent swarms and never retrieves peer lists. Your IP appears in DHT routing tables as a normal passive node with no association to specific content.

Quick Start

# Install dependencies
pip install geoip2 psutil

# Run antenna (Ctrl-C to stop)
python antenna_clean.py

# Compute behavioral profiles
python behavior_profiles.py compute

# View intelligence summary
python report.py

# View behavioral anomaly analysis
python report.py --behavior

# View systematic crawler detection
python report.py --prefix-walkers

# Direct prefix walking analysis
python prefix_walkers.py --hours 24

# View system metrics
python system_metrics.py

# Focused analysis
python report.py --operators      # Fleet behavior
python report.py --sweeps          # Keyspace sweeps
python report.py --rendezvous      # Timing patterns
python report.py --behavior       # Behavioral anomalies
python report.py --hash <hash>     # Specific hash analysis

🎛️ Intelligence Reporting CLI

The report.py command-line interface provides comprehensive analysis capabilities for DHT network intelligence. It's the primary tool for extracting insights from collected data.

📊 Core Reports

Full Intelligence Summary:

python report.py
# Complete overview with all key metrics and anomalies

Operator Analysis:

python report.py --operators
# Fleet behavior classifications and activity patterns

python report.py --operator <ASN>
# Deep-dive on specific autonomous system (e.g., 15830 or AS15830)

Behavioral Analysis:

python report.py --behavior
# Signal stacking analysis - identifies hashes with overlapping anomalies
# Shows top anomalous hashes by signal score (higher = more suspicious)
# Examples: burst_activity + observer_target + dead_swarm

🔍 Advanced Detection

Systematic Pattern Detection:

python report.py --prefix-walkers
# Detects systematic crawler patterns in keyspace exploration
# Shows sequential hash scanning behavior

python report.py --prefix-entropy
# Analyzes hash prefix entropy for systematic patterns
# Low entropy = organized scanning (crawler-like)
# High entropy = normal random distribution

Network Coordination:

python report.py --coordination
# Multi-ASN coordination analysis
# Identifies cross-autonomous system collaboration patterns

Timing Pattern Analysis:

python report.py --sweeps
# Real-time keyspace sweep detection
# Shows systematic scanning across hash ranges

python report.py --rendezvous
# Numbers station and timing pattern analysis
# Detects scheduled communication patterns

🤖 Node Behavior Analysis

Operator Profiling:

python report.py --operator-profiles
# Classifies DHT nodes by behavior:
#   - crawler: Systematic keyspace scanning
#   - normal_client: Regular BitTorrent clients
#   - tracker_node: High announce volume, seeding
#   - measurement_node: Research/academic patterns
#   - indexer: Content cataloging systems

🎯 Targeted Analysis

Specific Hash Investigation:

python report.py --hash <infohash>
# Complete behavioral profile for specific hash
# Includes timing gaps, operator analysis, and classification

Prefix Family Analysis:

python report.py --prefixes
# Clustering analysis for hash prefix families
# Identifies related hash groups and patterns

📈 Analysis Workflow

Daily Intelligence Routine:

# 1. Full overview
python report.py

# 2. Focus on high-value anomalies
python report.py --behavior

# 3. Identify systematic scanners
python report.py --operator-profiles
python report.py --prefix-walkers

# 4. Investigate specific targets
python report.py --hash <suspicious_hash>

Research Investigation:

# 1. Understand network composition
python report.py --operators

# 2. Identify measurement activity
python report.py --operator-profiles

# 3. Analyze systematic patterns
python report.py --prefix-entropy
python report.py --coordination

# 4. Deep-dive on specific actors
python report.py --operator <research_AS>

🔧 CLI Reference

Command Structure:

python report.py [OPTION]

Available Options:

(no args) - Full intelligence summary
--operators - Fleet behavior analysis
--operator <ASN> - Deep-dive on specific ASN
--behavior - Signal stacking and behavioral anomalies
--operator-profiles - Node behavior classification
--prefix-walkers - Systematic crawler detection
--prefix-entropy - Hash prefix entropy analysis
--sweeps - Keyspace sweep detection
--rendezvous - Timing pattern analysis
--coordination - Multi-ASN coordination
--prefixes - Prefix family clustering
--clustering - Operator clustering and distributed crawler detection
--heatmap - Keyspace heatmap visualization and scanning patterns
--campaigns - Long-term campaign tracking and persistent behavior analysis
--hash <hash> - Specific hash investigation

Output Features:

Signal stacking scores - Prioritizes multi-signal anomalies
Lifespan analysis - Distinguishes dead swarms vs active torrents
Operator classifications - Identifies crawlers, clients, and measurement nodes
Performance metrics - Query rates, burstiness, entropy scores
Timing analysis - Activity patterns and coordination detection

Integration with Data Processing:

# Before running reports, ensure profiles are computed:
python behavior_profiles.py compute
python operator_profiles.py compute

# Reports automatically use latest computed profiles

📋 Sample Output

Behavioral Anomaly Analysis

$ python report.py --behavior

🎯 SIGNAL STACKING ANALYSIS:
Top hashes by overlapping signals (higher score = more anomalous):

    1. score:3  e13b0a00d8601785...
       signals: burst_activity, observer_target, dead_swarm
       events:   49  lookups:   49  asns:  1  burst: 1.00

    2. score:3  c276203789fa9c0c...
       signals: burst_activity, observer_target, dead_swarm
       events:   42  lookups:   42  asns:  2  burst: 1.00

(Note: Values shown are from a real sample dataset and will vary by deployment)

Keyspace Heatmap Analysis

$ python report.py --heatmap

📊 ANALYSIS OVERVIEW:
   Total Hashes: 83,651
   Unique Prefixes: 17,761
   Hot Zones: 4,156
   Systematic Scanners: 0

🔥 KEYSPACE HEATMAP (4-digit hex prefixes):
   Intensity: █ Very High  ▓ High  ▒ Moderate  ░ Low  · Minimal

██████████      
      ·  █      
 ▒   ·          
 ·              
▒▒██    ░       
  ░             
▓ ▒·█  ·        
  █·            
▓   ·  ·██      
 ·

(Note: ASCII visualization represents real keyspace activity patterns)

Campaign Tracking Analysis

$ python report.py --campaigns

📊 ANALYSIS OVERVIEW:
   Period: 0.3 days analyzed
   Total Observations: 83,651
   Unique Operators: 9,254

🎯 CAMPAIGN SUMMARY:
   Persistent Observers: 0
   Scheduled Monitoring: 103
   Distributed Campaigns: 100
   Total Campaigns: 203

🚀 Advanced Intelligence Features

Operator Clustering:

python report.py --clustering
# Detects distributed crawler networks and coordinated operator clusters
# Groups IPs by ASN, behavioral similarity, hash overlap, and timing correlation
# Identifies multi-operator campaigns and infrastructure sharing

Keyspace Heatmaps:

python report.py --heatmap
# Visualizes DHT keyspace activity patterns with ASCII heatmaps
# Shows systematic scanning vs normal random distribution
# Identifies hot zones, cold zones, and monitoring targets
# Detects directional sweep patterns and scanning strategies

Campaign Tracking:

python report.py --campaigns
# Tracks long-term behavior patterns across days and weeks
# Detects persistent observers and scheduled automated monitoring
# Identifies coordinated multi-day campaigns and infrastructure persistence
# Analyzes temporal patterns and automation scheduling

Architecture

dht_protocol.py      ← Multi-port DHT listening (13 ports × 16 node IDs = 208 total nodes)
      ↓
antenna_clean.py    ← Event processor & orchestrator (with bounded memory & connection pooling)
      ↓
┌─────────────────────────────────────────────────┐
│  Behavioral Detection Pipelines                           │
│  operators.py     ← Fleet behavior classification       │
│  sweeps.py        ← Real-time keyspace sweep detection  │
│  rendezvous.py    ← Scheduled timing patterns ("rendezvous")        │
│  behavior_profiles.py ← Behavioral fingerprinting       │
│  prefix_walkers.py ← Systematic crawler detection      │
│  prefix_entropy.py ← Hash prefix entropy analysis       │
│  operator_profiles.py ← Node behavior classification    │
│  operator_clustering.py ← Distributed crawler detection │
│  keyspace_heatmaps.py ← Keyspace visualization          │
│  campaign_tracker.py ← Long-term campaign analysis      │
└─────────────────────────────────────────────────┘
      ↓
bounded_collections.py ← Memory management with LRU eviction & TTL
db_pool.py          ← Database connection pooling & retry logic
system_metrics.py   ← System monitoring & performance metrics
async_db.py         ← Async database operations
      ↓
retention.py        ← Data management & rolling logs
      ↓
report.py           ← Simple CLI intelligence queries

🚀 Visibility Optimizations

Deterministic Keyspace Coverage:

208 node IDs across 13 ports with deterministic prefix distribution
Each port covers exclusive keyspace slices for maximum routing inclusion
Eliminates clustering gaps found in random ID systems

Strategic Port Selection:

High-trust BitTorrent ports: 6881, 6882, 6883 (adjacent to primary)
Client defaults: 6969, 6889, 51413, 51414, 16881, 26881
Common ephemerals: 49160, 49161 (uTorrent/qBittorrent)
Fallback: 45682, 7881 (alternative client ports)

Node ID Rotation:

12-hour rotation cycles with ±30 minute jitter
IDs shift within their prefix regions for fresh routing entries
Avoids long-term bucket eviction while maintaining prefix authority
Research-backed optimization for sustained visibility

All data stored in dht_antenna.db (SQLite).

Signal Stacking

Each infohash may exhibit multiple behavioral signals simultaneously. The system assigns a signal score based on overlapping classifications.

Example:

score:3
burst_activity + observer_target + dead_swarm

Higher scores indicate increasingly unusual behavioral patterns.

Behavioral Classification System

🎯 Behavioral Fingerprinting

The system computes observable behavioral metrics for each infohash:

Core Metrics:

total_events - Total DHT events observed
lookup_count - get_peers requests (searches)
announce_count - announce_peer events (content sharing)
announce_ratio - announce_count / total_events
unique_ips - Number of distinct peer IPs
unique_asns - Number of distinct autonomous systems
duration_minutes - Time span of observations
burst_score - Event concentration in short windows (0-1)
periodicity_score - Regular timing patterns (0-1)
observer_score - Lookup-dominated behavior ratio (0-1)

🏷️ Behavioral Classifications

observer_target - Lookup-dominated surveillance

High lookup count, very low announce ratio (<0.05)
Typical: 1000+ lookups, <5% announces
Suggests: Hash monitoring, content availability checking

high_asn_interest - Multi-ASN coordination

Observed by 5+ distinct autonomous systems
Typical: 5+ ASNs, cross-infrastructure interest
Suggests: Coordinated monitoring, high-value targets

burst_activity - Concentrated event patterns

High burst score (>0.6), events clustered in short windows
Typical: Many events in <5 minute windows
Suggests: Coordinated probing, automated checking

periodic_lookup_target - Regular timing patterns

High periodicity score (>0.7), consistent lookup intervals
Typical: Regular checks every N hours/minutes
Suggests: Automated monitoring, scheduled checking

dead_swarm - Search without sharing

Many lookups but almost no announces (<2% ratio)
Typical: 50+ lookups, 0-1 announces
Suggests: Content searching, availability monitoring

swarm_like - Normal content distribution

Balanced announce/lookup ratio, many unique peers
Typical: 10%+ announce ratio, 20+ unique IPs
Suggests: Legitimate torrent activity

🏷️ Multi-Label Classification

Enhanced Analysis: Each infohash can now exhibit multiple behaviors simultaneously:

["observer_target", "high_asn_interest", "periodic_lookup_target"]

Benefits:

Richer patterns - Captures complex behavioral combinations
No information loss - Multiple classifications per hash
Enhanced detection - Overlapping behavioral signals

🚶 Systematic Crawler Detection

Prefix Walking Analysis identifies systematic keyspace traversal:

Detection Patterns:

Sequential walking (96a→96b→96c→96d)
Research crawlers (academic measurement nodes)
Index builders (search engine indexing)
Network mapping (systematic enumeration)

Features:

Hex prefix sequence analysis
2-hour time window correlation
Confidence scoring based on sequentiality
Persistent pattern storage and tracking

Usage:

# View prefix walking patterns
python report.py --prefix-walkers

# Direct analysis
python prefix_walkers.py --hours 24

[SWEEP] KEYSPACE WALK  65.38.165.59  AS29863
         prefixes 96a→96f  23 hashes  1080s  0.02/s  ascending  conf=0.85

⏰ Timing Pattern Detection

RETURN_VISIT - Same IP returns to same hash after >1h gap

Catches: Dead drop checking, periodic monitoring

SCHEDULED - 3+ visits with consistent intervals (±30% variance)

Catches: Regular automated checking
Example: 5 visits, every 24.0h ±0.5h, 0.9 confidence

CROSS_HASH - Multiple hashes checked sequentially

Catches: Systematic monitoring across targets
Example: 5 hashes checked in 2-hour window

Intelligence Reports

Full Summary

python report.py

Shows:

Today's activity metrics
Behavioral anomaly breakdown
Operator classifications
Sweep detection statistics
Rendezvous patterns
Top operators by activity

Behavioral Analysis

python report.py --behavior

Shows:

Classification breakdown with statistics
High lookup hashes (surveillance candidates)
Multi-ASN hashes (coordination patterns)
Burst activity hashes (concentrated events)
Periodic lookup hashes (regular timing)

Focused Analysis

python report.py --operators      # Fleet behavior breakdown
python report.py --sweeps          # Recent sweep patterns
python report.py --rendezvous      # Numbers station scheduling
python report.py --coordination   # Multi-ASN co-appearance
python report.py --hash <hash>     # Specific hash analysis

System Metrics

python system_metrics.py                    # Current metrics summary

Shows:

CPU, memory, disk usage
Network I/O and open files
Events per second rate
Database connection pool status
Historical trends and averages

Database Schema

-- Raw DHT events (rolling 30 days)
dht_events (infohash, peer_ip, peer_port, timestamp, event_type, 
            source_node, asn, country)

-- Behavioral profiles (permanent)
hash_behavior_profiles (infohash, total_events, lookup_count, announce_count,
                        announce_ratio, unique_ips, unique_asns, duration_minutes,
                        lifespan_seconds, burst_score, periodicity_score, observer_score,
                        classifications, signal_score, computed_at)
                        # classifications stored as JSON array

-- Prefix walking patterns (permanent)
prefix_walk_events (id, source_ip, asn, prefix_sequence, start_time,
                   end_time, total_hashes, duration_seconds, hashes_per_second,
                   confidence, detected_at)

-- Operator profiles (permanent)
operators (ip, asn, country, actor_label, classification, 
           first_seen, last_seen, unique_hashes, total_events, notes)

-- Operator behavior profiles (permanent)
operator_profiles (ip, asn, total_queries, unique_hashes, query_rate_per_hour,
                  burst_score, entropy_score, classification, first_seen, last_seen, computed_at)

-- Sweep events (permanent)
sweep_events (id, operator_ip, asn, detected_at, prefix_start, prefix_end,
              hashes_observed, duration_seconds, hashes_per_sec)

-- Rendezvous events (permanent)
rendezvous_events (id, infohash, operator_ip, asn, visit_count, 
                first_visit, last_visit, mean_interval_h, classification)

-- Daily summaries (permanent, survive raw event pruning)
daily_summaries (date_utc, total_events, unique_hashes, observer_targets,
                  burst_activity, multi_asn_hashes, top_country, computed_at)

-- System metrics (rolling 7 days)
system_metrics (timestamp, cpu_percent, memory_percent, memory_mb,
                disk_usage_percent, disk_free_gb, network_bytes_sent,
                network_bytes_recv, open_files, threads, 
                active_connections, queue_size, db_connections, 
                events_per_second)

Configuration

All detection thresholds and system parameters in config.py:

Detection Thresholds

# Behavioral analysis
BEHAVIORAL_MIN_EVENTS = 10
BEHAVIORAL_REASSESS_INTERVAL = 300

# Operator classification
OPERATOR_BROAD_SWEEP_MIN_HASHES = 500
OPERATOR_TARGETED_MIN_DEPTH = 10.0

# Sweep detection  
SWEEP_MIN_HASHES = 10
SWEEP_SEQUENTIAL_RATIO = 0.7

# Rendezvous detection
RENDEZVOUS_MIN_SCHEDULED_VISITS = 3
RENDEZVOUS_SCHEDULED_MAX_VARIANCE = 0.3

System Configuration

# Database
DB_PATH = "dht_antenna.db"

# Performance & Memory Management
MAX_OBSERVATIONS = 10000          # Maximum hash observations in memory
MAX_OPERATOR_PROFILES = 5000       # Maximum operator profiles in memory
OBSERVATION_TTL = 3600           # 1 hour for hash observations
OPERATOR_TTL = 86400             # 24 hours for inactive operators

# Database connection pool
DB_POOL_MIN_CONNECTIONS = 2
DB_POOL_MAX_CONNECTIONS = 10
DB_POOL_MAX_IDLE_TIME = 300       # 5 minutes
DB_CONNECTION_TIMEOUT = 30

# Metrics collection
METRICS_ENABLED = True
METRICS_COLLECTION_INTERVAL = 60  # seconds
METRICS_RETENTION_HOURS = 168      # 1 week

Data Retention

Table	Retention	Purpose
`dht_events`	30 days rolling	Raw noise, pruned automatically
`system_metrics`	7 days rolling	System performance data
`hash_behavior_profiles`	Permanent	Behavioral intelligence
`operators`	Permanent	Operator intelligence
`sweep_events`	Permanent	Sweep pattern history
`rendezvous_events`	Permanent	Timing pattern evidence
`daily_summaries`	Permanent	Aggregate statistics

Pruning runs every 6 hours, summarizing expiring data before deletion.

Dependencies

Required

pip install geoip2 psutil

Optional (for enhanced features)

pip install requests  # For MaxMind database updates

Database Files

GeoLite2-ASN.mmdb - MaxMind ASN database (download from MaxMind)
GeoLite2-City.mmdb - MaxMind City database (download from MaxMind)

Place these in the same directory as antenna_clean.py.

Performance Expectations

On a typical VPS (2-4 cores, 4-8GB RAM):

Resource Usage

CPU: 5-15% during normal operation
Memory: 500-1000MB (bounded by configuration)
Disk: ~10MB/day for raw events, ~1MB/day for intelligence data
Network: 1-5MB/hour outbound (bootstrap/find_node responses only)

Scaling Characteristics

Multi-port architecture: 13 ports × 16 node IDs = 208 total DHT presences
Connection pooling: 2-10 concurrent database connections
Bounded memory: Automatic LRU eviction prevents unbounded growth
Async operations: Thread pool for non-blocking database writes
SQL optimization: Aggregated queries with composite indexing for 1M+ row scalability

Throughput Estimates

Events processed: 1,000-10,000/hour depending on DHT traffic
Hashes tracked: Up to 10,000 concurrent observations
Operator profiles: Up to 10,000 concurrent profiles
Detection latency: <5 seconds for sweep patterns, <1 minute for rendezvous
Profile computation: 50k+ hash profiles in <30 seconds, 8k+ operator profiles in ~3 minutes

🎯 Use Cases

Academic Network Measurement

DHT traffic measurement and network topology analysis
Research-grade behavioral classification and pattern detection
Publication-quality data for network science papers

Detection of Crawler Infrastructure

Large-scale distributed crawler network identification
Systematic scanning pattern detection and attribution
Coordinated campaign analysis and infrastructure mapping

DHT Behavioral Research

Swarm discovery and lifecycle analysis
Operator behavior classification and profiling
Signal stacking analysis for anomaly detection
Long-term campaign tracking and persistence analysis

🖼️ Screenshots & Visualizations

Keyspace Heatmap Visualization

🔥 KEYSPACE HEATMAP (4-digit hex prefixes):
   Intensity: █ Very High  ▓ High  ▒ Moderate  ░ Low  · Minimal

██████████      
      ·  █      
 ▒   ·          
 ·              
▒▒██    ░       
  ░             
▓ ▒·█  ·        
  █·            
▓   ·  ·██      
 ·

Campaign Analysis Dashboard

🎯 CAMPAIGN SUMMARY:
   Persistent Observers: 0
   Scheduled Monitoring: 103
   Distributed Campaigns: 100
   Total Campaigns: 203

🕷️ DISTRIBUTED CAMPAIGNS:
   • distributed_8401a7d4_9ops
     Operators: 9 nodes (65.38.165.61, 65.38.165.54, ...)
     ASNs: 15830, 29863
     Coordination: 0.67
     Duration: 0.3 days

(Note: Campaign data represents actual detected coordination patterns)

Signal Stacking Analysis

🎯 SIGNAL STACKING ANALYSIS:
Top hashes by overlapping signals (higher score = more anomalous):

    1. score:3  e13b0a00d8601785...
       signals: burst_activity, observer_target, dead_swarm
       events:   49  lookups:   49  asns:  1  burst: 1.00

    2. score:3  c276203789fa9c0c...
       signals: burst_activity, observer_target, dead_swarm
       events:   42  lookups:   42  asns:  2  burst: 1.00

What to Look For

Highest Priority Findings

BROAD_SWEEP + High ASN Count - Large-scale coordinated activity
- Systematic keyspace traversal across multiple ASNs
- High confidence (>0.8) with consistent methodology
PERIODIC_LOOKUP_TARGET - Regular automated monitoring
- 3+ visits to same hash with consistent intervals
- High periodicity score (>0.7) with low variance
HIGH_ASN_INTEREST - Multi-ASN coordination
- Same hash observed by 5+ distinct autonomous systems
- Cross-infrastructure monitoring patterns
BURST_ACTIVITY + Observer Pattern - Coordinated probing
- High burst score with lookup-dominated behavior
- Suggests automated checking systems

Analysis Patterns

Fleet Coordination:

Multiple operators from same ASN targeting related hash prefixes
Synchronized timing across different IP ranges
Consistent methodology (sweep patterns, probe depths)

Automated Monitoring:

Regular periodic visits to specific hashes
High observer scores with low announce ratios
Cross-hash systematic checking

Infrastructure Patterns:

ASN diversity for monitoring coverage
Time-based operational patterns
Geographic clustering of activities

Troubleshooting

Common Issues

High Memory Usage:

Reduce MAX_OBSERVATIONS and MAX_OPERATOR_PROFILES in config.py
Monitor with python system_metrics.py

Database Lock Errors:

Increase DB_POOL_MAX_CONNECTIONS for higher concurrency
Check disk space and I/O performance

Missing GeoIP Data:

Download latest MaxMind databases:
```
python mmupdate.py
```

Low Detection Rates:

Verify DHT ports are open (check firewall)
Confirm public IP accessibility
Adjust detection thresholds in config.py

Debug Mode

# Enable verbose logging
python antenna_clean.py --debug

# Test individual components
python test_detectors.py -v
python test_integration_reports.py

Security Considerations

Operational Security

Passive-only: No outbound connections to peers
Plausible deniability: Appears as normal DHT node
No content association: Never queries specific hashes
Minimal network footprint: UDP responses only

Data Protection

Local SQLite: No external data transmission
Configurable retention: Automatic cleanup of raw data
Bounded memory: Prevents data accumulation leaks
Encrypted connections: TLS for MaxMind updates (if used)

🚫 Limitations

No Content Visibility

Passive-only operation: Cannot see actual torrent content or file names
Hash-only analysis: Limited to infohash patterns without payload access
Routing table perspective: Views DHT through routing node participation

No Attribution

Observable patterns only: Classifies behavior, not intent or identity
Network-layer focus: ASN attribution without user-level identification
Behavioral inference: No access to actual motivations or purposes

Sampling Bias

DHT visibility ≠ full network: Only sees traffic reaching our nodes
Geographic bias: Limited to regions where our nodes are visible
Temporal bias: Analysis limited to operational time periods
Port-specific coverage: 208 node IDs may not represent entire keyspace

Technical Constraints

UDP-only protocol: Cannot observe TCP-based BitTorrent traffic
Routing table dependence: Effectiveness limited by DHT routing health
Memory constraints: Bounded collections may miss very long-term patterns
Database retention: Raw events pruned after 30 days

License

MIT License - see LICENSE file for details.

Changelog

v4.0 (Deep Hash Telescope)

✅ Replaced darkness scoring with behavioral fingerprinting
✅ Added 6 observable behavioral classifications
✅ Implemented burst and periodicity analysis
✅ Removed state actor watchlist (observable patterns only)
✅ Updated reporting for behavioral anomalies
✅ Enhanced hash analysis with behavioral profiles
✅ Cleaned up deprecated modules and configurations
✅ Signal Stacking Analysis - Multi-signal anomaly detection with scoring
✅ Hash Lifespan Analysis - Distinguishes dead swarms vs active torrents
✅ Operator Behavior Profiling - Node classification (crawler/client/measurement)
✅ Prefix Entropy Detection - Systematic crawler pattern analysis
✅ SQL Performance Optimization - Aggregated queries with proper indexing
✅ Complete ASN Resolution - Network attribution across all detection modules
✅ Enhanced CLI Workflow - Comprehensive analysis pipeline with one-liner support
✅ Research-Grade Documentation - Professional README with academic positioning
✅ Operator Clustering - Distributed crawler network detection and coordination analysis
✅ Keyspace Heatmaps - ASCII visualization of DHT scanning patterns and hot zones
✅ Campaign Tracking - Long-term persistent behavior and scheduled monitoring detection
✅ Advanced Intelligence - Multi-operator campaign analysis and infrastructure mapping

v3.1 (Enhanced)

✅ Added bounded memory collections with LRU eviction
✅ Implemented database connection pooling with retry logic
✅ Added comprehensive system metrics collection
✅ Added async database operations framework
✅ Fixed multi-port architecture documentation
✅ Updated configuration with performance parameters
✅ Enhanced error handling and logging

v3.0 (Original)

Multi-port DHT listening architecture
Real-time operator behavior classification
Sweep and rendezvous detection
Darkness scoring with 19 signals
State actor watchlist integration
SQLite database with retention policies

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
antenna_clean.py		antenna_clean.py
behavior_profiles.py		behavior_profiles.py
bounded_collections.py		bounded_collections.py
campaign_tracker.py		campaign_tracker.py
config.py		config.py
db_pool.py		db_pool.py
dht_protocol.py		dht_protocol.py
geoip.py		geoip.py
keyspace_heatmaps.py		keyspace_heatmaps.py
mmupdate.py		mmupdate.py
operator_clustering.py		operator_clustering.py
operator_profiles.py		operator_profiles.py
operators.py		operators.py
prefix_entropy.py		prefix_entropy.py
prefix_walkers.py		prefix_walkers.py
rendezvous.py		rendezvous.py
report.py		report.py
retention.py		retention.py
sweeps.py		sweeps.py
system_metrics.py		system_metrics.py

Folders and files

Latest commit

History

Repository files navigation

Deep Hash Telescope

Advanced DHT Network Observatory v4

TL;DR

🤝 Support

Opsec

Quick Start

🎛️ Intelligence Reporting CLI

📊 Core Reports

🔍 Advanced Detection

🤖 Node Behavior Analysis

🎯 Targeted Analysis

📈 Analysis Workflow

🔧 CLI Reference

📋 Sample Output

Behavioral Anomaly Analysis

Keyspace Heatmap Analysis

Campaign Tracking Analysis

🚀 Advanced Intelligence Features

Architecture

🚀 Visibility Optimizations

Signal Stacking

Behavioral Classification System

🎯 Behavioral Fingerprinting

🏷️ Behavioral Classifications

🏷️ Multi-Label Classification

🚶 Systematic Crawler Detection

⏰ Timing Pattern Detection

Intelligence Reports

Full Summary

Behavioral Analysis

Focused Analysis

System Metrics

Database Schema

Configuration

Detection Thresholds

System Configuration

Data Retention

Dependencies

Required

Optional (for enhanced features)

Database Files

Performance Expectations

Resource Usage

Scaling Characteristics

Throughput Estimates

🎯 Use Cases

Academic Network Measurement

Detection of Crawler Infrastructure

DHT Behavioral Research

🖼️ Screenshots & Visualizations

Keyspace Heatmap Visualization

Campaign Analysis Dashboard

Signal Stacking Analysis

What to Look For

Highest Priority Findings

Analysis Patterns

Troubleshooting

Common Issues

Debug Mode

Security Considerations

Operational Security

Data Protection

🚫 Limitations

No Content Visibility

No Attribution

Sampling Bias

Technical Constraints

License

Changelog

v4.0 (Deep Hash Telescope)

v3.1 (Enhanced)

v3.0 (Original)

About

Topics

Resources

License

Packages