Comprehensive Exomoon Search - Usage Guide

Quick Start

# Basic usage - process all planets
python3 comprehensive_exomoon_search.py

# Process with more workers (faster)
python3 comprehensive_exomoon_search.py --workers 8

# Process a subset (e.g., first 100 planets)
python3 comprehensive_exomoon_search.py --start 0 --end 100

# Resume from where you left off (automatically skips processed planets)
python3 comprehensive_exomoon_search.py

Command Line Options

--input FILE          Input CSV file (default: ranked_transiting_planets.csv)
--output FILE         Output CSV file (default: comprehensive_exomoon_results.csv)
--workers N           Number of parallel workers (default: 4)
--start N             Start index for processing (default: 0)
--end N               End index (default: None = all)
--shard-id N          Shard ID for parallel execution
--num-shards N        Total number of shards
--cadence POLICY      Cadence policy (short, long, short_then_any, any)
--cache-dir DIR       Cache directory for preprocessed light curves
--cache               Enable persistent light-curve caching (disabled by default)

Parallel Execution Example

To process in 8 parallel shards:

# Terminal 1
python3 comprehensive_exomoon_search.py --shard-id 0 --num-shards 8 &

# Terminal 2
python3 comprehensive_exomoon_search.py --shard-id 1 --num-shards 8 &

# ... etc for shards 2-7

# Or use the provided script:
bash run_all_shards.sh  # Modify for 8 shards if needed

What the Script Does

The script implements complementary methods over shared transit features:

Skew Detection - Analyzes asymmetry between ingress/egress areas
TTV Detection - Detects Transit Timing Variations
Shoulder Detection - Looks for anomalies in transit shoulders
Variability Detection - Compares variability in transit regions vs baseline
Duration Periodicity - Tracks periodic transit-duration changes

Improvements Over Original Pipeline

✅ Fixed bugs - Proper duration validation, error handling ✅ Dual hypothesis scoring - single-moon and multi-moon interpretations ✅ Statistical significance - FAP (False Alarm Probability) for all detections ✅ Data quality metrics - SNR, completeness, number of transits ✅ Combined scoring - Two weighted scores, plus legacy alias ✅ Robust error handling - Continues processing even if one method fails

Output Format

The output CSV contains:

Basic Information

planet_name - Planet identifier
tic_id - TIC ID for light curve access
P_expected - Expected orbital period
Status - OK or Error
Reason - Error message if failed

Data Quality

num_transits_observed - Number of transits in light curve
data_quality_score - Overall data quality (0-1)

Method-Specific Results

Core methods include:

{method}_P_refined - Refined orbital period
{method}_T0 - Transit epoch
{method}_dur - Transit duration
{method}_power - Detection signal strength
{method}_fap - False Alarm Probability
{method}_amplitude - Signal amplitude

Additional dual-model outputs include:

single_* fields and combined_single_moon_score
multi_* fields and combined_multi_moon_score

Combined Scores (Dual Hypotheses)

combined_single_moon_score - Single-moon weighted score (0-1)
combined_multi_moon_score - Multi-moon weighted score (0-1)
combined_exomoon_score - Backward-compatible alias to combined_single_moon_score
single_skew_dist_score / skew_dist_score - consistency with the single-moon skew occupancy hypothesis (high = more consistent)
single_skew_dist_inconsistency_score / skew_dist_inconsistency_score - mismatch strength for that same hypothesis (high = less consistent)

Single-moon score components (scan):

Skew periodicity: 0.22
Skew distribution consistency: 0.06 (single-moon GOF consistency)
TTV periodicity: 0.38
Shoulder: 0.12
Variability: 0.12
Duration periodicity: 0.11
Data quality bonus: 0.05

Multi-moon score components (scan):

Multi-peak skew structure: 0.26
Multi-frequency TTV structure: 0.28
Multi-skew distribution shape: 0.10
Shoulder: 0.10
Variability: 0.08
Duration structure: 0.13
Data quality bonus: 0.05

Transit-count reliability modulation:

Both models apply a reliability factor to timing/periodicity-heavy terms:
- transit_reliability = clip(log10(num_transits_observed + 1) / 2, 0, 1)
This prevents low-transit outliers from saturating skew/TTV scores.

Interpreting Results

High Confidence Candidates

Look for planets with:

combined_single_moon_score > 0.5 or combined_multi_moon_score > 0.5
ttv_fap < 0.01 (1% false alarm probability)
skew_fap < 0.01
num_transits_observed >= 10 (more transits = better)
data_quality_score > 0.5

Example Query

import pandas as pd

df = pd.read_csv('comprehensive_exomoon_results.csv')

# Top candidates per model
top_single = df.nlargest(20, 'combined_single_moon_score')
top_multi = df.nlargest(20, 'combined_multi_moon_score')

# High confidence (low FAP)
high_conf = df[(df['ttv_fap'] < 0.01) | (df['skew_fap'] < 0.01)]
high_conf = high_conf.nlargest(20, 'combined_single_moon_score')

# Good data quality
good_data = df[df['data_quality_score'] > 0.5]
good_data = good_data.nlargest(20, 'combined_multi_moon_score')

# Follow-up-oriented ranking with transit floor
followup = df[df['num_transits_observed'] >= 40].copy()
followup['single_followup_score'] = (
    followup['combined_single_moon_score'] *
    (followup['num_transits_observed'].clip(upper=100) / 100.0)
)
followup['multi_followup_score'] = (
    followup['combined_multi_moon_score'] *
    (followup['num_transits_observed'].clip(upper=100) / 100.0)
)
top_single_followup = followup.nlargest(20, 'single_followup_score')
top_multi_followup = followup.nlargest(20, 'multi_followup_score')

Performance Tips

Use more workers if you have CPU cores available

python3 comprehensive_exomoon_search.py --workers 8

Process in chunks to monitor progress

python3 comprehensive_exomoon_search.py --start 0 --end 100
python3 comprehensive_exomoon_search.py --start 100 --end 200

Use sharding for very large datasets

# Process 8 shards in parallel
for i in {0..7}; do
    python3 comprehensive_exomoon_search.py --shard-id $i --num-shards 8 &
done
wait

Troubleshooting

"No TESS lightcurve found"

Some planets may not have TESS data
Check if TIC ID is correct
Try different mission (Kepler/K2) if available

Memory Issues

Reduce number of workers
Process in smaller chunks (--start/--end)

Slow Processing

Increase workers (if CPU allows)
Some planets take longer (more transits = more processing)

Segmentation fault after "Processing complete"

Rarely, native-library teardown may emit a segfault after all rows are already written.
If the script printed completion and row counts look correct, outputs are usually usable.
Sanity check with:
- Status counts
- expected number of rows
- quick parse of top candidates

Resume Processing

The script automatically resumes - it skips planets already in the output file. To restart completely, delete or rename the output file.

Next Steps After Processing

Review top candidates - Check highest scoring planets
Visual inspection - Plot light curves for promising candidates
Follow-up analysis - Use photodynamical modeling for best candidates
Publication - Report detections with proper statistical significance

Files Created

comprehensive_exomoon_results.csv - Main results file
comprehensive_exomoon_results_shard{N}.csv - Shard-specific results (if using sharding)

Combining Shard Results

If you used sharding, combine results:

import pandas as pd
import glob

shard_files = sorted(glob.glob("comprehensive_exomoon_results_shard*.csv"))
dfs = [pd.read_csv(f) for f in shard_files]
combined = pd.concat(dfs, ignore_index=True)
combined = combined.drop_duplicates(subset=['planet_name'])
combined.to_csv("comprehensive_exomoon_results_combined.csv", index=False)

Or use the existing merge_shards.py script (modify for your file pattern).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comprehensive Exomoon Search - Usage Guide

Quick Start

Command Line Options

Parallel Execution Example

What the Script Does

Improvements Over Original Pipeline

Output Format

Basic Information

Data Quality

Method-Specific Results

Combined Scores (Dual Hypotheses)

Interpreting Results

High Confidence Candidates

Example Query

Performance Tips

Troubleshooting

"No TESS lightcurve found"

Memory Issues

Slow Processing

Segmentation fault after "Processing complete"

Resume Processing

Next Steps After Processing

Files Created

Combining Shard Results

FilesExpand file tree

USAGE_GUIDE.md

Latest commit

History

USAGE_GUIDE.md

File metadata and controls

Comprehensive Exomoon Search - Usage Guide

Quick Start

Command Line Options

Parallel Execution Example

What the Script Does

Improvements Over Original Pipeline

Output Format

Basic Information

Data Quality

Method-Specific Results

Combined Scores (Dual Hypotheses)

Interpreting Results

High Confidence Candidates

Example Query

Performance Tips

Troubleshooting

"No TESS lightcurve found"

Memory Issues

Slow Processing

Segmentation fault after "Processing complete"

Resume Processing

Next Steps After Processing

Files Created

Combining Shard Results