Skip to content

Latest commit

 

History

History
247 lines (190 loc) · 8.15 KB

File metadata and controls

247 lines (190 loc) · 8.15 KB

Comprehensive Exomoon Search - Usage Guide

Quick Start

# Basic usage - process all planets
python3 comprehensive_exomoon_search.py

# Process with more workers (faster)
python3 comprehensive_exomoon_search.py --workers 8

# Process a subset (e.g., first 100 planets)
python3 comprehensive_exomoon_search.py --start 0 --end 100

# Resume from where you left off (automatically skips processed planets)
python3 comprehensive_exomoon_search.py

Command Line Options

--input FILE          Input CSV file (default: ranked_transiting_planets.csv)
--output FILE         Output CSV file (default: comprehensive_exomoon_results.csv)
--workers N           Number of parallel workers (default: 4)
--start N             Start index for processing (default: 0)
--end N               End index (default: None = all)
--shard-id N          Shard ID for parallel execution
--num-shards N        Total number of shards
--cadence POLICY      Cadence policy (short, long, short_then_any, any)
--cache-dir DIR       Cache directory for preprocessed light curves
--cache               Enable persistent light-curve caching (disabled by default)

Parallel Execution Example

To process in 8 parallel shards:

# Terminal 1
python3 comprehensive_exomoon_search.py --shard-id 0 --num-shards 8 &

# Terminal 2
python3 comprehensive_exomoon_search.py --shard-id 1 --num-shards 8 &

# ... etc for shards 2-7

# Or use the provided script:
bash run_all_shards.sh  # Modify for 8 shards if needed

What the Script Does

The script implements complementary methods over shared transit features:

  1. Skew Detection - Analyzes asymmetry between ingress/egress areas
  2. TTV Detection - Detects Transit Timing Variations
  3. Shoulder Detection - Looks for anomalies in transit shoulders
  4. Variability Detection - Compares variability in transit regions vs baseline
  5. Duration Periodicity - Tracks periodic transit-duration changes

Improvements Over Original Pipeline

Fixed bugs - Proper duration validation, error handling ✅ Dual hypothesis scoring - single-moon and multi-moon interpretations ✅ Statistical significance - FAP (False Alarm Probability) for all detections ✅ Data quality metrics - SNR, completeness, number of transits ✅ Combined scoring - Two weighted scores, plus legacy alias ✅ Robust error handling - Continues processing even if one method fails

Output Format

The output CSV contains:

Basic Information

  • planet_name - Planet identifier
  • tic_id - TIC ID for light curve access
  • P_expected - Expected orbital period
  • Status - OK or Error
  • Reason - Error message if failed

Data Quality

  • num_transits_observed - Number of transits in light curve
  • data_quality_score - Overall data quality (0-1)

Method-Specific Results

Core methods include:

  • {method}_P_refined - Refined orbital period
  • {method}_T0 - Transit epoch
  • {method}_dur - Transit duration
  • {method}_power - Detection signal strength
  • {method}_fap - False Alarm Probability
  • {method}_amplitude - Signal amplitude

Additional dual-model outputs include:

  • single_* fields and combined_single_moon_score
  • multi_* fields and combined_multi_moon_score

Combined Scores (Dual Hypotheses)

  • combined_single_moon_score - Single-moon weighted score (0-1)
  • combined_multi_moon_score - Multi-moon weighted score (0-1)
  • combined_exomoon_score - Backward-compatible alias to combined_single_moon_score
  • single_skew_dist_score / skew_dist_score - consistency with the single-moon skew occupancy hypothesis (high = more consistent)
  • single_skew_dist_inconsistency_score / skew_dist_inconsistency_score - mismatch strength for that same hypothesis (high = less consistent)

Single-moon score components (scan):

  • Skew periodicity: 0.22
  • Skew distribution consistency: 0.06 (single-moon GOF consistency)
  • TTV periodicity: 0.38
  • Shoulder: 0.12
  • Variability: 0.12
  • Duration periodicity: 0.11
  • Data quality bonus: 0.05

Multi-moon score components (scan):

  • Multi-peak skew structure: 0.26
  • Multi-frequency TTV structure: 0.28
  • Multi-skew distribution shape: 0.10
  • Shoulder: 0.10
  • Variability: 0.08
  • Duration structure: 0.13
  • Data quality bonus: 0.05

Transit-count reliability modulation:

  • Both models apply a reliability factor to timing/periodicity-heavy terms:
    • transit_reliability = clip(log10(num_transits_observed + 1) / 2, 0, 1)
  • This prevents low-transit outliers from saturating skew/TTV scores.

Interpreting Results

High Confidence Candidates

Look for planets with:

  • combined_single_moon_score > 0.5 or combined_multi_moon_score > 0.5
  • ttv_fap < 0.01 (1% false alarm probability)
  • skew_fap < 0.01
  • num_transits_observed >= 10 (more transits = better)
  • data_quality_score > 0.5

Example Query

import pandas as pd

df = pd.read_csv('comprehensive_exomoon_results.csv')

# Top candidates per model
top_single = df.nlargest(20, 'combined_single_moon_score')
top_multi = df.nlargest(20, 'combined_multi_moon_score')

# High confidence (low FAP)
high_conf = df[(df['ttv_fap'] < 0.01) | (df['skew_fap'] < 0.01)]
high_conf = high_conf.nlargest(20, 'combined_single_moon_score')

# Good data quality
good_data = df[df['data_quality_score'] > 0.5]
good_data = good_data.nlargest(20, 'combined_multi_moon_score')

# Follow-up-oriented ranking with transit floor
followup = df[df['num_transits_observed'] >= 40].copy()
followup['single_followup_score'] = (
    followup['combined_single_moon_score'] *
    (followup['num_transits_observed'].clip(upper=100) / 100.0)
)
followup['multi_followup_score'] = (
    followup['combined_multi_moon_score'] *
    (followup['num_transits_observed'].clip(upper=100) / 100.0)
)
top_single_followup = followup.nlargest(20, 'single_followup_score')
top_multi_followup = followup.nlargest(20, 'multi_followup_score')

Performance Tips

  1. Use more workers if you have CPU cores available

    python3 comprehensive_exomoon_search.py --workers 8
  2. Process in chunks to monitor progress

    python3 comprehensive_exomoon_search.py --start 0 --end 100
    python3 comprehensive_exomoon_search.py --start 100 --end 200
  3. Use sharding for very large datasets

    # Process 8 shards in parallel
    for i in {0..7}; do
        python3 comprehensive_exomoon_search.py --shard-id $i --num-shards 8 &
    done
    wait

Troubleshooting

"No TESS lightcurve found"

  • Some planets may not have TESS data
  • Check if TIC ID is correct
  • Try different mission (Kepler/K2) if available

Memory Issues

  • Reduce number of workers
  • Process in smaller chunks (--start/--end)

Slow Processing

  • Increase workers (if CPU allows)
  • Some planets take longer (more transits = more processing)

Segmentation fault after "Processing complete"

  • Rarely, native-library teardown may emit a segfault after all rows are already written.
  • If the script printed completion and row counts look correct, outputs are usually usable.
  • Sanity check with:
    • Status counts
    • expected number of rows
    • quick parse of top candidates

Resume Processing

The script automatically resumes - it skips planets already in the output file. To restart completely, delete or rename the output file.

Next Steps After Processing

  1. Review top candidates - Check highest scoring planets
  2. Visual inspection - Plot light curves for promising candidates
  3. Follow-up analysis - Use photodynamical modeling for best candidates
  4. Publication - Report detections with proper statistical significance

Files Created

  • comprehensive_exomoon_results.csv - Main results file
  • comprehensive_exomoon_results_shard{N}.csv - Shard-specific results (if using sharding)

Combining Shard Results

If you used sharding, combine results:

import pandas as pd
import glob

shard_files = sorted(glob.glob("comprehensive_exomoon_results_shard*.csv"))
dfs = [pd.read_csv(f) for f in shard_files]
combined = pd.concat(dfs, ignore_index=True)
combined = combined.drop_duplicates(subset=['planet_name'])
combined.to_csv("comprehensive_exomoon_results_combined.csv", index=False)

Or use the existing merge_shards.py script (modify for your file pattern).