Hunting Hidden Galaxy Collisions with AI

Status: Candidate-generation workflow
Scope: SDSS image preprocessing, embedding-based anomaly ranking, and raw JPG multi-component scanning

Here's our Notebook Demo! to get the overall idea of the pipeline.

Architecture Diagram

graph TD
    A[config.yaml] --> B(Download SDSS Data)
    B --> C(Preprocess Images)
    C --> D(ResNet50 Embeddings)
    D --> E(Isolation Forest Anomaly Ranking)
    E --> F(Image Scanner & Color Filter)
    F --> G(Candidate Stats & Figures)
    
    style A fill:#f9f,stroke:#333
    style F fill:#bfb,stroke:#333
    style G fill:#bfb,stroke:#333

Quick Start

The pipeline is now fully centralized and driven by config.yaml.

# 1. Install dependencies
pip install -r requirements.txt

# 2. Configure paths and parameters (Optional)
# Edit config.yaml in the root directory

# 3. Run the entire pipeline end-to-end
python run_pipeline.py

# Alternatively, run via Makefile
make all

Running Individual Stages

If you want to run specific stages, you can use the Makefile commands or their direct Python equivalents:

Download Data:

make data
# Or: python scripts/download_data.py --n 5000

Preprocess Images:

make preprocess
# Or: python scripts/preprocess_images.py

Generate ResNet50 Embeddings:

make embed
# Or: python scripts/generate_embeddings.py

This stage generates results/intermediate/embeddings/galaxy_embeddings.npy, which is required by Stage 4 and is expected to be regenerated in a fresh clone.

Isolation Forest Anomaly Ranking:

make anomaly
# Or: python scripts/detect_anomalies.py

Image-Plane Scanner & Photometric Filter:

make scan
# Or: python scripts/scan_raw_secondary_sources.py

Compute Pipeline Statistics:

make stats
# Or: python scripts/compute_scan_stats.py

Generate Paper Figures:

make figures
# Or: python scripts/make_paper_figures.py

This script builds the manuscript grid from the highest-ranked available overlays in raw_object_scan.csv and writes a candidate_grid_manifest.csv alongside the figure.

Project Structure

Directory	Purpose
`data/raw/`	Downloaded SDSS FITS/images
`data/processed/`	Resized, normalized images
`data/metadata/`	SDSS catalogs and derived metadata
`scripts/`	Pipeline Python stages
`results/final/`	Final outputs (Filtered catalogs, statistics, and overlay figures)
`results/intermediate/`	Intermediate artifacts (Embeddings, raw anomaly scores)
`results/experimental/`	Scratch space for experimental approaches
`memory/`	Internal pipeline JSON state registries

Configuration

Pipeline paths and tunable thresholds are configured through config.yaml. If you wish to change the sigma_threshold, max_color_diff, or dataset size limits, edit this file before running run_pipeline.py.

Current Constraints

The repo currently supports candidate generation, not discovery confirmation.
External catalog cross-match and literature review are not automated end-to-end here.
SDSS objid entries are already survey-detected objects; absence from a subset of catalogs is not proof of novelty.
Use outputs as ranked follow-up candidates unless independent verification is added.

Output Explanation

At the end of a successful run, check results/final/:

raw_object_scan/raw_object_scan.csv: The official prioritized list of merging candidates.
raw_object_scan/stats.txt: Summary of reduction counts (e.g. 605 candidates -> 190 high-confidence via color filter).
figures/: Annotated overlays plotting the precise location of the secondary collision components.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hunting Hidden Galaxy Collisions with AI

Architecture Diagram

Quick Start

Running Individual Stages

Project Structure

Configuration

Current Constraints

Output Explanation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
data		data
docs		docs
memory		memory
paper		paper
results		results
scripts		scripts
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
config.demo.yaml		config.demo.yaml
config.yaml		config.yaml
demo.ipynb		demo.ipynb
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py

Folders and files

Latest commit

History

Repository files navigation

Hunting Hidden Galaxy Collisions with AI

Architecture Diagram

Quick Start

Running Individual Stages

Project Structure

Configuration

Current Constraints

Output Explanation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages