Skip to content

amrit110/whalu

whalu logo

PyPI Python code checks unit tests codecov License

Marine bioacoustics detection pipeline. Slides a 5-second window over continuous hydrophone recordings, runs the Google Perch multispecies whale model, and stores detections as Parquet files for analysis.

Live demo: real MBARI hydrophone recording from Monterey Canyon with blue whale, fin whale, and humpback detections.

Install

pip install whalu

Or with uv:

uv add whalu

CLI

whalu [-v] <command> ...
Command Description
whalu scan mbari Run detection over MBARI Pacific Sound (S3, no auth required)
whalu scan orcasound Run detection over Orcasound labeled samples (S3, no auth required)
whalu scan noaa Run detection over NOAA NRS or SanctSound data (GCS, no auth required)
whalu analyze Summarize and visualize stored detections
whalu info [source] Show sensor/dataset metadata for a source

whalu scan mbari

Flag Default Description
--start YYYY-MM required First year-month to process
--end YYYY-MM same as --start Last year-month (inclusive)
--max-files N all Stop after N files
--limit-hours N full file Only process first N hours per file
--output-dir PATH data/detections/mbari Where to write Parquet files

whalu scan orcasound

Flag Default Description
--key S3_KEY labeled killer whale sample Specific S3 key to process
--output-dir PATH data/detections/orcasound Where to write Parquet files

whalu scan noaa

Flag Default Description
--program nrs|sanctsound required Dataset: nrs (12 stations, 5 kHz) or sanctsound (30 sites, 48 kHz)
--site SITE required Station/site ID, e.g. 01 (NRS) or mb01 (SanctSound)
--deployment NAME all deployments Specific deployment to process
--max-files N all Stop after N files per deployment
--limit-s N full file Only process first N seconds per file
--output-dir PATH data/detections/noaa Where to write Parquet files

whalu analyze

Flag Default Description
--input-dir PATH required Directory of detection Parquet files
--top-n N 5 Number of top species shown in heatmap

whalu info

Flag Default Description
source (all) Source ID to inspect (mbari, orcasound, noaa-nrs, noaa-sanctsound)

Examples

# Single file, first hour only (quick test)
whalu scan mbari --start 2026-03 --max-files 1 --limit-hours 1

# Full month
whalu scan mbari --start 2026-03 --output-dir data/detections/mbari

# Multi-month date range (blue whale season)
whalu scan mbari --start 2023-07 --end 2023-10

# Orcasound validation sample
whalu scan orcasound

# NOAA NRS quick test (site 01, first file, first 10 minutes)
whalu scan noaa --program nrs --site 01 --max-files 1 --limit-s 600

# NOAA SanctSound Monterey Bay, all deployments
whalu scan noaa --program sanctsound --site mb01

# Analyze stored detections
whalu analyze --input-dir data/detections/mbari

# Show source metadata
whalu info mbari
whalu info

Supported data sources

Source Location Coverage Format
MBARI Pacific Sound Monterey Canyon, CA 2015-present 16 kHz 24-bit WAV, 1 file/day (4.1 GB)
Orcasound Puget Sound, WA 2017-present 20 kHz 16-bit WAV
NOAA NRS 12 US ocean sites 2014-present 5 kHz 16-bit FLAC, ~4h per file (GCS)
NOAA SanctSound 8 US marine sanctuaries 2018-2021 48 kHz 24-bit FLAC, 15-30 min per file (GCS)

How it works

  • MBARI: streams 4 GB daily WAV files via S3 range requests in 1-hour chunks (~172 MB each), bounded RAM
  • NOAA: downloads full FLAC files from GCS (FLAC does not support range requests), splits in memory
  • Applies sigmoid activation (correct for the multi-label whale model, not softmax)
  • Emits detections only where confidence >= 0.5
  • Stores one Parquet file per audio source, runs are resumable

Detection model

Google multispecies_whale via perch-hoplite. 12 classes: blue whale (Bm), fin whale (Bp), humpback (Mn), minke (Ba), Bryde's (Be), sei (Bs), right whale (Eg), orca (Oo), and call types (Upcall, Gunshot, Call, Echolocation, Whistle).

Packages

 
 
 

Contributors

Languages