[FEATURE] Add NOAA passive acoustic data source

## Problem Statement

whalu currently supports MBARI Pacific Sound (California) and Orcasound (Puget Sound). Adding NOAA's passive acoustic archive would expand coverage to 12 US ocean regions, including the Atlantic, Gulf of Mexico, Alaska, Hawaii, and National Marine Sanctuaries — unlocking multi-year, multi-site whale detection at national scale.

## Proposed Solution

Implement `whalu/data/noaa.py` as a new data source module, analogous to `mbari.py`, backed by NOAA's public GCS bucket.

**Bucket:** `gs://noaa-passive-bioacoustic` (public, no auth required)

**Two highest-priority sub-datasets:**

### 1. NRS (Ocean Noise Reference Station Network)
12 fixed moorings, 2014-present, continuous long-term monitoring.

| Field | Value |
|-------|-------|
| Path | `nrs/audio/{station_id}/{deployment}/audio/` |
| File format | FLAC, ~4h recordings |
| Sample rate | 5 kHz (optimised for 20 Hz-2 kHz low-frequency whales) |
| Naming | `NRS01_20141014_234015.flac` |
| Stations | NRS01 (Bering Sea), NRS02 (Gulf of Alaska), NRS03 (Olympic Coast), NRS04 (Hawaii), NRS05 (Channel Islands), NRS06 (Gulf of Mexico), NRS07-08 (Atlantic), NRS09 (Stellwagen Bank, right whales), NRS10 (American Samoa), NRS11 (Cordell Bank), NRS12 (US Virgin Islands) |

### 2. SanctSound (National Marine Sanctuaries)
30 sites across 8 sanctuaries, 2018-2021, higher sample rates.

| Field | Value |
|-------|-------|
| Path | `sanctsound/audio/{site}/{deployment}/audio/` |
| File format | FLAC, 15-30 min recordings |
| Sample rate | 48-96 kHz (SoundTrap instruments) |
| Naming | `SanctSound_MB01_01_671399971_20181115T000002Z.flac` |
| Sites | mb=Monterey Bay, hi=Hawaiian Islands, sb=Stellwagen Bank, ci=Channel Islands, fk=Florida Keys, oc=Olympic Coast, gr=Gray's Reef, pm=Papahanaumokuakea |

## Implementation Ideas

### Data access (GCS, not AWS S3)
```python
# google-cloud-storage with anonymous credentials
from google.cloud import storage
client = storage.Client.create_anonymous_client()
bucket = client.bucket("noaa-passive-bioacoustic")
```

New dependency: `google-cloud-storage` (to add to `pyproject.toml`).

### `whalu/data/noaa.py` — key functions
```python
def list_deployments(program: str, site: str) -> list[str]
    # e.g. list_deployments("nrs", "01") -> ["nrs_01_2014-2015", ...]

def list_files(program: str, site: str, deployment: str) -> list[str]
    # returns sorted GCS blob names for all FLAC files

def download_audio(blob_name: str, target_sr: int, limit_s: float | None) -> tuple[np.ndarray, float]
    # downloads FLAC to tempfile, loads with librosa (handles FLAC natively)

def stream_chunks(blob_name: str, target_sr: int, chunk_s: float = 3600.0) -> Iterator[...]
    # for long recordings (NRS ~4h files), stream in chunks
```

### Sample rate considerations
- NRS at 5 kHz: the Perch `multispecies_whale` model expects 24 kHz input — librosa resampling handles this but 5 kHz recordings only carry energy up to 2.5 kHz (Nyquist), so detection sensitivity for higher-frequency calls (e.g. orca clicks) will be reduced. Low-frequency species (blue, fin, humpback) should be unaffected.
- SanctSound at 48-96 kHz: downsampling to 24 kHz is straightforward and lossless for the model's frequency range.

### Timestamp parsing
NRS files use `NRS01_YYYYMMDD_HHMMSS.flac` — a different naming scheme from MBARI's `MARS-YYYYMMDDTHHMMSSZ-16kHz.wav`. `add_timestamps()` in `analysis.py` will need to handle this pattern (or source names should be normalised at ingest time).

### CLI additions
```bash
# List available NRS deployments
uv run whalu info noaa-nrs

# Scan a specific NRS station
uv run whalu scan noaa --program nrs --site 05 --start 2023-01 --output-dir data/detections/noaa

# Scan SanctSound Monterey Bay
uv run whalu scan noaa --program sanctsound --site mb01 --output-dir data/detections/noaa
```

### `sources.py`
Add `NOAA_NRS` and `NOAA_SANCTSOUND` entries to `SOURCE_REGISTRY`.

## Use Cases

- Multi-region comparison: Pacific coast vs Atlantic vs Hawaii species distributions
- Stellwagen Bank NRS09 for North Atlantic right whale (critically endangered) detection
- Long time series (2014-present NRS) for seasonal and inter-annual trends
- SanctSound labeled detection data (available via ERDDAP) as ground truth for model validation

## Component Impact

- [x] Core functionality (`whalu/data/noaa.py`, `whalu/sources.py`)
- [x] CLI (`whalu/cli/scan.py` — new `scan noaa` subcommand)
- [x] Documentation
- [ ] API
- [ ] Docker/Infrastructure

## Additional Context

NOAA also exposes species presence/absence detections (no audio processing needed) via ERDDAP for SanctSound sites:
```
https://coastwatch.pfeg.noaa.gov/erddap/griddap/noaaSanctSound_MB01_01_bluewhale_1d
```
This could be a fast path to validated ground-truth data for benchmarking the Perch model against human annotators.

Metadata JSON per deployment is available at:
`gs://noaa-passive-bioacoustic/{program}/audio/{site}/{deployment}/metadata/*.json`

## Priority

- [x] Important for my use case

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add NOAA passive acoustic data source #5

Problem Statement

Proposed Solution

1. NRS (Ocean Noise Reference Station Network)

2. SanctSound (National Marine Sanctuaries)

Implementation Ideas

Data access (GCS, not AWS S3)

`whalu/data/noaa.py` — key functions

Sample rate considerations

Timestamp parsing

CLI additions

`sources.py`

Use Cases

Component Impact

Additional Context

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Field	Value
Path	`nrs/audio/{station_id}/{deployment}/audio/`
File format	FLAC, ~4h recordings
Sample rate	5 kHz (optimised for 20 Hz-2 kHz low-frequency whales)
Naming	`NRS01_20141014_234015.flac`
Stations	NRS01 (Bering Sea), NRS02 (Gulf of Alaska), NRS03 (Olympic Coast), NRS04 (Hawaii), NRS05 (Channel Islands), NRS06 (Gulf of Mexico), NRS07-08 (Atlantic), NRS09 (Stellwagen Bank, right whales), NRS10 (American Samoa), NRS11 (Cordell Bank), NRS12 (US Virgin Islands)

Field	Value
Path	`sanctsound/audio/{site}/{deployment}/audio/`
File format	FLAC, 15-30 min recordings
Sample rate	48-96 kHz (SoundTrap instruments)
Naming	`SanctSound_MB01_01_671399971_20181115T000002Z.flac`
Sites	mb=Monterey Bay, hi=Hawaiian Islands, sb=Stellwagen Bank, ci=Channel Islands, fk=Florida Keys, oc=Olympic Coast, gr=Gray's Reef, pm=Papahanaumokuakea

[FEATURE] Add NOAA passive acoustic data source #5

Description

Problem Statement

Proposed Solution

1. NRS (Ocean Noise Reference Station Network)

2. SanctSound (National Marine Sanctuaries)

Implementation Ideas

Data access (GCS, not AWS S3)

whalu/data/noaa.py — key functions

Sample rate considerations

Timestamp parsing

CLI additions

sources.py

Use Cases

Component Impact

Additional Context

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`whalu/data/noaa.py` — key functions

`sources.py`