Automatic P-wave first-break picking for active-source (vertical-component) land
seismic, designed to produce physically-consistent traveltimes ready for traveltime
inversion (pyGIMLi .sgt) with no further human editing.
Trained on 33 manually-picked shot gathers and applied to the full 117-gather survey.
The pipeline is classical-first, learned-refined, physically-regularized:
- SEG-Y reader (
pick/segy_io.py) — parses Stryde source gathers, geometry from the dataset's custom trace-header bytes, honours the −100 ms recording delay. - Classical baseline (
pick/picker.py) — STA/LTA + AIC + energy-ratio with a velocity-prior search window. Unbiased but limited to ~3 ms MAD (energy detectors lag the human's first-motion convention and occasionally cycle-skip). - Supervised picker (
pick/ml_pick.py,pick/train_cv.py) — a 1-D U-Net (PhaseNet-style) that regresses an onset-probability bump, learning the human first-break convention from the 33 labelled gathers. Trained on Apple-Silicon GPU (MPS). An ensemble provides a disagreement signal for rejection. - Gating — "fewer but correct" (
pick/gate.py) — a pick is kept only if it clears a confidence floor, the ensemble agrees, and it fits a robust local moveout trend. - Moveout regularization (
pick/moveout.py) — per shot, a confidence-weighted LOESS + isotonic fit turns jittery independent picks into a smooth, monotonic, physically-consistent first-break curve (removes the ±1-sample grid jitter that arises from 1 ms sampling over 0.15 m receiver spacing), dropping off-curve picks. - Export (
pick/dataset.py) — writes a pyGIMLi.sgtreusing the original point indices; physically-invalid picks (t ≤ 0) are removed.
Leave-shots-out cross-validation against the manual picks:
| Metric | Full coverage | Conf-gated (50%) |
|---|---|---|
| Median residual (bias) | +0.16 ms | — |
| MAD | 1.39 ms | 1.05 ms |
| within ±2 ms | 63% | 72% |
Label-free checks on the exported survey:
- Reciprocity (source↔receiver) median |Δt| = 1.68 ms (P90 5 ms).
- Moveout monotonic fraction = 0.96 for the auto picks (vs 0.66 for the raw manual picks) — the auto curves are smoother and more physical than the human's.
See out/fig_traveltimes.png, out/fig_gather_auto.png, out/fig_gather_manual.png.
pick/ pipeline (reader, picker, ML, gates, moveout, export, render, verify, diagnostics)
out/ exported picks (.sgt), QC csv, figures
stacked_segy_groups_of_3/ 117 source-gather SEG-Y files
3D_Picking_1-33.sgt manual first-break picks (training labels, pyGIMLi format)
python -m venv .venv && .venv/bin/pip install -r requirements.txt
.venv/bin/python pick/train_cv.py # cross-validated accuracy
.venv/bin/python pick/run_all.py # train ensemble + pick all 117 -> out/picks_full_survey.sgt
.venv/bin/python pick/verify.py # format + reciprocity + moveout QC
.venv/bin/python pick/render.py # figuresOutput for inversion: out/picks_full_survey.sgt (pyGIMLi TravelTimeManager).