Echoflow

Do you have Terabytes of unprocessed Sonar data? Use this for easy viewing and inference!

EchoFlow is a three-stage containerised pipeline that converts raw Kongsberg EK80 echosounder files into echograms and DINO ViT attention maps.

Stages:

Conversion (raw) — decodes .raw pings to volume-backscattering strength NetCDF files via pyEcholab.
Pre-processing (preprocessing) — contrast-stretches and tiles echograms as PNGs.
Inference (infer) — runs a DINO Vision Transformer to produce per-patch attention heat-maps.

Features

Preprocessing: Prepares the data for inference.
Inference: Utilises DINO for visual inference and attention inspection.
Docker Support: Run the pipeline in an isolated Docker environment for consistency and ease of use.

Repository Structure

.
├── raw_consumer/                # process raw to xarray
│   ├── Dockerfile.raw            # Dockerfile for volumetric backscatter computing with pyEcholab
│   ├── preprocessing.py          # Script for converting raw file to volumetric backscatter cubes
├── preprocessing/               # Preprocessing components
│   ├── Dockerfile.preprocessing  # Dockerfile for preprocessing
│   ├── preprocessing.py          # Script for preprocessing input data
├── inference/                   # Inference components
│   ├── Dockerfile.infer          # Dockerfile for inference
│   ├── attention_inspect.py      # Script for inspecting attention maps
│   ├── inspect_attention.py      # Main script for running inference and inspection
│   ├── requirements.txt          # Python dependencies for inference demo
│   ├── utils.py                  # Utility functions for inference demo
│   ├── vision_transformer.py     # DINO Vision Transformer model implementation
├── docker-compose.yml            # Docker Compose file to run the entire pipeline
├── entrypoint.sh                 # Entrypoint script for Docker container
├── infer.py                      # Main script to run inference outside Docker
├── run_docker.sh                 # Script to run the pipeline using Docker
├── watchdog.py                   # Script to watch for changes in the pipeline

Installation

Prerequisites

Docker ≥ 24
Docker Compose v2 (docker compose — note: no hyphen)
Git
AWS CLI (for downloading the sample input file)

Clone with submodules

git clone --recurse-submodules https://github.com/erlingdevold/EchoFlow.git

If you have already cloned without submodules:

git submodule update --init --recursive

Without Docker

Each stage has its own requirements.txt. Install the dependencies for the stages you need:

pip install -r raw_consumer/requirements.txt
pip install -r preprocessing/requirements.txt
pip install -r inference/requirements.txt

Populate input

The command below fetches a publicly available NOAA EK80 test file (~105 MB) into data/input/ and initialises git submodules before building the Docker images.

aws s3 cp --no-sign-request \
  "s3://noaa-wcsd-pds/data/raw/Bell_M._Shimada/SH2306/EK80/Hake-D20230811-T165727.raw" \
  data/input/

touch ./inference/checkpoint.pth

git submodule sync --recursive

Setup

Running with Docker

Run the full pipeline

docker compose up --build

This builds and starts all three stages (Conversion, Pre-processing, Inference) plus the progress monitor.

Run individual stages

You can also run each stage independently:

Stage	Command
Stage 1 — Conversion	`docker compose up --build raw`
Stage 2 — Pre-processing	`docker compose up --build preprocessing`
Stage 3 — Inference	`docker compose up --build infer`
All stages (no monitor)	`docker compose up --build raw preprocessing infer`

Each stage reads from and writes to bind-mounted directories under ./data/, so stages can be run in sequence without rebuilding upstream containers.

Monitor dashboard

Once the pipeline is running, a progress monitor is available at http://localhost:8050.

The dashboard is a pipeline progress monitor — it tracks file counts and tail-logs for each stage. It is not an inference viewer. Actual outputs are written to:

data/raw_consumer/ — converted NetCDF files (Stage 1 output)
data/preprocessing/ — echogram PNGs (Stage 2 output)
data/inference/ — attention map PNGs (Stage 3 output)

Performance / parallelism

EchoFlow uses process pools within each stage to parallelise work across .raw files. Because .raw files are large XML datagrams, file I/O is the primary bottleneck; adding CPU cores within a node yields proportional throughput gains. This is the extent of the "HPC" scaling — parallelism is intra-node only and there is no built-in cluster or scheduler integration.

ENV variables

watchdog.py:
- LOG_DIR (default: "/data/log")
- INPUT_DIR (default: "/data/sonar")
- OUTPUT_DIR (default: "/data/processed")
inspect_attention.py:
- INPUT_DIR (default: "/data/test_imgs")
- OUTPUT_DIR (default: "/data/inference")
- LOG_DIR (default: ".")
- PATCH_SZ (default: 8)
- ARCH (default: 'vit_small')
- DOWNSAMPLE_SIZE (default: 5000)
preprocessing.py:
- INPUT_DIR (default: "/data/processed")
- OUTPUT_DIR (default: "/data/test_imgs")
- LOG_DIR (default: ".")
raw.py:
- INPUT_DIR (default: "/data/sonar")
- OUTPUT_DIR (default: "/data/processed")
- LOG_DIR (default: "log")

Output

The output of the inference step, including generated attention maps and transformed images, will be saved in data/inference/. Each run creates a subdirectory named after the input file for organised output management.

Contributing

See CONTRIBUTING.md for guidelines on reporting bugs, suggesting features, and submitting pull requests.

License

Licensed under the MIT License — see LICENSE for details.

Acknowledgements

This pipeline uses the DINO Vision Transformer for attention-based image analysis. The implementation is based on research from the original DINO paper by Facebook AI Research (FAIR).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Echoflow

Features

Repository Structure

Installation

Prerequisites

Clone with submodules

Without Docker

Populate input

Setup

Running with Docker

Run the full pipeline

Run individual stages

Monitor dashboard

Performance / parallelism

ENV variables

Output

Contributing

License

Acknowledgements

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
inference		inference
monitor		monitor
preprocessing		preprocessing
raw_consumer		raw_consumer
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
infer.py		infer.py
paper.bib		paper.bib
paper.md		paper.md
run_docker.sh		run_docker.sh
watchdog.py		watchdog.py

Folders and files

Latest commit

History

Repository files navigation

Echoflow

Features

Repository Structure

Installation

Prerequisites

Clone with submodules

Without Docker

Populate input

Setup

Running with Docker

Run the full pipeline

Run individual stages

Monitor dashboard

Performance / parallelism

ENV variables

Output

Contributing

License

Acknowledgements

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages