Supervised machine learning pipeline for classifying subsurface geological units from 3D seismic reflection data. Trains a CNN and a Random Forest on the same dataset, then compares them on held-out test sections.
Dataset: F3 Netherlands Seismic Benchmark — Alaudah et al. (2019)
Task: 5-class seismic facies classification
Best result: CNN macro F1 = 0.955 (validation), 0.819 (test)
| Metric | CNN | Random Forest |
|---|---|---|
| Accuracy (val) | 96.4% | 91.1% |
| Macro F1 (val) | 0.955 | 0.883 |
| Macro F1 (test) | 0.819 | 0.631 |
Per-class F1 on held-out test section:
| Facies class | CNN | RF | Gap |
|---|---|---|---|
| Upper North Sea | 0.90 | 0.76 | +0.14 |
| Middle North Sea | 0.91 | 0.84 | +0.07 |
| Lower North Sea | 0.74 | 0.57 | +0.16 |
| Rijnland / Chalk | 0.73 | 0.40 | +0.33 |
| Scruff | 0.65 | 0.39 | +0.25 |
The CNN's advantage is largest for structurally defined facies — Rijnland/Chalk (+0.33) and Scruff (+0.25) — where spatial pattern recognition provides information that aggregate statistics cannot.
Left to right: raw seismic amplitude, expert ground truth, CNN prediction, Random Forest prediction. The CNN produces spatially coherent facies maps close to the ground truth. The RF breaks down in structurally complex zones, producing fragmented predictions where the geology is most challenging.
This project uses the F3 Netherlands Seismic Benchmark prepared by:
Alaudah, Y., Michałowicz, P., Alfarraj, M., & Alregib, G. (2019). A machine-learning benchmark for facies classification. Interpretation, 7(3), SE175–SE187.
Available at: https://github.com/yalaudah/facies_classification_benchmark
The benchmark provides seismic amplitudes and facies labels as clean NumPy arrays, split into training and test sections. The F3 Netherlands survey covers the Dutch North Sea sector and uses the same six-class labelling scheme as the Parihaka benchmark, making results comparable to published work.
Why not raw Parihaka SEGY? Parsing the raw SEG-Y binary format — extracting trace headers, reconstructing 3D geometry, aligning with label files — is substantial domain-specific engineering. This benchmark provides verified, peer-reviewed data in a format that lets the project focus on the ML pipeline rather than format parsing. Using a citable benchmark also makes results directly comparable to published baselines.
| Class | Name | Geological character |
|---|---|---|
| 1 | Upper North Sea | Young marine sediments; smooth, continuous reflectors |
| 2 | Middle North Sea | Intermediate marine clays/sands; moderate amplitude |
| 3 | Lower North Sea | Deeper sands; high amplitude, parallel layering |
| 4 | Rijnland / Chalk | Carbonate group; bright top reflection, dim below |
| 5 | Scruff | Unconformity zone; chaotic, disrupted reflectors |
Class 0 (Unknown) and Class 6 (Zechstein) are excluded from training — class 0 is unlabelled boundary voxels, class 6 is absent from the training volume.
seismic-facies-classification/
├── data/
│ ├── raw/
│ │ └── facies_classification_benchmark/ ← benchmark data (not tracked)
│ └── processed/ ← extracted patches and features (not tracked)
├── notebooks/
│ ├── 01_eda.ipynb ← data loading and visualisation
│ ├── 02_preprocessing.ipynb ← patch extraction and class balancing
│ ├── 03_cnn.ipynb ← CNN training and evaluation
│ ├── 04_random_forest.ipynb ← Random Forest training and evaluation
│ └── 05_comparison.ipynb ← side-by-side results and facies maps
├── outputs/
│ ├── figures/ ← all saved plots
│ └── models/ ← saved model weights (not tracked)
├── requirements.txt
└── README.md
Each labelled voxel in the training volume becomes one training sample. A 33×33 patch is extracted from the inline slice centred on that voxel, giving the model 16 samples of spatial context in every direction. The time dimension is padded with reflection padding to handle voxels near the volume boundary.
Training data is sampled with fixed per-class targets to address the severe class imbalance in the raw volume (Middle North Sea = 48.6%, Scruff = 1.5%). Final training set: 45,000 patches across 5 classes. Class weights are applied during CNN training to further penalise misclassification of rare classes.
Two convolutional blocks (32 and 64 filters, kernel size 3×3), each followed by batch normalisation, ReLU activation, 2×2 max pooling, and dropout (0.25). A dense head with 128 units and 0.4 dropout feeds a 5-class softmax output. Trained with Adam (lr=0.001), sparse categorical crossentropy loss, and early stopping on validation loss. Best weights restored from epoch 14 of 19.
200 decision trees with max depth 20, trained on 18 handcrafted features per patch: amplitude statistics (mean, std, min, max, range, median), RMS energy, skewness, kurtosis, zero-crossing rate, horizontal/vertical gradient energy, quadrant means, and centre-surround contrast. Features are standardised before training. Class weights set to balanced.
The comparison is the scientific contribution. The CNN sees raw spatial patterns — it learns what a Rijnland/Chalk reflector looks like geometrically. The RF sees 18 summary statistics — it learns that Rijnland/Chalk has moderate mean amplitude and high RMS energy. The performance gap quantifies how much spatial pattern recognition matters for each facies class. For Upper and Middle North Sea the gap is small (7–14%); for Rijnland/Chalk and Scruff it is large (25–33%), because those classes are defined by spatial geometry rather than bulk amplitude properties.
cd data/raw
git clone https://github.com/yalaudah/facies_classification_benchmarkconda create -n seismic-facies python=3.10 -y
conda activate seismic-facies
pip install -r requirements.txt
python -m ipykernel install --user --name seismic-facies --display-name "Seismic Facies"01_eda.ipynb ← explore and visualise the data
02_preprocessing.ipynb ← extract patches (takes ~10 minutes)
03_cnn.ipynb ← train CNN (takes ~30–60 minutes on CPU)
04_random_forest.ipynb ← train RF (takes ~5 minutes)
05_comparison.ipynb ← generate facies maps and comparison figures
All training was done on CPU. CNN training took approximately 2–3 minutes per epoch (19 epochs total). If you have a CUDA-enabled GPU, replace tensorflow in requirements.txt with tensorflow[and-cuda] and expect 5–10× speedup.
| Figure | Description |
|---|---|
outputs/figures/facies_map_test1.png |
Main comparison: seismic, ground truth, CNN, RF |
outputs/figures/cnn_vs_rf_per_class_f1.png |
Per-class F1 bar chart |
outputs/figures/cnn_training_history.png |
Loss and accuracy curves |
outputs/figures/cnn_confusion_matrix.png |
CNN normalised confusion matrix |
outputs/figures/rf_confusion_matrix.png |
RF normalised confusion matrix |
outputs/figures/rf_feature_importance.png |
Which features matter most for the RF |
Alaudah, Y., Michałowicz, P., Alfarraj, M., & Alregib, G. (2019). A machine-learning benchmark for facies classification. Interpretation, 7(3), SE175–SE187. https://doi.org/10.1190/INT-2018-0249.1
Author: Olasunkanmi
Task: Seismic facies classification — CNN vs Random Forest
Dataset: F3 Netherlands Seismic Benchmark (Alaudah et al., 2019)
