Hybrid Multi-Scale Feature-Fusion CNN with Ridge Stacking for Shear-Wave Velocity Prediction from Conventional Well Logs
A reproducible MATLAB pipeline for shear-wave velocity (
This repository accompanies the paper:
Wibowo, R.C., Handoyo, Kumalasari, I.N., Rambe, S.A., Winardhy, I.S., Amijaya, H., Normansyah, Sarkowi, M. (2026). Hybrid Multi-Scale Feature-Fusion Convolutional Network with Ridge Stacking for Shear-Wave Velocity Prediction from Conventional Well Logs: An Out-of-Distribution-Aware Framework. Artificial Intelligence in Geosciences. DOI: [pending].
- Overview
- Key results
- How to cite
- Repository structure
- Requirements
- Installation
- Quick start
- Configuration
- Reproducing paper results
- Data availability
- Output structure
- Documentation
- Testing
- License
- Contact
- Acknowledgments
Reliable shear-wave velocity logging supports reservoir characterization, AVO inversion, and geomechanical caprock assessment — essential workflows for CCS/CCUS prospect evaluation in mature hydrocarbon basins. However, dipole sonic measurement is expensive and absent from most legacy onshore wells. This pipeline closes that gap through machine learning, while embedding rigorous uncertainty quantification and OOD safeguards that make cross-formation deployment auditable.
The pipeline implements:
- Data ingestion & preprocessing — LAS / Excel reader, IQR + Z-score outlier detection, kNN imputation, Savitzky-Golay denoising, z-score normalization, depth resampling
- Feature selection — mRMR (with target binning for regression), LASSO with 10-fold CV, SHAP attribution, and four-scenario ablation
- Base learners — Probabilistic Neural Network (PNN), Multi-Layer Feedforward Neural Network (MLFFNN), Deep Feedforward Neural Network (DFFNN), 1D Convolutional Neural Network (CNN1D)
- Meta-learners — I-CNN stacker (baseline), Ridge stacker (deployment model), Hybrid Multi-Scale Feature-Fusion CNN (architectural novelty)
- Hyperparameter optimization — Bayesian Optimization with Gaussian Process surrogate, 5-fold cross-validation, validation-loss early stopping
- Safety layer — Monte Carlo Dropout uncertainty quantification (T = 200), OOD detection (|z| > 3), physical clipping ([0.2, 4.5] km/s)
- Deployment & indirect validation — Vp–Vs crossplot vs Castagna & Greenberg-Castagna, geomechanical post-processing (ν, G, K)
Applied to a well pair in the North East Java Basin (Indonesia) — training well BT-4 (Late Oligocene Kujung carbonate, 1415–1993 m) and blind well BTS-1 (Pliocene–Pleistocene clastic overburden, 35–629 m):
| Model |
|
Domain | Note |
|---|---|---|---|
| Ridge stacker | 0.886 | z-score | ⭐ Deployment model |
| PNN | 0.879 | z-score | Best base learner |
| MLFFNN | 0.869 | z-score | |
| DFFNN | 0.816 | z-score | |
| I-CNN stacker | 0.726 | z-score | Baseline deep stacking |
| Hybrid I-CNN | 0.544 | z-score | Proposed novelty (negative result) |
| CNN1D | −0.066 | z-score | Catastrophic on small data |
Three principal findings:
-
Linear stacking beats deep meta-learning on small tabular data — Ridge stacker (
$R^2 = 0.886$ ) outperforms both deep meta-learner variants (I-CNN$R^2 = 0.726$ ; Hybrid I-CNN$R^2 = 0.544$ ). Consistent with Grinsztajn et al., 2022. -
mRMR-only outperforms mRMR ∩ LASSO intersection for feature selection (
$R^2 = 0.739$ vs$0.727$ in ablation). The intersection rule discards gamma-ray, which carries lithology-discriminative information not subsumed by other logs. -
Statistical OOD detection is necessary but not sufficient for cross-formation transfer. 52.8% of BTS-1 samples flagged as OOD; even within non-OOD samples, predicted
$V_s$ shows +44% systematic upward bias vs Castagna baseline due to rock-physics regime mismatch (carbonate-trained model applied to clastics). Identifies a structural failure mode of conventional OOD safeguards.
If you use this code in your research, please cite both the paper and the software:
@article{Wibowo2026Vs,
author = {Wibowo, Rahmat Catur and Handoyo and Kumalasari, Isti Nur and
Rambe, Sibgha Alfirdausi and Winardhy, Ignatius Sonny and Amijaya,
Hendra and Normansyah and Sarkowi, Muh},
title = {Hybrid Multi-Scale Feature-Fusion Convolutional Network with
Ridge Stacking for Shear-Wave Velocity Prediction from
Conventional Well Logs: An Out-of-Distribution-Aware Framework},
journal = {Artificial Intelligence in Geosciences},
year = {2026},
volume = {?},
pages = {???--???},
doi = {10.xxxx/xxxxx},
publisher = {KeAi / Elsevier}
}@software{VsPredictionStacking2026,
author = {Wibowo, Rahmat Catur and Handoyo and Kumalasari, Isti Nur and
Rambe, Sibgha Alfirdausi and Winardhy, Ignatius Sonny and
Amijaya, Hendra and Normansyah and Sarkowi, Muh},
title = {{VsPrediction-Stacking: MATLAB pipeline for Vs prediction with
stacking ensemble and OOD safeguards}},
year = {2026},
publisher = {GitHub},
url = {https://github.com/rcw3712/VsPrediction-Stacking},
version = {1.0.0}
}A CITATION.cff file is included for GitHub's automatic citation widget.
VsPrediction-Stacking/
│
├── README.md ← this file
├── LICENSE ← MIT license
├── CITATION.cff ← citation metadata (GitHub auto-citation)
├── .gitignore ← MATLAB-aware ignore rules
│
├── main_pipeline.m ← end-to-end driver script
│
├── config/
│ └── default_config.m ← all hyperparameters + paths + flags
│
├── data/
│ ├── README.md ← data confidentiality note
│ ├── BT-4_template.csv ← column structure template (empty)
│ ├── BTS-1_template.csv ← column structure template (empty)
│ └── generate_synthetic_wells.m ← synthetic data with BT-4/BTS-1-like statistics
│
├── src/
│ ├── preprocessing/ ← Section 2.2 of paper
│ │ ├── preprocessLogs.m
│ │ ├── detectOutliers.m
│ │ ├── imputeMissing.m
│ │ ├── denoiseSavGol.m
│ │ ├── normalizeZScore.m
│ │ └── resampleDepth.m
│ │
│ ├── features/ ← Section 2.3
│ │ ├── computeMRMR.m
│ │ ├── runLASSO.m
│ │ ├── computeSHAP.m
│ │ └── selectFeatures.m
│ │
│ ├── models/ ← Sections 2.4 & 2.5
│ │ ├── trainPNN.m ← Probabilistic Neural Network
│ │ ├── trainMLFFNN.m ← Multi-Layer FFNN (2 hidden layers)
│ │ ├── trainDFFNN.m ← Deep FFNN (3 hidden layers)
│ │ ├── trainCNN1D.m ← 1D Convolutional Neural Network
│ │ ├── trainICNN.m ← I-CNN stacker meta-learner
│ │ ├── trainRidgeStacker.m ← ★ Ridge stacker (deployment)
│ │ └── trainHybridICNN.m ← Hybrid Multi-Scale Feature-Fusion CNN
│ │
│ ├── evaluation/ ← Section 2.8
│ │ ├── computeMetrics.m ← R², RMSE, MAE, MAPE
│ │ ├── crossPlot.m
│ │ ├── residualAnalysis.m
│ │ ├── learningCurves.m
│ │ └── taylorDiagram.m
│ │
│ ├── uncertainty/ ← Section 2.7
│ │ ├── mcDropoutPredict.m ← T = 200 stochastic passes
│ │ ├── detectOOD.m ← |z| > 3 criterion
│ │ ├── physicalClipping.m ← [0.2, 4.5] km/s envelope
│ │ └── computeGeomechanics.m ← ν, G, K from Vs
│ │
│ ├── ablation/ ← Section 3.3
│ │ └── runAblation.m ← S1 / S2 / S3 / S4 scenarios
│ │
│ └── deployment/ ← Section 3.5
│ ├── deployToBT1.m ← inference + UQ + OOD on blind well
│ ├── castagnaCheck.m ← Vp–Vs empirical overlay
│ └── plotDeployment.m ← multitrack figure
│
├── docs/
│ ├── METHODOLOGY.md ← detailed equations + design choices
│ ├── HYPERPARAMETERS.md ← full Bayesian search ranges
│ └── figures/
│ ├── Fig1_workflow.png ← pipeline workflow (paper Fig 1)
│ └── Fig2_architecture.png ← Hybrid I-CNN architecture (paper Fig 2)
│
├── results/ ← generated by main_pipeline
│ ├── .gitkeep
│ ├── preprocessing/
│ ├── feature_selection/
│ ├── base_models/
│ ├── meta_models/
│ ├── uncertainty/
│ ├── ablation/
│ ├── deployment/
│ ├── figures/ ← all PNG/PDF outputs
│ ├── tables/ ← XLSX summary tables
│ └── logs/ ← timestamped run logs
│
└── tests/
├── runTests.m ← top-level test runner
├── testPreprocessing.m
├── testModels.m
├── testUncertainty.m
└── testDeployment.m
- MATLAB R2024b or later (older releases untested; some neural-net syntax requires R2024a+)
- A standards-compliant
gitclient for cloning
The following toolboxes are required:
| Toolbox | Used for |
|---|---|
| Statistics and Machine Learning Toolbox | mRMR, LASSO, k-NN imputation, cross-validation |
| Deep Learning Toolbox | CNN1D, I-CNN, Hybrid I-CNN, MC-Dropout |
| Optimization Toolbox | Bayesian Optimization (bayesopt) |
| Signal Processing Toolbox | Savitzky-Golay filter, depth resampling |
| Parallel Computing Toolbox (optional) | Parallel hyperparameter search and ablation |
To verify available toolboxes in MATLAB:
verTested on:
- ✅ macOS 14 (Sonoma) — Apple Silicon
- ✅ Windows 11 — x86_64
- ✅ Ubuntu 22.04 LTS
GPU is optional; pipeline runs on CPU in approximately 45 minutes for the full BT-4 dataset (3530 samples, 5 features).
git clone https://github.com/rcw3712/VsPrediction-Stacking.git
cd VsPrediction-StackingIn MATLAB:
cd /path/to/VsPrediction-Stacking
addpath(genpath('src'));
addpath('config');
savepath(); % optional, to persist the pathOr add this to your startup.m:
addpath(genpath(fullfile(userpath, 'VsPrediction-Stacking', 'src')));runTests % runs all unit tests; expect "All tests passed."The pipeline is driven by a single configuration struct. The minimum working example:
% 1. Load default configuration
cfg = default_config();
% 2. Point to your well-log files
cfg.training.dataPath = 'data/your_training_well.csv';
cfg.deployment.dataPath = 'data/your_blind_well.csv';
% 3. Run the full pipeline (preprocessing → models → deployment)
results = main_pipeline(cfg);
% 4. Inspect key results
disp(results.meta.ridge.metrics)
% R²: 0.886
% RMSE: 0.342 (z-score) | 0.178 (km/s)
% MAE: 0.199 (z-score) | 0.104 (km/s)
% MAPE: 115
% 5. Deployment-well predictions
fprintf('OOD-flagged samples: %.1f%%\n', results.deployment.ood_pct);
fprintf('Mean predicted Vs (non-OOD): %.2f km/s\n', ...
results.deployment.mean_vs_nonOOD);After the run finishes, all outputs are saved under results/ (see Output structure).
All settings live in config/default_config.m. Key fields:
function cfg = default_config()
% ─── Data ──────────────────────────────────────────────
cfg.training.dataPath = 'data/BT-4_logs.csv';
cfg.deployment.dataPath = 'data/BTS-1_logs.csv';
cfg.targetColumn = 'Vs';
cfg.featureColumns = {'GR', 'RHOB', 'NPHI', 'PHIE', 'VP'};
% ─── Preprocessing ────────────────────────────────────
cfg.preproc.outlierIQR = 1.5;
cfg.preproc.outlierZ = 3.0;
cfg.preproc.imputerK = 5; % kNN imputation
cfg.preproc.savgolOrder = 2;
cfg.preproc.savgolWindow = 11;
cfg.preproc.resampleDepth = 0.15; % meters
% ─── Feature selection ───────────────────────────────
cfg.fs.combineRule = 'mrmr_only'; % 'intersect' | 'mrmr_only' | 'lasso_only' | 'none'
% ─── Train/test split ────────────────────────────────
cfg.split.testFraction = 0.20;
cfg.split.kfold = 5;
cfg.split.fixedSeed = 7;
% ─── Base learners ──────────────────────────────────
cfg.base.PNN.spreadRange = [0.05, 1.5];
cfg.base.MLFFNN.hiddenRange = {[16 128], [16 128]};
cfg.base.DFFNN.hiddenRange = {[8 64], [8 64], [8 64]};
cfg.base.CNN1D.windowSize = 16;
cfg.base.CNN1D.fixedSeed = 7;
% ─── Meta-learners ──────────────────────────────────
cfg.meta.runRidge = true;
cfg.meta.runICNN = true;
cfg.meta.runHybridICNN = true;
cfg.meta.deploymentChoice = 'ridge'; % 'ridge' | 'icnn' | 'hybrid'
cfg.meta.ridge.lambdaGrid = logspace(-6, 2, 25);
% ─── Hybrid I-CNN architecture ─────────────────────
cfg.icnn.windowSize = 16;
cfg.icnn.kernelsMulti = [3 5 7]; % multi-scale branches
cfg.icnn.fusion = 'concat'; % 'concat' | 'attention'
% ─── Hyperparameter optimization ──────────────────
cfg.opt.maxObjEvals = 30;
cfg.opt.useGridSearch = true;
cfg.opt.maxTimeSec = 3600;
% ─── Uncertainty + OOD ───────────────────────────
cfg.uq.mcDropoutSamples = 200;
cfg.uq.alphaPI = 0.95;
cfg.ood.zThreshold = 3;
cfg.ood.clipMin = 0.2; % km/s
cfg.ood.clipMax = 4.5; % km/s
% ─── Output ──────────────────────────────────────
cfg.output.rootDir = 'results';
cfg.output.figureDPI = 300;
cfg.output.saveMatFiles = true;
cfg.output.saveExcel = true;
endTo customize, copy default_config.m to config/my_config.m, edit, then:
cfg = my_config();
results = main_pipeline(cfg);The paper results were generated with the configuration shipped in config/default_config.m and the fixed seed rng(7). To reproduce:
% 1. Generate synthetic data with the same statistical properties as the paper's
% training and blind wells (since the original data are confidential)
generate_synthetic_wells('data/', 'BT-4-like.csv', 'BTS-1-like.csv', 7);
% 2. Point config at the synthetic files
cfg = default_config();
cfg.training.dataPath = 'data/BT-4-like.csv';
cfg.deployment.dataPath = 'data/BTS-1-like.csv';
% 3. Run
results = main_pipeline(cfg);Expected outcomes on the synthetic data (statistically equivalent within ± 2% to the paper):
- Ridge stacker
$R^2 \approx 0.88$ - mRMR-only beats intersection
- Approximately 50% of blind-well samples flagged as OOD
- Systematic positive bias of predicted Vs relative to Castagna baseline (typically +30 to +50%)
The synthetic data preserves the cross-formation regime mismatch by design; this ensures the central scientific finding is reproducible even without access to the confidential field data.
The original well-log dataset (BT-4 and BTS-1, Field RCW, North East Java Basin) cannot be publicly distributed due to operator confidentiality. To preserve scientific reproducibility, this repository includes:
-
Column-structure templates (
data/BT-4_template.csv,data/BTS-1_template.csv) — empty headers showing the expected log names and depth-step convention -
Synthetic data generator (
data/generate_synthetic_wells.m) — produces well-log sequences with statistical properties (means, variances, lag-1 correlations, cross-correlations) calibrated to match BT-4 and BTS-1 -
Pre-computed result tables (
results/tables/) — the final$R^2$ , RMSE, ablation, and deployment numbers reported in the paper, distributed as Excel files
Researchers seeking access to the original well-log data may contact the corresponding author with a formal data-sharing request, subject to operator approval.
After running main_pipeline, the results/ directory is populated as:
results/
├── logs/
│ └── run_20260515_143022.log ← timestamped pipeline log
│
├── preprocessing/
│ ├── BT-4_clean.mat
│ ├── BTS-1_clean.mat
│ └── preproc_summary.xlsx
│
├── feature_selection/
│ ├── FS_mRMR_scores.png
│ ├── FS_LASSO_path.png
│ ├── FS_SHAP_summary.png
│ └── feature_selection_summary.xlsx
│
├── base_models/
│ ├── PNN_model.mat
│ ├── MLFFNN_model.mat
│ ├── DFFNN_model.mat
│ ├── CNN1D_model.mat
│ ├── EVAL_learning_curves.png
│ └── base_learners_metrics.xlsx
│
├── meta_models/
│ ├── ICNN_stacker_model.mat
│ ├── Ridge_stacker_model.mat ← ★ deployment model
│ ├── Hybrid_ICNN_model.mat
│ ├── META_ridge_pred_vs_actual.png
│ ├── META_icnn_pred_vs_actual.png
│ ├── META_hybrid_pred_vs_actual.png
│ └── meta_learners_metrics.xlsx
│
├── ablation/
│ ├── ablation_radar.png
│ └── ablation_summary.xlsx
│
├── uncertainty/
│ ├── UQ_band.png
│ ├── UQ_distribution.png
│ └── uncertainty_predictions.xlsx
│
├── deployment/
│ ├── BT1_deployment_multitrack.png
│ ├── BT1_vpvs_crossplot.png
│ ├── BT1_geomech.png
│ └── BT1_deployment_predictions.xlsx
│
└── tables/
├── final_model_ranking.xlsx ← Table 3 of paper
├── best_hyperparameters.xlsx ← Table 2 of paper
└── icnn_metrics.xlsx
All figures are 300 DPI PNG by default. To also export PDF/SVG:
cfg.output.figureFormats = {'png', 'pdf', 'svg'};- METHODOLOGY.md — full mathematical derivations, design-choice rationale, and pseudocode for each module
- HYPERPARAMETERS.md — full Bayesian Optimization search ranges, prior justifications, and optimized values
- Paper §2–§4 — comprehensive narrative description, equations, and discussion of results
- In-source comments — every
.mfile is documented with header block, parameter description, and references to relevant paper sections
runTestsRuns unit tests covering preprocessing, model training, uncertainty estimation, and deployment modules on synthetic data. Expected runtime: ~3 minutes on a modern laptop.
Individual modules can be tested separately:
testPreprocessing
testModels
testUncertainty
testDeploymentContributions are welcome. If you find a bug, please open an issue with:
- A minimal reproducible example
- The MATLAB version and operating system
- The full error message and stack trace
For feature additions (new base learners, alternative meta-learners, additional rock-physics constraints), please:
- Open an issue to discuss before submitting a PR
- Follow the existing code style (MATLAB capitalized verbNoun, header blocks, structured config)
- Add a corresponding unit test
- Update
docs/METHODOLOGY.mdif the change is methodological
This project is licensed under the MIT License — see LICENSE for the full text.
The MIT License is permissive: you may use, modify, and distribute this code for any purpose (academic or commercial), as long as the original copyright notice and license text are retained.
Corresponding author
Rahmat Catur Wibowo
Geological Engineering Department, Universitas Lampung
Bandar Lampung, Indonesia
Email: rahmat.caturwibowo@eng.unila.ac.id
For code-specific issues, the GitHub issues page is preferred:
https://github.com/rcw3712/VsPrediction-Stacking/issues
We thank the operators of Field RCW for providing well-log access under confidentiality terms, and the geophysics community for the foundational empirical relations [Castagna et al., 1985; Greenberg and Castagna, 1992] that underpin the indirect validation framework. The reviewers of Artificial Intelligence in Geosciences are gratefully acknowledged for their constructive feedback.
This work was supported by Kemdiktisaintek Indonesia with number contract 357/UN26.21/PN/2026.
The open-source community is also acknowledged: the pipeline builds on MATLAB's Deep Learning Toolbox, Statistics and Machine Learning Toolbox, and the SHAP attribution implementation inspired by Lundberg and Lee, 2017.
If this code helps your research, please consider citing the paper above and starring ⭐ the repository.
