Machine Learning & Data Engineer
Brazil
Building statistically grounded machine learning systems and structured data platforms for real-world problems.
Portfolio • LinkedIn • Email
I design and implement machine learning and data systems with a strong foundation in statistics, optimization, and relational data architecture.
My work spans:
- Probabilistic modeling and uncertainty-aware learning
- Deep learning systems built from first principles
- Bayesian inference and MCMC sampling
- Time series modeling on real-world datasets
- Structured SQL data platforms with integrity enforcement
I combine mathematical rigor with production-style engineering to build reliable, reproducible ML workflows.
My approach to ML and Data Engineering is guided by:
- Statistical rigor over unnecessary model complexity
- Reproducibility over ad-hoc experimentation
- Data architecture as part of the modeling lifecycle
- Validation and reconciliation as first-class system components
I treat models and databases as systems — not scripts.
| Machine Learning & Probabilistic Systems | Data Engineering & Analytical Systems |
|---|---|
| FilinGPT Byte-level financial language model built from scratch in NumPy with structured ETL and training pipeline. |
ChinookAnalytics Layered SQL analytical platform (stg → core → marts) with financial reconciliation and executive reporting. |
| ProbNN Heteroscedastic probabilistic neural network for uncertainty-aware regression using likelihood-based optimization. |
RetailSQL Normalized relational data platform enforcing business rules and integrity at the storage layer. |
| GPredict Gaussian Process regression framework implementing Bayesian non-parametric modeling and posterior inference. |
ParamInsight Custom Metropolis–Hastings MCMC engine for Bayesian parameter inference and posterior diagnostics. |
| Probabilistic ML Thesis Unified probabilistic ML pipeline integrating neural networks, Gaussian processes, Bayesian inference, and MCMC sampling. |
OptLearn Numerical optimization framework benchmarking SGD, Momentum, RMSProp, and Adam using finite-difference gradients. |
| Time Series Distance Estimation Large-scale irregular time series processing and regression pipeline validated against benchmark datasets. |
- Supervised Learning (Regression & Classification)
- Deep Learning & Neural Networks
- Probabilistic Modeling & Uncertainty Quantification
- Gaussian Processes & Bayesian Inference
- Time Series Analysis
- Likelihood-Based Optimization
- Relational Modeling (3NF)
- SQL Data Architecture
- ETL / ELT Pipelines
- Data Validation & Reconciliation
- Analytics Engineering
- End-to-End ML Pipelines
- Reproducible Experimentation
- Modular Architecture
- Gradient-Based Optimization
- Performance & Numerical Stability
Machine Learning: PyTorch • TensorFlow • Keras • Scikit-learn
Scientific Computing: NumPy • SciPy • Pandas • Matplotlib
Data & Databases: SQL • PostgreSQL
Infrastructure & Workflow: Docker • Git • Linux • Jupyter
B.Sc. in Physics Federal University of EspĂrito Santo (UFES), Brazil (2018–2023) Thesis in Probabilistic Machine Learning and Statistical Modeling. CNPq-funded applied research in predictive modeling and time series analysis.
Technical Degree in IT Support & Systems Federal Institute of EspĂrito Santo (IFES), Brazil (2016–2017) Training in systems architecture, infrastructure, and structured technical problem-solving.