Skip to content
View costakevinn's full-sized avatar
đź’­
Machine Learning & Data Engineer Focused on Predictive Modeling, Statistical Lea
đź’­
Machine Learning & Data Engineer Focused on Predictive Modeling, Statistical Lea

Block or report costakevinn

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
costakevinn/README.md

Kevin Mota da Costa

Machine Learning & Data Engineer

Brazil

Building statistically grounded machine learning systems and structured data platforms for real-world problems.

Portfolio • LinkedIn • Email


About

I design and implement machine learning and data systems with a strong foundation in statistics, optimization, and relational data architecture.

My work spans:

  • Probabilistic modeling and uncertainty-aware learning
  • Deep learning systems built from first principles
  • Bayesian inference and MCMC sampling
  • Time series modeling on real-world datasets
  • Structured SQL data platforms with integrity enforcement

I combine mathematical rigor with production-style engineering to build reliable, reproducible ML workflows.


Engineering Philosophy

My approach to ML and Data Engineering is guided by:

  • Statistical rigor over unnecessary model complexity
  • Reproducibility over ad-hoc experimentation
  • Data architecture as part of the modeling lifecycle
  • Validation and reconciliation as first-class system components

I treat models and databases as systems — not scripts.


Selected Projects

Machine Learning & Probabilistic Systems Data Engineering & Analytical Systems
FilinGPT
Byte-level financial language model built from scratch in NumPy with structured ETL and training pipeline.
ChinookAnalytics
Layered SQL analytical platform (stg → core → marts) with financial reconciliation and executive reporting.
ProbNN
Heteroscedastic probabilistic neural network for uncertainty-aware regression using likelihood-based optimization.
RetailSQL
Normalized relational data platform enforcing business rules and integrity at the storage layer.
GPredict
Gaussian Process regression framework implementing Bayesian non-parametric modeling and posterior inference.
ParamInsight
Custom Metropolis–Hastings MCMC engine for Bayesian parameter inference and posterior diagnostics.
Probabilistic ML Thesis
Unified probabilistic ML pipeline integrating neural networks, Gaussian processes, Bayesian inference, and MCMC sampling.
OptLearn
Numerical optimization framework benchmarking SGD, Momentum, RMSProp, and Adam using finite-difference gradients.
Time Series Distance Estimation
Large-scale irregular time series processing and regression pipeline validated against benchmark datasets.

Core Competencies

Machine Learning

  • Supervised Learning (Regression & Classification)
  • Deep Learning & Neural Networks
  • Probabilistic Modeling & Uncertainty Quantification
  • Gaussian Processes & Bayesian Inference
  • Time Series Analysis
  • Likelihood-Based Optimization

Data Engineering & Analytics

  • Relational Modeling (3NF)
  • SQL Data Architecture
  • ETL / ELT Pipelines
  • Data Validation & Reconciliation
  • Analytics Engineering

Systems & Optimization

  • End-to-End ML Pipelines
  • Reproducible Experimentation
  • Modular Architecture
  • Gradient-Based Optimization
  • Performance & Numerical Stability

Tech Stack

Machine Learning: PyTorch • TensorFlow • Keras • Scikit-learn

Scientific Computing: NumPy • SciPy • Pandas • Matplotlib

Data & Databases: SQL • PostgreSQL

Infrastructure & Workflow: Docker • Git • Linux • Jupyter


Education

B.Sc. in Physics Federal University of Espírito Santo (UFES), Brazil (2018–2023) Thesis in Probabilistic Machine Learning and Statistical Modeling. CNPq-funded applied research in predictive modeling and time series analysis.

Technical Degree in IT Support & Systems Federal Institute of Espírito Santo (IFES), Brazil (2016–2017) Training in systems architecture, infrastructure, and structured technical problem-solving.

Pinned Loading

  1. FilinGPT FilinGPT Public

    Byte-level autoregressive financial language model built from scratch in NumPy, integrating structured SEC 10-K ETL pipelines, custom training loops, and reproducible Dockerized experimentation.

    Python

  2. ProbNN ProbNN Public

    Heteroscedastic probabilistic neural network for likelihood-based regression, jointly learning predictive mean and input-dependent uncertainty with calibrated residual diagnostics.

    Python

  3. GPredict GPredict Public

    Modular Gaussian Process regression framework implementing non-parametric Bayesian inference with customizable kernels, posterior prediction, and calibrated uncertainty estimation.

    Python

  4. ChinookAnalytics ChinookAnalytics Public

    Production-style SQL analytical platform built in PostgreSQL using layered architecture (stg → core → marts) with strict integrity enforcement and revenue reconciliation validation.

    PLpgSQL

  5. RetailSQL RetailSQL Public

    Production-style PostgreSQL relational data platform modeling retail operations with 3NF schema design, strict constraint enforcement, and integrity-driven architecture.

    PLpgSQL

  6. time-series-distance-estimation time-series-distance-estimation Public

    Published large-scale irregular time series regression pipeline (arXiv:2311.04470) integrating Lomb–Scargle period detection, feature engineering, and validated statistical modeling.

    Jupyter Notebook