Skip to content
View Siddharth-Shekhar-Singh37's full-sized avatar

Block or report Siddharth-Shekhar-Singh37

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Hi, I'm Siddharth 👋

Data Analyst | Python · SQL · dbt · Snowflake · AI Automation | Belfast, UK

I build end-to-end data pipelines, machine learning models and AI-powered automation systems. Currently working at NI Water as a Data Analyst, where I apply analytics and automation to real-world water utility challenges.

I'm actively seeking new opportunities in Data Engineering, Analytics Engineering or Data Science roles.


🛠️ Tech Stack

Languages

Python R SQL

Data Engineering & Analytics

dbt Snowflake Power BI Streamlit

AI & Machine Learning

LangChain Pandas scikit-learn

Automation & Tools

n8n Git SQLite


🚀 Featured Projects

AI-powered water leakage anomaly detection system built for water utilities. Detects pipe bursts and anomalies across 10 DMA zones using Z-score analysis and Groq Llama 3.3 70B for automated plain-English field reports. Deployed as a live Streamlit dashboard with n8n + Gmail automated delivery.

Python Streamlit LangChain dbt SQLite Groq AI n8n


End-to-end ELT analytics engineering pipeline processing 3.5M rows across 112 partitions. Built on a Modern Data Stack with Snowflake, dbt and Power BI. Implements Kimball Star Schema, automated dbt testing and chunked Python ingestion with zero credential leakage.

Snowflake dbt Python Power BI SQL DAX


Forecasting S&P 500 and FTSE All-Share indices across 30 years of data using five time series models — ARIMA, SES, Moving Averages, Simple Forecasts and STL Decomposition. ARIMA achieved RMSE of 82.26 and MAPE of 6.37% on the 1990–2000 horizon.

R ARIMA Time Series ggplot2 Bloomberg Data


Capital Market Approach OLS regression quantifying GBP/USD and GBP/EUR exposure for a FTSE-listed oil & gas company. Model explains 87% of stock return variance. Full econometric diagnostics including Breusch-Pagan, Newey-West robust SEs and Jarque-Bera normality tests.

R OLS Regression Econometrics Bloomberg Terminal Robust SE


Binary classification of product safety outcomes across 50,646 records using Lasso feature selection, Logistic Regression and KNN. Achieved 99.08% accuracy on Subset 2 with AUC of 0.918 on Subset 3. Full 10-fold cross-validation pipeline.

R Lasso Logistic Regression KNN caret glmnet


Classifying online business scores using GAMs and Decision Trees across three field-segmented subsets. Decision Trees outperformed GAMs with up to 94% sensitivity. Lasso used as a two-stage feature selector feeding into both models.

R tidymodels GAM Decision Tree Lasso mgcv


🎓 Education & Credentials

🎓 MSc Financial Analytics — Queen's University Belfast (2023–24)

📜 Bloomberg Market Concepts (BMC) — Bloomberg certification


📫 Let's Connect

LinkedIn Email


Open to Data Analyst, Analytics Engineer and Data Science roles — based in Belfast, open to remote.

Pinned Loading

  1. Leakage-AI-Detector Leakage-AI-Detector Public

    AI-powered water leakage detection system - Z-score anomaly detection across 10 DMA zones with Groq Llama 3.3 70B report generation, Streamlit dashboard and automated n8n email delivery.

    Python

  2. UK-Energy-Analytics-Engineering UK-Energy-Analytics-Engineering Public

    End-to-end ELT analytics pipeline processing 3.5M rows across 112 partitions - Snowflake + dbt + Power BI Modern Data Stack with automated data quality testing, Kimball Star Schema and executive BI…

    Python

  3. STOCK-MARKET-RETURN-PREDICTION-USING-R STOCK-MARKET-RETURN-PREDICTION-USING-R Public

    Time series forecasting of S&P500 & FTSE indices using ARIMA, SES, Moving Averages and STL Decomposition across 30 years of data.

    HTML

  4. PREDICTIVE-MODELING-FOR-BUSINESS-SCORE-CLASSIFICATION-USING-GAMS-AND-DECISION-TREES. PREDICTIVE-MODELING-FOR-BUSINESS-SCORE-CLASSIFICATION-USING-GAMS-AND-DECISION-TREES. Public

    Binary classification of online business scores using Lasso feature selection, GAMs and Decision Trees across 3 field subsets - Decision Trees outperformed GAMs with 94% sensitivity.

  5. PREDICTIVE-MODELING-FOR-PRODUCT-OUTCOME-CLASSIFICATION-USING-MACHINE-LEARNING PREDICTIVE-MODELING-FOR-PRODUCT-OUTCOME-CLASSIFICATION-USING-MACHINE-LEARNING Public

    Binary classification of product safety outcomes using Lasso feature selection, Logistic Regression and KNN across 50,646 records - evaluated with AUC and 10-fold cross-validation.

  6. FOREIGN-EXCHANGE-RISK-ANALYSIS-FOR-ENERGEAN-PLC-USING-REGRESSION-MODELING FOREIGN-EXCHANGE-RISK-ANALYSIS-FOR-ENERGEAN-PLC-USING-REGRESSION-MODELING Public

    Assessed foreign exchange risk using multiple regression on market index and exchange rates. Explained 87% of stock return variance using regression diagnostics and Bloomberg Terminal data. OLS reg…