Skip to content
View JoaoSaraiva99's full-sized avatar
🔥
🔥

Block or report JoaoSaraiva99

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
JoaoSaraiva99/README.md

About Me

Hi, I’m João Saraiva, a Data Analyst with a strong interest in analytics, process improvement, and data-driven problem solving.

At the moment, I’m working on a risk prevention project focused on the analysis of operational incidents. By applying text mining techniques such as LDA and BERTopic, I aim to uncover patterns, detect recurring themes, and generate insights that can help strengthen prevention and decision-making processes.

My main tools include Python, KNIME, Power BI, Excel, SPSS, and Power Automate.

You can contact me via email at joaomariosaraiva.99@gmail.com or connect with me on LinkedIn.

João Saraiva's Portfolio

Project description: This is an HR analytics project where I explored employee performance evaluation, workforce segmentation, and rater-effect validation using a combination of dashboarding, clustering, and regression analysis.

  • Data was based on an HR evaluation dataset covering multiple fiscal periods
  • Built an interactive dashboard in Power BI to analyse workforce structure, score distribution, visibility needs, promotion patterns, and supervisory trends
  • Applied K-Modes clustering to segment employees into distinct categorical profiles
  • Used statistical tests and regression models to validate whether the Nine Block Score was meaningfully linked to talent-related outcomes
  • Main tools used: Python, Pandas, Power BI, KModes, Logistic Regression
  • Key findings showed clear employee segments with different performance, development, and promotion patterns, while also highlighting the limitations of the score in predicting attrition risk

Project description:
This is a wine analytics project where I explored product segmentation, catalogue profiling, and cluster interpretation using a combination of exploratory analysis, dimensionality reduction, clustering, and AI-assisted interpretation.

  • Data was based on a wine catalogue dataset containing 178 wines described across 13 chemical attributes
  • Performed data quality checks, descriptive analysis, and normalisation to prepare the dataset for modelling
  • Applied correlation analysis and PCA to reduce dimensionality and identify the most relevant variables for segmentation
  • Used K-Means clustering to segment the catalogue into 4 distinct wine profiles
  • Complemented the analysis with silhouette plots, Self-Organizing Maps (SOMs), and a decision tree to validate and interpret the clusters
  • Used the ChatGPT API to help translate technical outputs into business-oriented interpretations related to wine quality, classification, and marketing potential
  • Main tools used: Python, Pandas, Scikit-learn, Seaborn, Plotly, PCA, K-Means, SOMs, Decision Trees, OpenAI API
  • Key findings showed that the catalogue can be segmented into 4 distinct wine profiles, providing a clearer basis for targeted advertising, stronger product positioning, and more efficient catalogue promotion

Project description: This is a customer analytics project where I explored customer behavior, feature engineering, and predictive modeling to estimate the likelihood of purchasing a transfer service, using a combination of data preprocessing, classification models, and performance evaluation. Data was based on real-world operational datasets from a short-term rental company, combining reservations, apartments, and transfer records into a unified analytical dataset

  • Performed extensive data cleaning, including handling missing values, standardizing categorical variables, and resolving inconsistencies typical of real business data
  • Engineered relevant features such as check-in time categories, distance to airport, booking lead time, and seasonality indicators to capture behavioral patterns
  • Applied one-hot encoding and dataset balancing techniques (RandomOverSampler) to improve model performance on imbalanced data
  • Tested multiple classification models, including Decision Trees, Bagging, Random Forest, Gradient Boosting, and XGBoost
  • Main tools used: Python, Pandas, Scikit-learn, XGBoost, Imbalanced-learn
  • Key findings showed that customer purchase behavior is strongly influenced by booking timing, check-in period, and location-related factors. Gradient Boosting achieved the best performance in identifying potential buyers, enabling more effective targeting strategies and supporting data-driven decision-making for service promotion.

Project description:
This is a fraud analytics project where I explored anomaly detection and classification techniques using a combination of statistical methods, machine learning models, and deployment workflows in KNIME.

  • Applied outlier detection (IQR), Decision Trees, Random Forest, Gradient Boosting, Logistic Regression, and Autoencoders
  • Performed feature engineering and model-specific preprocessing to improve detection performance
  • Designed a 3-layer hybrid architecture combining Autoencoder, Decision Tree, and Quartile-based validation
  • Simulated deployment with email alerts for suspicious transactions
  • Main tools used: KNIME Analytics Platform, Machine Learning, Feature Engineering
  • Key findings showed that combining models improves fraud detection performance, with a strong focus on identifying the minority class and enabling real-time monitoring

Project description:
This project focuses on automating customer support email handling using AI and workflow automation tools, improving efficiency, reducing manual effort, and enhancing response time.

  • Automated classification of emails into complaints or information requests using NLP
  • Performed sentiment analysis, language detection, urgency detection, and TIN extraction
  • Generated structured request summaries to support operational decision-making
  • Implemented automated workflows with Power Automate for email processing and task orchestration
  • Integrated AI Builder to extract invoice data from PDF attachments
  • Stored and managed data using Excel and OneDrive
  • Created tasks and approval flows for customer support teams
  • Sent real-time alerts via Microsoft Teams for urgent requests
  • Main tools used: Power Automate, AI Builder, NLP, Excel, OneDrive, Microsoft Teams
  • The system streamlines customer support operations by automating repetitive tasks, improving data consistency, and enabling faster and more structured request resolution

Project description:
This is a customer analytics and machine learning project focused on predicting airline passenger satisfaction and identifying the key drivers behind customer experience.

  • Performed EDA, correlation analysis, and PCA, identifying 6 latent components of service experience
  • Tested multiple models including Logistic Regression, Decision Trees, Random Forest, Bagging, Gradient Boosting, XGBoost and Neural Networks
  • Applied GridSearchCV for model tuning
  • Main tools used: Python, Pandas, Scikit-learn, XGBoost, SHAP
  • Key findings showed that satisfaction is primarily driven by Seat comfort, Customer Type, Type of Travel and service-related variables, while PCA reduced model performance and neural networks were not necessary given the strong performance of tree-based models.

Certifications

1st Place in the Identifying Fraud Challenge - Spring 2025

Credly Badge

Advanced Proficiency in KNIME Analytics Platform

Credly Badge

Popular repositories Loading

  1. JoaoSaraiva99 JoaoSaraiva99 Public

    Talent Segmentation and Workforce Profiling using KModes

  2. Talent-Segmentation-and-Workforce-Profiling-using-K-Modes Talent-Segmentation-and-Workforce-Profiling-using-K-Modes Public

    HR analytics project focused on workforce segmentation, performance evaluation, and rater-effect validation using dashboarding, K-Modes clustering, and regression analysis.

    Jupyter Notebook

  3. Data-Driven-Customer-Segmentation-for-Wine-Catalog Data-Driven-Customer-Segmentation-for-Wine-Catalog Public

    This project applies segmentation algorithms to a catalogue of 178 wines to identify distinct profiles for targeted advertising campaigns. The methodology combined normalisation, correlation analys…

    Jupyter Notebook

  4. Predicting-Customer-Transfer-Service-Adoption-Using-Logistic-Regression Predicting-Customer-Transfer-Service-Adoption-Using-Logistic-Regression Public

    A company managing tourist real estate aims to offer a transfer service. This project uses logistic regression to analyze customer data and identify profiles most likely to purchase the service, en…

    Jupyter Notebook

  5. Fraud-Analytics-with-KNIME Fraud-Analytics-with-KNIME Public

    A fraud detection project developed in KNIME using transaction data, combining statistical methods and machine learning models to identify anomalous patterns and support real-time fraud monitoring …

  6. Smart-Complaint-Management-Email-Automation-System Smart-Complaint-Management-Email-Automation-System Public

    AI-powered system to automate customer support emails, improving efficiency, reducing errors, and enhancing response time and service quality.