João Saraiva JoaoSaraiva99

About Me

Hi, I’m João Saraiva, a Data Analyst with a strong interest in analytics, process improvement, and data-driven problem solving.

At the moment, I’m working on a risk prevention project focused on the analysis of operational incidents. By applying text mining techniques such as LDA and BERTopic, I aim to uncover patterns, detect recurring themes, and generate insights that can help strengthen prevention and decision-making processes.

My main tools include Python, KNIME, Power BI, Excel, SPSS, and Power Automate.

You can contact me via email at joaomariosaraiva.99@gmail.com or connect with me on LinkedIn.

João Saraiva's Portfolio

Project: 1 - Talent Segmentation and Workforce Profiling using K-Modes

Project description: This is an HR analytics project where I explored employee performance evaluation, workforce segmentation, and rater-effect validation using a combination of dashboarding, clustering, and regression analysis.

Data was based on an HR evaluation dataset covering multiple fiscal periods
Built an interactive dashboard in Power BI to analyse workforce structure, score distribution, visibility needs, promotion patterns, and supervisory trends
Applied K-Modes clustering to segment employees into distinct categorical profiles
Used statistical tests and regression models to validate whether the Nine Block Score was meaningfully linked to talent-related outcomes
Main tools used: Python, Pandas, Power BI, KModes, Logistic Regression
Key findings showed clear employee segments with different performance, development, and promotion patterns, while also highlighting the limitations of the score in predicting attrition risk

Project: 2 - Data Driven Customer Segmentation for Wine Catalog

Project description:
This is a wine analytics project where I explored product segmentation, catalogue profiling, and cluster interpretation using a combination of exploratory analysis, dimensionality reduction, clustering, and AI-assisted interpretation.

Data was based on a wine catalogue dataset containing 178 wines described across 13 chemical attributes
Performed data quality checks, descriptive analysis, and normalisation to prepare the dataset for modelling
Applied correlation analysis and PCA to reduce dimensionality and identify the most relevant variables for segmentation
Used K-Means clustering to segment the catalogue into 4 distinct wine profiles
Complemented the analysis with silhouette plots, Self-Organizing Maps (SOMs), and a decision tree to validate and interpret the clusters
Used the ChatGPT API to help translate technical outputs into business-oriented interpretations related to wine quality, classification, and marketing potential
Main tools used: Python, Pandas, Scikit-learn, Seaborn, Plotly, PCA, K-Means, SOMs, Decision Trees, OpenAI API
Key findings showed that the catalogue can be segmented into 4 distinct wine profiles, providing a clearer basis for targeted advertising, stronger product positioning, and more efficient catalogue promotion

Project: 3 - Transfer Service Purchase Prediction using Machine Learning

Project description: This is a customer analytics project where I explored customer behavior, feature engineering, and predictive modeling to estimate the likelihood of purchasing a transfer service, using a combination of data preprocessing, classification models, and performance evaluation. Data was based on real-world operational datasets from a short-term rental company, combining reservations, apartments, and transfer records into a unified analytical dataset

Performed extensive data cleaning, including handling missing values, standardizing categorical variables, and resolving inconsistencies typical of real business data
Engineered relevant features such as check-in time categories, distance to airport, booking lead time, and seasonality indicators to capture behavioral patterns
Applied one-hot encoding and dataset balancing techniques (RandomOverSampler) to improve model performance on imbalanced data
Tested multiple classification models, including Decision Trees, Bagging, Random Forest, Gradient Boosting, and XGBoost
Main tools used: Python, Pandas, Scikit-learn, XGBoost, Imbalanced-learn
Key findings showed that customer purchase behavior is strongly influenced by booking timing, check-in period, and location-related factors. Gradient Boosting achieved the best performance in identifying potential buyers, enabling more effective targeting strategies and supporting data-driven decision-making for service promotion.

Project: 4 - Fraud Analytics with KNIME

Project description:
This is a fraud analytics project where I explored anomaly detection and classification techniques using a combination of statistical methods, machine learning models, and deployment workflows in KNIME.

Applied outlier detection (IQR), Decision Trees, Random Forest, Gradient Boosting, Logistic Regression, and Autoencoders
Performed feature engineering and model-specific preprocessing to improve detection performance
Designed a 3-layer hybrid architecture combining Autoencoder, Decision Tree, and Quartile-based validation
Simulated deployment with email alerts for suspicious transactions
Main tools used: KNIME Analytics Platform, Machine Learning, Feature Engineering
Key findings showed that combining models improves fraud detection performance, with a strong focus on identifying the minority class and enabling real-time monitoring

Project: 5 - Smart Complaint Management & Email Automation System

Project description:
This project focuses on automating customer support email handling using AI and workflow automation tools, improving efficiency, reducing manual effort, and enhancing response time.

Automated classification of emails into complaints or information requests using NLP
Performed sentiment analysis, language detection, urgency detection, and TIN extraction
Generated structured request summaries to support operational decision-making
Implemented automated workflows with Power Automate for email processing and task orchestration
Integrated AI Builder to extract invoice data from PDF attachments
Stored and managed data using Excel and OneDrive
Created tasks and approval flows for customer support teams
Sent real-time alerts via Microsoft Teams for urgent requests
Main tools used: Power Automate, AI Builder, NLP, Excel, OneDrive, Microsoft Teams
The system streamlines customer support operations by automating repetitive tasks, improving data consistency, and enabling faster and more structured request resolution

Project: 6 - Airline Customer Satisfaction Prediction

Project description:
This is a customer analytics and machine learning project focused on predicting airline passenger satisfaction and identifying the key drivers behind customer experience.

Performed EDA, correlation analysis, and PCA, identifying 6 latent components of service experience
Tested multiple models including Logistic Regression, Decision Trees, Random Forest, Bagging, Gradient Boosting, XGBoost and Neural Networks
Applied GridSearchCV for model tuning
Main tools used: Python, Pandas, Scikit-learn, XGBoost, SHAP
Key findings showed that satisfaction is primarily driven by Seat comfort, Customer Type, Type of Travel and service-related variables, while PCA reduced model performance and neural networks were not necessary given the strong performance of tree-based models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

João Saraiva JoaoSaraiva99

Block or report JoaoSaraiva99

About Me

João Saraiva's Portfolio

Project: 1 - Talent Segmentation and Workforce Profiling using K-Modes

Project: 2 - Data Driven Customer Segmentation for Wine Catalog

Project: 3 - Transfer Service Purchase Prediction using Machine Learning

Project: 4 - Fraud Analytics with KNIME

Project: 5 - Smart Complaint Management & Email Automation System

Project: 6 - Airline Customer Satisfaction Prediction

Certifications

1st Place in the Identifying Fraud Challenge - Spring 2025

Advanced Proficiency in KNIME Analytics Platform

Popular repositories Loading

Uh oh!