Skip to content
View yievia's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report yievia

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
yievia/README.md

Hi there, I'm Xin Yie πŸ‘‹

πŸŽ“ MSc Data Science, Universiti Malaya
🌏 Based in Malaysia
πŸ“Š Data science, machine learning, and analytics for real-world decision making


πŸš€ About Me

With a background in biotechnology and scientific research, I apply data science methods to analyze complex systems and support data-driven decisions.

My work spans time-series modeling, machine learning, and data analytics, combining structured analysis with domain knowledge. I focus on building reproducible workflows, interpretable models, and practical analytics solutions.


🧠 Core Skills

  • Languages & Tools: Python, R, SQL
  • Libraries: Pandas, NumPy, Scikit-learn, Statsmodels, Matplotlib, Seaborn
  • Data & BI Tools: Power BI, Excel, Jupyter Notebook, MySQL Workbench
  • Tools & Platforms: Git, VS Code
  • Techniques: Data Cleaning, Feature Engineering, Machine Learning (Classification), Time Series Forecasting (ARIMA, ETS), Econometric Modeling (ARDL, VAR), SQL Window Functions, Data Visualization
  • Other: Agile, Google-Certified Project Management, Scientific Documentation

πŸ“Œ Featured Projects

πŸ”Ή Fuel Price Pass-Through & Inflation in Malaysia

Research project analyzing how global oil prices and exchange rates influence Malaysian fuel prices and transport inflation. Built multi-source dataset integrating oil prices, FX rates, fuel prices, and CPI Developed time-series forecasting models (ARIMA, ETS) Estimated pass-through effects using ARDL and VAR models Evaluated models using RMSE, MAE, and MAPE

πŸ”Ή Logistics Inventory Data Analysis (SQL + PowerBI)

SQL and Power BI analysis of shipment lead times, delay rates, inventory days, and SKU performance for a retail logistics context.

πŸ”Ή Recipe Site Traffic Prediction (Machine Learning + KPI)

Classified high-traffic recipes using Logistic Regression, Decision Tree, and Random Forest. Defined a business KPI β€” High Traffic Conversion Rate (HTCR) β€” to align model precision with strategy. β†’ Best Model: Logistic Regression (Precision = 0.88, HTCR = 7.13)

πŸ”Ή Telecom Customer Churn Analysis

Predictive model to identify customers at risk of churn using billing and usage patterns.
F1 Score: 0.85 | Key tools: Python, Random Forest, Seaborn_

πŸ”Ή Insurance Claim Outcome Modeling

Built classifiers to predict insurance claims and explored risk segmentation.
Accuracy > 75% | SMOTE for class balancing_

πŸ”Ή Netflix Content Trends

EDA and visualization of global trends across genres, ratings, and durations.
Clear dashboards to support content strategy decisions_

πŸ”Ή Penguin Clustering (PCA + K-Means)

Unsupervised learning project to classify species based on biometric traits.

πŸ”Ή Crop Yield Prediction (Regression)

Modeled yield based on environmental factors to support precision agriculture.

🧠 More projects available in the Repositories


πŸ“ˆ Currently Exploring

  • Dashboard deployment with Streamlit
  • Power BI data storytelling
  • Model interpretability (feature importance, SHAP)
  • Data pipelines and workflow automation

πŸ“¬ Get in Touch


Pinned Loading

  1. fuel-subsidy-pass-through-malaysia fuel-subsidy-pass-through-malaysia Public

    Reproducible Python code for Master of Data Science thesis on fuel price pass-through and inflation in Malaysia (2020–2025)

    Jupyter Notebook

  2. recipe-traffic-prediction recipe-traffic-prediction Public

    Predicting recipe website traffic (High vs Low) using nutritional and categorical features with Logistic Regression, Decision Tree, and Random Forest.

    Jupyter Notebook

  3. logistics-inventory-data-analysis logistics-inventory-data-analysis Public

    SQL and Power BI analysis of shipment lead times, delay rates, inventory days, and SKU performance for a retail logistics context.

  4. telecom-customer-churn telecom-customer-churn Public

    This project uses real-world telecom customer data to predict churn behavior using machine learning. It includes data cleaning, exploratory data analysis (EDA), feature engineering, model training …

    Jupyter Notebook

  5. car-insurance-claim-predictor car-insurance-claim-predictor Public

    Predicting car insurance claims using single-feature logistic regression. Identify the most informative predictor to build simple, production-ready models.

    Jupyter Notebook

  6. penguin-species-clustering penguin-species-clustering Public

    Identify natural groupings of Antarctic penguins using k-means clustering based on physical measurements

    Jupyter Notebook