π MSc Data Science, Universiti Malaya
π Based in Malaysia
π Data science, machine learning, and analytics for real-world decision making
With a background in biotechnology and scientific research, I apply data science methods to analyze complex systems and support data-driven decisions.
My work spans time-series modeling, machine learning, and data analytics, combining structured analysis with domain knowledge. I focus on building reproducible workflows, interpretable models, and practical analytics solutions.
- Languages & Tools: Python, R, SQL
- Libraries: Pandas, NumPy, Scikit-learn, Statsmodels, Matplotlib, Seaborn
- Data & BI Tools: Power BI, Excel, Jupyter Notebook, MySQL Workbench
- Tools & Platforms: Git, VS Code
- Techniques: Data Cleaning, Feature Engineering, Machine Learning (Classification), Time Series Forecasting (ARIMA, ETS), Econometric Modeling (ARDL, VAR), SQL Window Functions, Data Visualization
- Other: Agile, Google-Certified Project Management, Scientific Documentation
πΉ Fuel Price Pass-Through & Inflation in Malaysia
Research project analyzing how global oil prices and exchange rates influence Malaysian fuel prices and transport inflation. Built multi-source dataset integrating oil prices, FX rates, fuel prices, and CPI Developed time-series forecasting models (ARIMA, ETS) Estimated pass-through effects using ARDL and VAR models Evaluated models using RMSE, MAE, and MAPE
πΉ Logistics Inventory Data Analysis (SQL + PowerBI)
SQL and Power BI analysis of shipment lead times, delay rates, inventory days, and SKU performance for a retail logistics context.
πΉ Recipe Site Traffic Prediction (Machine Learning + KPI)
Classified high-traffic recipes using Logistic Regression, Decision Tree, and Random Forest. Defined a business KPI β High Traffic Conversion Rate (HTCR) β to align model precision with strategy. β Best Model: Logistic Regression (Precision = 0.88, HTCR = 7.13)
πΉ Telecom Customer Churn Analysis
Predictive model to identify customers at risk of churn using billing and usage patterns.
F1 Score: 0.85 | Key tools: Python, Random Forest, Seaborn_
πΉ Insurance Claim Outcome Modeling
Built classifiers to predict insurance claims and explored risk segmentation.
Accuracy > 75% | SMOTE for class balancing_
EDA and visualization of global trends across genres, ratings, and durations.
Clear dashboards to support content strategy decisions_
πΉ Penguin Clustering (PCA + K-Means)
Unsupervised learning project to classify species based on biometric traits.
πΉ Crop Yield Prediction (Regression)
Modeled yield based on environmental factors to support precision agriculture.
π§ More projects available in the Repositories
- Dashboard deployment with Streamlit
- Power BI data storytelling
- Model interpretability (feature importance, SHAP)
- Data pipelines and workflow automation
- π§ Email: yievia@gmail.com
- πΌ LinkedIn: View Here
- ποΈ DataCamp Portfolio: View Here
