Biostatistician · Statistical Analyst · Healthcare Research & Data Analytics
MS Business Analytics & AI student at UT Dallas, currently co-authoring a faculty-led manuscript under peer review — applying statistical analysis plans, regression modeling, and reproducible Python/SQL workflows to healthcare operational datasets. Former Samsung SDS BI Developer (2 years) delivering enterprise sales dashboards and ETL pipelines across national retail operations.
I focus on the intersection of statistical rigor and real-world healthcare impact — from SAP development and clinical data analysis to dashboard delivery and data governance.
- Co-Author, Peer-Review Manuscript — UTD Optimization & Scheduling Lab: developed SAPs, regression models (linear + logistic, 13+ assumptions), and reproducible analytics pipelines on 3+ healthcare demand datasets
- Samsung SDS BI Developer — QlikView/Tableau dashboards, enterprise SQL on SAMS data warehouse, ETL standardization; reduced reporting workload 30%, preprocessing overhead 20%
- Statistical Analysis Plans (SAPs) — population-level healthcare demand modeling with version-controlled, auditable code
- 95%+ variable-dictionary completion across 3+ demand sources with full documentation and data governance
| Project | Description | Stack |
|---|---|---|
| Healthcare Analytics SQL | SQL-driven analysis of healthcare operational datasets surfacing clinical and demand insights | Python · SQL |
| AI Nutrition Companion | ML classification across 20+ population health variables; interactive web dashboard for health insights | Python · Flask · ML |
| SBA Loan Default Prediction | XGBoost binary classifier with SHAP explainability; 15+ engineered features, optimized AUCPR | Python · XGBoost · SHAP |
| Conagra Market Analysis | Lasso/ElasticNet regression to quantify CPG sales drivers; ranked feature coefficients for commercial action | Python · scikit-learn |
| Flight & Weather Delay Analysis | Merged DOT flight performance data with NOAA weather records to model seasonal delay patterns | Python · Pandas |
| Transit Operations Study | ETL across 7+ longitudinal datasets; statistical analysis raised system efficiency 10% | Python · NumPy |
Statistical & Research Methods
Statistical Analysis Plans (SAPs) Regression Modeling Hypothesis Testing Biostatistics QA/QC Manuscript Preparation
Programming & Querying
R Python SAS SQL STATA
Visualization & BI
Tableau QlikView Matplotlib Seaborn Plotly
Databases & Tools
MySQL PostgreSQL Git ETL Pipelines Google Analytics
Open to Biostatistician, Statistical Analyst, and Healthcare Data Analytics roles in the Dallas–Fort Worth area and beyond. Reach me at mvpk240054@gmail.com or connect on LinkedIn.