Skip to content
View zhengbrody's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report zhengbrody

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
zhengbrody/README.md

Zheng Dong

UC San Diego ECE M.S. | Machine Learning & Data Science | LLM Systems, Trustworthy ML, Data-centric AI

LinkedIn GitHub

Based in San Diego. I am looking for Fall 2026 research opportunities with UC San Diego labs working on reliable AI systems, LLM agents, data-centric machine learning, and ML for scientific or high-stakes decision support.


Research Profile

I build machine learning systems where model outputs need to be grounded in source data, deterministic computation, and measurable evaluation. My background combines UCSD graduate coursework in machine learning for physical applications, statistical learning, big network data, and computational evolutionary biology with applied projects in LLM agents, RAG, entity extraction, risk analytics, and high-throughput ML services.

Current research interests:

  • Reliable LLM and agent systems: tool use, retrieval, evaluation, hallucination control, and auditable decision pipelines.
  • Trustworthy and responsible ML: confidence calibration, human-in-the-loop review, interpretability, fairness monitoring, and robust evaluation.
  • Data-centric AI and ML systems: document/entity extraction, weak supervision, data pipelines, experiment tracking, and scalable model serving.
  • ML for physical and high-stakes systems: time-series anomaly detection, portfolio risk analytics, energy dispatch, and scientific machine learning.

Research-Relevant Projects

Project Focus Why it matters
MindMarket / PersonalFinancialRiskManagement LLM-grounded risk analytics Combines deterministic VaR/CVaR, factor beta, and stress-scenario computation with an AI copilot that is grounded on computed evidence rather than free-form numerical generation. Relevant to reliable AI for high-stakes decision support.
Multimodal RAG System Retrieval, embeddings, and evaluation Integrates CLIP vision embeddings, text embeddings, rank-fusion reranking, FastAPI serving, and Recall@5 evaluation. Relevant to LLM reliability, retrieval quality, and grounded generation.
EDAgent Tool-using optimization agent Uses a ReAct-style agent with CVXPY tools for economic dispatch, separating numerical optimization from language generation. Relevant to AI agents for scientific/engineering workflows.
Kaggle LMSYS Chatbot Arena LLM preference modeling Kaggle Silver Medal project involving large-scale model inference, preference prediction, and performance optimization. Relevant to LLM evaluation and ML systems.
Amazon Review Deception Detection NLP classification Applies text classification to deceptive-review detection, connecting NLP, trust, and high-signal evaluation. Relevant to data-centric NLP and trustworthy ML.

Selected Experience

Role Research-relevant work
Machine Learning Engineer Intern, Permitfolio Built an auditable LLM-assisted asset-classification service for RegTech compliance with human-in-the-loop review, confidence calibration, review flags, model-version tracking, and reproducible tests.
Machine Learning Research Assistant, Penn State Built an NLP entity-extraction pipeline with spaCy, Transformers/BERT fine-tuning, Kafka streaming, MLflow experiment tracking, Optuna tuning, and evaluation metrics.
Machine Learning Engineer Intern, Allianz Insurance Built fraud detection and model-serving pipelines with privacy-preserving feature engineering, fairness monitoring, Redis-backed serving, and latency optimization.

Technical Stack

Area Tools
ML / AI PyTorch, Transformers, BERT, LLMs, RAG, CLIP, embeddings, scikit-learn, MLflow, Optuna
Data / Systems Python, SQL, Kafka, FastAPI, Flask, PostgreSQL, Redis, PySpark, Docker, Linux
Evaluation Recall@K, F1, precision/false-positive analysis, confidence calibration, fairness metrics, pytest, JMeter
Engineering TypeScript, Next.js, GitHub Actions, AWS EC2/Lambda/S3, CI/CD, observability

Coursework

UC San Diego ECE M.S. coursework includes Machine Learning for Physical Applications, Computational Evolutionary Biology, Machine Learning for Music, Statistical Learning, Big Network Data, Linear Algebra and Applications, and Smart Grids.


Connect

I am open to UC San Diego research collaborations starting Fall 2026, especially projects that can lead to a workshop, conference, or systems paper.


Copyright © 2026 Zheng Dong. This profile README is personal portfolio content and is not licensed for reuse, redistribution, or impersonation.

Pinned Loading

  1. multimodal-rag-system multimodal-rag-system Public

    Multimodal RAG system with CLIP/text embeddings, rank-fusion reranking, FastAPI serving, and Recall@5 evaluation.

    Python 41 4

  2. kaggle-LMSYS---Chatbot-Arena-Human-Preference-Predictions kaggle-LMSYS---Chatbot-Arena-Human-Preference-Predictions Public

    Kaggle Silver Medal LLM preference prediction project with large-scale inference and performance optimization.

    Jupyter Notebook 1

  3. PersonalFinancialRiskManagement PersonalFinancialRiskManagement Public

    LLM-grounded portfolio risk analytics with deterministic VaR/CVaR, stress scenarios, and evidence-based AI copilot.

    Python 1

  4. Amazon-Review-Deception-Detection Amazon-Review-Deception-Detection Public

    NLP text-classification project for deceptive-review detection, focused on trust, evaluation, and data quality.

    Jupyter Notebook 1

  5. ED-agent ED-agent Public

    ReAct-based economic dispatch agent using CVXPY tools for reliable optimization-grounded AI workflows.

    Python

  6. Real-Time-Personalized-Fitness-Plan-Optimization-with-RL Real-Time-Personalized-Fitness-Plan-Optimization-with-RL Public

    Jupyter Notebook