This repository contains materials developed for a hands-on workshop introducing machine learning concepts to biomedical researchers, with practical applications in cancer transcriptomic data.
The workshop was designed to bridge foundational ML theory and real-world genomic datasets, enabling scientists to understand and implement supervised learning models in biological contexts.
Topics covered:
- Logistic Regression
- Perceptron
- Support Vector Machines (SVM)
- AdaBoost
- Model evaluation (confusion matrix, performance metrics)
- Decision boundary visualization
- Concepts of overfitting and generalization
All examples are applied to gene expression data derived from TCGA via the UCSC Xena platform.
Transcriptomic data were obtained from:
UCSC Xena Browser
https://xenabrowser.net
Users can select tumor types of interest from the TCGA hub.
Adjustments are required for other tumor types.
python3 scripts/preprocessamento.pypython3 scripts/classificador.pypython3 scripts/perceptron_code.pypython3 scripts/exemplo_perceptron_sklearn.py
python3 scripts/exemplo_ada_sklearn.py
python3 scripts/exemplo_log_sklearn.py
python3 scripts/exemplo_svm_sklearn.pyjupyter notebookOpen the notebook inside the notebooks/ directory and execute interactively.
This workshop was designed to:
- Provide intuition behind linear classifiers
- Demonstrate model behavior in high-dimensional biological data
- Introduce supervised learning workflows
- Promote computational literacy in genomics research environments
Notes
This repository reflects an educational initiative within a research laboratory setting and is intended for instructional and demonstration purposes.