Applied Machine Learning for Transcriptomic Data

This repository contains materials developed for a hands-on workshop introducing machine learning concepts to biomedical researchers, with practical applications in cancer transcriptomic data.

The workshop was designed to bridge foundational ML theory and real-world genomic datasets, enabling scientists to understand and implement supervised learning models in biological contexts.

Overview

Topics covered:

Logistic Regression
Perceptron
Support Vector Machines (SVM)
AdaBoost
Model evaluation (confusion matrix, performance metrics)
Decision boundary visualization
Concepts of overfitting and generalization

All examples are applied to gene expression data derived from TCGA via the UCSC Xena platform.

Data Source

Transcriptomic data were obtained from:

UCSC Xena Browser
https://xenabrowser.net

Users can select tumor types of interest from the TCGA hub.

⚠️ Current preprocessing scripts are configured for Gastrointestinal tumor samples.
Adjustments are required for other tumor types.

Running the Analysis

1. Preprocess Data

python3 scripts/preprocessamento.py

2. Train Logistic Regression Model

python3 scripts/classificador.py

3. Train Perceptron

python3 scripts/perceptron_code.py

4. Run scikit-learn Examples

python3 scripts/exemplo_perceptron_sklearn.py
python3 scripts/exemplo_ada_sklearn.py
python3 scripts/exemplo_log_sklearn.py
python3 scripts/exemplo_svm_sklearn.py

5. Interactive Notebook

jupyter notebook

Open the notebook inside the notebooks/ directory and execute interactively.

Educational Objective

This workshop was designed to:

Provide intuition behind linear classifiers
Demonstrate model behavior in high-dimensional biological data
Introduce supervised learning workflows
Promote computational literacy in genomics research environments

Notes

This repository reflects an educational initiative within a research laboratory setting and is intended for instructional and demonstration purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
figures		figures
scripts		scripts
Hands-On Machine Learning.ipynb		Hands-On Machine Learning.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Applied Machine Learning for Transcriptomic Data

Overview

Data Source

Running the Analysis

1. Preprocess Data

2. Train Logistic Regression Model

3. Train Perceptron

4. Run scikit-learn Examples

5. Interactive Notebook

Educational Objective

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Applied Machine Learning for Transcriptomic Data

Overview

Data Source

Running the Analysis

1. Preprocess Data

2. Train Logistic Regression Model

3. Train Perceptron

4. Run scikit-learn Examples

5. Interactive Notebook

Educational Objective

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages