Skip to content

helenabea/applied-machine-learning-for-transcriptomics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Applied Machine Learning for Transcriptomic Data

This repository contains materials developed for a hands-on workshop introducing machine learning concepts to biomedical researchers, with practical applications in cancer transcriptomic data.

The workshop was designed to bridge foundational ML theory and real-world genomic datasets, enabling scientists to understand and implement supervised learning models in biological contexts.


Overview

Topics covered:

  • Logistic Regression
  • Perceptron
  • Support Vector Machines (SVM)
  • AdaBoost
  • Model evaluation (confusion matrix, performance metrics)
  • Decision boundary visualization
  • Concepts of overfitting and generalization

All examples are applied to gene expression data derived from TCGA via the UCSC Xena platform.


Data Source

Transcriptomic data were obtained from:

UCSC Xena Browser
https://xenabrowser.net

Users can select tumor types of interest from the TCGA hub.

⚠️ Current preprocessing scripts are configured for Gastrointestinal tumor samples.
Adjustments are required for other tumor types.


Running the Analysis

1. Preprocess Data

python3 scripts/preprocessamento.py

2. Train Logistic Regression Model

python3 scripts/classificador.py

3. Train Perceptron

python3 scripts/perceptron_code.py

4. Run scikit-learn Examples

python3 scripts/exemplo_perceptron_sklearn.py
python3 scripts/exemplo_ada_sklearn.py
python3 scripts/exemplo_log_sklearn.py
python3 scripts/exemplo_svm_sklearn.py

5. Interactive Notebook

jupyter notebook

Open the notebook inside the notebooks/ directory and execute interactively.

Educational Objective

This workshop was designed to:

  • Provide intuition behind linear classifiers
  • Demonstrate model behavior in high-dimensional biological data
  • Introduce supervised learning workflows
  • Promote computational literacy in genomics research environments

Notes

This repository reflects an educational initiative within a research laboratory setting and is intended for instructional and demonstration purposes.

About

This repository is a brief introduction to Supervised Learning in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors