GitHub - isha2106/Cancer-Classifier-ML-Model

Detection of Cancer Type based on Gene Expression Profiles

Introduction

In bioinformatics, there is significant use of gene expression profiles for numerous number of studies. From identification of cancer to finding therapeutic targets gene expression data has proven to be of utmost importance. The primary goal of this project is to use the gene expression data for categorizing different types of cancer. Here, machine learning techniques are used for the same which can be eﬀicient and a faster approach for cancer type identification. This project could pave the way for more personalized medicine approaches and also contribute to ongoing efforts in oncology research and treatment strategies.

Objective

To determine the type of cancer based on the gene expression profile using machine learning models.

Formulated data mining approach

This is a supervised learning problem where the input variables are the gene expression data and the target variable is the type of cancer. The challenge involves processing the high dimensional dataset, selecting the most relevant features and choosing machine learning algorithms that can effectively classify the cancer types. Additionally, the project will explore the potential of combining models through ensemble methods to enhance predictive accuracy and robustness.

Requirements

RNA-Seq (HiSeq) PANCAN dataset with instances (patients) and features (gene expressions)

Source: Fiorini, S. (2016). gene expression cancer RNA-Seq [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5R88H.

Outline

This project uses the RNA-Seq (HiSeq) PANCAN dataset from the UC Irvine Machine Learning Repository to determine the type of cancer (BRCA, KIRC, COAD, LUAD, PRAD) using gene expressions from patients. The classification model uses supervised machine learning approach. The project will explore Support Vector Machine (SVM), Random Forest, and Gradient Boosting algorithms known for their high-dimensional data handling, feature interaction capture, and skewness mitigation capabilities. Utilizing data normalization techniques and feature selection, the project aims for precise cancer classification, evaluated using cross- validation and accuracy/F1 scores.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
CancerClassifier.R		CancerClassifier.R
CancerClassifier.Rmd		CancerClassifier.Rmd
CancerClassifier.pdf		CancerClassifier.pdf
README.md		README.md
data.csv		data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detection of Cancer Type based on Gene Expression Profiles

Introduction

Objective

Formulated data mining approach

Requirements

Outline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Detection of Cancer Type based on Gene Expression Profiles

Introduction

Objective

Formulated data mining approach

Requirements

Outline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages