Skip to content

kvkrishn/CS510_Project

Repository files navigation

CS510_Project

Github Repository for CS 510 Project: Analysis of Different Predictive Models for the Prediction of Aromatic Rings of Cancer SRC Kinase Inhibitors Author: Keerthi Krishnan

Premise: SRC Kinase Inhibitors are important protein inhibitors and potential drug candidates for cancer proteins. They play an important role in suppressing tumor growth and oncogenesis in tumor proteins. A distinction factor of SRC Kinase Inhibitors and their existence is their ring count and it acts as a very important feature for these molecules. Potentially, the ring count can affect binding modes of these inhibitors to different cancer proteins. Therefore, I plan on predicting aromatic ring count given a list of 20 aggregate chemical features and using different machine learning models for comparison analysis. I compare 4 models: Linear Regression, Log-Linear Regression, Random Forest, and Neural Network Model to see which out of the 4 do the best in prediction of number of aromatic rings in the dataset. The main goal is to identify the best model to predicting the number of rings in kinase inhibitors.

Data: SRC Kinase inhibitor Dataset(xlsx file) contained in data file Contains multiple(~20) chemical features that describe the molecule Prediction variable: Rings

Details: Data contains ~3000 molecules with smile strings and ~20 aggregate chemical features

Output: Initial visualization of data and correlation plots. Summary statistics and accuracy measures of different models: Linear Regression, Log-Linear Regression, RF, and Neural Network model on training and testing(plots, variance level, rmse, degrees of freedom, etc)

All code is contained in the R notebook and if run, should contain comments and information on how to load dataset and what each step is doing and functioning. Make sure the dataset is in the same directory as the R notebook for the code to load dataset properly.

About

Github Repository for CS 510 Project: Prediction of Aromatic Rings for Cancer Kinase Inhibitors Using Random Forest Model

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages