🔬 Reaction Class Predictor is a web application built using Streamlit that predicts the class of a chemical reaction based on its SMILES representation. Utilizing machine learning techniques and molecular fingerprints, this application provides a user-friendly interface for chemists and researchers to quickly analyze chemical reactions.
You can access the live application here: Reaction Class Predictor on Heroku
- Features
- Technologies Used
- Installation
- Usage
- How It Works
- Exploring the Training Code
- Important Notes on Deployment
- Contributing
- License
- User Input for SMILES: Enter chemical reaction SMILES strings to predict their classes.
- Reaction Visualization: Automatically generates visual representations of the input reactions.
- Model Predictions: Provides predicted reaction classes along with confidence levels.
- LIME Explainer: Displays bit-level importance of molecular features influencing the predictions.
- Alternative Suggestions: If the confidence level is low, alternative reaction classes are suggested.
- Streamlit - Framework for building web apps
- RDKit - Collection of cheminformatics and machine learning tools
- scikit-learn - Machine learning library for Python
- LIME - Local Interpretable Model-agnostic Explanations
- joblib - Lightweight pipelining in Python
- Pandas - Data manipulation and analysis
- NumPy - Numerical computing in Python
To run this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/yourusername/reaction-class-predictor.git cd reaction-class-predictorIf you do not have Git installed, you can download the project as a ZIP file from GitHub here and extract it to access the folder.
-
Open your terminal and navigate to the project directory (if not already there).
-
Create a new virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
- On Windows:
-
Install the required packages:
pip install -r requirements.txt
-
Ensure you have the following files in your project directory:
label_encoder.pklfinal_model_svc.pkl.ziplime_explainer.dill
-
Start the Streamlit app:
streamlit run app.py
- Open your browser and navigate to
http://localhost:8501to view the app. - Enter a reaction in SMILES format in the input box (e.g.,
C1=CC=CC=C1>>C1=CC=C(C=C1)C(=O)O). - Click Enter to see the predicted reaction class and visualizations.
- Review the bit-level importance and confidence levels in the prediction.
The application performs the following steps:
- Input Handling: Accepts user input in the form of SMILES strings.
- ECFP Conversion: Converts the SMILES representation to Extended Connectivity Fingerprints (ECFP) for model input.
- Model Prediction: Uses a pre-trained machine learning model to predict the reaction class and confidence level.
- Reaction Visualization: Generates a visual representation of the chemical reaction using RDKit.
- LIME Explanation: Utilizes LIME to explain the model's predictions by identifying important molecular features.
To understand how the models were created, check out the training code in this repository. It includes explanations and methodologies used for model training and evaluation.
- The application has been successfully deployed on Heroku. Users can access the live app online for predictions.
- Due to memory limitations during deployment, the number of samples used for the LIME explainer was reduced. This means that while the application runs smoothly in a production environment, users may experiment with increasing the sample size when running the application locally to enhance explanation quality.