This project invloves in creating LLMs to analyze breast cancer clinical trial reports (CTRs), which is helpful for the healthcare professionals in decision-making. We use MedBERT, MedRoBERTa, and Longformer—models pre-trained on medical data—to evaluate the truthfulness of statements within CTRs. Finally used ensemble learning with logistic regression to combine the strength of each model. This model helps the reliability of AI in medical domain, supporting healthcare professionals to analyze the data.
- The main task is to predict whether the statement is contradicts or entails with the data in the CTR.
The repository contains the code and models with the links to submit your reults
-
Upload the training_data.zip in drive
-
Run the files using GPU(A100) to fine tune the models.
To get more information about the dataset SemEval 2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials
Submit the results from your predictions in results.json.zip file by uploading all the predictions in NLP_results.ipynb
- Clone the repository
https://github.com/RamanaNani/LLMs-in-Health-Science.git- Access data Download the training_data and give the correct path to load the data.
-
Import all the required libraries
-
Run
NLP_Basemodels.ipynbto get the baseline predictions and see how the preprocessing is working -
Fine tune the models by getting the pretrained weights from Hugging Face which are pretrained on medical corpora.
-
Run
NLP_MedBert.ipynbto fine tune MedBERT and get the predictions. -
Run
NLP_MedRoBERTa.ipynbto fine tune MedRoBERTa and get the predictions. -
Run
NLP_Longformer.ipynbto fine tune Longformer and get the predictions. -
Save the weights of all the three models which are fine tuned on pretrained models.
-
Load the weights in
NLP_Ensemble_Learning.ipynband run theNLP_Ensemble_Learning.ipynbto get the predictions for Ensembling model with logistic regression. -
Use
NLP_results.ipynbto convert prediction to submit the results in json file as mentioned in the SemEval 2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials -
Submit the results and see the score of faithfulness and consistency.
Models which are pretrained on medical corpora are getting better results as compared to base models. Try with Data Augmentation which may result in better predictions.
