GitHub - RamanaNani/LLMs-in-Health-Science

LLMs in Health Science

This project invloves in creating LLMs to analyze breast cancer clinical trial reports (CTRs), which is helpful for the healthcare professionals in decision-making. We use MedBERT, MedRoBERTa, and Longformer—models pre-trained on medical data—to evaluate the truthfulness of statements within CTRs. Finally used ensemble learning with logistic regression to combine the strength of each model. This model helps the reliability of AI in medical domain, supporting healthcare professionals to analyze the data.

The main task is to predict whether the statement is contradicts or entails with the data in the CTR.

The repository contains the code and models with the links to submit your reults

Prerequisites

Upload the training_data.zip in drive
Run the files using GPU(A100) to fine tune the models.

More about Dataset and submission of results

To get more information about the dataset SemEval 2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials

Submit the results from your predictions in results.json.zip file by uploading all the predictions in NLP_results.ipynb

Setup for the model

Clone the repository

https://github.com/RamanaNani/LLMs-in-Health-Science.git

Access data Download the training_data and give the correct path to load the data.

Training and Finetunning the model

Import all the required libraries
Run NLP_Basemodels.ipynb to get the baseline predictions and see how the preprocessing is working
Fine tune the models by getting the pretrained weights from Hugging Face which are pretrained on medical corpora.
Run NLP_MedBert.ipynb to fine tune MedBERT and get the predictions.
Run NLP_MedRoBERTa.ipynb to fine tune MedRoBERTa and get the predictions.
Run NLP_Longformer.ipynb to fine tune Longformer and get the predictions.
Save the weights of all the three models which are fine tuned on pretrained models.
Load the weights in NLP_Ensemble_Learning.ipynb and run the NLP_Ensemble_Learning.ipynb to get the predictions for Ensembling model with logistic regression.
Use NLP_results.ipynb to convert prediction to submit the results in json file as mentioned in the SemEval 2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials
Submit the results and see the score of faithfulness and consistency.

Additional Information

Models which are pretrained on medical corpora are getting better results as compared to base models. Try with Data Augmentation which may result in better predictions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMs in Health Science

Prerequisites

More about Dataset and submission of results

Setup for the model

Training and Finetunning the model

Additional Information

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Architecture.png		Architecture.png
LICENSE		LICENSE
NLP_Basemodels.ipynb		NLP_Basemodels.ipynb
NLP_Ensemble_Learning.ipynb		NLP_Ensemble_Learning.ipynb
NLP_Longformer.ipynb		NLP_Longformer.ipynb
NLP_MedBert.ipynb		NLP_MedBert.ipynb
NLP_MedRoBERTa.ipynb		NLP_MedRoBERTa.ipynb
NLP_results.ipynb		NLP_results.ipynb
README.md		README.md
Report.pdf		Report.pdf
results.json		results.json
training_data.zip		training_data.zip

Folders and files

Latest commit

History

Repository files navigation

LLMs in Health Science

Prerequisites

More about Dataset and submission of results

Setup for the model

Training and Finetunning the model

Additional Information

Architecture

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages