NLP-Sentiment-Analysis-Methods

Overview

A repository comparing four different methods for NLP sentiment analysis on the same dataset: Logistic Regression, MLP, BERT, and DistilBERT. Providing a technical comparison of four different methodologies for solving a common NLP task: sentiment analysis (classifying text as positive or negative). Each implementation is contained within its own Jupyter Notebook and includes data preprocessing, feature extraction, model definition, hyperparameter tuning with Optuna, and submission generation.

Repository Contents

Logistic_Regression_NLP_Sentiment_Analysis.ipynb: A classical machine learning approach using TF-IDF features with a Logistic Regression classifier.
MLP_NLP_Sentiment_Analysis.ipynb: A deep learning approach using a Multi-Layer Perceptron (MLP) built with PyTorch, leveraging pre-trained GloVe embeddings.
BeRT_NLP_Sentiment_Analysis.ipynb: A Transformer-based approach that fine-tunes the bert-base model for sequence classification.
DistilBeRT_NLP_Sentiment_Analysis.ipynb: A lighter, faster Transformer-based approach that fine-tunes the distilbert-base model.

Common Setup & Dependencies

All notebooks share a common set of dependencies and a general workflow. Before running, ensure you have the required Python libraries installed.

Core Libraries:

pandas
numpy
scikit-learn
nltk
torch (Required for MLP, BERT, DistilBERT)
transformers (Required for BERT, DistilBERT)
gensim (Required for MLP notebook to load GloVe)
optuna (Used in all notebooks for hyperparameter tuning)
contractions (Used in BERT, DistilBERT for text preprocessing)

Installation:

You can install the primary dependencies using pip:

pip install pandas numpy scikit-learn nltk torch transformers gensim optuna contractions

NLTK Data:

The notebooks also require NLTK data packages (wordnet, punkt, stopwords). They include commands to download them automatically (nltk.download(...)).

Data:

These notebooks are configured to run in an environment (like Kaggle) where the dataset is located at a path like /kaggle/input/ai-2-dl-for-nlp.../.

To run these locally, you will need to download the dataset and adjust the file paths in the "Initialize Datasets" section of each notebook to point to your local train_dataset.csv, val_dataset.csv, and test_dataset.csv files.

How to Run

Each notebook is self-contained. For best results, especially with the MLP and Transformer models, run on a machine with a GPU (e.g., Google Colab, Kaggle).

1. Logistic Regression

File: Logistic_Regression_NLP_Sentiment_Analysis.ipynb
Key Features: TfidfVectorizer for feature extraction.
Run: Execute the cells sequentially. The optuna study will run to find the best hyperparameters for the LogisticRegression model.

2. MLP with GloVe

File: MLP_NLP_Sentiment_Analysis.ipynb
Key Features: Uses gensim.downloader to fetch glove-twitter-200 embeddings. Tweets are vectorized by averaging the embeddings of their tokens. The model is a custom PyTorch neural network.
Run:
1. Execute the cells. The notebook will first download the GloVe embeddings (this may take several minutes).
2. The optuna study will run to find the best hyperparameters for fine-tuning and the best model is re-trained.

3. BERT (Transformer)

File: BeRT_NLP_Sentiment_Analysis.ipynb
Key Features: Fine-tunes the bert-base-uncased model. Uses the BertTokenizer and BertForSequenceClassification models from the transformers library.
Run:
1. This notebook requires a GPU. Ensure your environment is configured for CUDA (device = "cuda").
2. Execute cells sequentially. The transformers library will download the pre-trained model weights.
3. The optuna study will run to find the best hyperparameters for fine-tuning and the best model is re-trained.

4. DistilBERT (Transformer)

File: DistilBeRT_NLP_Sentiment_Analysis.ipynb
Key Features: A lighter alternative to BERT. Fine-tunes the distilbert-base-uncased model using DistilBertTokenizer and DistilBertForSequenceClassification.
Run:
1. This notebook also requires a GPU.
2. Execute cells sequentially. The pre-trained model weights will be downloaded.
3. The optuna study will run to find the best hyperparameters for fine-tuning and the best model is re-trained.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
BeRT_NLP_Sentiment_Analysis.ipynb		BeRT_NLP_Sentiment_Analysis.ipynb
DistilBeRT_NLP_Sentiment_Analysis.ipynb		DistilBeRT_NLP_Sentiment_Analysis.ipynb
Logistic_Regression_NLP_Sentiment_Analysis.ipynb		Logistic_Regression_NLP_Sentiment_Analysis.ipynb
MLP_NLP_Sentiment_Analysis.ipynb		MLP_NLP_Sentiment_Analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-Sentiment-Analysis-Methods

Overview

Repository Contents

Common Setup & Dependencies

Core Libraries:

Installation:

NLTK Data:

Data:

How to Run

1. Logistic Regression

2. MLP with GloVe

3. BERT (Transformer)

4. DistilBERT (Transformer)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP-Sentiment-Analysis-Methods

Overview

Repository Contents

Common Setup & Dependencies

Core Libraries:

Installation:

NLTK Data:

Data:

How to Run

1. Logistic Regression

2. MLP with GloVe

3. BERT (Transformer)

4. DistilBERT (Transformer)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages