Text Classification

Dataset

There are many datasets available freely on the web to perform text classification. The dataset that was used is the Cornell sentiment analysis dataset.

Link: [http://www.cs.cornell.edu/people/pabo/movie-review-data/ ]

In this page you will be able to find a lot of datasets and the one that was used is "polarity_dataset_v2.0" which was introduced in Pang/Lee ACL 2004.

This dataset contains 1000 positive and 1000 negative reviews.

Implementation

The corpus was created by using regular expression formatting and was converted into Bag-Of-Words(BOW) model. This was converted into a TF-IDF model. The corpus was split into train and test data and several models were trained and were tested.

Results

Accuracy -

Logistic Regression - 0.87

SVM - 0.84

Random Forest - 0.84

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
dataset		dataset
models		models
.gitignore		.gitignore
README.md		README.md
Text_Classification.ipynb		Text_Classification.ipynb
create_dataset.py		create_dataset.py
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Classification

Dataset

Implementation

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Classification

Dataset

Implementation

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages