GitHub - engrsabakhan/Automated-Essay-Scoring-2.0: A complete implementation of the Learning Agency Lab Automated Essay Scoring (AES) 2.0 challenge using modern NLP and deep learning techniques. Includes preprocessing, tokenization, transformer-based modeling, score prediction, evaluation using QWK, and an end-to-end training and inference pipeline.

📝 Automated Essay Scoring (AES) – Kaggle Project

Automated Essay Scoring (AES) is an intelligent system designed to evaluate written essays using machine learning and deep learning techniques. This project focuses on developing a reliable scoring model using the Kaggle AES dataset, applying modern NLP preprocessing, feature engineering, and state-of-the-art neural architectures. The goal is to create a scoring pipeline that performs consistently, reduces human bias, and delivers accurate, fast, and scalable evaluations of student writing.

📚 Table of Contents

📖 Overview
📁 Repository Structure

🎯 Objectives

🚀 Features

🛠️ Installation

📊 Usage

🧹 Data Preprocessing

🧠 Models Used

📈 Evaluation

📂 Dataset

📚 References

📝 License

📖 Overview

Automated Essay Scoring (AES) is transforming the way learning assessments are conducted by improving the speed, consistency, and fairness of scoring. Manual grading is slow and labor-intensive, especially in areas with limited educational resources. AES applies machine learning and NLP techniques to automate the scoring process.

This project provides a comparative study of multiple AES models trained on one of the largest publicly accessible essay datasets aligned with modern educational standards.

Model Performance Summary Model Cohen’s Kappa Linear Regression 0.6540 XGBoost 0.7100 LightGBM (LGBM) 0.7210 LSTM 0.7710 BERT 0.7806

Deep learning models — especially LSTM and BERT — significantly outperform traditional machine learning methods in prediction accuracy and reproducibility.

The goal of this work is to propose a publicly available AES system that improves teacher workflow efficiency and provides students with fast, objective feedback.

📁 Repository Structure

├── Dataset
│   ├── train.csv
│   ├── test.csv
│
├── Notebook
│   ├── AES_model.ipynb
│   ├── AES_inference.py
│
├── References
│   ├── research_paper.pdf
│
├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt

🎯 Objectives

1.Automate essay scoring using ML and NLP

2.Compare traditional and deep learning approaches

3.Extract linguistic & semantic features

4.Build train-ready and inference-ready scripts

5.Improve feedback speed for students

🚀 Features

✔ Full AES pipeline (preprocessing → training → prediction)
✔ Multiple model comparisons
✔ Deep learning support (LSTM, BERT)
✔ Clean notebook + Python script
✔ Requirements.txt for easy setup
✔ MIT licensed

🛠️ Installation

1️⃣ Clone the repository

git clone https://github.com/YourUsername/Automated-Essay-Scoring.git cd Automated-Essay-Scoring

2️⃣ Install dependencies

pip install -r requirements.txt

📊 Usage

Run Jupyter Notebook jupyter notebook

Open file: Notebook/AES_model.ipynb

Run Python Inference Script python Notebook/AES_inference.py

🧹 Data Preprocessing

The pipeline includes:

1.Lowercasing

2.Punctuation removal

3.Stopword removal

4.Lemmatization

5.Tokenization

6.Word & sentence statistics

7.Readability metrics

8.TF-IDF feature extraction

🧠 Models Used

Model Type Description Linear Regression Baseline scoring model RandomForest / XGBoost / LGBM Strong for tabular + text stats LSTM Sequential deep learning model BERT Transformer-based, best semantic understanding

📈 Evaluation

Primary Metric:

⭐ Quadratic Weighted Kappa (QWK)

Used in Kaggle AES competitions to measure agreement between model predictions and human graders.

📂 Dataset

Dataset contains:

Essay text

Essay set ID

Human-graded score

You must download the dataset from Kaggle: 👉 Kaggle → Automated Essay Scoring Dataset(https://www.kaggle.com/competitions/learning-agency-lab-automated-essay-scoring-2/data)

Size: 24,000 argumentative essays

Score Range: 1 to 6

📚 References

Kaggle AES Competition

NLP Educational Research

Included Research PDF in /References/

Scores on Kaggle Competition

📝 License

This project is licensed under the MIT License. See the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scores on Kaggle Competition

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Dataset		Dataset
Notebook		Notebook
References		References
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Requirement.txt		Requirement.txt

Folders and files

Latest commit

History

Repository files navigation

Scores on Kaggle Competition

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages