Council of Classifiers: CEFR Ensemble Text Classifier

A production-ready ensemble system for classifying English text by CEFR proficiency levels (A1, A2, B1, B2, C1/C2). The "Council of Classifiers" combines three distinct machine learning approaches to provide robust, accurate predictions through consensus.

Key Features

Three-Model Ensemble: Combines Naive Bayes, Doc2Vec, and fine-tuned BERT for robust predictions
Dual Prediction Methods: Majority voting and mean probability aggregation
Modern Web Interface: React + Vite frontend with real-time classification
REST API: JSON endpoints for programmatic access
Production Ready: Deployed on Google Cloud Run with models hosted on HuggingFace Hub

Installation

# Clone the repository
git clone https://github.com/luantran/CouncilOfClassifiers.git
cd CouncilOfClassifiers

# Install Python dependencies
pip install -r requirements.txt

# Install frontend dependencies (for development)
cd frontend
npm install
cd ..

Running the Application

Development Mode (Frontend + Backend)

# Terminal 1: Start Flask backend
python run.py

# Terminal 2: Start Vite dev server
cd frontend
npm run dev

Access the application at http://localhost:5173

Production Mode (Unified Flask App)

# Build frontend
cd frontend
npm run build
cd ..

# Set environment to production
export FLASK_ENV=production

# Run Flask app (serves both API and built React frontend)
python run.py

Access at http://localhost:5000

API Usage

Predict Endpoint

POST /api/predict
Content-Type: application/json

{
  "text": "This is a sample text to classify by CEFR level."
}

Response:

{
  "text": "This is a sample text to classify by CEFR level.",
  "predictions": {
    "Naive Bayes": 2,
    "Doc2Vec": 2,
    "BERT": 3
  },
  "probabilities": {
    "Naive Bayes": [0.05, 0.10, 0.65, 0.15, 0.05],
    "Doc2Vec": [0.03, 0.08, 0.72, 0.12, 0.05],
    "BERT": [0.02, 0.07, 0.45, 0.38, 0.08]
  },
  "majority_vote": 2,
  "mean_probabilities": [0.033, 0.083, 0.606, 0.216, 0.06],
  "mean_pred": 2,
  "mean_pred_proba": 0.606,
  "use_majority_vote": true,
  "confidence": 0.67,
  "stats": {
    "num_models": 3,
    "agreement_count": 2,
    "all_agree": false
  }
}

CEFR Level Mapping:

0 = A1 (Beginner)
1 = A2 (Elementary)
2 = B1 (Intermediate)
3 = B2 (Upper Intermediate)
4 = C1/C2 (Advanced/Proficient)

Model Details

Models: HuggingFace Profile

Model Source Code: Github Repo

Note: Training data is proprietary and not included in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
frontend		frontend
server		server
.dockerignore		.dockerignore
.gitignore		.gitignore
.renderignore		.renderignore
Dockerfile		Dockerfile
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
deploy.sh		deploy.sh
package-lock.json		package-lock.json
render.yaml		render.yaml
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Council of Classifiers: CEFR Ensemble Text Classifier

Key Features

Installation

Running the Application

Development Mode (Frontend + Backend)

Production Mode (Unified Flask App)

API Usage

Predict Endpoint

Model Details

About

Uh oh!

Releases

Packages

Languages

luantran/CouncilOfClassifiers

Folders and files

Latest commit

History

Repository files navigation

Council of Classifiers: CEFR Ensemble Text Classifier

Key Features

Installation

Running the Application

Development Mode (Frontend + Backend)

Production Mode (Unified Flask App)

API Usage

Predict Endpoint

Model Details

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages