A production-ready ensemble system for classifying English text by CEFR proficiency levels (A1, A2, B1, B2, C1/C2). The "Council of Classifiers" combines three distinct machine learning approaches to provide robust, accurate predictions through consensus.
- Three-Model Ensemble: Combines Naive Bayes, Doc2Vec, and fine-tuned BERT for robust predictions
- Dual Prediction Methods: Majority voting and mean probability aggregation
- Modern Web Interface: React + Vite frontend with real-time classification
- REST API: JSON endpoints for programmatic access
- Production Ready: Deployed on Google Cloud Run with models hosted on HuggingFace Hub
# Clone the repository
git clone https://github.com/luantran/CouncilOfClassifiers.git
cd CouncilOfClassifiers
# Install Python dependencies
pip install -r requirements.txt
# Install frontend dependencies (for development)
cd frontend
npm install
cd ..# Terminal 1: Start Flask backend
python run.py
# Terminal 2: Start Vite dev server
cd frontend
npm run devAccess the application at http://localhost:5173
# Build frontend
cd frontend
npm run build
cd ..
# Set environment to production
export FLASK_ENV=production
# Run Flask app (serves both API and built React frontend)
python run.pyAccess at http://localhost:5000
POST /api/predict
Content-Type: application/json
{
"text": "This is a sample text to classify by CEFR level."
}Response:
{
"text": "This is a sample text to classify by CEFR level.",
"predictions": {
"Naive Bayes": 2,
"Doc2Vec": 2,
"BERT": 3
},
"probabilities": {
"Naive Bayes": [0.05, 0.10, 0.65, 0.15, 0.05],
"Doc2Vec": [0.03, 0.08, 0.72, 0.12, 0.05],
"BERT": [0.02, 0.07, 0.45, 0.38, 0.08]
},
"majority_vote": 2,
"mean_probabilities": [0.033, 0.083, 0.606, 0.216, 0.06],
"mean_pred": 2,
"mean_pred_proba": 0.606,
"use_majority_vote": true,
"confidence": 0.67,
"stats": {
"num_models": 3,
"agreement_count": 2,
"all_agree": false
}
}CEFR Level Mapping:
0= A1 (Beginner)1= A2 (Elementary)2= B1 (Intermediate)3= B2 (Upper Intermediate)4= C1/C2 (Advanced/Proficient)
Models: HuggingFace Profile
Model Source Code: Github Repo
Note: Training data is proprietary and not included in this repository.