🌍 Linguist AI - Language Detection System

Linguist AI is a high-performance machine learning application designed to identify the language of any given text. Built using Multinomial Naive Bayes and Natural Language Processing (NLP) techniques, it can detect 22 different languages with high precision.

🚀 Key Features

Multilingual Support: Detects 22 languages including English, Hindi, Spanish, French, Chinese, and more.
Micro-Cleaning Engine: Advanced preprocessing that strips noise (numbers/special chars) while preserving linguistic integrity.
Premium Web Interface: A sleek, glassmorphic UI built with Flask for real-time interaction.
Technical Rigor: Follows a full ML lifecycle from EDA to deployment.

🛠️ Technical Architecture

1. The Algorithm: Multinomial Naive Bayes

The core detection engine uses the Multinomial Naive Bayes (MNB) classifier.

How it works: MNB is based on Bayes' Theorem and is particularly suited for text classification with discrete features (like word counts).
Probabilistic Logic: It calculates the probability of a text belonging to a specific language based on the frequency of its words relative to the overall dataset.
Efficiency: Unlike deep learning models, MNB is extremely fast and effective for medium-sized text datasets.

2. NLP Technique: Bag of Words (BoW)

To convert text into numerical data that the algorithm can understand, we use CountVectorizer:

It creates a Vocabulary of all unique words across 22,000 samples.
Every input text is converted into a Sparse Matrix representing word frequencies.

3. Data Preprocessing

Before training, the raw dataset undergoes "Data Cleaning":

Normalization: Converting all text to lowercase.
Noise Removal: Stripping numbers and special characters to focus on alphabetic patterns unique to each language.
Standardization: Removing extra whitespaces for consistent vectorization.

4. Evaluation Metrics

The model is evaluated using:

Accuracy Score: Achieving over 91% accuracy on unseen test data.
Confusion Matrix: Visualizing precisely where the model might confuse similar languages.
Classification Report: Precision, Recall, and F1-score for every individual language.

📦 Project Structure

language detection.ipynb: The research and development notebook (The "ML Back").
app.py: The Flask production server for the web application.
train_model.py: Utility script for model retraining and persistence.
model.pkl & vectorizer.pkl: Serialized trained models for fast inference.
language.csv: The core dataset (22,000 rows).

💻 Installation & Setup

Clone the repository (or navigate to the project folder).

Install Dependencies:

pip install pandas numpy scikit-learn flask joblib matplotlib seaborn

Run the Training (Optional):
```
python train_model.py
```
Launch the Application:
```
python app.py
```
Access the UI: Open http://127.0.0.1:5000 in your browser.

🌐 Supported Languages

The system supports 22 languages, including but not limited to:

English
Hindi
Spanish
French
Chinese
Russian
Arabic
Dutch
Turkish
...and many more!

Built with ❤️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌍 Linguist AI - Language Detection System

🚀 Key Features

🛠️ Technical Architecture

1. The Algorithm: Multinomial Naive Bayes

2. NLP Technique: Bag of Words (BoW)

3. Data Preprocessing

4. Evaluation Metrics

📦 Project Structure

💻 Installation & Setup

🌐 Supported Languages

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
app.py		app.py
language detection.ipynb		language detection.ipynb
language.csv		language.csv
model.pkl		model.pkl
train_model.py		train_model.py
vectorizer.pkl		vectorizer.pkl

Folders and files

Latest commit

History

Repository files navigation

🌍 Linguist AI - Language Detection System

🚀 Key Features

🛠️ Technical Architecture

1. The Algorithm: Multinomial Naive Bayes

2. NLP Technique: Bag of Words (BoW)

3. Data Preprocessing

4. Evaluation Metrics

📦 Project Structure

💻 Installation & Setup

🌐 Supported Languages

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages