AI Text Detector

A fine-tuned DistilBERT model that classifies text as human-written or AI-generated. Comes with a Gradio web interface for easy testing.

I built this as part of my work on AI literacy at VU Amsterdam. Detecting AI-generated text is a real and growing problem in education, and I wanted to see how well a simple transformer classifier could do.

How it works

The model is a DistilBERT (66M params) fine-tuned on the GPT-wiki-intro dataset, which contains ~150k pairs of human-written and GPT-generated Wikipedia introductions. After 3 epochs of training it gets around 98% accuracy on the validation set.

Obviously this won't catch everything. It's trained on Wikipedia-style text and older GPT output, so newer models like GPT-4 or Claude will be harder to detect. But it's a solid demo of how transfer learning works for this kind of task.

Setup

git clone https://github.com/jasp-nerd/ai-text-detector.git
cd ai-text-detector

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

pip install -r requirements.txt

Training

The dataset downloads automatically from Hugging Face when you first run training.

python train.py

Takes about 30-60 min on a GPU, or 2-3 hours on CPU. The trained model gets saved to ./model/.

Usage

Web interface:

python app.py
# opens at http://localhost:7860

Command line:

python predict.py

In your own code:

from predict import TextDetectorPredictor

p = TextDetectorPredictor('./model')
result = p.predict("Some text to check...")
print(result['prediction'], result['confidence'])

Limitations

Trained on Wikipedia text, so it may underperform on other domains (tweets, essays, code, etc.)
Older GPT-2 style output, newer models are harder to detect
Short texts (<100 chars) are unreliable
Binary classification only, doesn't tell you which AI wrote it

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Text Detector

How it works

Setup

Training

Usage

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Text Detector

How it works

Setup

Training

Usage

Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages