is-it-slop

Unsure if the article you just read was AI generated slop?

is-it-slop is a small, fast, and accurate text classifier that detects AI-generated text. Using classic ML - TF-IDF and logistic regression on token n-grams.

No transformers, no GPU, no Python runtime required. Just a single ~60 MB Rust binary with embedded model artifacts.

Inspired by Magika for serving a small, fast model via ONNX Runtime in Rust.

Features

Fast: Rust-based multi-threaded preprocessing and batched ONNX Runtime inference
Small: ~11.4 MB of model artifacts, no GPU or transformers needed
Portable: Single ~60 MB binary with embedded model and no Python runtime required
Accurate: 95.6% accuracy on holdout test set (F1 0.958, MCC 0.912)
Chunk-aware: Handles long documents via overlapping 150-token chunks with weighted aggregation
Cross-platform: macOS (ARM64), Linux (x86_64, ARM64), Windows (x86_64)

Installation

Command Line Tool

Via Python/PyPI (recommended — includes both CLI and library):

pip install is-it-slop
# or with pipx:
pipx install is-it-slop
# or with uv:
uv tool install is-it-slop
# or run directly:
uvx is-it-slop "Your text here"
# or add to project:
uv add is-it-slop

Via cargo-binstall (pre-built binaries, no compilation):

cargo binstall is-it-slop

Via cargo install (build from source):

cargo install is-it-slop --locked --features cli

Model artifacts (~11.4 MB) download automatically during build and are embedded in the binary. No runtime downloads, no Python required.

Python Library

uv add is-it-slop
# or
pip install is-it-slop

Rust Library

cargo add is-it-slop

Quick Start

Command Line

Default output is human-readable with probabilities and confidence metrics:

$ is-it-slop "Your text here"
Classification: Human
Probabilities:
  Human: 91.2%
  AI:    8.8%

Confidence Metrics:
  Model:     91.2%
  Threshold: 87.3%
  Entropy:   64.1%
  Overall:   83.5%

Other output modes:

# Classification label only
$ is-it-slop "Your text" --label
Human

# Label with AI probability score
$ is-it-slop "Your text" --label --score
Human (0.0880)

# Bare float for shell scripting
$ is-it-slop "Your text" --score
0.0880

# Full JSON (includes chunk predictions and confidence metrics)
$ is-it-slop "Your text" --json
{"status":"ok","class":"Human",...}

# Batch from file (auto-detects .json vs line-delimited)
$ is-it-slop -b texts.txt

# Custom classification threshold
$ is-it-slop "Your text" --threshold 0.7

Python

from is_it_slop import is_this_slop

result = is_this_slop("Your text here")
print(result.classification)   # 'Human' or 'AI'
print(f"AI: {result.ai_probability:.1%}")  # AI: 8.8%
print(f"Chunks: {result.num_chunks}, Agreement: {result.chunk_agreement:.1%}")

Rust

use is_it_slop::Predictor;

let predictor = Predictor::new();
let result = predictor.predict("Your text here")?;
println!("AI probability: {:.2}%", result.prediction.ai_probability() * 100.0);

How It Works

Training (Python):

Texts → Clean → Tokenize (BPE) → Chunk → TF-IDF → Stacked Ensemble → ONNX

Inference (Rust):

Text → Clean → Tokenize → Chunk (150 tokens, 15 overlap) → TF-IDF per chunk → ONNX → Aggregate → Result

Why BPE Tokenization?

We use tiktoken's BPE tokenization o200k_base to convert text into sequences of 2-4 consecutive tokens. This captures sub-word patterns that character or word n-grams miss, particularly useful for the predictable token sequences that AI models produce.

The idea here is that LLMs operate on tokens, and token-level n-grams can capture patterns that character or word n-grams might miss, especially for AI-generated text. Humans often have more varied token usage, while AI-generated text may have more predictable token sequences.

Why Chunking?

Variable-length documents (50-5000 tokens) lose information in fixed-size feature vectors. Splitting into overlapping 150-token chunks ensures consistent feature extraction regardless of document length. Chunk predictions are aggregated via weighted mean.

Why Separate Artifacts?

TF-IDF preprocessing in Rust: Avoids complex sklearn-to-ONNX conversion and keeps preprocessing during inference fast without Python dependencies.
sklearn → ONNX model: Portable format, no Python at inference
Two-stage text cleaning: Universal (always) + dataset artifacts (training only to remove dataset-specific noise)

This also avoids complex sklearn-to-ONNX preprocessing conversion while keeping inference fast.

We use try and clean specific artifacts from the training datasets (e.g. "HuggingFace", "arXiv", "Film Reviews") to prevent the model from learning dataset-specific patterns that wouldn't generalize. While I have tried my best to ensure that the model is learning generalizable features of AI-generated text, there may still be some residual dataset-specific artifacts that could be cleaned in future iterations. The two-stage cleaning process allows us to remove universal noise while also targeting specific artifacts from the training data.

Architecture

crates/
├── is-it-slop-preprocessing/  # Text → TF-IDF pipeline (PyO3 bindings for training)
│   ├── cleaner.rs            # Two-stage text cleaning
│   ├── tokenizer.rs          # tiktoken BPE (o200k_base)
│   ├── chunker.rs            # Token-based chunking
│   ├── ngrams.rs             # Token n-gram extraction
│   └── vectorizer/           # TF-IDF vectorizer with rkyv serialization
└── is-it-slop/               # ONNX inference + CLI
    ├── bin/                  # CLI binary entrypoint
    ├── cli/                  # Command-line argument parsing
    ├── model/                # Embedded artifacts (build.rs downloads)
    ├── pipeline/             # Prediction, aggregation, error types
    └── lib.rs                # Predictor, Threshold, public re-exports

python/                       # Two PyO3 packages (inference + preprocessing)
notebooks/                    # Dataset curation + training

Training

Dataset

Trained on 25+ diverse datasets (~687K samples across 118K test, 95K validation):

Human sources: News (newswire, ag_news, imdb), essays (ivy panda, ASAP, PERSUADE), quotes, reviews
AI sources: GPT-3.5/4, Claude, Llama 3.1/3.2, Gemini 2, SmolLM2, Qwen 2.5
Class balance: ~48% human, ~52% AI

Data quality caveat: Model performance depends on dataset label accuracy. We assume training data labels are correct (human text is genuinely human-written, AI text is genuinely AI-generated), but mislabelled examples may exist.

See notebooks/dataset_curation.ipynb for details.

Training Pipeline

See notebooks/train.ipynb for the complete training pipeline.

Model Architecture

The classifier is a stacked ensemble of calibrated linear models trained on token n-gram TF-IDF features:

Base models (4 classifiers):
- SGD Classifier (stochastic gradient descent)
- Logistic Regression
- Calibrated Linear SVC (with probability calibration)
- Multinomial Naive Bayes
Meta-learner: Logistic Regression combines base model predictions via 5-fold stacking
Feature extraction: Token n-grams (2-4 tokens) → TF-IDF vectors
- Uses tiktoken's o200k_base BPE encoding
- Captures subword patterns across ~105k features (2-4 grams, min_df=0.07%, 99.9% sparse)

Why this works: AI-generated text exhibits predictable token sequence patterns. By combining multiple linear models with different learning characteristics, the ensemble captures these patterns robustly across diverse writing styles.

Model Artifacts

Exported artifacts (embedded at build time):

tfidf_vectorizer.rkyv - Vectorizer with vocabulary
slop-classifier.onnx - Stacked ensemble model
classification_threshold.txt - Document-level threshold
chunk_classification_threshold.txt - Per-chunk threshold
token_chunker_config.json - Chunking parameters

Not embedded but also available in model_artifacts/:

model_metadata.json - Metadata (training datasets, performance metrics)

`slop-classifier.onnx`

The diagram shows the full ONNX graph: input → 5 parallel classifiers → probability calibration → meta-learner → final prediction.

Additional visualizations:

See plots/ for embedding visualizations, feature distributions, and model analysis.

Development

# Build
cargo build --release -p is-it-slop --features cli

# Test (295 tests)
just test

# Full CI check (fmt + clippy + tests)
just check

# Training pipeline
just model-pipeline

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.cargo		.cargo
.github/workflows		.github/workflows
crates		crates
notebooks		notebooks
plots		plots
python		python
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml
release-plz.toml		release-plz.toml
ruff.toml		ruff.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml
taplo.toml		taplo.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

is-it-slop

Features

Installation

Command Line Tool

Python Library

Rust Library

Quick Start

Command Line

Python

Rust

How It Works

Why BPE Tokenization?

Why Chunking?

Why Separate Artifacts?

Architecture

Training

Dataset

Training Pipeline

Model Architecture

Model Artifacts

`slop-classifier.onnx`

Additional visualizations:

Development

License

About

Uh oh!

Releases 30

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

is-it-slop

Features

Installation

Command Line Tool

Python Library

Rust Library

Quick Start

Command Line

Python

Rust

How It Works

Why BPE Tokenization?

Why Chunking?

Why Separate Artifacts?

Architecture

Training

Dataset

Training Pipeline

Model Architecture

Model Artifacts

slop-classifier.onnx

Additional visualizations:

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 30

Contributors

Uh oh!

Languages

`slop-classifier.onnx`