GeRaCl: General Rapid text Classifier

GeRaCl is an open‑source framework for building, training, and evaluating efficient zero‑shot text classifiers on top of any BERT‑like sentence-encoder. It is inspired by the GLiNER framework.

✨ Why GeRaCl?

Feature	What it means for you
Zero‑shot by design	Classify with arbitrary label sets that you decide at run‑time — just pass a list of strings.
One forward pass	As fast as ordinary text classification; no pairwise loops like in NLI‑based approaches.
Model‑agnostic	Works with any Hugging Face sentence-encoder.
155 M reference checkpoint	A lean baseline (155M parameters) that beats much larger sentence‑encoders (300-500M parameters).
All‑in‑one toolkit	Training/eval scripts, HF Hub and WandB integration.

🚀 Quick Start

Clone and install directly from GitHub:

git clone https://github.com/deepvk/geracl
cd geracl

pip install -r requirements.txt

Verify your installation:

import geracl
print(geracl.__version__)

🧑‍💻 Usage Examples

Single classification scenario

from transformers import AutoTokenizer
from geracl import GeraclHF, ZeroShotClassificationPipeline

model = GeraclHF.from_pretrained('deepvk/GeRaCl-USER2-base').to('cuda').eval()
tokenizer  = AutoTokenizer.from_pretrained('deepvk/GeRaCl-USER2-base')

pipe = ZeroShotClassificationPipeline(model, tokenizer, device="cuda")

text = "Утилизация катализаторов: как неплохо заработать"
labels = ["экономика", "происшествия", "политика", "культура", "наука", "спорт"]
result = pipe(text, labels, batch_size=1)[0]

print(labels[result])

Multiple classification scenarios

from transformers import AutoTokenizer
from geracl import GeraclHF, ZeroShotClassificationPipeline

model = GeraclHF.from_pretrained('deepvk/GeRaCl-USER2-base').to('cuda').eval()
tokenizer  = AutoTokenizer.from_pretrained('deepvk/GeRaCl-USER2-base')

pipe = ZeroShotClassificationPipeline(model, tokenizer, device="cuda")

texts = [
  "Утилизация катализаторов: как неплохо заработать",
  "Мне не понравился этот фильм."
]
labels = [
  ["экономика", "происшествия", "политика", "культура", "наука", "спорт"],
  ["нейтральный", "позитивный", "негативный"]
]
results = pipe(texts, labels, batch_size=2)

for i in range(len(labels)):
    print(labels[i][results[i]])

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
geracl		geracl
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.dev.txt		requirements.dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GeRaCl: General Rapid text Classifier

✨ Why GeRaCl?

🚀 Quick Start

🧑‍💻 Usage Examples

Single classification scenario

Multiple classification scenarios

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

deepvk/geracl

Folders and files

Latest commit

History

Repository files navigation

GeRaCl: General Rapid text Classifier

✨ Why GeRaCl?

🚀 Quick Start

🧑‍💻 Usage Examples

Single classification scenario

Multiple classification scenarios

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages