GeRaCl is an open‑source framework for building, training, and evaluating efficient zero‑shot text classifiers on top of any BERT‑like sentence-encoder. It is inspired by the GLiNER framework.
| Feature | What it means for you |
|---|---|
| Zero‑shot by design | Classify with arbitrary label sets that you decide at run‑time — just pass a list of strings. |
| One forward pass | As fast as ordinary text classification; no pairwise loops like in NLI‑based approaches. |
| Model‑agnostic | Works with any Hugging Face sentence-encoder. |
| 155 M reference checkpoint | A lean baseline (155M parameters) that beats much larger sentence‑encoders (300-500M parameters). |
| All‑in‑one toolkit | Training/eval scripts, HF Hub and WandB integration. |
Clone and install directly from GitHub:
git clone https://github.com/deepvk/geracl
cd geracl
pip install -r requirements.txtVerify your installation:
import geracl
print(geracl.__version__)from transformers import AutoTokenizer
from geracl import GeraclHF, ZeroShotClassificationPipeline
model = GeraclHF.from_pretrained('deepvk/GeRaCl-USER2-base').to('cuda').eval()
tokenizer = AutoTokenizer.from_pretrained('deepvk/GeRaCl-USER2-base')
pipe = ZeroShotClassificationPipeline(model, tokenizer, device="cuda")
text = "Утилизация катализаторов: как неплохо заработать"
labels = ["экономика", "происшествия", "политика", "культура", "наука", "спорт"]
result = pipe(text, labels, batch_size=1)[0]
print(labels[result])from transformers import AutoTokenizer
from geracl import GeraclHF, ZeroShotClassificationPipeline
model = GeraclHF.from_pretrained('deepvk/GeRaCl-USER2-base').to('cuda').eval()
tokenizer = AutoTokenizer.from_pretrained('deepvk/GeRaCl-USER2-base')
pipe = ZeroShotClassificationPipeline(model, tokenizer, device="cuda")
texts = [
"Утилизация катализаторов: как неплохо заработать",
"Мне не понравился этот фильм."
]
labels = [
["экономика", "происшествия", "политика", "культура", "наука", "спорт"],
["нейтральный", "позитивный", "негативный"]
]
results = pipe(texts, labels, batch_size=2)
for i in range(len(labels)):
print(labels[i][results[i]])