Contributing to RabbitLLM

Thank you for your interest in contributing. This document explains how to set up the development environment, run tests, and add support for new models.

Development setup

RabbitLLM uses uv as the package manager.

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/manuelslemos/rabbitllm
cd rabbitllm

# Install all dependencies including dev tools
uv sync --extra dev
# or: make install

This creates a virtual environment at .venv/ with ruff, mypy, pytest, and pytest-cov.

Running tests

# All tests (skips GPU-only compression test)
make test
# or: uv run pytest tests/

# With coverage report
make test-cov
# or: uv run pytest tests/ --cov=rabbitllm --cov-report=term-missing

# Single test module
uv run python -m unittest tests.test_model_registry

# GPU tests (requires CUDA)
uv run pytest tests/test_compression.py

Most tests run without a GPU. The compression test (test_compression.py) requires CUDA and is skipped automatically in CI.

Code style

RabbitLLM uses ruff for linting and formatting.

make lint      # ruff check
make format    # ruff format
make typecheck # mypy

Line length is 100 characters. Imports are sorted (isort via ruff). Type hints are required on all public APIs.

Project structure

src/rabbitllm/
├── engine/          # Core layer-streaming engine
│   ├── base.py          # RabbitLLMBaseModel — the main class
│   ├── attention.py     # Attention implementation resolution
│   ├── model_init.py    # Model skeleton creation
│   ├── layer_loading.py # Layer I/O and async transfer
│   ├── forward_utils.py # Attention mask, position ids, KV extraction
│   └── mlx_engine.py    # macOS/MLX implementation
├── models/          # Architecture-specific subclasses
│   ├── registry.py      # AutoModel factory
│   ├── llama.py         # Llama2/3, Gemma, Phi, DeepSeek
│   ├── qwen.py          # Qwen v1
│   ├── qwen2.py         # Qwen2/2.5/3
│   ├── mistral.py       # Mistral
│   ├── mixtral.py       # Mixtral
│   ├── chatglm.py       # ChatGLM
│   ├── baichuan.py      # Baichuan
│   └── internlm.py      # InternLM
├── persist/         # Layer serialization (safetensors / mlx)
├── utils/           # Splitting, compression, memory, platform
├── compat/          # Tokenizer compatibility shims
└── profiler.py      # Per-layer timing

Adding a new model

Check if the existing Llama-based class works. Many Llama-like architectures (Gemma, Phi, DeepSeek) share the same layer naming convention and work without a dedicated subclass. Try AutoModel.from_pretrained("your/model") first.

Create a subclass if the architecture uses different layer names:

# src/rabbitllm/models/mymodel.py
from ..engine.base import RabbitLLMBaseModel

class RabbitLLMMyModel(RabbitLLMBaseModel):
    def set_layer_names_dict(self) -> None:
        self.layer_names_dict = {
            "embed": "transformer.wte",      # embedding layer
            "layer_prefix": "transformer.h", # decoder layers
            "norm": "transformer.ln_f",      # final norm
            "lm_head": "lm_head",            # language model head
        }

Register the architecture in src/rabbitllm/models/registry.py:
```
if "MyModelForCausalLM" in arch:
    return "rabbitllm.models.mymodel", "RabbitLLMMyModel"
```
Check the model's config.json on HuggingFace to find the exact architectures value.

Export the class in src/rabbitllm/__init__.py:

from .models.mymodel import RabbitLLMMyModel

Add a test in tests/test_model_registry.py:
```
"org/my-model": "RabbitLLMMyModel",
```

Verify with a small model (e.g. 1B–7B parameter variant):

from rabbitllm import AutoModel
model = AutoModel.from_pretrained("org/my-model-7b")
out = model.generate(model.tokenizer(["Hello"], return_tensors="pt")["input_ids"].cuda(), max_new_tokens=10)
print(model.tokenizer.decode(out.sequences[0]))

Submitting a pull request

Open an issue first for large changes to discuss the approach.
Include tests for any new functionality.
Run make lint && make test before opening the PR.
Keep PRs focused; one feature or fix per PR.

Reporting issues

Please include:

Python version, PyTorch version, transformers version
GPU model and VRAM (if applicable)
The model repo ID being used
Full traceback

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Contributing to RabbitLLM

Development setup

Running tests

Code style

Project structure

Adding a new model

Submitting a pull request

Reporting issues

Uh oh!

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to RabbitLLM

Development setup

Running tests

Code style

Project structure

Adding a new model

Submitting a pull request

Reporting issues