A lightweight CLI to estimate hardware requirements and quantization compatibility for Hugging Face models.
- Hardware Detection: Automatically detects your CPU/GPU and available VRAM/RAM.
- Memory Estimation: Estimates the memory required to run a given Hugging Face model.
- Quantization Analysis: Checks compatibility for different quantization levels (e.g., 4-bit, 8-bit, 16-bit).
- Simple CLI & API: Easy to use from the command line or integrate into your Python projects.
You can install canirun using pip:
pip install canirunThe canirun command allows you to quickly check a model from your terminal.
canirun <model_id> [OPTIONS]Let's check if meta-llama/Meta-Llama-3-8B can run on the local hardware:
canirun meta-llama/Meta-Llama-3-8B --ctx 4096This will produce a report like this:
π ANALYSIS REPORT: meta-llama/Meta-Llama-3-8B
Context Length : 4096
Device : NVIDIA GeForce RTX 3090
VRAM / RAM : 24.0 GB / 64.0 GB
ββββββββββββββββββ€βββββββββββββββ€βββββββββββββ€βββββββββββββββββββββββββ
β Quantization β Total Est. β KV Cache β Compatibility β
ββββββββββββββββββͺβββββββββββββββͺβββββββββββββͺβββββββββββββββββββββββββ‘
β FP16 β 16.96 GB β 512.00 MB β β
GPU β
ββββββββββββββββββΌβββββββββββββββΌβββββββββββββΌβββββββββββββββββββββββββ€
β INT8 β 9.48 GB β 512.00 MB β β
GPU β
ββββββββββββββββββΌβββββββββββββββΌβββββββββββββΌβββββββββββββββββββββββββ€
β 4-bit β 6.30 GB β 512.00 MB β β
GPU β
ββββββββββββββββββΌβββββββββββββββΌβββββββββββββΌβββββββββββββββββββββββββ€
β 2-bit β 4.34 GB β 512.00 MB β β
GPU β
ββββββββββββββββββ§βββββββββββββββ§βββββββββββββ§βββββββββββββββββββββββββ
You can also use canirun programmatically in your Python code.
from canirun import canirun
model_id = "mistralai/Mistral-7B-v0.1"
# Analyze the model
result = canirun(model_id, context_length=2048)
if result and result.is_supported:
print(f"'{model_id}' is supported on your hardware!")
# Get the detailed report
report = result.report()
for quant_result in report:
print(f"- {quant_result['quant']}: {quant_result['status']}")
else:
print(f"'{model_id}' is not supported on your hardware.")canirun works by:
- Fetching the model's configuration from the Hugging Face Hub.
- Calculating the memory required for the model's parameters.
- Estimating the size of the KV cache based on the context length and model architecture.
- Comparing the estimated memory requirements with your system's available VRAM (if a GPU is detected) or RAM.
The tool checks for different levels of quantization to see if a smaller, quantized version of the model could fit.
This project is licensed under the MIT License - see the LICENSE file for details.