Skip to content

πŸš€ A lightweight CLI to estimate hardware requirements and quantization compatibility for Hugging Face models.

License

Notifications You must be signed in to change notification settings

PythonicVarun/canirun

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

canirun

A lightweight CLI to estimate hardware requirements and quantization compatibility for Hugging Face models.

PyPI version License: MIT

Key Features

  • Hardware Detection: Automatically detects your CPU/GPU and available VRAM/RAM.
  • Memory Estimation: Estimates the memory required to run a given Hugging Face model.
  • Quantization Analysis: Checks compatibility for different quantization levels (e.g., 4-bit, 8-bit, 16-bit).
  • Simple CLI & API: Easy to use from the command line or integrate into your Python projects.

Installation

You can install canirun using pip:

pip install canirun

CLI Usage

The canirun command allows you to quickly check a model from your terminal.

canirun <model_id> [OPTIONS]

Example

Let's check if meta-llama/Meta-Llama-3-8B can run on the local hardware:

canirun meta-llama/Meta-Llama-3-8B --ctx 4096

This will produce a report like this:

 πŸ” ANALYSIS REPORT: meta-llama/Meta-Llama-3-8B 
 Context Length  : 4096
 Device          : NVIDIA GeForce RTX 3090
 VRAM / RAM      : 24.0 GB / 64.0 GB

╒════════════════╀══════════════╀════════════╀════════════════════════╕
β”‚ Quantization   β”‚   Total Est. β”‚   KV Cache β”‚ Compatibility          β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════════β•ͺ════════════β•ͺ════════════════════════║
β”‚ FP16           β”‚     16.96 GB β”‚  512.00 MB β”‚ βœ… GPU                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ INT8           β”‚      9.48 GB β”‚  512.00 MB β”‚ βœ… GPU                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 4-bit          β”‚      6.30 GB β”‚  512.00 MB β”‚ βœ… GPU                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2-bit          β”‚      4.34 GB β”‚  512.00 MB β”‚ βœ… GPU                 β”‚
β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•›

API Usage

You can also use canirun programmatically in your Python code.

from canirun import canirun

model_id = "mistralai/Mistral-7B-v0.1"

# Analyze the model
result = canirun(model_id, context_length=2048)

if result and result.is_supported:
    print(f"'{model_id}' is supported on your hardware!")

    # Get the detailed report
    report = result.report()
    for quant_result in report:
        print(f"- {quant_result['quant']}: {quant_result['status']}")
else:
    print(f"'{model_id}' is not supported on your hardware.")

How It Works

canirun works by:

  1. Fetching the model's configuration from the Hugging Face Hub.
  2. Calculating the memory required for the model's parameters.
  3. Estimating the size of the KV cache based on the context length and model architecture.
  4. Comparing the estimated memory requirements with your system's available VRAM (if a GPU is detected) or RAM.

The tool checks for different levels of quantization to see if a smaller, quantized version of the model could fit.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

πŸš€ A lightweight CLI to estimate hardware requirements and quantization compatibility for Hugging Face models.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages