ZipLLM is an efficient LLM storage system that significantly reduces storage cost through tensor-level deduplication and BitX compression.
Tested Environment:
- Platform: AWS EC2 c5.12xlarge instance
- Memory: 96GB RAM
- vCPUs: 48 cores
- Storage: High-performance SSD storage
- Models with insufficient memory may cause the program to crash during processing
- Large language models (7B+ parameters) require significant RAM for tensor operations
- Consider using smaller models or adding swap space if you have limited memory
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env
# Verify installation
rustc --version
cargo --version# Install required Python packages
pip3 install -r requirements.txt# Set your HF token for model downloads
export HF_TOKEN=your_token_here# Configure paths (edit if needed)
vim config.json
# Download test models and generate base-finetune mapping
./setup_test_models.shcargo build --release# Run with config path (required)
RUST_LOG=info ./target/release/zipllm ./config.json# Restore using real model ID format (config path is required)
RUST_LOG=info ./target/release/restore ./config.json meta-llama/Llama-3.1-8B-Instruct /tmp/output
# Verify restoration
ls -la /tmp/output/Test restore performance across different thread counts:
cd analysis
# Run experiments with different thread counts (config path is required)
python3 restore_throughput_analysis.py --threads 1 2 4 8 16 32 48 --config "../config.json"
# Only analyze existing results
python3 restore_throughput_analysis.py --analyze-only --config "../config.json"
# Run experiments and analyze results automatically
python3 restore_throughput_analysis.py --threads 1 4 8 16 --analyze --config "../config.json"
# Optionally drop system caches before each run (requires sudo)
python3 restore_throughput_analysis.py --threads 1 2 4 --drop-cache --config "../config.json"Features:
- Tests restore performance with different thread configurations
- Generates CSV reports and visualization plots
- Shows throughput vs thread count relationships
- Analyzes threading efficiency compared to single-thread baseline
- Optionally clears system cache before each experiment when using
--drop-cache
Benchmark BitX compression performance with different thread counts:
cd analysis
# Run compression benchmark with thread count variations
python3 compression_throughput_analysis.py /path/to/base/model /path/to/finetune/model
# Specify thread counts to test
python3 compression_throughput_analysis.py /path/to/base/model /path/to/finetune/model --threads 1,4,8,16,32
# Skip decompression test
python3 compression_throughput_analysis.py /path/to/base/model /path/to/finetune/model --no-decompress
# Custom output prefix for result files
python3 compression_throughput_analysis.py /path/to/base/model /path/to/finetune/model --output custom_prefix
# Run each test multiple times
python3 compression_throughput_analysis.py /path/to/base/model /path/to/finetune/model --repeat 3Features:
- Measures BitX compression and decompression throughput
- Tests multiple thread configurations
- Generates CSV reports with detailed metrics
- Creates throughput vs thread count plots
- Validates bit-exact decompression
Run content-defined chunking and neural network compression experiments:
cd experiment
# Configure paths in config.json
vim config.json
# Run both FastCDC chunking and ZipNN compression
python3 run_exp.py
# View results and get figures
jupyter notebook plots.ipynbFeatures:
- Processes safetensors files with FastCDC content-defined chunking
- Compresses models with ZipNN compression
- Saves results for later analysis
- Visualizes data reduction ratios with plots.ipynb
zipllm_rust/
├── src/
│ ├── main.rs # Main compression pipeline
│ ├── restore.rs # Model restoration binary
│ ├── config.rs # Configuration loader
│ ├── storage.rs # Storage backend
│ ├── pipeline.rs # Processing pipeline
│ ├── deduplication.rs # Tensor deduplication
│ ├── compression.rs # Compression strategies
│ └── bitx/bitx_bytes.rs # BitX differential compression
├── examples/
│ ├── bitx.rs # Standalone BitX tool
│ └── restore_example.rs # API usage example
├── experiment/
│ ├── run_exp.py # Run FastCDC and ZipNN experiments
│ ├── zipnn_exp.py # ZipNN compression tool
│ ├── fastcdc_exp/ # FastCDC chunking implementation
│ └── plots.ipynb # Visualization notebook for results
├── analysis/
│ ├── restore_throughput_analysis.py # Restore performance testing
│ └── compression_throughput_analysis.py # Compression benchmarking
├── py_lib/
│ ├── download.py # Model downloader
│ └── generate_base_ft.py # Base-finetune mapper
├── config.json # Configuration file
├── test_models.txt # Test model list
├── setup_test_models.sh # Automated setup
├── models/ # Downloaded models
└── storage/ # Compressed data
Edit config.json to customize paths and performance settings:
{
"model_dir": "./models",
"storage_dir": "./HF_storage",
"models_to_process": "./test_models.txt",
"base_ft_path": "./base_ft.json",
"threads": 48
} Configuration Options:
model_dir: Directory containing downloaded modelsstorage_dir: Directory for compressed tensor storagemodels_to_process: Text file listing models to processbase_ft_path: JSON file mapping base models to finetunesthreads: Number of parallel threads (auto-detects system max if not specified)
Important Path Resolution:
- All paths in the config file are resolved relative to the config file's location, not the current working directory
- This means you can run commands from any directory and the paths will still work correctly
- For example, when running from the
analysisdirectory with--config "../config.json", paths like./modelswill be resolved relative to the parent directory
Edit experiment/config.json for FastCDC and ZipNN experiments:
{
"output_root": "./results",
"model_root": "../models",
"fastcdc_avg_size": 65536
}Experiment Configuration Options:
output_root: Directory for experiment output filesmodel_root: Directory containing safetensors files to processfastcdc_avg_size: Average chunk size for FastCDC (in bytes)
# Build and use BitX for file compression
cargo build --release --example bitx
./target/release/examples/bitx file1.bin file2.bin --compress \
--compressed-exp exp.zst --compressed-sm mantissa.zst- Config Path Required: All commands now require specifying a config path as the first argument
- Support Dtype:
⚠️ Current version only supports BF16 - Logging: Use
RUST_LOG=infoto see runtime progress and performance metrics - Test Models: First line in
test_models.txtmust be the base model - Model IDs: Use real Hugging Face format (
org/model-name) - automatic conversion to storage format - HF_token: To download models, don't forget to set your HF_token in the environments
- Path Resolution: All paths in config.json are resolved relative to the config file's location