Declarative Hugging Face model and dataset management for Nix. nix-hug pins
models to exact revisions, fetches only the files you need, builds
offline-compatible HuggingFace Hub caches, and supports importing from and
exporting to the local HuggingFace cache.
The CLI is used to download models into the nix store:
$ nix run github:longregen/nix-hug -- fetch MiniMaxAI/MiniMax-M2.5
nix-hug-lib.fetchModel {
url = "MiniMaxAI/MiniMax-M2.5";
rev = "abc123...";
fileTreeHash = "sha256-...";
};The output can then be used in nix:
# Smoke test: an app that just loads the model in python
let
minimax = nix-hug-lib.fetchModel {
url = "MiniMaxAI/MiniMax-M2.5";
rev = "abc123...";
fileTreeHash = "sha256-...";
};
cache = nix-hug-lib.buildCache {
models = [ minimax ];
};
python = pkgs.python3.withPackages (p: [ p.transformers p.torch ]);
in
pkgs.writeShellApplication {
name = "say-minimax-inefficiently";
runtimeInputs = [ python ];
text = ''
export HF_HUB_CACHE=${cache}
export TRANSFORMERS_OFFLINE=1
python -c "
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('MiniMaxAI/MiniMax-M2.5')
print(model)
"
'';
}Add nix-hug to your flake inputs:
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs";
nix-hug.url = "github:longregen/nix-hug";
};
}Use the CLI to fetch a model. It resolves the revision, computes hashes, and prints a Nix expression you can paste into your configuration:
$ nix-hug fetch mistralai/Mistral-7B-Instruct-v0.3 --include '*.safetensors'Use the output in your flake to build an offline HuggingFace Hub cache:
let
nix-hug-lib = nix-hug.lib.${system};
mistral = nix-hug-lib.fetchModel {
url = "mistralai/Mistral-7B-Instruct-v0.3";
rev = "abc123..."; # pinned commit hash from CLI output
filters = { include = [ ".*\\.safetensors" ]; };
fileTreeHash = "sha256-...";
};
cache = nix-hug-lib.buildCache {
models = [ mistral ];
};
in
pkgs.mkShell {
HF_HUB_CACHE = cache;
TRANSFORMERS_OFFLINE = "1";
}Running Python within this shell will find the model without network
access (the transformers lib reads the env variable HF_HUB_CACHE):
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")nix-hug has two parts: a bash-based CLI, and a nix library. The CLI's fetch
subcommand resolves the git ref to a commit hash via the Hugging Face API. It then
fetches the repository's file tree metadata and computes a SHA256 hash of how the
directory structure for consumption by HuggingFace libraries will look like. The
output of the CLI is a Nix expression that pins that "fileTreeHash" and stores
the git ref.
When consuming it, the nix-based lib evaluates that expression, and executes
the same steps that the bash-based CLI does: fetchGit clones the Hugging Face
repository at the pinned revision. This retrieves all small files (configs,
tokenizer data, etc.) but only LFS pointer files for large weights. For each
LFS file then fetchurl downloads it from HuggingFace's CDN using the LFS SHA256
OID as the content hash. Filters can be provided to selectively download some of
these large filters, in case the repository contains a lot of model files that
you don't need (for example, one might want only one particular large
".safetensors" file from a repository that has also ONNX files, or many
quantizations together in the same repo). A derivation
assembles the result: the git checkout with real model files replacing the LFS
pointers.
buildCache takes fetched models and datasets and arranges them into the
directory layout that HuggingFace Hub's Python libraries expect:
models--org--repo/
refs/
main # contains the pinned commit hash
snapshots/
<rev>/ # the actual model files
Set HF_HUB_CACHE to this store path and any library that reads from the Hub
cache (transformers, diffusers, sentence-transformers) will find the
model without making network requests. Please note that datasets is known to
cause problems sometimes (contributions welcome).
Everything is content-addressed. The same inputs produce the same store paths. Models can be shared across machines, cached in CI, and pinned in lockfiles the same way as any other Nix dependency.
nix-collect-garbage removes store paths not referenced by a GC root. For
large models, re-downloading after collection is expensive. The export
command copies a model from the Nix store into the local HuggingFace cache
directory, and import copies it back. This uses the same directory layout
that transformers, diffusers, and other HF libraries read from. The cache
location is determined by $HF_HUB_CACHE, $HF_HOME/hub, or defaults to
$XDG_CACHE_HOME/huggingface/hub/.
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs";
nix-hug.url = "github:longregen/nix-hug";
};
outputs = { nixpkgs, nix-hug, ... }:
let
system = "x86_64-linux";
pkgs = nixpkgs.legacyPackages.${system};
nix-hug-lib = nix-hug.lib.${system};
my-model = nix-hug-lib.fetchModel {
url = "stas/tiny-random-llama-2";
rev = "3579d71fd57e04f5a364d824d3a2ec3e913dbb67";
fileTreeHash = "sha256-mD+VYvxsLFH7+jiumTZYcE3f3kpMKeimaR0eElkT7FI=";
};
model-cache = nix-hug-lib.buildCache {
models = [ my-model ];
};
in {
packages.${system} = {
inherit my-model model-cache;
default = nix-hug.packages.${system}.default;
};
devShells.${system}.default = pkgs.mkShell {
buildInputs = [ nix-hug.packages.${system}.default ];
};
};
}$ nix run github:longregen/nix-hug -- fetch mistralai/Mistral-7B-Instruct-v0.3Global options:
--debug: enable verbose logging--version: print version--help: show help
Downloads a model or dataset from Hugging Face and prints a Nix expression with pinned revision and hashes.
$ nix-hug fetch <url> [options]Options:
--ref REF: git reference to resolve (default:main)--include PATTERN: include files matching a glob pattern--exclude PATTERN: exclude files matching a glob pattern--file FILENAME: include a specific file by name--dry-run: show what would be fetched without downloading
# Fetch only safetensors weights
$ nix-hug fetch mistralai/Mistral-7B-Instruct-v0.3 --include '*.safetensors'
# Fetch a dataset
$ nix-hug fetch rajpurkar/squad --include '*.json'
# Fetch a single config file
$ nix-hug fetch google-bert/bert-base-uncased --file config.jsonThe CLI auto-detects whether a repository is a model or dataset by querying the Hugging Face API.
Lists files in a repository without downloading anything. Accepts the same
filter options as fetch.
$ nix-hug ls mistralai/Mistral-7B-Instruct-v0.3
$ nix-hug ls stanfordnlp/imdb --include '*.parquet'Fetches a model or dataset and copies it into the local HuggingFace cache
directory. This makes the model available to transformers, diffusers,
and other HF libraries, and preserves it outside the Nix store (surviving
garbage collection).
The cache location is determined by $HF_HUB_CACHE, $HF_HOME/hub, or
defaults to $XDG_CACHE_HOME/huggingface/hub/.
Accepts the same filter options as fetch.
$ nix-hug export openai-community/gpt2
$ nix-hug export openai-community/gpt2 --include '*.safetensors'Imports a model or dataset from the local HuggingFace cache into the Nix
store. If you already have models downloaded by transformers, diffusers,
or huggingface-cli, this avoids re-downloading files that are already on
disk. Use nix-hug scan to see what's available before importing.
The imported store path has the same layout as nix-hug fetch, so the
output can be used with buildCache and nix build.
The cache location is determined by $HF_HUB_CACHE, $HF_HOME/hub, or
defaults to $XDG_CACHE_HOME/huggingface/hub/.
$ nix-hug import <url> [options]Options:
--ref REF: match a specific revision--include PATTERN: include files matching a glob pattern--exclude PATTERN: exclude files matching a glob pattern--file FILENAME: include a specific file by name
$ nix-hug import openai-community/gpt2
$ nix-hug import openai-community/gpt2 --include '*.safetensors'Lists all models and datasets in the local HuggingFace cache. Useful for
discovering what's available before running import.
The cache location is determined by $HF_HUB_CACHE, $HF_HOME/hub, or
defaults to $XDG_CACHE_HOME/huggingface/hub/.
$ nix-hug scanShows each cached repository with its type, revision, size, file count, whether it's already in the Nix store, and any ref labels.
The library is available as nix-hug.lib.${system} from the flake output.
Fetch a model or dataset from Hugging Face and returns a derivation.
nix-hug-lib.fetchModel {
url = "stas/tiny-random-llama-2";
rev = "3579d71fd57e04f5a364d824d3a2ec3e913dbb67";
fileTreeHash = "sha256-mD+VYvxsLFH7+jiumTZYcE3f3kpMKeimaR0eElkT7FI=";
}fetchDataset has the same interface:
nix-hug-lib.fetchDataset {
url = "rajpurkar/squad";
rev = "abc123...";
filters = { include = [ ".*\\.json" ]; };
fileTreeHash = "sha256-...";
}Parameters:
url(required): repository identifier (see URL Formats)rev(required): git commit hash (40 characters)fileTreeHash(required): SHA256 hash of the HF API file tree responsefilters(optional): filter object withinclude,exclude, orfiles
The filters attribute accepts one of three forms:
{ include = [ "regex" ... ]; }keeps only matching LFS files{ exclude = [ "regex" ... ]; }skips matching LFS files{ files = [ "filename" ... ]; }selects specific files by exact name
Non-LFS files (configs, tokenizer files) are always included unless files
is used.
Combines fetched models and datasets into a HuggingFace Hub-compatible cache directory using symlinks (no data duplication).
nix-hug-lib.buildCache {
models = [ my-model another-model ];
datasets = [ my-dataset ];
}Use the result as HF_HUB_CACHE:
$ export HF_HUB_CACHE=/nix/store/...-hf-hub-cache
$ export TRANSFORMERS_OFFLINE=1
$ python your_script.pyModels:
mistralai/Mistral-7B-Instruct-v0.3hf:mistralai/Mistral-7B-Instruct-v0.3https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
Datasets:
rajpurkar/squadhf-datasets:rajpurkar/squaddatasets/rajpurkar/squadhttps://huggingface.co/datasets/rajpurkar/squad
When you use a bare org/repo path, the CLI queries the Hugging Face API to
determine whether the repository is a model or dataset.
$ nix develop
$ ./cli/nix-hug --helpRun the tests:
$ nix flake checkThis software is provided free under the MIT License.