-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
Description
Project
vgrep
Description
The model download function in src/cli/commands.rs lines 865-894 downloads models from HuggingFace but does not validate the downloaded files. If the download is interrupted, network issues cause corruption, or disk space runs out, the corrupted model file is saved and used anyway, causing cryptic errors later.
Error Message
Various errors when loading corrupted model:
Error: Failed to load embedding model
or
Error: Invalid GGUF file format
or silent incorrect behavior (garbage embeddings).Debug Logs
System Information
- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+Screenshots
No response
Steps to Reproduce
- Start model download:
vgrep models download - Interrupt with Ctrl+C during download
- Run
vgrep serveorvgrep search - Observe cryptic error from corrupted model file
Expected Behavior
After downloading:
- Verify file size matches expected
- Validate checksum/hash of downloaded file
- Attempt to load model to verify it's valid GGUF
- Only then save path to config
- If validation fails, delete corrupted file and show clear error
Actual Behavior
- File is downloaded
- Path is saved to config immediately
- No size check
- No hash verification
- No format validation
- Corrupted file causes later errors
Additional Context
HuggingFace Hub API might provide checksums that could be verified. The hf-hub crate may have built-in validation options.
Users who experience download issues will:
- See "Downloaded successfully!" message
- Later get cryptic model loading errors
- Not realize the downloaded file is corrupted
- Waste time debugging
A simple file size check and format validation would catch most issues.