Skip to content

[BUG] Downloaded Models Not Validated #133

@olddev94

Description

@olddev94

Project

vgrep

Description

The model download function in src/cli/commands.rs lines 865-894 downloads models from HuggingFace but does not validate the downloaded files. If the download is interrupted, network issues cause corruption, or disk space runs out, the corrupted model file is saved and used anyway, causing cryptic errors later.

Error Message

Various errors when loading corrupted model:

Error: Failed to load embedding model

or

Error: Invalid GGUF file format

or silent incorrect behavior (garbage embeddings).

Debug Logs

System Information

- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+

Screenshots

No response

Steps to Reproduce

  1. Start model download: vgrep models download
  2. Interrupt with Ctrl+C during download
  3. Run vgrep serve or vgrep search
  4. Observe cryptic error from corrupted model file

Expected Behavior

After downloading:

  1. Verify file size matches expected
  2. Validate checksum/hash of downloaded file
  3. Attempt to load model to verify it's valid GGUF
  4. Only then save path to config
  5. If validation fails, delete corrupted file and show clear error

Actual Behavior

  1. File is downloaded
  2. Path is saved to config immediately
  3. No size check
  4. No hash verification
  5. No format validation
  6. Corrupted file causes later errors

Additional Context

HuggingFace Hub API might provide checksums that could be verified. The hf-hub crate may have built-in validation options.

Users who experience download issues will:

  1. See "Downloaded successfully!" message
  2. Later get cryptic model loading errors
  3. Not realize the downloaded file is corrupted
  4. Waste time debugging

A simple file size check and format validation would catch most issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingvalidValid issuevgrep

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions