-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
bugSomething isn't workingSomething isn't workinginvalidThis doesn't seem rightThis doesn't seem right
Description
Project
vgrep
Description
The should_index() function includes an empty string "" in its extension match pattern, causing all files without extensions to be indexed. This includes compiled binaries, core dumps, lock files, and other non-text files that should be excluded.
Error Message
# May produce errors like:
Failed to read file: stream did not contain valid UTF-8
# Or silently corrupt the index with binary contentDebug Logs
$ RUST_LOG=debug vgrep index
# Shows attempts to read binary files without extensionsSystem Information
OS: Ubuntu 22.04
vgrep version: 0.1.0Screenshots
No response
Steps to Reproduce
- Create a directory with mixed files:
echo "valid source" > test.rs
cp /bin/ls ./my_binary # Or any binary without extension
- Run
vgrep index - Observe that vgrep attempts to index
my_binary - Check logs/output for UTF-8 errors or observe the binary in the database
Expected Behavior
Only known text/source file types should be indexed. Files without extensions should only be indexed if they match specific known names (Dockerfile, Makefile, etc.).
Actual Behavior
All files without extensions are matched by | "" in the extension pattern and are attempted to be indexed.
Additional Context
Files affected:
src/core/indexer.rs→should_index()(line 310)src/watcher.rs→should_index()(line 244)
Problematic code:
matches!(
ext.as_str(),
"rs" | "py" | ... | "" // Matches ALL extensionless files
)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinginvalidThis doesn't seem rightThis doesn't seem right