Skip to content

epcgrs/rs-precommit-fix-encode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RSFE - Rust Source File Encoding Fixer

Read in English | Leia em Português

A Rust command-line tool for automatic validation and correction of file encodings in projects, designed to run as a pre-commit hook via Husky.

Features

  • Automatic encoding detection: Automatically detects file encoding using chardetng
  • Smart conversion: Converts only when necessary to avoid unnecessary changes
  • Flexible configuration: Supports configuration file with glob patterns
  • Optimized performance: Parallel processing using Rayon
  • Git integration: Processes only staged files or the entire project
  • Interactive interface: Allows encoding selection when there's no configuration
  • Respects .gitignore: Automatically ignores files listed in .gitignore

Installation

Prerequisites

  • Rust 1.70+ (install via rustup)
  • Git (for pre-commit integration)
  • Node.js and npm (optional, for Husky integration)

Quick installation

# Clone or copy the project
git clone <your-repository>
cd rs-precommit-fix-encode

# Run the installation script
chmod +x install.sh
./install.sh

The installation script will:

  1. Compile the project in release mode
  2. Automatically configure Husky (if Node.js project detected)
  3. Create the pre-commit hook

Manual installation

# Compile
cargo build --release

# Binary will be at: ./target/release/rsfe

Configuration

rsfe.conf file

Create an rsfe.conf file in your project root. Format:

# Comments start with #
<glob-pattern> <ENCODING>

# Example:
** UTF-8                    # Default for all files
**/*.js UTF-8               # JavaScript files
**/*.py UTF-8               # Python files
legacy/** AUTO              # Auto-detection for legacy folder
old-data/**/*.txt WINDOWS-1252  # Specific encoding

Supported encodings

  • UTF-8 (recommended default)
  • UTF-16LE (UTF-16 Little Endian)
  • UTF-16BE (UTF-16 Big Endian)
  • ISO-8859-1 (Latin-1)
  • WINDOWS-1252 (CP1252)
  • AUTO (automatic detection)
  • Any encoding supported by the encoding_rs library

Configuration example

Copy the example file:

cp rsfe.conf.example rsfe.conf

Edit as needed for your project.

Usage

Standalone mode

# Process entire project
./target/release/rsfe

# If no rsfe.conf exists, you'll be prompted to choose the default encoding

With Git (staged files)

# Stage files
git add .

# Run rsfe (will process only staged files)
./target/release/rsfe

As pre-commit hook with Husky

If you installed via install.sh, the hook is already configured. Each commit will:

  1. Run rsfe on staged files
  2. Convert encodings if necessary
  3. Re-stage modified files
  4. Proceed with the commit
# Normal git usage
git add .
git commit -m "My message"
# rsfe will run automatically

Manual Git hook configuration

If not using Husky, add to .git/hooks/pre-commit:

#!/bin/sh

# Run rsfe
./target/release/rsfe

# If it fails, cancel the commit
if [ $? -ne 0 ]; then
    echo "❌ rsfe failed. Fix errors before committing."
    exit 1
fi

# Re-stage modified files
git add -u

exit 0

Make the hook executable:

chmod +x .git/hooks/pre-commit

How it works

Execution flow

  1. Load configuration: Reads rsfe.conf or prompts for default encoding
  2. Collect files:
    • If in git repository with staged files: process only staged files
    • Otherwise: process entire project
  3. Filter files: Ignore binaries and respect .gitignore
  4. Process in parallel: Uses Rayon for multi-threaded processing
  5. For each file:
    • Determine target encoding based on rules
    • Detect current encoding
    • Check if conversion is needed
    • Convert if necessary
  6. Report results: Show how many files were converted

Conversion detection

RSFE only converts when truly necessary, by checking:

  1. If there are errors when decoding with the target encoding
  2. If re-encoding produces different content than the original

This avoids unnecessary commits of files already in the correct encoding.

Performance

RSFE is optimized for performance:

  • Parallel processing: Uses all CPU cores via Rayon
  • Smart reading: Automatically ignores binary files
  • Detection cache: chardetng is efficient for encoding detection
  • Conditional conversion: Only converts when truly necessary

Typical benchmark (hardware dependent):

  • ~10,000 files processed in ~2-5 seconds
  • Necessary conversions: adds ~1-2ms per file

Ignored files

By default, RSFE ignores:

Directories

  • node_modules
  • target
  • dist
  • build
  • .git
  • .idea
  • .vscode
  • Any directory in .gitignore

Extensions (binaries)

  • Images: .png, .jpg, .jpeg, .gif
  • Documents: .pdf
  • Compressed files: .zip, .tar, .gz
  • Executables: .exe, .dll, .so, .dylib

Usage examples

JavaScript/TypeScript project

# rsfe.conf
** UTF-8
**/*.js UTF-8
**/*.ts UTF-8
**/*.jsx UTF-8
**/*.tsx UTF-8
**/*.json UTF-8

Python project

# rsfe.conf
** UTF-8
**/*.py UTF-8
**/*.md UTF-8
requirements.txt UTF-8

Project with legacy files

# rsfe.conf
** UTF-8
legacy/** AUTO           # Auto-detect
docs/old/*.txt WINDOWS-1252

Troubleshooting

"Rust is not installed"

Install Rust via rustup:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

"Compilation failed"

Check Rust version:

rustc --version  # Should be 1.70+

Update if necessary:

rustup update

Binary files being processed

Add the extension to the ignore list at src/main.rs:251.

Unsupported encoding

Check the encoding_rs documentation for supported encodings.

Development

Project structure

rs-precommit-fix-encode/
├── src/
│   └── main.rs          # Main code
├── Cargo.toml           # Dependencies
├── rsfe.conf.example    # Configuration example
├── install.sh           # Installation script
├── .husky/
│   └── pre-commit       # Husky hook
├── README.md            # English documentation
└── LEIAME.md            # Portuguese documentation

Run in debug mode

cargo run

Tests

cargo test

Optimized build

cargo build --release --target x86_64-unknown-linux-musl  # Static Linux binary

Contributing

  1. Fork the project
  2. Create a branch for your feature (git checkout -b feature/MyFeature)
  3. Commit your changes (git commit -m 'Add MyFeature')
  4. Push to the branch (git push origin feature/MyFeature)
  5. Open a Pull Request

License

MIT License - see the LICENSE file for details.

Author

Created to ensure encoding consistency in projects and avoid special character issues in commits.

Useful links

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published