Issue 277 reed-solomon#316
Conversation
Add `reed-solomon-erasure = \"5\"` to provide Reed–Solomon encoding/decoding support for future error-correction feature.
Add `reed-solomon-erasure = \"5\"` so the Reed–Solomon helper is available as `crate::crypto::rs`."
radumarias
left a comment
There was a problem hiding this comment.
Review from claude code
▐▛███▜▌ Claude Code v2.0.65
▝▜█████▛▘ Opus 4.5 · Claude API
▘▘ ▝▝ ~/dev/xorio/rencfs
Code Review: PR #316 - Reed-Solomon Error Correction
Overview
This PR implements Reed-Solomon error correction for rencfs (Issue #277), allowing data recovery from parity shards when the main file is corrupted or missing. The implementation adds:
- New module: src/crypto/rs.rs - Core RS encoding/decoding logic
- Integration: RS encoding on file release, reconstruction on file open
- Configuration: Optional RsConfig struct with data/parity shard counts
- Example: examples/rs_example.rs demonstrating usage
- Tests: Unit test for the RS encoder and integration test for full workflow
Changes: +398 additions, -7 deletions across 11 files
Code Quality Analysis
✅ Strengths
- Opt-in design: RS is optional via Option, respecting users who don't want storage overhead
- Clean separation: RS logic isolated in its own module (src/crypto/rs.rs)
- Comprehensive test: Integration test simulates file corruption and recovery
- Length prefix encoding: Smart approach to handle arbitrary data sizes
❌ Issues & Concerns
- Critical: Incorrect Data Sharding Model (src/encryptedfs.rs:1831-1893)
The reconstruction logic assumes the original file is shard 0, but the encoding splits data across ALL data shards:
// Encoding: data split across data_shards (0, 1, 2)
// Reconstruction: only loads original file as shard[0], nothing for shards 1, 2
shards_opt[0] = Some(data); // Only shard 0!
Problem: With data_shards=3, reconstruction requires 3 shards minimum, but the code only provides 1 data shard + parity shards. This means reconstruction will always fail for the current implementation.
Recommendation: Either:
- Store all data shards separately (not just the original file)
- Or use data_shards=1 model where the original file IS the single data shard
- Memory Leak via Box::leak (src/encryptedfs.rs:1880-1888)
return Err(FsError::Other(Box::leak(format!("...").into_boxed_str())));
Using Box::leak creates permanent memory leaks. The project already has FsError::Other(&'static str) which expects static strings.
Recommendation: Add a new error variant for RS failures:
#[error("Reed-Solomon error: {0}")]
ReedSolomonError(String),
- Blocking I/O in Async Context (src/encryptedfs.rs:1531-1565, 1838-1893)
if let Ok(content) = std::fs::read(&content_path) { // Blocking!
Using synchronous std::fs::read/write in async methods blocks the Tokio runtime.
Recommendation: Use tokio::fs::read / tokio::fs::write with .await
- Missing Error Propagation (src/encryptedfs.rs:1553-1559)
if let Err(e) = std::fs::write(&shard_path, shard) {
warn!("Failed to write parity shard {:?}: {}", shard_path, e);
}
Encoding failures are silently logged but not propagated. User may think data is protected when it isn't.
Recommendation: Consider making parity write failures a hard error or at least return a warning to the caller.
- Inconsistent use of warn! vs error!
- Line 1559: warn! for parity write failure
- Line 1563: warn! for encode failure
- Line 1874: error! for reconstruction failure
These all seem like similar severity levels; consider consistent logging.
Security Considerations
- Parity shards are unencrypted data fragments: The encrypted content is split into shards. While the content is already encrypted, the existence of .parity.X files leaks metadata about which files have RS protection enabled.
- No integrity verification on shards: Maliciously crafted parity shards could potentially cause reconstruction to produce garbage data. Consider adding checksums.
- Shard files not cleaned up on delete: When a file is deleted, the corresponding .parity.X files should also be removed to avoid dangling files and potential data leakage.
Test Coverage
✅ Good
- Unit test for RsEncoder::encode/reconstruct
- Integration test test_reed_solomon_error_correction covers full workflow
❌ Missing
- No test for partial corruption (file exists but is corrupted)
- No test for insufficient shards scenario
- No test for parity shard cleanup on file deletion
- No test for set_len (truncation) interaction with RS
- No negative test for when data_shards > 1 (which will fail with current impl)
Performance Implications
- Every file release triggers RS encoding + parity writes - could be slow for large files
- Synchronous reads of entire file content into memory (std::fs::read) - problematic for large files
- Older parking_lot version (0.11.2) pulled in by reed-solomon-erasure - adds duplicate dependency
Specific Recommendations
Must Fix
- Fix the sharding model: Current implementation won't work with data_shards > 1. Either:
- Document that only data_shards=1 is supported, or
- Store all data shards separately - Remove Box::leak: Add proper error variant to avoid memory leaks
- Use async I/O: Replace std::fs::{read,write} with tokio::fs
Should Fix
- Add parity cleanup in remove_file / deletion logic
- Handle large files with streaming instead of loading entire file into memory
- Propagate encoding errors to caller instead of silent logging
Nice to Have
- Add more comprehensive tests for edge cases
- Consider adding a checksum to each shard for integrity verification
- Add CLI flag to enable/configure RS settings
Summary
| Category | Rating |
|---|---|
| Code Correctness | |
| Code Style | ✅ Good |
| Security | |
| Tests | |
| Performance |
Verdict: The PR shows good understanding of Reed-Solomon concepts and clean code structure, but has a critical bug in the sharding model that would cause reconstruction to fail in practice. The memory leak from Box::leak should also be fixed before merge.
Recommend requesting changes to address the critical issues before approval.
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
? for shortcuts 51102 tokens
globalVersion: 2.0.65 · latestVersion: 2.0.69
● PR Review Summary: Reed-Solomon Error Correction (issue-277-reed-solomon) Overview This PR implements Reed-Solomon error correction for file recovery, adding ~400 lines across 11 files. The feature is opt-in via an RsConfig parameter. Critical Issues (4 found) — Must fix before merge
Important Issues (8 found) — Should fix
Suggestions (6 found) — Nice to have
Strengths
Recommended Action Plan Phase 1: Fix Critical Issues (Blocking)
Phase 2: Address Important Issues
Phase 3: Polish
Re-run Review After Fixes After addressing critical issues, run: |
Critical fixes: - Eliminate Box::leak memory leaks (add FsError::ReedSolomonError) - Replace blocking std::fs with async tokio::fs - Delete parity shards on file removal to prevent data leakage Important fixes: - Add input validation for RsEncoder (data_shards/parity_shards > 0) - Document data_shards=1 limitation with debug_assert - Respect read-only mode during reconstruction - Only reconstruct missing files, not empty ones
There was a problem hiding this comment.
Pull request overview
This PR adds Reed-Solomon error correction functionality to enable file recovery from parity shards in case of data corruption or loss. The implementation uses the reed-solomon-erasure crate to create parity shards that can reconstruct missing or corrupted encrypted files.
Key changes:
- Introduced a new
RsEncodermodule with encode/reconstruct capabilities for Reed-Solomon error correction - Extended
EncryptedFsto support optional Reed-Solomon configuration, automatically creating parity shards on file write and reconstructing files on read when needed - Updated all
EncryptedFs::new()call sites to include the new optionalrs_configparameter
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 23 comments.
Show a summary per file
| File | Description |
|---|---|
| src/crypto/rs.rs | New module implementing Reed-Solomon encoding and reconstruction logic |
| src/encryptedfs.rs | Added rs_config field, parity shard creation on release(), reconstruction on open(), and parity cleanup on deletion |
| src/encryptedfs/test.rs | Added comprehensive test for Reed-Solomon reconstruction scenario |
| src/test_common.rs | Updated EncryptedFs::new() call to include None for rs_config parameter |
| src/mount/linux.rs | Updated EncryptedFs::new() call to include None for rs_config parameter |
| examples/rs_example.rs | New standalone example demonstrating Reed-Solomon encode/reconstruct |
| examples/file_handling.rs | Updated EncryptedFs::new() call to include None for rs_config parameter |
| examples/encryptedfs.rs | Updated EncryptedFs::new() call to include None for rs_config parameter |
| src/crypto.rs | Added rs module export and removed unused error import from tracing |
| src/crypto/read.rs | Removed redundant pub(crate) use of decrypt_block macro |
| Cargo.toml | Added reed-solomon-erasure dependency |
| Cargo.lock | Updated with new dependencies for reed-solomon-erasure |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let mut available_shards = if shards_opt[0].is_some() { 1 } else { 0 }; | ||
| for i in 0..rs_config.parity_shards { | ||
| let shard_path = content_path.parent() | ||
| .unwrap() |
There was a problem hiding this comment.
Error handling issue: Using .unwrap() on parent() can panic. Consider using .ok_or() or .expect() with a descriptive message to make debugging easier if this path operation ever fails.
| .unwrap() | |
| .ok_or_else(|| FsError::ReedSolomonError(format!( | |
| "Cannot determine parent directory for content path: {:?}", content_path | |
| )))? |
| // Write parity shards (skip data shards which are the original file) | ||
| for (i, shard) in shards.iter().enumerate().skip(rs_config.data_shards) { | ||
| let shard_path = content_path.parent() | ||
| .unwrap() |
There was a problem hiding this comment.
Error handling issue: Using .unwrap() on parent() can panic. Consider using .ok_or() or .expect() with a descriptive message to make debugging easier if this path operation ever fails.
| .unwrap() | |
| .expect(&format!("content_path {:?} has no parent directory", content_path)) |
| pub data_shards: usize, | ||
| /// Number of parity shards to create (for recovery). | ||
| pub parity_shards: usize, | ||
| } | ||
|
|
||
| pub struct RsEncoder { |
There was a problem hiding this comment.
Missing validation: RsConfig lacks validation in its creation. The fields data_shards and parity_shards should be validated to ensure they are greater than 0 and that data_shards == 1 (as per the current implementation constraint). Consider adding a constructor method that validates these constraints or implementing validation in the EncryptedFs::new method.
| pub data_shards: usize, | |
| /// Number of parity shards to create (for recovery). | |
| pub parity_shards: usize, | |
| } | |
| pub struct RsEncoder { | |
| data_shards: usize, | |
| /// Number of parity shards to create (for recovery). | |
| parity_shards: usize, | |
| } |
| let mut payload = Vec::with_capacity(8 + data.len()); | ||
| payload.extend_from_slice(&(data.len() as u64).to_le_bytes()); | ||
| payload.extend_from_slice(data); | ||
|
|
||
| let shard_size = (payload.len() + self.data_shards - 1) / self.data_shards; | ||
| let total_shards = self.data_shards + self.parity_shards; | ||
|
|
||
| let mut shards: Vec<Vec<u8>> = vec![vec![0u8; shard_size]; total_shards]; | ||
|
|
||
| for i in 0..self.data_shards { | ||
| let start = i * shard_size; | ||
| let end = std::cmp::min(start + shard_size, payload.len()); | ||
| if start < payload.len() { | ||
| shards[i][..end - start].copy_from_slice(&payload[start..end]); | ||
| } | ||
| } | ||
|
|
||
| let mut shard_refs: Vec<&mut [u8]> = shards.iter_mut().map(|v| v.as_mut_slice()).collect(); | ||
| r.encode(&mut shard_refs)?; | ||
|
|
||
| Ok(shards) | ||
| } | ||
|
|
||
| pub fn reconstruct( | ||
| &self, | ||
| shards_opt: &mut [Option<Vec<u8>>], | ||
| ) -> Result<Vec<u8>, Box<dyn Error>> { | ||
| let r = ReedSolomon::<Gf8>::new(self.data_shards, self.parity_shards)?; | ||
| let total_shards = self.data_shards + self.parity_shards; | ||
|
|
||
| if shards_opt.len() != total_shards { | ||
| return Err("shards length mismatch".into()); | ||
| } | ||
|
|
||
| let shard_len = shards_opt | ||
| .iter() | ||
| .find_map(|s| s.as_ref().map(|v| v.len())) | ||
| .ok_or("no shards available")?; | ||
|
|
||
| for slot in shards_opt.iter_mut() { | ||
| if let Some(v) = slot { | ||
| if v.len() < shard_len { | ||
| v.resize(shard_len, 0u8); | ||
| } else if v.len() > shard_len { | ||
| return Err("inconsistent shard lengths".into()); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| r.reconstruct(shards_opt)?; | ||
|
|
||
| let mut payload = Vec::with_capacity(shard_len * self.data_shards); | ||
| for i in 0..self.data_shards { | ||
| let slice = shards_opt[i] | ||
| .as_ref() | ||
| .ok_or("missing shard after reconstruct")?; | ||
| payload.extend_from_slice(slice); | ||
| } | ||
|
|
||
| if payload.len() < 8 { | ||
| return Err("payload too small".into()); | ||
| } | ||
| let orig_len = u64::from_le_bytes(payload[0..8].try_into().unwrap()) as usize; | ||
| if 8 + orig_len > payload.len() { | ||
| return Err("original length exceeds reconstructed payload".into()); | ||
| } | ||
| Ok(payload[8..8 + orig_len].to_vec()) |
There was a problem hiding this comment.
Magic number: The value 8 appears multiple times (lines 34, 35, 93, 96, 100) representing the size of the u64 length prefix. This should be defined as a named constant (e.g., LENGTH_PREFIX_SIZE) to improve maintainability and make the code's intent clearer.
| #[tokio::test] | ||
| #[traced_test] | ||
| async fn test_reed_solomon_error_correction() { | ||
| // Create filesystem with Reed-Solomon enabled (3 data shards, 2 parity shards) | ||
| // Currently only `data_shards = 1` is supported. | ||
| // The original encrypted file acts as the single data shard. | ||
| // Parity shards are used only for recovery. | ||
| let rs_config = Some(crate::crypto::rs::RsConfig { | ||
| data_shards: 1, | ||
| parity_shards: 2, | ||
| }); | ||
|
|
||
| let temp_dir = tempfile::tempdir().unwrap(); | ||
| let data_dir = temp_dir.path().to_path_buf(); | ||
|
|
||
| let fs = EncryptedFs::new( | ||
| data_dir.clone(), | ||
| Box::new(PasswordProviderImpl {}), | ||
| Cipher::ChaCha20Poly1305, | ||
| rs_config, | ||
| false, | ||
| ) | ||
| .await | ||
| .unwrap(); | ||
|
|
||
| // Create and write a test file | ||
| let test_file = SecretString::from_str("test-rs-file").unwrap(); | ||
| let (fh, attr) = fs | ||
| .create( | ||
| ROOT_INODE, | ||
| &test_file, | ||
| create_attr(FileType::RegularFile), | ||
| false, | ||
| true, | ||
| ) | ||
| .await | ||
| .unwrap(); | ||
|
|
||
| let original_data = b"Hello, Reed-Solomon! This is test data for error correction."; | ||
| write_all_bytes_to_fs(&fs, attr.ino, 0, original_data, fh).await.unwrap(); | ||
|
|
||
| // Release the file handle (this should trigger RS encoding and create parity shards) | ||
| fs.release(fh).await.unwrap(); | ||
|
|
||
| // Verify the main content file exists | ||
| let content_path = fs.contents_path(attr.ino); | ||
| assert!(content_path.exists(), "Main content file should exist"); | ||
|
|
||
| // Verify parity shard files were created | ||
| for i in 0..2 { | ||
| let parity_path = content_path.parent().unwrap().join(format!("{}.parity.{}", attr.ino, i)); | ||
| assert!(parity_path.exists(), "Parity shard {} should exist", i); | ||
| } | ||
|
|
||
| // Simulate file corruption by deleting the main content file | ||
| std::fs::remove_file(&content_path).unwrap(); | ||
| assert!(!content_path.exists(), "Main content file should be deleted"); | ||
|
|
||
| // Try to read the file - this should trigger reconstruction from parity shards | ||
| let read_fh = fs.open(attr.ino, true, false).await.unwrap(); | ||
|
|
||
| // Read the reconstructed content | ||
| let mut buffer = vec![0u8; original_data.len()]; | ||
| let bytes_read = fs.read(attr.ino, 0, &mut buffer, read_fh).await.unwrap(); | ||
| assert_eq!(bytes_read, original_data.len(), "Should read all original bytes"); | ||
| assert_eq!(&buffer[..bytes_read], original_data, "Reconstructed data should match original"); | ||
|
|
||
| // Verify the main file was restored | ||
| assert!(content_path.exists(), "Main content file should be restored after reconstruction"); | ||
|
|
||
| fs.release(read_fh).await.unwrap(); | ||
| } |
There was a problem hiding this comment.
Missing test coverage: The test only covers the happy path where file reconstruction succeeds. It should also test error scenarios such as: insufficient parity shards available for reconstruction, corrupted parity shards, missing all shards, and the read-only mode preventing reconstruction write-back.
| debug!("RS: Wrote {} parity shards for inode {}", rs_config.parity_shards, ctx.ino); | ||
| } | ||
| Err(e) => { | ||
| warn!("{}", e); |
There was a problem hiding this comment.
Inconsistent error handling: When RS encoding fails (line 1570), the error is converted to a String, then later at line 1589 it's only logged as a warning. This means encoding failures are silently ignored and parity shards won't be created. Consider either propagating this error or at least using error! level logging and documenting this behavior, as users may expect parity shards to always be created when RS is enabled.
| warn!("{}", e); | |
| // RS encoding failed: parity shards will NOT be created for this file. | |
| error!("RS: Parity shard creation failed: {}. Parity shards will NOT be created for inode {}.", e, ctx.ino); |
| // Create filesystem with Reed-Solomon enabled (3 data shards, 2 parity shards) | ||
| // Currently only `data_shards = 1` is supported. | ||
| // The original encrypted file acts as the single data shard. | ||
| // Parity shards are used only for recovery. | ||
| let rs_config = Some(crate::crypto::rs::RsConfig { | ||
| data_shards: 1, | ||
| parity_shards: 2, | ||
| }); | ||
|
|
||
| let temp_dir = tempfile::tempdir().unwrap(); | ||
| let data_dir = temp_dir.path().to_path_buf(); | ||
|
|
||
| let fs = EncryptedFs::new( | ||
| data_dir.clone(), | ||
| Box::new(PasswordProviderImpl {}), | ||
| Cipher::ChaCha20Poly1305, | ||
| rs_config, | ||
| false, | ||
| ) | ||
| .await | ||
| .unwrap(); | ||
|
|
||
| // Create and write a test file | ||
| let test_file = SecretString::from_str("test-rs-file").unwrap(); | ||
| let (fh, attr) = fs | ||
| .create( | ||
| ROOT_INODE, | ||
| &test_file, | ||
| create_attr(FileType::RegularFile), | ||
| false, | ||
| true, | ||
| ) | ||
| .await | ||
| .unwrap(); | ||
|
|
||
| let original_data = b"Hello, Reed-Solomon! This is test data for error correction."; | ||
| write_all_bytes_to_fs(&fs, attr.ino, 0, original_data, fh).await.unwrap(); | ||
|
|
||
| // Release the file handle (this should trigger RS encoding and create parity shards) | ||
| fs.release(fh).await.unwrap(); | ||
|
|
||
| // Verify the main content file exists | ||
| let content_path = fs.contents_path(attr.ino); | ||
| assert!(content_path.exists(), "Main content file should exist"); | ||
|
|
||
| // Verify parity shard files were created | ||
| for i in 0..2 { | ||
| let parity_path = content_path.parent().unwrap().join(format!("{}.parity.{}", attr.ino, i)); | ||
| assert!(parity_path.exists(), "Parity shard {} should exist", i); | ||
| } | ||
|
|
||
| // Simulate file corruption by deleting the main content file | ||
| std::fs::remove_file(&content_path).unwrap(); | ||
| assert!(!content_path.exists(), "Main content file should be deleted"); | ||
|
|
||
| // Try to read the file - this should trigger reconstruction from parity shards | ||
| let read_fh = fs.open(attr.ino, true, false).await.unwrap(); | ||
|
|
||
| // Read the reconstructed content | ||
| let mut buffer = vec![0u8; original_data.len()]; | ||
| let bytes_read = fs.read(attr.ino, 0, &mut buffer, read_fh).await.unwrap(); | ||
| assert_eq!(bytes_read, original_data.len(), "Should read all original bytes"); | ||
| assert_eq!(&buffer[..bytes_read], original_data, "Reconstructed data should match original"); | ||
|
|
||
| // Verify the main file was restored | ||
| assert!(content_path.exists(), "Main content file should be restored after reconstruction"); | ||
|
|
||
| fs.release(read_fh).await.unwrap(); |
There was a problem hiding this comment.
Test pattern inconsistency: This test doesn't follow the run_test pattern used by other tests in the file (e.g., test_read_only_write). Consider refactoring to use the run_test helper for consistency, proper cleanup, and to leverage the existing test infrastructure.
| // Create filesystem with Reed-Solomon enabled (3 data shards, 2 parity shards) | |
| // Currently only `data_shards = 1` is supported. | |
| // The original encrypted file acts as the single data shard. | |
| // Parity shards are used only for recovery. | |
| let rs_config = Some(crate::crypto::rs::RsConfig { | |
| data_shards: 1, | |
| parity_shards: 2, | |
| }); | |
| let temp_dir = tempfile::tempdir().unwrap(); | |
| let data_dir = temp_dir.path().to_path_buf(); | |
| let fs = EncryptedFs::new( | |
| data_dir.clone(), | |
| Box::new(PasswordProviderImpl {}), | |
| Cipher::ChaCha20Poly1305, | |
| rs_config, | |
| false, | |
| ) | |
| .await | |
| .unwrap(); | |
| // Create and write a test file | |
| let test_file = SecretString::from_str("test-rs-file").unwrap(); | |
| let (fh, attr) = fs | |
| .create( | |
| ROOT_INODE, | |
| &test_file, | |
| create_attr(FileType::RegularFile), | |
| false, | |
| true, | |
| ) | |
| .await | |
| .unwrap(); | |
| let original_data = b"Hello, Reed-Solomon! This is test data for error correction."; | |
| write_all_bytes_to_fs(&fs, attr.ino, 0, original_data, fh).await.unwrap(); | |
| // Release the file handle (this should trigger RS encoding and create parity shards) | |
| fs.release(fh).await.unwrap(); | |
| // Verify the main content file exists | |
| let content_path = fs.contents_path(attr.ino); | |
| assert!(content_path.exists(), "Main content file should exist"); | |
| // Verify parity shard files were created | |
| for i in 0..2 { | |
| let parity_path = content_path.parent().unwrap().join(format!("{}.parity.{}", attr.ino, i)); | |
| assert!(parity_path.exists(), "Parity shard {} should exist", i); | |
| } | |
| // Simulate file corruption by deleting the main content file | |
| std::fs::remove_file(&content_path).unwrap(); | |
| assert!(!content_path.exists(), "Main content file should be deleted"); | |
| // Try to read the file - this should trigger reconstruction from parity shards | |
| let read_fh = fs.open(attr.ino, true, false).await.unwrap(); | |
| // Read the reconstructed content | |
| let mut buffer = vec![0u8; original_data.len()]; | |
| let bytes_read = fs.read(attr.ino, 0, &mut buffer, read_fh).await.unwrap(); | |
| assert_eq!(bytes_read, original_data.len(), "Should read all original bytes"); | |
| assert_eq!(&buffer[..bytes_read], original_data, "Reconstructed data should match original"); | |
| // Verify the main file was restored | |
| assert!(content_path.exists(), "Main content file should be restored after reconstruction"); | |
| fs.release(read_fh).await.unwrap(); | |
| let rs_config = Some(crate::crypto::rs::RsConfig { | |
| data_shards: 1, | |
| parity_shards: 2, | |
| }); | |
| run_test( | |
| Some(Cipher::ChaCha20Poly1305), | |
| rs_config, | |
| false, | |
| |fs| Box::pin(async move { | |
| // Create and write a test file | |
| let test_file = SecretString::from_str("test-rs-file").unwrap(); | |
| let (fh, attr) = fs | |
| .create( | |
| ROOT_INODE, | |
| &test_file, | |
| create_attr(FileType::RegularFile), | |
| false, | |
| true, | |
| ) | |
| .await | |
| .unwrap(); | |
| let original_data = b"Hello, Reed-Solomon! This is test data for error correction."; | |
| write_all_bytes_to_fs(&fs, attr.ino, 0, original_data, fh).await.unwrap(); | |
| // Release the file handle (this should trigger RS encoding and create parity shards) | |
| fs.release(fh).await.unwrap(); | |
| // Verify the main content file exists | |
| let content_path = fs.contents_path(attr.ino); | |
| assert!(content_path.exists(), "Main content file should exist"); | |
| // Verify parity shard files were created | |
| for i in 0..2 { | |
| let parity_path = content_path.parent().unwrap().join(format!("{}.parity.{}", attr.ino, i)); | |
| assert!(parity_path.exists(), "Parity shard {} should exist", i); | |
| } | |
| // Simulate file corruption by deleting the main content file | |
| std::fs::remove_file(&content_path).unwrap(); | |
| assert!(!content_path.exists(), "Main content file should be deleted"); | |
| // Try to read the file - this should trigger reconstruction from parity shards | |
| let read_fh = fs.open(attr.ino, true, false).await.unwrap(); | |
| // Read the reconstructed content | |
| let mut buffer = vec![0u8; original_data.len()]; | |
| let bytes_read = fs.read(attr.ino, 0, &mut buffer, read_fh).await.unwrap(); | |
| assert_eq!(bytes_read, original_data.len(), "Should read all original bytes"); | |
| assert_eq!(&buffer[..bytes_read], original_data, "Reconstructed data should match original"); | |
| // Verify the main file was restored | |
| assert!(content_path.exists(), "Main content file should be restored after reconstruction"); | |
| fs.release(read_fh).await.unwrap(); | |
| }), | |
| ) | |
| .await; |
| pub fn encode(&self, data: &[u8]) -> Result<Vec<Vec<u8>>, Box<dyn Error>> { | ||
| let r = ReedSolomon::<Gf8>::new(self.data_shards, self.parity_shards)?; | ||
|
|
||
| // prefix with original length so we can trim when reconstructing | ||
| let mut payload = Vec::with_capacity(8 + data.len()); | ||
| payload.extend_from_slice(&(data.len() as u64).to_le_bytes()); | ||
| payload.extend_from_slice(data); | ||
|
|
||
| let shard_size = (payload.len() + self.data_shards - 1) / self.data_shards; | ||
| let total_shards = self.data_shards + self.parity_shards; | ||
|
|
||
| let mut shards: Vec<Vec<u8>> = vec![vec![0u8; shard_size]; total_shards]; | ||
|
|
||
| for i in 0..self.data_shards { | ||
| let start = i * shard_size; | ||
| let end = std::cmp::min(start + shard_size, payload.len()); | ||
| if start < payload.len() { | ||
| shards[i][..end - start].copy_from_slice(&payload[start..end]); | ||
| } | ||
| } | ||
|
|
||
| let mut shard_refs: Vec<&mut [u8]> = shards.iter_mut().map(|v| v.as_mut_slice()).collect(); | ||
| r.encode(&mut shard_refs)?; | ||
|
|
||
| Ok(shards) | ||
| } |
There was a problem hiding this comment.
Missing documentation: The encode method lacks documentation. It should document what it does, explain the length-prefixing behavior, describe the returned shard structure, and document potential errors.
| let shard_path = content_path.parent() | ||
| .unwrap() | ||
| .join(format!("{}.parity.{}", ctx.ino, i - rs_config.data_shards)); |
There was a problem hiding this comment.
Security issue: The parity shard path construction uses format! with user-controlled inode number without validation. While inode is typically an internal number, ensure this cannot be exploited for path traversal. Consider using a more explicit path join operation or validating the inode format.
| let shard_path = content_path.parent() | |
| .unwrap() | |
| .join(format!("{}.parity.{}", ctx.ino, i - rs_config.data_shards)); | |
| let shard_filename = format!("{}.parity.{}", ctx.ino.to_string(), (i - rs_config.data_shards).to_string()); | |
| let shard_path = content_path.parent() | |
| .unwrap() | |
| .join(shard_filename); |
| let content_path = self.contents_path(ctx.ino); | ||
| debug!("RS: Attempting to encode file for inode {} at {:?}", ctx.ino, content_path); | ||
|
|
||
| match tokio::fs::read(&content_path).await { | ||
| Ok(content) => { | ||
| debug!("RS: Read {} bytes from {:?}", content.len(), content_path); | ||
| let encoder = crate::crypto::rs::RsEncoder::new( | ||
| rs_config.data_shards, | ||
| rs_config.parity_shards, | ||
| ); | ||
|
|
||
| let shards = encoder.encode(&content) | ||
| .map_err(|e| format!("RS encoding failed for inode {}: {}", ctx.ino, e)); | ||
|
|
||
| match shards { | ||
| Ok(shards) => { | ||
| // Write parity shards (skip data shards which are the original file) | ||
| for (i, shard) in shards.iter().enumerate().skip(rs_config.data_shards) { | ||
| let shard_path = content_path.parent() | ||
| .unwrap() | ||
| .join(format!("{}.parity.{}", ctx.ino, i - rs_config.data_shards)); | ||
| if let Err(e) = tokio::fs::write(&shard_path, shard).await { | ||
| warn!("RS: Failed to write parity shard {:?}: {}", shard_path, e); | ||
| } else { | ||
| debug!("RS: Wrote parity shard {} for inode {} at {:?}", | ||
| i - rs_config.data_shards, ctx.ino, shard_path); | ||
| } | ||
| } | ||
| debug!("RS: Wrote {} parity shards for inode {}", rs_config.parity_shards, ctx.ino); | ||
| } | ||
| Err(e) => { | ||
| warn!("{}", e); | ||
| } | ||
| } | ||
| } | ||
| Err(e) => { | ||
| debug!("RS: Failed to read content file {:?}: {}", content_path, e); |
There was a problem hiding this comment.
Performance issue: The entire file content is read into memory (line 1561) for Reed-Solomon encoding on every file release. For large files, this could cause significant memory pressure and I/O overhead. Consider implementing streaming/chunked encoding or adding a file size threshold to avoid encoding very large files.
| let content_path = self.contents_path(ctx.ino); | |
| debug!("RS: Attempting to encode file for inode {} at {:?}", ctx.ino, content_path); | |
| match tokio::fs::read(&content_path).await { | |
| Ok(content) => { | |
| debug!("RS: Read {} bytes from {:?}", content.len(), content_path); | |
| let encoder = crate::crypto::rs::RsEncoder::new( | |
| rs_config.data_shards, | |
| rs_config.parity_shards, | |
| ); | |
| let shards = encoder.encode(&content) | |
| .map_err(|e| format!("RS encoding failed for inode {}: {}", ctx.ino, e)); | |
| match shards { | |
| Ok(shards) => { | |
| // Write parity shards (skip data shards which are the original file) | |
| for (i, shard) in shards.iter().enumerate().skip(rs_config.data_shards) { | |
| let shard_path = content_path.parent() | |
| .unwrap() | |
| .join(format!("{}.parity.{}", ctx.ino, i - rs_config.data_shards)); | |
| if let Err(e) = tokio::fs::write(&shard_path, shard).await { | |
| warn!("RS: Failed to write parity shard {:?}: {}", shard_path, e); | |
| } else { | |
| debug!("RS: Wrote parity shard {} for inode {} at {:?}", | |
| i - rs_config.data_shards, ctx.ino, shard_path); | |
| } | |
| } | |
| debug!("RS: Wrote {} parity shards for inode {}", rs_config.parity_shards, ctx.ino); | |
| } | |
| Err(e) => { | |
| warn!("{}", e); | |
| } | |
| } | |
| } | |
| Err(e) => { | |
| debug!("RS: Failed to read content file {:?}: {}", content_path, e); | |
| const RS_MAX_FILE_SIZE: u64 = 10 * 1024 * 1024; // 10 MB | |
| let content_path = self.contents_path(ctx.ino); | |
| debug!("RS: Attempting to encode file for inode {} at {:?}", ctx.ino, content_path); | |
| match tokio::fs::metadata(&content_path).await { | |
| Ok(metadata) => { | |
| if metadata.len() > RS_MAX_FILE_SIZE { | |
| warn!("RS: Skipping Reed-Solomon encoding for inode {} at {:?} (file size {} bytes exceeds threshold of {} bytes)", | |
| ctx.ino, content_path, metadata.len(), RS_MAX_FILE_SIZE); | |
| } else { | |
| match tokio::fs::read(&content_path).await { | |
| Ok(content) => { | |
| debug!("RS: Read {} bytes from {:?}", content.len(), content_path); | |
| let encoder = crate::crypto::rs::RsEncoder::new( | |
| rs_config.data_shards, | |
| rs_config.parity_shards, | |
| ); | |
| let shards = encoder.encode(&content) | |
| .map_err(|e| format!("RS encoding failed for inode {}: {}", ctx.ino, e)); | |
| match shards { | |
| Ok(shards) => { | |
| // Write parity shards (skip data shards which are the original file) | |
| for (i, shard) in shards.iter().enumerate().skip(rs_config.data_shards) { | |
| let shard_path = content_path.parent() | |
| .unwrap() | |
| .join(format!("{}.parity.{}", ctx.ino, i - rs_config.data_shards)); | |
| if let Err(e) = tokio::fs::write(&shard_path, shard).await { | |
| warn!("RS: Failed to write parity shard {:?}: {}", shard_path, e); | |
| } else { | |
| debug!("RS: Wrote parity shard {} for inode {} at {:?}", | |
| i - rs_config.data_shards, ctx.ino, shard_path); | |
| } | |
| } | |
| debug!("RS: Wrote {} parity shards for inode {}", rs_config.parity_shards, ctx.ino); | |
| } | |
| Err(e) => { | |
| warn!("{}", e); | |
| } | |
| } | |
| } | |
| Err(e) => { | |
| debug!("RS: Failed to read content file {:?}: {}", content_path, e); | |
| } | |
| } | |
| } | |
| } | |
| Err(e) => { | |
| debug!("RS: Failed to stat content file {:?}: {}", content_path, e); |
eeb16e5 to
2d7ffa8
Compare
2d7ffa8 to
a3d52c4
Compare
Use Reed-Solomon error correction (issue #277 )
Description
Add a minimal Reed-Solomon error-correction example algorithm to see how the data can be reconstructed. It provides a mechanism to create parity shards for files so the project can tolerate disk/device failures.
Type of change
Checklist: