Performance improvement: avoid unnecessary hashing by pre-indexing files using filename + size

The current implementation hashes every orphan and candidate file: digest, err := getDigest(path)
This can become very expensive when processing large archives with many large files.
However, many datasets contain files that are already uniquely identifiable using: filename + size

Example: DSC_1023.JPG (5.2 MB)
In most real-world archives, this combination uniquely identifies the file.

### Proposed Optimization

Add a fast pre-filter index before hashing.
If multiple matches exist, fall back to the current digest-based approach.

**Benefits**
Typical archive restructuring scenario:
- 100k files
- 90% unique filename+size
Result:
- significant reduction in hashing
- faster scanning
- improved scalability for large media archives

**Safety**
The optimization only skips hashing when:
- exactly one filename+size candidate exists
- If ambiguity exists, hashing still runs as before.

### Solution Code Snippet

**Fast index structure**
type FastKey struct {
    Name string
    Size int64
}
fastIndex := map[FastKey][]string{}
for path, meta := range destinationFiles {
    key := FastKey{
        Name: filepath.Base(path),
        Size: meta.Size,
    }
    fastIndex[key] = append(fastIndex[key], path)
}

**Fast lookup before hashing**
key := FastKey{
    Name: filepath.Base(orphanAtSource),
    Size: sourceFiles[orphanAtSource].Size,
}
candidates := fastIndex[key]
if len(candidates) == 1 {
    candidateAtDestination := candidates[0]
    actions = append(actions, action.MoveFileAction{
        BasePath: destinationDirPath,
        RelativeFromPath: candidateAtDestination,
        RelativeToPath: orphanAtSource,
    })
    continue
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement: avoid unnecessary hashing by pre-indexing files using filename + size #27

Proposed Optimization

Solution Code Snippet

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance improvement: avoid unnecessary hashing by pre-indexing files using filename + size #27

Description

Proposed Optimization

Solution Code Snippet

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions