Skip to content

[high] Content-addressed upload can permanently bind wrong bytes to a hash #107

Description

@mbertschler

Summary

The content-addressed upload path can permanently bind the wrong bytes to a content hash, after which the true bytes can never reach the destination, and the index/indexer can mint an immutable contents row whose size_bytes disagrees with the actual content.

Where

  • sync/content_addressed.gouploadOneObject: the drift guard stats the file before rclone reads it (stat→copy race), the post-upload check compares size only, and HasRemoteObject suppresses any future re-upload of that hash.
  • index/index.go — size/mtime captured at walk time, hash computed later from a fresh open with no re-stat (store/files.go lookupContentTx then errors forever on the honest size).

Scenario

Wrong bytes under a hash: a file is edited in place (size+mtime preserved, or changed within the stat→read window) during a content-addressed push. rclone uploads bytes that don't match the recorded hash; InsertRemoteObject lands; HasRemoteObject now suppresses re-upload forever. If a clean duplicate of the true content exists elsewhere and is later offloaded, "recovery" yields the poisoned object.

Immutable wrong size: a file is appended to between the walker's d.Info() and the worker's hashFile. The contents row binds the new digest to the old size. Because contents is immutable, every later observation of those exact bytes fails the size cross-check — aborting the whole ApplyIndexBatch repeatedly and refusing the content-addressed push forever.

Fix shape

  • In the upload path, hash the bytes as uploaded (stream-hash, or stage a copy and hash it) and refuse on mismatch with the indexed hash — never record a remote_objects row for unverified bytes. (This also dovetails with the scan-back fingerprint work.)
  • In the indexer, Stat the open handle after hashing and build the contents row from that size/mtime, so hash and metadata describe the same inode state. Optionally isolate a single poisoned entry instead of failing the whole batch.

Adversarial audit of offload-v1 (auditor D F4, auditor B MEDIUM-6).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdata-lossCould cause silent data losssecuritySecurity / data-integrity finding

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions