Skip to content

[medium] Confirm/repair S3 multipart ETag capture for scan-back fingerprints #118

Description

@mbertschler

Summary

squirrel's scan-back fingerprint reads the ciphertext checksum from the underlying remote with rclone lsjson --hash. For the s3 backend it uses the md5 hash slot as the object ETag. Per rclone's S3 documentation, this is reliable only for single-part uploads (and multipart objects that carry an MD5 in metadata): a plain multipart object's ETag is <hex>-<parts> and rclone does not surface it in the md5 hash slot — it returns the metadata MD5 or an empty hash. So large objects (those that the backend splits into multiple parts — exactly the multi-GB media case) may leave their fingerprint pending rather than capturing the ETag.

The current code is safe about this — an empty hash leaves the remote_objects pair NULL with a "fingerprint stays pending" warning, never a fake value — but the ETag-capture coverage for multipart objects is unconfirmed against a live S3-compatible backend.

What to confirm / decide

  • Against a real S3-compatible backend with multipart uploads: what does rclone lsjson --hash actually return for a multipart object's md5 hash? Empty, the metadata MD5, or the composite ETag?
  • If the composite ETag is needed as the fingerprint, the right rclone surface is likely lsf --format / --metadata (the ETag is metadata, not a hash), not lsjson --hash. Decide whether to switch the s3 capture path to read the ETag via metadata, or to set an MD5 in object metadata at upload time so single-hash capture works.
  • The design doc deferred S3 additional checksums (rclone can't set them); this issue is the read-side counterpart.

Behavior is correct today (pending + warning); this tracks closing the multipart ETag coverage gap so content-addressed offsite objects on s3 can actually gate offload via a recorded fingerprint.

Raised during the standards review of the scan-back fingerprint PR (#116).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsecuritySecurity / data-integrity finding

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions