Skip to content

feat: add checksum verification library (SHA-256, SHA-1, MD5)#324

Open
mvanhorn wants to merge 4 commits intoSurgeDM:mainfrom
mvanhorn:feat/checksum-verification
Open

feat: add checksum verification library (SHA-256, SHA-1, MD5)#324
mvanhorn wants to merge 4 commits intoSurgeDM:mainfrom
mvanhorn:feat/checksum-verification

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

@mvanhorn mvanhorn commented Apr 5, 2026

Summary

Add a checksum verification library that computes file hashes and compares them against expected values. Also parses HTTP Digest response headers (RFC 3230) for server-provided checksums.

Why this matters

aria2 has --checksum=sha-256=HASH. wget verifies checksums. When downloading Linux ISOs or software releases, users expect integrity verification. Surge currently downloads files with no integrity check.

Changes

  • Add VerifyChecksum(filepath, algorithm, expected) supporting MD5, SHA-1, SHA-256
  • Add ParseDigestHeader(header) to extract checksums from RFC 3230 HTTP Digest headers (base64 and hex)
  • Both functions are in internal/processing/checksum.go
  • This is the core library. Wiring into the download lifecycle and CLI --checksum flag will follow in a separate PR.

Testing

Tests

go test ./... -count=1   # All 18 packages pass

This contribution was developed with AI assistance (Codex + Claude Code).

Greptile Summary

This PR adds a new internal/processing/checksum.go library providing VerifyChecksum (file hash computation + comparison for MD5, SHA-1, SHA-256) and ParseDigestHeader (RFC 3230 HTTP Digest header parsing with hex and all four base64 encodings). Most issues from the previous review round have been addressed, but ParseDigestHeader silently returns ("", "") when called with a standard RFC 3230 comma-separated multi-value header (e.g. sha-256=HASH, md5=HASH2), which would cause the caller to skip checksum verification for a fully valid server response.

Confidence Score: 4/5

Safe to merge after fixing the multi-value RFC 3230 header parsing; the remaining finding is a P2 test coverage gap.

One P1 logic bug remains: ParseDigestHeader silently returns ("", "") for comma-separated RFC 3230 Digest headers, which the PR description explicitly claims to support. All other previously flagged issues have been resolved. The P2 finding (missing tests for unpadded/URL-safe base64 and SHA-1 base64 in ParseDigestHeader) is not blocking but should be addressed before the wiring PR lands.

internal/processing/checksum.go — the ParseDigestHeader function needs to handle comma-separated multi-value headers.

Important Files Changed

Filename Overview
internal/processing/checksum.go New checksum library; most prior review issues resolved, but ParseDigestHeader silently returns ("", "") for valid RFC 3230 multi-value headers (e.g. sha-256=HASH, md5=HASH).
internal/processing/checksum_test.go Good coverage added for MD5, SHA-1 (with alias), and exact hash assertions; missing tests for unpadded/URL-safe base64 and SHA-1 base64 paths in ParseDigestHeader.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant ParseDigestHeader
    participant VerifyChecksum
    participant fs as File System

    Caller->>ParseDigestHeader: header (e.g. sha-256=base64hash)
    ParseDigestHeader->>ParseDigestHeader: SplitN on = (n=2)
    ParseDigestHeader->>ParseDigestHeader: normalize algo (sha-256→sha256)
    ParseDigestHeader->>ParseDigestHeader: try hex fast-path (length check)
    ParseDigestHeader->>ParseDigestHeader: try base64 variants (Std/URL/Raw×2)
    ParseDigestHeader-->>Caller: (algorithm, hexHash)

    Caller->>VerifyChecksum: (filePath, algorithm, hexHash)
    VerifyChecksum->>fs: os.Open(filePath)
    fs-->>VerifyChecksum: file handle
    VerifyChecksum->>VerifyChecksum: io.Copy → hash.Sum
    VerifyChecksum->>VerifyChecksum: hex.EncodeToString(actual)
    VerifyChecksum->>VerifyChecksum: actual == expected?
    VerifyChecksum-->>Caller: ChecksumResult{Match, Algorithm, Actual, Expected}
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 71-75

Comment:
**Silent failure on multi-value RFC 3230 Digest headers**

`SplitN(header, "=", 2)` puts everything after the first `=` into `parts[1]`. When a server sends a standard RFC 3230 comma-separated header like `sha-256=HASH, md5=HASH2`, `parts[1]` becomes `"HASH, md5=HASH2"`. The comma makes hex and base64 decoding both fail, so the function returns `("", "")` — silently skipping checksum verification for a fully valid header. The PR description explicitly claims RFC 3230 support, but this case isn't handled.

```go
// Split on comma first to handle multi-algorithm RFC 3230 headers,
// then pick the first entry that matches a supported algorithm.
func ParseDigestHeader(header string) (algorithm string, hexHash string) {
    for _, entry := range strings.Split(header, ",") {
        entry = strings.TrimSpace(entry)
        if algo, h := parseSingleDigestEntry(entry); algo != "" {
            return algo, h
        }
    }
    return "", ""
}
```

Alternatively, at minimum the docstring should document that callers must pass a single-algorithm value so the limitation is explicit.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: internal/processing/checksum_test.go
Line: 86-109

Comment:
**Untested base64 variants and algorithm paths in `ParseDigestHeader`**

The implementation was updated to try all four base64 encodings (`StdEncoding`, `URLEncoding`, `RawStdEncoding`, `RawURLEncoding`) to handle real-world servers, but only `StdEncoding` with padded SHA-256 is exercised. The `RawStdEncoding`/`RawURLEncoding` (unpadded) and `URLEncoding` (URL-safe) code paths — and SHA-1 base64 entirely — are never tested. Per the edge-case testing requirement, each new branch should have at least one test.

Suggested additions:

```go
func TestParseDigestHeader_SHA256UnpaddedBase64(t *testing.T) {
    // RawStdEncoding — no trailing '='
    algo, hash := ParseDigestHeader("sha-256=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU")
    assert.Equal(t, "sha256", algo)
    assert.Equal(t, "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", hash)
}

func TestParseDigestHeader_SHA1Base64(t *testing.T) {
    // sha1("") in standard padded base64
    algo, hash := ParseDigestHeader("sha-1=2jmj7l5rSw0yVb/vlWAYkK/YBwk=")
    assert.Equal(t, "sha1", algo)
    assert.Equal(t, "da39a3ee5e6b4b0d3255bfef95601890afd80709", hash)
}
```

**Rule Used:** What: All code changes must include tests for edge... ([source](https://app.greptile.com/review/custom-context?memory=2b22782d-3452-4d55-b059-e631b2540ce8))

How can I resolve this? If you propose a fix, please make it concise.

Reviews (4): Last reviewed commit: "fix: remove redundant hex fallback and s..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Add VerifyChecksum() to compute file hashes and compare against expected
values, and ParseDigestHeader() to extract checksums from HTTP Digest
response headers (RFC 3230). Supports hex and base64 encoded hashes.

This is the core library for download integrity verification. Wiring
into the download lifecycle and CLI flags will follow in a separate PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mvanhorn
Copy link
Copy Markdown
Contributor Author

mvanhorn commented Apr 6, 2026

Fixed the hex fallback - it now validates hash length before accepting, preventing wrong-length hashes from silently passing through. Also added support for unpadded base64 (RawStdEncoding/RawURLEncoding) since some servers omit padding in Digest headers. Pushed in 0ec5370.

mvanhorn and others added 2 commits April 9, 2026 19:06
Remove dead code in ParseDigestHeader (hex fallback already handled
by the earlier hex check). Strengthen base64 test assertion with exact
expected hash. Add MD5 and SHA-1 happy-path tests with algorithm
normalization verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Addressed the greptile findings in 62ac7bd - removed the redundant hex fallback in ParseDigestHeader, strengthened the base64 test assertion with the exact expected hash, and added MD5/SHA-1 happy-path tests. The unpadded base64 and algorithm normalization findings were already handled in the existing implementation.

Comment on lines +71 to +75
func ParseDigestHeader(header string) (algorithm string, hexHash string) {
parts := strings.SplitN(header, "=", 2)
if len(parts) != 2 {
return "", ""
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent failure on multi-value RFC 3230 Digest headers

SplitN(header, "=", 2) puts everything after the first = into parts[1]. When a server sends a standard RFC 3230 comma-separated header like sha-256=HASH, md5=HASH2, parts[1] becomes "HASH, md5=HASH2". The comma makes hex and base64 decoding both fail, so the function returns ("", "") — silently skipping checksum verification for a fully valid header. The PR description explicitly claims RFC 3230 support, but this case isn't handled.

// Split on comma first to handle multi-algorithm RFC 3230 headers,
// then pick the first entry that matches a supported algorithm.
func ParseDigestHeader(header string) (algorithm string, hexHash string) {
    for _, entry := range strings.Split(header, ",") {
        entry = strings.TrimSpace(entry)
        if algo, h := parseSingleDigestEntry(entry); algo != "" {
            return algo, h
        }
    }
    return "", ""
}

Alternatively, at minimum the docstring should document that callers must pass a single-algorithm value so the limitation is explicit.

Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 71-75

Comment:
**Silent failure on multi-value RFC 3230 Digest headers**

`SplitN(header, "=", 2)` puts everything after the first `=` into `parts[1]`. When a server sends a standard RFC 3230 comma-separated header like `sha-256=HASH, md5=HASH2`, `parts[1]` becomes `"HASH, md5=HASH2"`. The comma makes hex and base64 decoding both fail, so the function returns `("", "")` — silently skipping checksum verification for a fully valid header. The PR description explicitly claims RFC 3230 support, but this case isn't handled.

```go
// Split on comma first to handle multi-algorithm RFC 3230 headers,
// then pick the first entry that matches a supported algorithm.
func ParseDigestHeader(header string) (algorithm string, hexHash string) {
    for _, entry := range strings.Split(header, ",") {
        entry = strings.TrimSpace(entry)
        if algo, h := parseSingleDigestEntry(entry); algo != "" {
            return algo, h
        }
    }
    return "", ""
}
```

Alternatively, at minimum the docstring should document that callers must pass a single-algorithm value so the limitation is explicit.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant