feat: add checksum verification library (SHA-256, SHA-1, MD5)#324
feat: add checksum verification library (SHA-256, SHA-1, MD5)#324mvanhorn wants to merge 4 commits intoSurgeDM:mainfrom
Conversation
Add VerifyChecksum() to compute file hashes and compare against expected values, and ParseDigestHeader() to extract checksums from HTTP Digest response headers (RFC 3230). Supports hex and base64 encoded hashes. This is the core library for download integrity verification. Wiring into the download lifecycle and CLI flags will follow in a separate PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Fixed the hex fallback - it now validates hash length before accepting, preventing wrong-length hashes from silently passing through. Also added support for unpadded base64 (RawStdEncoding/RawURLEncoding) since some servers omit padding in Digest headers. Pushed in 0ec5370. |
Remove dead code in ParseDigestHeader (hex fallback already handled by the earlier hex check). Strengthen base64 test assertion with exact expected hash. Add MD5 and SHA-1 happy-path tests with algorithm normalization verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Addressed the greptile findings in 62ac7bd - removed the redundant hex fallback in ParseDigestHeader, strengthened the base64 test assertion with the exact expected hash, and added MD5/SHA-1 happy-path tests. The unpadded base64 and algorithm normalization findings were already handled in the existing implementation. |
| func ParseDigestHeader(header string) (algorithm string, hexHash string) { | ||
| parts := strings.SplitN(header, "=", 2) | ||
| if len(parts) != 2 { | ||
| return "", "" | ||
| } |
There was a problem hiding this comment.
Silent failure on multi-value RFC 3230 Digest headers
SplitN(header, "=", 2) puts everything after the first = into parts[1]. When a server sends a standard RFC 3230 comma-separated header like sha-256=HASH, md5=HASH2, parts[1] becomes "HASH, md5=HASH2". The comma makes hex and base64 decoding both fail, so the function returns ("", "") — silently skipping checksum verification for a fully valid header. The PR description explicitly claims RFC 3230 support, but this case isn't handled.
// Split on comma first to handle multi-algorithm RFC 3230 headers,
// then pick the first entry that matches a supported algorithm.
func ParseDigestHeader(header string) (algorithm string, hexHash string) {
for _, entry := range strings.Split(header, ",") {
entry = strings.TrimSpace(entry)
if algo, h := parseSingleDigestEntry(entry); algo != "" {
return algo, h
}
}
return "", ""
}Alternatively, at minimum the docstring should document that callers must pass a single-algorithm value so the limitation is explicit.
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 71-75
Comment:
**Silent failure on multi-value RFC 3230 Digest headers**
`SplitN(header, "=", 2)` puts everything after the first `=` into `parts[1]`. When a server sends a standard RFC 3230 comma-separated header like `sha-256=HASH, md5=HASH2`, `parts[1]` becomes `"HASH, md5=HASH2"`. The comma makes hex and base64 decoding both fail, so the function returns `("", "")` — silently skipping checksum verification for a fully valid header. The PR description explicitly claims RFC 3230 support, but this case isn't handled.
```go
// Split on comma first to handle multi-algorithm RFC 3230 headers,
// then pick the first entry that matches a supported algorithm.
func ParseDigestHeader(header string) (algorithm string, hexHash string) {
for _, entry := range strings.Split(header, ",") {
entry = strings.TrimSpace(entry)
if algo, h := parseSingleDigestEntry(entry); algo != "" {
return algo, h
}
}
return "", ""
}
```
Alternatively, at minimum the docstring should document that callers must pass a single-algorithm value so the limitation is explicit.
How can I resolve this? If you propose a fix, please make it concise.
Summary
Add a checksum verification library that computes file hashes and compares them against expected values. Also parses HTTP Digest response headers (RFC 3230) for server-provided checksums.
Why this matters
aria2 has
--checksum=sha-256=HASH. wget verifies checksums. When downloading Linux ISOs or software releases, users expect integrity verification. Surge currently downloads files with no integrity check.Changes
VerifyChecksum(filepath, algorithm, expected)supporting MD5, SHA-1, SHA-256ParseDigestHeader(header)to extract checksums from RFC 3230 HTTP Digest headers (base64 and hex)internal/processing/checksum.go--checksumflag will follow in a separate PR.Testing
This contribution was developed with AI assistance (Codex + Claude Code).
Greptile Summary
This PR adds a new
internal/processing/checksum.golibrary providingVerifyChecksum(file hash computation + comparison for MD5, SHA-1, SHA-256) andParseDigestHeader(RFC 3230 HTTP Digest header parsing with hex and all four base64 encodings). Most issues from the previous review round have been addressed, butParseDigestHeadersilently returns("", "")when called with a standard RFC 3230 comma-separated multi-value header (e.g.sha-256=HASH, md5=HASH2), which would cause the caller to skip checksum verification for a fully valid server response.Confidence Score: 4/5
Safe to merge after fixing the multi-value RFC 3230 header parsing; the remaining finding is a P2 test coverage gap.
One P1 logic bug remains:
ParseDigestHeadersilently returns("", "")for comma-separated RFC 3230 Digest headers, which the PR description explicitly claims to support. All other previously flagged issues have been resolved. The P2 finding (missing tests for unpadded/URL-safe base64 and SHA-1 base64 inParseDigestHeader) is not blocking but should be addressed before the wiring PR lands.internal/processing/checksum.go — the
ParseDigestHeaderfunction needs to handle comma-separated multi-value headers.Important Files Changed
ParseDigestHeadersilently returns("", "")for valid RFC 3230 multi-value headers (e.g.sha-256=HASH, md5=HASH).ParseDigestHeader.Sequence Diagram
sequenceDiagram participant Caller participant ParseDigestHeader participant VerifyChecksum participant fs as File System Caller->>ParseDigestHeader: header (e.g. sha-256=base64hash) ParseDigestHeader->>ParseDigestHeader: SplitN on = (n=2) ParseDigestHeader->>ParseDigestHeader: normalize algo (sha-256→sha256) ParseDigestHeader->>ParseDigestHeader: try hex fast-path (length check) ParseDigestHeader->>ParseDigestHeader: try base64 variants (Std/URL/Raw×2) ParseDigestHeader-->>Caller: (algorithm, hexHash) Caller->>VerifyChecksum: (filePath, algorithm, hexHash) VerifyChecksum->>fs: os.Open(filePath) fs-->>VerifyChecksum: file handle VerifyChecksum->>VerifyChecksum: io.Copy → hash.Sum VerifyChecksum->>VerifyChecksum: hex.EncodeToString(actual) VerifyChecksum->>VerifyChecksum: actual == expected? VerifyChecksum-->>Caller: ChecksumResult{Match, Algorithm, Actual, Expected}Prompt To Fix All With AI
Reviews (4): Last reviewed commit: "fix: remove redundant hex fallback and s..." | Re-trigger Greptile