diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index 83bad4c..3f11806 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -6,6 +6,9 @@ on: permissions: contents: write + packages: none + issues: none + pull-requests: none jobs: build-and-release: @@ -28,8 +31,12 @@ jobs: - name: Create artifact bundle run: ./Scripts/release-artifactbundle.sh "${{ env.TAG_NAME }}" + - name: Generate checksums + run: shasum -a 256 xpbc-macos.artifactbundle.zip > checksums.txt + - name: Upload release assets uses: softprops/action-gh-release@v2 with: files: | xpbc-macos.artifactbundle.zip + checksums.txt diff --git a/README.md b/README.md index 7f84cd9..d34fdc8 100644 --- a/README.md +++ b/README.md @@ -47,7 +47,7 @@ make install PREFIX=~/.local ## Usage ``` -xpbc [-pboard {general|ruler|find|font}] [--help] [--version] +xpbc [-pboard {general|ruler|find|font}] [--no-validate] [--help] [--version] ``` Pipe any data into `xpbc` via stdin. It automatically detects the format and copies accordingly. @@ -90,7 +90,7 @@ Anything that doesn't match a known image signature is copied as text. | Code | Meaning | |------|---------| | 0 | Success | -| 1 | Known error (empty input, input too large, invalid argument, pasteboard write failure) | +| 1 | Known error (empty input, input too large, invalid argument, validation failure, pasteboard write failure) | | 2 | Unexpected error | ### Options @@ -98,6 +98,7 @@ Anything that doesn't match a known image signature is copied as text. | Flag | Description | |------|-------------| | `-pboard NAME` | Target pasteboard: `general` (default), `ruler`, `find`, or `font` | +| `--no-validate` | Skip structural validation of image headers | | `-h`, `--help` | Print usage | | `-v`, `--version` | Print version | @@ -118,6 +119,17 @@ make clean # Clean build artifacts - Input size is capped at 100 MB (read in 64 KB chunks to prevent OOM) - stdin-only input (no file path arguments, no path traversal risk) - Written in memory-safe Swift with no `Unsafe` pointer usage +- Structural validation of image headers is enabled by default (use `--no-validate` to skip) + +### Important limitations + +**xpbc does not guarantee the safety of clipboard contents.** While structural validation checks that image headers are well-formed, it cannot detect: + +- **Crafted exploit payloads** — A structurally valid image (valid headers, correct dimensions) can still contain malicious data that exploits vulnerabilities in the application where you paste it (e.g., heap overflows in image decoders like libwebp, ImageIO). +- **Decompression bombs** — An image with valid headers but compressed data that expands to an extreme size, causing the paste target to crash with out-of-memory. +- **PDF active content** — While xpbc blocks PDFs containing known dangerous keywords (`/JS`, `/JavaScript`, `/OpenAction`, `/AA`, `/Launch`), obfuscated or novel techniques may bypass this check. + +**Do not pipe untrusted data** (e.g., `curl | xpbc`) **without understanding the risk.** The clipboard contents will be processed by whatever application you paste into, and xpbc cannot protect against vulnerabilities in those applications. ## Architecture diff --git a/Scripts/install.sh b/Scripts/install.sh index 5bb38ea..979b604 100755 --- a/Scripts/install.sh +++ b/Scripts/install.sh @@ -4,25 +4,52 @@ set -euo pipefail REPO="chigichan24/xpbc" ASSET_NAME="xpbc-macos.artifactbundle.zip" ASSET_URL="https://github.com/$REPO/releases/latest/download/$ASSET_NAME" +CHECKSUM_URL="https://github.com/$REPO/releases/latest/download/checksums.txt" INSTALL_DIR="${1:-$HOME/.local/bin}" +# Clean up intermediate files on any exit +cleanup() { + rm -f "$ASSET_NAME" checksums.txt + rm -rf extracted_files +} +trap cleanup EXIT + +# Validate install directory +case "$INSTALL_DIR" in + /*) ;; + *) echo "Error: install directory must be an absolute path: $INSTALL_DIR" >&2; exit 1 ;; +esac +case "$INSTALL_DIR" in + *..*) echo "Error: install directory must not contain '..': $INSTALL_DIR" >&2; exit 1 ;; +esac + # Download zip file echo "Downloading latest xpbc..." -curl -sL -o "$ASSET_NAME" "$ASSET_URL" +curl -fsSL -o "$ASSET_NAME" "$ASSET_URL" +curl -fsSL -o checksums.txt "$CHECKSUM_URL" + +# Verify checksum +echo "Verifying checksum..." +shasum -a 256 -c checksums.txt --ignore-missing || { + echo "Error: checksum verification failed!" >&2 + exit 1 +} + unzip -qo "$ASSET_NAME" -d extracted_files -rm "$ASSET_NAME" VERSION=$(ls ./extracted_files/xpbc.artifactbundle | sed -n 's/^xpbc-\([^-]*\)-macos$/\1/p' | head -n 1) if [ -z "$VERSION" ]; then - echo "Error: version not found in the artifact bundle." - rm -rf extracted_files + echo "Error: version not found in the artifact bundle." >&2 + exit 1 +fi +if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+$'; then + echo "Error: unexpected version format: $VERSION" >&2 exit 1 fi mkdir -p "$INSTALL_DIR" cp -f "./extracted_files/xpbc.artifactbundle/xpbc-$VERSION-macos/bin/xpbc" "$INSTALL_DIR/xpbc" chmod +x "$INSTALL_DIR/xpbc" -rm -rf extracted_files echo "Installed xpbc $VERSION to $INSTALL_DIR/xpbc" echo "Please make sure $INSTALL_DIR is in your \$PATH" diff --git a/Scripts/release-artifactbundle.sh b/Scripts/release-artifactbundle.sh index b48f33c..90fd37e 100755 --- a/Scripts/release-artifactbundle.sh +++ b/Scripts/release-artifactbundle.sh @@ -3,9 +3,15 @@ set -euo pipefail VERSION_STRING="$1" +# Validate version format +if ! echo "$VERSION_STRING" | grep -qE '^v[0-9]+\.[0-9]+\.[0-9]+$'; then + echo "Error: invalid version format '$VERSION_STRING' (expected vX.Y.Z)" >&2 + exit 1 +fi + mkdir -p "xpbc.artifactbundle/xpbc-$VERSION_STRING-macos/bin" -sed "s/__VERSION__/$VERSION_STRING/g" ./Scripts/info.json > "xpbc.artifactbundle/info.json" +sed "s|__VERSION__|$VERSION_STRING|g" ./Scripts/info.json > "xpbc.artifactbundle/info.json" cp -f "./.build/apple/Products/Release/xpbc" "xpbc.artifactbundle/xpbc-$VERSION_STRING-macos/bin" diff --git a/Sources/XPBCCore/DataValidator.swift b/Sources/XPBCCore/DataValidator.swift new file mode 100644 index 0000000..911a60f --- /dev/null +++ b/Sources/XPBCCore/DataValidator.swift @@ -0,0 +1,26 @@ +import Foundation + +public struct DataValidator: Sendable { + public static func validate(_ data: Data, as type: DataType) -> ValidationResult { + switch type { + case .text: + return .valid + case .png: + return PNGValidator().validate(data) + case .jpeg: + return JPEGValidator().validate(data) + case .gif: + return GIFValidator().validate(data) + case .tiff: + return TIFFValidator().validate(data) + case .bmp: + return BMPValidator().validate(data) + case .webp: + return WebPValidator().validate(data) + case .heic, .avif: + return FtypValidator().validate(data) + case .pdf: + return PDFValidator().validate(data) + } + } +} diff --git a/Sources/XPBCCore/FormatDetector.swift b/Sources/XPBCCore/FormatDetector.swift index 3347841..076b4e2 100644 --- a/Sources/XPBCCore/FormatDetector.swift +++ b/Sources/XPBCCore/FormatDetector.swift @@ -1,6 +1,6 @@ import Foundation -public enum DataType: Equatable, Sendable { +public enum DataType: Equatable, Sendable, CustomStringConvertible { case png case jpeg case gif @@ -11,6 +11,21 @@ public enum DataType: Equatable, Sendable { case avif case pdf case text + + public var description: String { + switch self { + case .png: return "PNG" + case .jpeg: return "JPEG" + case .gif: return "GIF" + case .tiff: return "TIFF" + case .bmp: return "BMP" + case .webp: return "WebP" + case .heic: return "HEIC" + case .avif: return "AVIF" + case .pdf: return "PDF" + case .text: return "text" + } + } } protocol FormatDetector: Sendable { diff --git a/Sources/XPBCCore/FormatValidator.swift b/Sources/XPBCCore/FormatValidator.swift new file mode 100644 index 0000000..ca595f2 --- /dev/null +++ b/Sources/XPBCCore/FormatValidator.swift @@ -0,0 +1,10 @@ +import Foundation + +public enum ValidationResult: Equatable, Sendable { + case valid + case invalid(reason: String) +} + +protocol FormatValidator: Sendable { + func validate(_ data: Data) -> ValidationResult +} diff --git a/Sources/XPBCCore/PasteboardWriter.swift b/Sources/XPBCCore/PasteboardWriter.swift index 8ac25da..4dbf14e 100644 --- a/Sources/XPBCCore/PasteboardWriter.swift +++ b/Sources/XPBCCore/PasteboardWriter.swift @@ -23,14 +23,30 @@ public struct PasteboardWriter: Sendable { } private func decodeText(from data: Data) -> String { + let decoded: String if let utf8 = String(data: data, encoding: .utf8) { - return utf8 + decoded = utf8 } else { FileHandle.standardError.write( Data("xpbc: warning: input is not valid UTF-8, falling back to Latin-1\n".utf8) ) // Latin-1 can decode any byte sequence, so this never returns nil - return String(data: data, encoding: .isoLatin1)! + decoded = String(data: data, encoding: .isoLatin1)! + } + return stripControlCharacters(decoded) + } + + /// Strip C0 control characters (except tab, newline, carriage return) and DEL + /// to prevent terminal escape sequence injection. + /// Checks all Unicode scalars in each Character to handle multi-scalar graphemes. + func stripControlCharacters(_ text: String) -> String { + text.filter { ch in + ch.unicodeScalars.allSatisfy { scalar in + let v = scalar.value + if v == 0x09 || v == 0x0A || v == 0x0D { return true } + if v < 0x20 || v == 0x7F { return false } + return true + } } } diff --git a/Sources/XPBCCore/Validators/BMPValidator.swift b/Sources/XPBCCore/Validators/BMPValidator.swift new file mode 100644 index 0000000..cd3c6bd --- /dev/null +++ b/Sources/XPBCCore/Validators/BMPValidator.swift @@ -0,0 +1,18 @@ +import Foundation + +struct BMPValidator: FormatValidator { + private static let validDIBSizes: Set = [12, 40, 52, 56, 108, 124] + + func validate(_ data: Data) -> ValidationResult { + // BMP file header is 14 bytes, then DIB header starts with its size (LE u32) + guard let dibSize = data.readLittleEndianUInt32(at: 14) else { + return .invalid(reason: "too short for DIB header size (need >= 18 bytes)") + } + + guard Self.validDIBSizes.contains(dibSize) else { + return .invalid(reason: "invalid DIB header size \(dibSize)") + } + + return .valid + } +} diff --git a/Sources/XPBCCore/Validators/ByteReader.swift b/Sources/XPBCCore/Validators/ByteReader.swift new file mode 100644 index 0000000..07f1c45 --- /dev/null +++ b/Sources/XPBCCore/Validators/ByteReader.swift @@ -0,0 +1,28 @@ +import Foundation + +extension Data { + func readBigEndianUInt32(at offset: Int) -> UInt32? { + guard offset >= 0, offset + 4 <= count else { return nil } + let start = startIndex + offset + return UInt32(self[start]) << 24 + | UInt32(self[start + 1]) << 16 + | UInt32(self[start + 2]) << 8 + | UInt32(self[start + 3]) + } + + func readLittleEndianUInt32(at offset: Int) -> UInt32? { + guard offset >= 0, offset + 4 <= count else { return nil } + let start = startIndex + offset + return UInt32(self[start]) + | UInt32(self[start + 1]) << 8 + | UInt32(self[start + 2]) << 16 + | UInt32(self[start + 3]) << 24 + } + + func readLittleEndianUInt16(at offset: Int) -> UInt16? { + guard offset >= 0, offset + 2 <= count else { return nil } + let start = startIndex + offset + return UInt16(self[start]) + | UInt16(self[start + 1]) << 8 + } +} diff --git a/Sources/XPBCCore/Validators/FtypValidator.swift b/Sources/XPBCCore/Validators/FtypValidator.swift new file mode 100644 index 0000000..3704c98 --- /dev/null +++ b/Sources/XPBCCore/Validators/FtypValidator.swift @@ -0,0 +1,24 @@ +import Foundation + +struct FtypValidator: FormatValidator { + func validate(_ data: Data) -> ValidationResult { + guard let boxSize = data.readBigEndianUInt32(at: 0) else { + return .invalid(reason: "too short for ftyp box (need >= 4 bytes)") + } + + // Per ISO BMFF, boxSize == 0 means "box extends to EOF" and boxSize == 1 means + // "64-bit extended size follows". Both are valid but rejected here for simplicity + // since typical ftyp boxes have a concrete small size. + // Minimum 12: box header (8) + major brand (4). Full ftyp also has minor_version (4) + // but we check for 12 as the bare minimum for a recognizable ftyp box. + guard boxSize >= 12 else { + return .invalid(reason: "ftyp box size \(boxSize) is less than minimum (12)") + } + + guard Int(boxSize) <= data.count else { + return .invalid(reason: "ftyp box size \(boxSize) exceeds data size \(data.count)") + } + + return .valid + } +} diff --git a/Sources/XPBCCore/Validators/GIFValidator.swift b/Sources/XPBCCore/Validators/GIFValidator.swift new file mode 100644 index 0000000..d5e9312 --- /dev/null +++ b/Sources/XPBCCore/Validators/GIFValidator.swift @@ -0,0 +1,21 @@ +import Foundation + +struct GIFValidator: FormatValidator { + func validate(_ data: Data) -> ValidationResult { + guard let width = data.readLittleEndianUInt16(at: 6) else { + return .invalid(reason: "too short for Logical Screen Descriptor (need >= 8 bytes)") + } + guard width > 0 else { + return .invalid(reason: "logical screen width is 0") + } + + guard let height = data.readLittleEndianUInt16(at: 8) else { + return .invalid(reason: "too short for Logical Screen Descriptor (need >= 10 bytes)") + } + guard height > 0 else { + return .invalid(reason: "logical screen height is 0") + } + + return .valid + } +} diff --git a/Sources/XPBCCore/Validators/JPEGValidator.swift b/Sources/XPBCCore/Validators/JPEGValidator.swift new file mode 100644 index 0000000..8a69a2f --- /dev/null +++ b/Sources/XPBCCore/Validators/JPEGValidator.swift @@ -0,0 +1,25 @@ +import Foundation + +struct JPEGValidator: FormatValidator { + func validate(_ data: Data) -> ValidationResult { + // After SOI (FF D8), next should be FF xx where xx in 0xC0...0xFE + guard data.count >= 4 else { + return .invalid(reason: "too short for marker after SOI (need >= 4 bytes)") + } + + guard data[data.startIndex + 2] == 0xFF else { + return .invalid( + reason: "expected 0xFF at offset 2, got 0x\(String(format: "%02X", data[data.startIndex + 2]))" + ) + } + + let marker = data[data.startIndex + 3] + guard (0xC0...0xFE).contains(marker) else { + return .invalid( + reason: "invalid marker 0xFF\(String(format: "%02X", marker)) at offset 2" + ) + } + + return .valid + } +} diff --git a/Sources/XPBCCore/Validators/PDFValidator.swift b/Sources/XPBCCore/Validators/PDFValidator.swift new file mode 100644 index 0000000..0adc7a9 --- /dev/null +++ b/Sources/XPBCCore/Validators/PDFValidator.swift @@ -0,0 +1,47 @@ +import Foundation + +struct PDFValidator: FormatValidator { + // PDF delimiter characters that follow a name object per ISO 32000. + private static let pdfDelimiters: CharacterSet = CharacterSet(charactersIn: " \t\r\n<>()[]/%") + + private static let dangerousKeywords: [String] = [ + "/JS", "/JavaScript", "/OpenAction", "/AA", "/Launch", + ] + + // NOTE: This check does not cover hex-encoded PDF name objects (e.g., /#4A#53 for /JS). + // Full coverage would require decoding PDF name hex escapes before matching. + func validate(_ data: Data) -> ValidationResult { + // isoLatin1 can decode any byte sequence, so this guard is defensive only. + guard let content = String(data: data, encoding: .ascii) + ?? String(data: data, encoding: .isoLatin1) + else { + return .invalid(reason: "unable to decode PDF content for inspection") + } + + for keyword in Self.dangerousKeywords { + if containsKeywordAtBoundary(content, keyword: keyword) { + return .invalid(reason: "contains potentially dangerous keyword '\(keyword)'") + } + } + + return .valid + } + + /// Check if the keyword appears in content followed by a PDF delimiter or at end of string. + /// This reduces false positives from names like "/JSActions" or "/AABattery". + private func containsKeywordAtBoundary(_ content: String, keyword: String) -> Bool { + var searchRange = content.startIndex.. ValidationResult { + // PNG: 8-byte signature + IHDR chunk (4-byte length + 4-byte "IHDR" + 13-byte data) + guard data.count >= 29 else { + return .invalid(reason: "too short for IHDR chunk (need >= 29 bytes, got \(data.count))") + } + + let ihdr: [UInt8] = [0x49, 0x48, 0x44, 0x52] + guard data[data.startIndex + 12.. 0 else { + return .invalid(reason: "width is 0 or unreadable") + } + + guard let height = data.readBigEndianUInt32(at: 20), height > 0 else { + return .invalid(reason: "height is 0 or unreadable") + } + + return .valid + } +} diff --git a/Sources/XPBCCore/Validators/TIFFValidator.swift b/Sources/XPBCCore/Validators/TIFFValidator.swift new file mode 100644 index 0000000..3acd8bf --- /dev/null +++ b/Sources/XPBCCore/Validators/TIFFValidator.swift @@ -0,0 +1,31 @@ +import Foundation + +struct TIFFValidator: FormatValidator { + func validate(_ data: Data) -> ValidationResult { + guard data.count >= 8 else { + return .invalid(reason: "too short for IFD offset (need >= 8 bytes)") + } + + let isLittleEndian = data[data.startIndex] == 0x49 // 'I' + let ifdOffset: UInt32? + if isLittleEndian { + ifdOffset = data.readLittleEndianUInt32(at: 4) + } else { + ifdOffset = data.readBigEndianUInt32(at: 4) + } + + guard let offset = ifdOffset else { + return .invalid(reason: "unable to read IFD offset") + } + + guard offset >= 8 else { + return .invalid(reason: "IFD offset \(offset) is less than minimum (8)") + } + + guard Int(offset) < data.count else { + return .invalid(reason: "IFD offset \(offset) exceeds data size \(data.count)") + } + + return .valid + } +} diff --git a/Sources/XPBCCore/Validators/WebPValidator.swift b/Sources/XPBCCore/Validators/WebPValidator.swift new file mode 100644 index 0000000..7ace922 --- /dev/null +++ b/Sources/XPBCCore/Validators/WebPValidator.swift @@ -0,0 +1,24 @@ +import Foundation + +struct WebPValidator: FormatValidator { + private static let vp8: [UInt8] = [0x56, 0x50, 0x38, 0x20] // "VP8 " + private static let vp8l: [UInt8] = [0x56, 0x50, 0x38, 0x4C] // "VP8L" + private static let vp8x: [UInt8] = [0x56, 0x50, 0x38, 0x58] // "VP8X" + + func validate(_ data: Data) -> ValidationResult { + guard data.count >= 16 else { + return .invalid(reason: "too short for chunk header (need >= 16 bytes)") + } + + let chunkID = data[data.startIndex + 12.. ValidationResult +} +``` + +`DataValidator.validate(_:as:)` uses an exhaustive `switch` on `DataType` to dispatch to the appropriate validator. This ensures the compiler catches missing validators when new formats are added. Text data (`.text`) is always valid and skips validation. + +### Validator Checks + +| Format | Validator | Checks | +|--------|-----------|--------| +| PNG | PNGValidator | IHDR chunk present, width/height > 0 | +| JPEG | JPEGValidator | Valid marker after SOI (0xC0–0xFE range) | +| GIF | GIFValidator | Logical Screen Descriptor width/height > 0 | +| TIFF | TIFFValidator | IFD offset within valid range (≥ 8, < data size) | +| BMP | BMPValidator | DIB header size is a known valid value (12, 40, 52, 56, 108, 124) | +| WebP | WebPValidator | VP8/VP8L/VP8X chunk header present | +| HEIC/AVIF | FtypValidator | ftyp box size ≥ 12 (per ISO 14496-12) and ≤ data size | +| PDF | PDFValidator | Rejects files containing dangerous keywords at PDF name boundaries (`/JS`, `/JavaScript`, `/OpenAction`, `/AA`, `/Launch`) | + +### Byte Reading + +Validators use safe byte-reading methods defined as a `Data` extension in `ByteReader.swift`: + +```swift +extension Data { + func readBigEndianUInt32(at offset: Int) -> UInt32? + func readLittleEndianUInt32(at offset: Int) -> UInt32? + func readLittleEndianUInt16(at offset: Int) -> UInt16? +} +``` + +All methods perform boundary checks and return `nil` if the offset is out of range, preventing out-of-bounds crashes regardless of caller behavior. + +### PDF Validation + +`PDFValidator` scans for dangerous PDF keywords with boundary-aware matching: a keyword must be followed by a PDF delimiter character (whitespace, `<`, `>`, `(`, `)`, `[`, `]`, `/`, `%`) or appear at the end of the file. This reduces false positives from names like `/JSActions` or `/AABattery`. + +Known limitation: hex-encoded PDF name objects (e.g., `/#4A#53` for `/JS`) are not decoded before matching. This is documented in the source. + ## Pasteboard Writing `PasteboardWriter` maps `DataType` to `NSPasteboard.PasteboardType` (UTI strings) and writes raw bytes via `NSPasteboard.setData(_:forType:)`. For text, it decodes via UTF-8 with a Latin-1 fallback (which can decode any byte sequence) and uses `setString(_:forType:)`. -Key design decisions: +### Control Character Stripping + +All text (both UTF-8 and Latin-1 fallback) is sanitized by `stripControlCharacters` before being placed on the clipboard. This removes C0 control characters (U+0000–U+001F except tab, newline, carriage return) and DEL (U+007F) to prevent terminal escape sequence injection. The filter checks all Unicode scalars in each `Character` to correctly handle multi-scalar graphemes. + +### Key Design Decisions - **No image decoders**: `NSImage`, `CGImageSource`, and `NSBitmapImageRep` are never used. This avoids exposure to vulnerabilities in ImageIO/CoreGraphics (e.g., CVE-2021-30860 FORCEDENTRY, CVE-2023-41064 BLASTPASS). - **clearContents timing**: The pasteboard is cleared immediately before writing, after all validation and data preparation is complete. @@ -84,41 +134,64 @@ All errors are modeled as `XPBCError`, a `LocalizedError` enum: | `inputTooLarge(size:maxMB:)` | Input exceeds 100 MB limit | | `pasteboardWriteFailed` | `NSPasteboard.setData/setString` returned false | | `invalidArgument(String)` | Unrecognized CLI flag or pasteboard name | +| `validationFailed(format:reason:)` | Structural validation of image header failed | -The CLI distinguishes expected errors (`XPBCError` -> exit 1) from unexpected errors (exit 2) for easier debugging. +The CLI distinguishes expected errors (`XPBCError` -> exit 1) from unexpected errors (exit 2) for easier debugging. `DataType` conforms to `CustomStringConvertible` so that validation error messages display stable, human-readable format names (e.g., "PNG", "JPEG"). ## Module Boundaries | Component | Access Level | Rationale | |-----------|-------------|-----------| | `DataType` | `public` | Used by both library and executable | -| `DataTypeDetector.detect(from:)` | `public` | Primary API | +| `DataTypeDetector.detect(from:)` | `public` | Primary detection API | +| `DataValidator.validate(_:as:)` | `public` | Primary validation API | +| `ValidationResult` | `public` | Returned by validation API | | `StdinReader.read()` | `public` | Called from executable | | `PasteboardWriter` | `public` | Called from executable | | `XPBCError` | `public` | Caught in executable | | `NSPasteboard.Name.from(userInput:)` | `public` | CLI argument parsing | | `FormatDetector` protocol | `internal` | Implementation detail | +| `FormatValidator` protocol | `internal` | Implementation detail | | `detectors` array | `internal` | Implementation detail | | `maxInputSize` / `maxInputSizeMB` | `internal` | Implementation detail | | All concrete detectors | `internal` | Implementation detail | +| All concrete validators | `internal` | Implementation detail | +| `Data` byte-reading extension | `internal` | Implementation detail | ## Adding a New Format -1. Create a new struct conforming to `FormatDetector` in `Sources/XPBCCore/Detectors/` (or use `FtypDetector` for ISOBMFF-based formats) -2. Add it to `DataTypeDetector.detectors` in the appropriate position by signature length -3. Add a case to `DataType` enum -4. Add UTI mapping in `PasteboardWriter.pasteboardType(for:)` and the case list in `write(_:as:)` -5. Add tests in `DataTypeDetectorTests` +1. Add a case to the `DataType` enum (also add a `description` in the `CustomStringConvertible` conformance) +2. Create a new struct conforming to `FormatDetector` in `Sources/XPBCCore/Detectors/` (or use `FtypDetector` for ISOBMFF-based formats) +3. Add it to `DataTypeDetector.detectors` in the appropriate position by signature length +4. Create a new struct conforming to `FormatValidator` in `Sources/XPBCCore/Validators/` +5. Add a case in `DataValidator.validate(_:as:)` for the new type +6. Add UTI mapping in `PasteboardWriter.pasteboardType(for:)` and the case list in `write(_:as:)` +7. Add detection tests in `DataTypeDetectorTests` and validation tests in `DataValidatorTests` -The compiler will guide steps 4 via exhaustive switch errors. +The compiler will guide steps 1, 5, and 6 via exhaustive switch errors. ## Testing -Tests cover `DataTypeDetector.detect(from:)` with 24 test cases: +Tests cover two suites with 74 test cases total: + +### DataTypeDetectorTests (24 tests) - **Format detection** (11): one per supported format, including both GIF versions and both TIFF endiannesses - **Text fallback** (2): ASCII and Japanese UTF-8 - **Edge cases** (6): empty data, single byte, partial headers, RIFF non-WebP, ftyp non-image, random binary - **Security** (5): oversized data with valid header, all-zero bytes, all-0xFF bytes, partial JPEG signatures +### DataValidatorTests (50 tests) + +- **PNG validation** (7): valid, too short, missing IHDR, zero width/height, exact minimum size (29 bytes), one byte below minimum +- **JPEG validation** (4): valid, no marker prefix, invalid marker range, too short +- **GIF validation** (4): valid, zero width/height, too short +- **TIFF validation** (5): valid LE/BE, IFD offset too small/exceeds data, too short +- **BMP validation** (4): valid DIB sizes (40, 124), invalid DIB size, too short +- **WebP validation** (5): VP8/VP8L/VP8X valid, unknown chunk, too short +- **HEIC/AVIF validation** (5): valid HEIC/AVIF, box size too small, exceeds data, just below minimum (11), exact minimum (12) +- **PDF validation** (8): valid, /JS, /JavaScript, /OpenAction, /AA, /Launch, false positive tests (/JSActions, /AABattery), keyword at end of file +- **Control character stripping** (4): ESC removal, tab/newline/CR preservation, NUL removal, multi-scalar grapheme passthrough +- **Text passthrough** (2): always valid, empty data valid + Test data uses in-memory byte arrays (no fixture files needed).