zkdiff is a zero-knowledge implementation of the myers diff algorithm designed to prove that two files were compared correctly, and optionally, that a specific set of line-level changes were computed, without exposing full file contents.
- Zero-Knowledge Proofs: Generate verifiable proofs without revealing file contents
- Selective Redaction: Hide sensitive portions while proving diff structure
- Myers Algorithm: Efficient diff computation with optimal edit distance
- Cryptographic Integrity: SHA256 hashing ensures file authenticity
- Tamper-Proof: Proofs are cryptographically verifiable and cannot be forged
- Privacy-Preserving: Redacted content never leaves the secure computation environment
- Rust toolchain (1.70 or later)
- RISC Zero zkVM dependencies (automatically handled by cargo)
cargo install --git https://github.com/tanctl/zkdiff
git clone https://github.com/yourusername/zkdiff.git
cd zkdiff
cargo build --releaseThe binary will be available at target/release/zkdiff.
Compare two files and generate a zero-knowledge proof:
zkdiff generate -a file1.txt -b file2.txt -o proof.jsonHide specific lines while proving the diff structure:
zkdiff generate -a file1.txt -b file2.txt -r "delete:5-10,insert:15-20" -o proof.jsonCryptographically verify a proof file:
zkdiff verify proof.jsonzkdiff <COMMAND>
Commands:
generate Generate a zero-knowledge diff proof
verify Verify a zero-knowledge diff proof
help Print this message or the help of the given subcommand(s)
zkdiff generate [OPTIONS] --file-a <FILE> --file-b <FILE>
Options:
-a, --file-a <FILE> First file to compare
-b, --file-b <FILE> Second file to compare
-r, --redact <RANGES> Redaction ranges [default: ""]
-o, --output <FILE> Output file for the proof [default: zkdiff.proof]zkdiff verify <PROOF_FILE>
Arguments:
<PROOF_FILE> Proof file to verifyRedaction ranges specify which lines to hide based on operation type:
operation:start-end,operation:start-end
deleteord: Redact deleted linesinsertori: Redact inserted linesreplaceorr: Redact replaced lines
# Redact deleted lines 5-10 and inserted lines 15-20
zkdiff generate -a file1.txt -b file2.txt -r "delete:5-10,insert:15-20" -o proof.json
# Short form syntax
zkdiff generate -a file1.txt -b file2.txt -r "d:1-3,i:7-9,r:12-15" -o proof.json
# Single line redaction
zkdiff generate -a file1.txt -b file2.txt -r "d:5-5" -o proof.jsonzkdiff uses a host-guest architecture with the RISC Zero zkVM to provide zero-knowledge file comparison:
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ zkdiff System │
├─────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────┐ ┌─────────────────────────────────┐ │
│ │ Host Program │ │ RISC Zero zkVM │ │
│ │ (host/src/main.rs) │ │ │ │
│ │ │ │ ┌───────────────────────────┐ │ │
│ │ ┌─────────────────────────┐ │ │ │ Guest Program │ │ │
│ │ │ CLI Interface │ │ │ │ (methods/guest/main.rs) │ │ │
│ │ │ • generate command │ │ │ │ │ │ │
│ │ │ • verify command │ │ │ │ ┌─────────────────────┐ │ │ │
│ │ │ • argument parsing │ │ │ │ │ Myers Algorithm │ │ │ │
│ │ └─────────────────────────┘ │ │ │ │ • diff computation │ │ │ │
│ │ │ │ │ │ • edit sequence │ │ │ │
│ │ ┌─────────────────────────┐ │ Input │ │ └─────────────────────┘ │ │ │
│ │ │ File Operations │ │────────▶ │ │ │ │ │
│ │ │ • read files A & B │ │ │ │ ┌─────────────────────┐ │ │ │
│ │ │ • SHA256 hashing │ │ │ │ │ Redaction Logic │ │ │ │
│ │ │ • proof serialization │ │ │ │ │ • selective hiding │ │ │ │
│ │ └─────────────────────────┘ │ │ │ │ • range filtering │ │ │ │
│ │ │ │ │ └─────────────────────┘ │ │ │
│ │ ┌─────────────────────────┐ │ Output │ │ │ │ │
│ │ │ Proof Verification │ │◀──────── │ │ ┌─────────────────────┐ │ │ │
│ │ │ • cryptographic check │ │ │ │ │ Proof Generation │ │ │ │
│ │ │ • method ID validation │ │ │ │ │ • integrity hash │ │ │ │
│ │ │ • receipt validation │ │ │ │ │ • structured output│ │ │ │
│ │ └─────────────────────────┘ │ │ │ └─────────────────────┘ │ │ │
│ └─────────────────────────────────┘ │ └───────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────┘ │
│ │
├─────────────────────────────────────────────────────────────────────────────────────┤
│ Data Flow │
├─────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ Input Files ──┐ ┌── Cryptographic │
│ │ │ Proof Output │
│ file_a.txt ├── SHA256 ──┐ ┌─────────┴───────────────────── │
│ file_b.txt │ Hashing │ │ │ │
│ │ │ │ ┌─────────────────────────┐ │ │
│ Redaction ───┘ │ │ │ Proof Structure │ │ │
│ Ranges ▼ │ │ • file_a_hash │ │ │
│ ┌───────────────┐ │ │ • file_b_hash │ │ │
│ │ DiffInput │ │ │ • diff_lines[] │ │ │
│ │ Structure │────────────▶│ │ • proof_hash │ │ │
│ │ │ │ │ • method_id │ │ │
│ └───────────────┘ │ │ • receipt (zkProof) │ │ │
│ │ └─────────────────────────┘ │ │
│ └───────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────┘
-
Host Program (
host/src/main.rs:1-301) - Untrusted execution environment:- CLI Interface: Command parsing and user interaction
- File Operations: Reading input files and generating SHA256 hashes
- Proof Management: Serialization/deserialization of proof data
- Verification Engine: Cryptographic validation of zkVM receipts
-
Guest Program (
methods/guest/src/main.rs:1-293) - Trusted execution inside zkVM:- Myers Algorithm: Optimal diff computation with O((N+M)D) complexity
- Redaction Engine: Selective content hiding based on operation ranges
- Integrity Verification: Hash validation and proof generation
-
Methods Bridge (
methods/src/lib.rs:1) - Interface layer:- METHOD_ELF: Compiled guest program bytecode
- METHOD_ID: Cryptographic identifier for guest program version
-
RISC Zero zkVM - Zero-knowledge execution environment:
- Deterministic Execution: Guarantees reproducible computation
- Privacy Preservation: Redacted content never leaves secure environment
- Cryptographic Proofs: STARK-based verification without re-execution
Input Files → Hash Verification → Myers Diff → Redaction → ZK Proof → Verification
- File Processing: Files are read and hashed for integrity
- Diff Computation: Myers algorithm calculates optimal edit sequence
- Redaction Applied: Specified ranges are hidden from output
- Proof Generation: zkVM creates cryptographic proof of computation
- Verification: Proof can be verified without access to original files
- Computational Integrity: RISC Zero zkVM ensures correct execution
- Privacy Preservation: Redacted content never leaves the secure environment
- Tamper Resistance: Proofs are cryptographically bound to inputs
- Hash Verification: SHA256 prevents input tampering
Proofs are stored as JSON files containing:
{
"verified": true,
"output": {
"file_a_hash": "sha256_hash_of_file_a",
"file_b_hash": "sha256_hash_of_file_b",
"diff_lines": [
{
"line_number_a": 1,
"line_number_b": null,
"operation": "Delete",
"content": null,
"redacted_length": 42
}
],
"proof_hash": "integrity_hash"
},
"method_id": "guest_program_identifier",
"proof_generated": true,
"receipt": "cryptographic_proof_data"
}- File Hashes: SHA256 hashes of original files
- Diff Lines: Structured diff output with redactions
- Proof Hash: Integrity hash of the entire proof
- Receipt: Cryptographic proof from RISC Zero zkVM
- Method ID: Identifier for the guest program version
# Create test files
echo -e "line1\nline2\nline3" > file1.txt
echo -e "line1\nmodified\nline3\nline4" > file2.txt
# Generate proof
zkdiff generate -a file1.txt -b file2.txt -o example.proof
# Verify proof
zkdiff verify example.proof# Compare code files, hiding sensitive function implementations
zkdiff generate \
-a old_code.rs \
-b new_code.rs \
-r "insert:50-75,delete:20-30" \
-o code_review.proof
# Reviewer can verify changes without seeing redacted sections
zkdiff verify code_review.proof# Prove vulnerability was fixed without revealing details
zkdiff generate \
-a vulnerable.c \
-b patched.c \
-r "d:123-145,i:150-175" \
-o security_fix.proof
# Verify the fix without seeing sensitive code
zkdiff verify security_fix.proof# Debug build
cargo build
# Release build
cargo build --release
# Run tests
cargo test
# Format code
cargo fmt
# Check for linting issues
cargo clippyzkdiff/
├── Cargo.toml # Workspace configuration
├── host/ # Host program (CLI interface)
│ ├── Cargo.toml
│ └── src/
│ └── main.rs
├── methods/ # Guest program compilation
│ ├── Cargo.toml
│ ├── build.rs
│ ├── src/
│ │ └── lib.rs
│ └── guest/ # Guest program (runs in zkVM)
│ ├── Cargo.toml
│ └── src/
│ └── main.rs
├── target/ # Build artifacts
└── rust-toolchain.toml
Given two files A and B, and a set of redaction rules, zkdiff proves the following inside a zero-knowledge virtual machine:
- The SHA-256 hashes of
file_a_contentandfile_b_contentmatch the claimedfile_a_hashandfile_b_hash. - The Myers diff was run on the files line-by-line and the correct edit trace was produced.
- The output diff contains the correct line differences, where some lines may be redacted in accordance with user-specified ranges.
- The output includes a
proof_hashthat cryptographically commits to all revealed + redacted edits, making verification of redacted diffs possible without revealing the redacted lines.
The diff output is based on a variant of the O(ND) Myers algorithm which computes the shortest edit script to convert lines from file A to file B. Each edit operation is tracked as:
Insert— A new line appears in B but not ADelete— A line is removed in B that was present in AKeep(internal use) — A line that exists in both A and B (not included in final diff)Replace— Not explicitly emitted; encoded via Delete+Insert pairs on adjacent lines
RISC Zero zkVM provides:
- Deterministic Execution: Same inputs always produce same proofs
- Computational Integrity: Proofs guarantee correct execution
- Privacy: Intermediate values never leave the secure environment
- Verifiability: Proofs can be verified without re-execution
- Diff Algorithm: O((N+M)D) where N, M are file sizes, D is edit distance
- Proof Generation: ~1000x slower than native execution (zkVM overhead)
- Verification: O(1) - constant time regardless of file size
| File Size | Lines | Proof Generation | Verification |
|---|---|---|---|
| 1KB | 50 | ~2 seconds | ~0.1 seconds |
| 10KB | 500 | ~15 seconds | ~0.1 seconds |
| 100KB | 5000 | ~2 minutes | ~0.1 seconds |
| Performance varies by system configuration.* |
- Malicious Prover: Cannot generate false proofs
- Untrusted Verifier: Cannot access redacted content
- Network Adversary: Proofs are self-contained and verifiable
- Quantum Resistance: STARK proofs are post-quantum secure
- Proof Size: Proofs are larger than original diffs
- Generation Time: Significant computational overhead
- Trust Assumptions: Requires trust in RISC Zero zkVM
- Side Channels: Timing attacks may reveal information
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- RISC Zero for the zero-knowledge virtual machine
- Eugene Myers for the optimal diff algorithm
zkdiff - Trustless diffing with zero-knowledge proofs