Standardizing the Chain of Custody for Human & Artificial Intelligence.
Rhodi is a document format and a "truth engine" designed to promote trustworthy content. It extends Markdown, using the frontmatter to store all the metadata needed for cryptographic verification. It is written in Rust, with plans to deploy via WASM and Python packages.
This is an experiment in accountability. Rhodi was born from a growing concern about information corruption in the digital age. As AI-generated content proliferates and attention economies incentivize engagement over truth, I wanted to explore the subject by building something tangible rather than just think abstractly about these problems. It explores how we might verify the origins of information, track the chain of custody for ideas.
I've chosen to release this as open source because:
- Feedback is welcome – Recommendations, alternative approaches, and critiques will only make this better
- Debate is essential – The problems we're trying to solve are complex and require diverse perspectives
- Collaboration strengthens – Information integrity is a shared concern, not a solo endeavor
This is an alpha implementation. It's a starting point, not a finished solution. If you're interested in these problems—whether from journalism, research, AI safety, cryptography, or just curiosity—I encourage you to try it, break it, and share your thoughts.
This project stems from my experience working in innovation and research consulting. I observed that by the time information reached its final stage, it was often corrupted and arbitrary—not through malice, but through accumulated distortion in the chain of custody.
Trace Protocol emerged from years of interacting with large organizations. In high-stakes environments, the "chain of custody" for an idea is often non-existent. As insights move from raw data to final executive summaries, they undergo a process of information entropy: a gradual decay where nuance is lost, and claims are stripped of their evidence.
Traditional document formats (.docx, .pdf, .pptx) are opaque binary blobs that prioritize visual layout over data integrity. They are not "machine-readable" in a meaningful way, making it nearly impossible to audit how a specific claim evolved or who is responsible for a change.
The current research landscape suffers from five critical failures:
- The "Blob" Problem: Proprietary formats are black boxes. They cannot be easily version-controlled (via diff), making the evolution of a document invisible.
- The Accountability Gap: Content can be altered or fabricated without leaving a trace. There is no cryptographic link between a claim and its author.
- The Metadata Divorce: Technical metadata (who, when, where) is stored separately from the content, making it easy to strip, fake, or lose during file transfers.
- Information Entropy: Manual "copy-pasting" between tools causes data degradation. By the final report, a "suggested trend" often becomes "hard fact" through sheer repetition.
- Biological vs. Synthetic Reasoning: As AI agents enter the workflow, we lack a standard to distinguish between human-verified insights and AI-generated synthesis. This leads to "reasoning pollution," where hallucinations are ingested into the research chain as truth.
- Trace Protocol is a response to this crisis. It is a transition from opaque, "dead" files to a ledger of truth: a system where every claim is signed, every edit is hashed, and the lineage of an idea is baked into its very structure.
In modern research—both commercial and academic—Content is inextricably mixed with Form. Documents are heavy, opaque binary blobs (DOCX, PDF) that do not "speak" to each other.
Worse, as AI agents enter the workflow, we face a crisis of reasoning pollution. When hallucinated or anecdotal information is ingested into a research process without flags, it corrupts the entire output.
Trace is a document format and engine designed to standardize research workflows, with specific regard to Agentic and Hybrid (AI + Human) collaboration.
It is not just a file format; it is a ledger of truth.
- Immutability: Every edit creates a new version hash. History is preserved, never overwritten.
- Identity First: Every document is signed. We explicitly distinguish between Human Authorship (biological reasoning) and Agent Generation (synthetic reasoning).
- Backward Compatibility Only: To prevent circular logic, documents can only cite versions that existed before them.
- Separation of Concerns:
- The Envelope (YAML Frontmatter): Contains the metadata, lineage, and signatures.
- The Payload (Markdown Body): Contains the pure narrative and evidence blocks.
This protocol is designed for contexts where minimizing bias and information corruption is critical:
- Commercial and academic research
- Journalism
- Regulatory compliance
- Any domain requiring auditable reasoning trails
We follow a "Core + Bindings" philosophy to ensure speed, safety, and ubiquity.
The heart of the protocol is written in Rust. This ensures:
- Memory Safety: Critical for handling untrusted data inputs.
- Performance: Fast hashing and signing operations for large datasets.
- Single Source of Truth: The logic for "Is this document valid?" exists in only one place.
Current Implementation:
- Canonicalization: Deterministic normalization (LF line endings, sorted YAML keys, Unicode control character stripping)
- Hashing: SHA-256 for content integrity
- Signing: Ed25519 for cryptographic authenticity
- Trace Verification: Granular evidence locking for Markdown sources
- Protocol Versioning: Version field in frontmatter with registry (Current/Deprecated/Obsolete)
- Document Versioning: Auto-incrementing version with previous hash chaining
- CLI: Full command-line tool for document management
The core is exposed via FFI (Foreign Function Interface) to:
- Python: For data scientists and AI researchers (planned)
- WASM/TypeScript: For web-based editors and visualizations (planned)
- CLI: For easy interaction via terminal (✅ implemented)
use core::{TracedDocument, KeyPair, DocStatus};
// 1. Create a new document
let mut doc = TracedDocument::new(
"Market Analysis Q4 2025",
"The market shows signs of saturation in the consumer electronics sector."
)
.author("Research Team")
.set_status(DocStatus::Draft);
// 2. Add evidence with a trace block
doc.body.push_str("\n\n```trace\nsource: ./data/market_report.md\nexpected: \"15% decline\"\nmethod: automatic\n```\n");
// 3. Update all trace hashes (verify sources exist and compute SHA-256)
doc.update_all_traces(&std::env::current_dir().unwrap()).unwrap();
// 4. Seal the document (compute version_hash, sign with Ed25519, set status to Published)
// This also increments doc_version and chains prev_version_hash
let keypair = KeyPair::generate();
doc = doc.seal(&keypair);
// 5. Verify the document's integrity and authenticity
doc.verify(&keypair.verifying_key).expect("Document verification failed");
// 6. The document is now cryptographically sealed
println!("Version Hash: {:?}", doc.frontmatter.version_hash);
println!("Status: {:?}", doc.frontmatter.doc_status); // Published
println!("Protocol Version: {:?}", doc.frontmatter.protocol_version); // "1.0"
println!("Document Version: {:?}", doc.frontmatter.doc_version); // 1Detailed documentation for the protocol and its implementation:
- Concept: The philosophy and "Truth Engine" vision behind the project.
- Architecture: High-level system design, data flow, and Mermaid diagrams.
- Trace Protocol: Detailed specification of the
traceblock for evidence verification. - Include Protocol: Specification for modular document composition using
includeblocks. - Versioning: Protocol and document versioning strategy.
- JSON Schema: Formal schema definition for Traced Markdown Documents.
- Sample Document: An example of a
.tmdfile following the protocol.
| Status | Hash Required | Signature Required | Use Case |
|---|---|---|---|
| Notes | No | No | Brainstorming, raw ideas |
| Draft | Recommended | No | Work in progress, assembling evidence |
| Published | Yes | Yes | Final, immutable, auditable documents |
When a document is "sealed", it:
- Updates the document status to
Publishedand sets themodified_attimestamp - Canonicalizes the content (normalizes line endings to LF, strips trailing whitespace, sorts YAML keys)
- Computes a SHA-256
version_hashof the frontmatter (excludingversion_hash,prev_version_hash, andsignature) + body - Signs the hash with the author's Ed25519 private key
Any modification to the document after sealing will cause verification to fail.
Alpha. Core Rust implementation complete with:
- ✅ Document hashing and canonicalization
- ✅ Ed25519 signing and verification
- ✅ Trace block parsing and granular hash updates
- ✅ Full seal-and-verify workflow
- ✅ Protocol versioning (Current/Deprecated/Obsolete)
- ✅ Document versioning with hash chaining
- ✅ CLI tool with seal, verify, status, init, keygen commands
- ✅ Include block for modular composition
Next Steps:
- Python bindings via PyO3
- WASM bindings for browser-based validation
- External version registry support
- Citation chain verification (warn when citing deprecated docs)
# Clone the repository
git clone https://github.com/dimitriberti/rhodi.git
cd rhodi/core
# Run tests
cargo test
# Build the CLI
cargo build --release
# Or run directly
cargo run -- init my-document.tmd --title "My Research"# Create a new document
rhodi init doc.tmd --title "Research Notes" --author "Your Name"
# Generate a signing key
rhodi keygen --name default
# Seal the document (hash + sign)
rhodi seal doc.tmd
# Verify integrity
rhodi verify doc.tmd
# Check document status
rhodi status doc.tmdFor more details, see the CLI help: rhodi --help
Contributions are welcome! Please read AGENTS.md for development guidelines, then:
- Fork the repo
- Create a feature branch
- Make your changes
- Run
cargo testandcargo clippy -- -D warnings - Submit a pull request
Dual licensed under MIT (open source) and commercial terms. See LICENSE file for details.
As AI agents increasingly participate in knowledge work, the "Chain of Custody" for information is becoming critical for trust. If you cannot trace a piece of reasoning back to a verified human or a signed data source, the output becomes unreliable.