C4 — Universal Content Identification

Everything in Unix is a file, except for the filesystem itself. Until now.

C4 is a file management system designed to simplify distributed systems. It decouples file storage and transport from organization and metadata by providing a standardized content-derived identifier, and using these identifiers in place of file content in a simple, readable, text-based filesystem format.

$ echo "hello" | c4
c447Fm3BJZQ62765jMZJH4m28hrDM7Szbj9CUmj4F4gnvyDYXYz4WfnK2nYRhFvRgYEectEXYBYWLDpLo6XGNAfKdt

That 90-character string is a SMPTE ST 2114:2017 identifier — an international standard built on SHA-512. Run it on any machine, in any language, ten years from now. Same input, same ID. Unlike a UUID, which is assigned arbitrarily and requires coordination to be meaningful, a C4 ID is discovered from the data itself. If two people have a file with the same ID, they have identical copies of the same file.

C4 extends this idea to entire filesystems. c4 id produces a c4m file — a plain-text description that captures the identity of every file and directory in a tree:

$ c4 id ./project/
-rw-r--r-- 2026-03-18T22:22:57Z 13 README.md c44iCq6un9W47x7ydjJSWp4arMJ...
drwxr-xr-x 2026-03-18T22:22:57Z 66 src/ c44nbgL6nkBWsEBDCUCr4LufsjVhJt...
  -rw-r--r-- 2026-03-18T22:22:57Z 66 main.go c43Q4j81SxGkV9FhbeW23YrMTj6...

A c4m file is just text. Pipe it, grep it, diff it, email it. Each line looks like ls -l with a content ID at the end. The format is designed to compose with awk, sort, grep, and the rest of the Unix toolkit — no special tools required, though C4 provides some that are useful.

Install

Homebrew (recommended — includes c4 and c4sh)

brew install mrjoshuak/tap/c4

Binary downloads

Pre-built archives for macOS, Linux, and Windows: c4toolkit releases

From source

go install github.com/Avalanche-io/c4/cmd/c4@latest
go install github.com/Avalanche-io/c4sh@latest

Other languages

pip install c4py                    # Python
npm install @avalanche-io/c4        # TypeScript / JavaScript

See c4toolkit for the full suite and version matrix.

What can you do with it?

Know what changed. Snapshot a directory, come back later, compare:

c4 id ./deliverables/ > monday.c4m
# ... work happens ...
c4 diff monday.c4m ./deliverables/

Build history. Append diffs to a c4m file to create a version chain:

c4 id ./project/ > project.c4m                                  # snapshot
c4 diff project.c4m ./project/ >> project.c4m                    # append changes

c4 log project.c4m          # see what changed
c4 patch -n 1 project.c4m   # recover the original state

Make a directory match a description. Declarative — you say what the result should be, C4 figures out how to get there:

c4 patch target.c4m ./dir/

Creates, moves, and removes files as needed. Nothing starts until all required content is confirmed available. Safe to re-run after interruption.

Store and retrieve content by ID:

c4 id -s ./final/ > delivery.c4m     # identify + store
c4 cat c44iCq6un9W47...              # retrieve by ID

The store can be a local directory, an S3-compatible object store, or both. Content goes in once, comes out by ID.

Undo what you just did:

c4 patch -s new_state.c4m ./dir/ > changeset.c4m    # apply + preserve
c4 patch -r changeset.c4m ./dir/                     # revert

Work with Unix tools. One entry per line, fields in predictable positions:

awk '{print $NF}' project.c4m | sort | uniq -d       # find duplicates
grep '\.exr ' project.c4m | wc -l                    # count EXR files

Merge two trees. Combine branches, deliveries, or any two c4m files into one:

c4 merge branch-a.c4m branch-b.c4m > merged.c4m

Conflicts (same path, different content) are reported. Non-overlapping entries are combined.

Scan fast, hash later. Structure-only scans skip content hashing. Edit the manifest, then hash only what survived:

c4 id -m s ./project/ > scan.c4m     # names only
vi scan.c4m                           # remove what you don't want
c4 id -c scan.c4m ./project/          # hash the rest

Commands

Command	What it does
`c4 id`	Identify files, directories, or c4m files
`c4 cat`	Retrieve/display content by C4 ID or c4m file path
`c4 diff`	Compare two states (c4m files or directories)
`c4 patch`	Apply a target state: reconcile directories, resolve chains
`c4 merge`	Combine two or more trees
`c4 log`	Show patch history
`c4 split`	Split a patch chain
`c4 explain`	Human-readable narration of what a command would do
`c4 paths`	Convert between c4m format and plain path lists
`c4 intersect`	Find common entries between two c4m files

Every command that takes a c4m file also takes a directory, and vice versa. c4 <path> is a shortcut for c4 id -s (identify and store). echo "data" | c4 produces a bare ID from stdin and stores it.

The c4m format

A c4m file is a complete filesystem description in plain text. Each line has permissions, timestamp, size, name, and C4 ID. Readable, editable, diffable, composable.

A 10,000-entry c4m file is about 1.4 MB. It describes the identity of every file in the tree regardless of how large those files are. The description is the lightweight handle; the content is the heavy thing it refers to.

Go library

import "github.com/Avalanche-io/c4"

id := c4.Identify(strings.NewReader("hello"))
fmt.Println(id)
// c447Fm3BJZQ62765jMZJH4m28hrDM7Szbj9CUmj4F4gnvyDYXYz4WfnK2nYRhFvRgYEectEXYBYWLDpLo6XGNAfKdt

C4 IDs are 90-character base58 strings — SHA-512 with a c4 prefix. URL-safe, filename-safe, double-click selectable.

Zero external dependencies. Go 1.16+.

Links

C4 Framework Universal Asset ID (video)
C4 ID Whitepaper
CLI Reference
Getting Started
FAQ — design decisions (SHA-512 permanence, c4m format, store scaling)

The C4 toolkit

C4 is part of a cross-language ecosystem. Every tool reads and writes the same c4m format and produces identical C4 IDs:

Tool	Language	What it does
c4	Go	CLI — identify, diff, patch, merge
c4sh	Go	Shell — cd into c4m files, browse, copy
c4py	Python	Library — scan, diff, verify, store
c4git	Go	Git filter — version large files
c4ts	TypeScript	Browser + Node.js — zero dependencies
c4-swift	Swift	Apple platforms — Sendable, Codable
libc4	C	Embed in any application

See c4toolkit for the full suite, install options, and version matrix.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 271 Commits
.github/workflows		.github/workflows
c4m		c4m
cmd/c4		cmd/c4
design		design
docs		docs
reconcile		reconcile
scan		scan
store		store
.gitignore		.gitignore
ARCHITECTURAL_CATALOG.md		ARCHITECTURAL_CATALOG.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS		CONTRIBUTORS
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
doc.go		doc.go
go.mod		go.mod
go.sum		go.sum
id.go		id.go
id_test.go		id_test.go
internals_test.go		internals_test.go
tree.go		tree.go
tree_test.go		tree_test.go
treeslice_bench_test.go		treeslice_bench_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C4 — Universal Content Identification

Install

Homebrew (recommended — includes c4 and c4sh)

Binary downloads

From source

Other languages

What can you do with it?

Commands

The c4m format

Go library

Links

The C4 toolkit

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 5

Languages

Folders and files

Latest commit

History

Repository files navigation

C4 — Universal Content Identification

Install

Homebrew (recommended — includes c4 and c4sh)

Binary downloads

From source

Other languages

What can you do with it?

Commands

The c4m format

Go library

Links

The C4 toolkit

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 5

Languages

Packages