ARIA Protocol

Autonomous Responsible Intelligence Architecture

An AI that's truly yours.

Using capable AI today means renting it from a handful of cloud providers who own the model, the compute, your data, and the memory of every conversation you have. ARIA gives that back. It's an open-source, peer-to-peer protocol that runs AI on hardware you already own — the model you choose, the compute you control, and the data and memory that never leave your machine.

ARIA runs 1-bit/ternary models CPU-first, scales up through standard quantized and specialist models when you want more, and links nodes into a peer-to-peer network so you can reach models bigger than your own hardware — and a resilient, license-gated library of open weights.

What you actually own

Sovereignty here means ownership across the four things cloud AI centralizes:

The model — open weights you choose and can inspect, not a black box behind an API.
The compute — your hardware, CPU-first, GPU-optional.
The data & memory — they stay on your device; nothing is uploaded to be mined.
The governance — MIT-licensed, auditable, community-governed, with a public no-rug-pull commitment.

Private, offline inference is table stakes — tools like llama.cpp already give you that. Sovereignty is the larger claim: ownership and control end to end, including the parts the local-AI world still leaves centralized — model distribution, long-term memory, and governance.

Pillar 1 — Inference that fits your hardware

ARIA runs the same protocol across tiers, so a vintage laptop and a modern workstation each get the best model they can actually run:

Tier	Models (representative)	Backend	Memory floor	Use case
🌱 Efficiency	BitNet b1.58 2B-4T · Falcon-E 1B/3B · Falcon3 1.58-bit 1B–10B	`bitnet.cpp`	0.4 GB RAM	Always-on chat on any CPU; low-power laptops; background nodes
⚡ Quality	Gemma 4 · Qwen 3.5 · GLM-4 9B · Granite 4.0 · Phi-4 mini · SmolLM3 · OLMo 3 · Ministral 3	mainline `llama.cpp`	1.6 GB RAM	Multilingual chat, longer context, multimodal (vision + audio)
🛠️ Specialist	Qwen2.5-Coder 7B · DeepSeek-R1 0528 8B · Granite 3.3 8B · Phi-4 mini reasoning · Qwen3-VL 2B	mainline `llama.cpp`	1.5 GB RAM	Code, reasoning, math, vision — routed on demand
🚀 Accelerated	Qwen3 8B/14B · Phi-4 14B · Mistral Nemo 12B · DeepSeek-R1 14B	`llama.cpp` + GPU offload	4 GB VRAM	Dense models on a discrete GPU / Apple Silicon (≈6–12 GB VRAM)

A node operator picks one of the profile presets (minimal · efficient (default) · balanced · full · specialist_only), and the profile decides which tiers light up. The default keeps nodes lean (Efficiency only); heavier profiles add Quality and Specialist; the Accelerated tier engages automatically when a supported GPU is detected.

Full catalog, license matrix and model URLs: docs/MODELS.md.

Honest about the limits:

CPU-first means accessible and efficient, not "no GPU, ever." Most people have some GPU, and ARIA uses it when present (the Accelerated tier). We don't pretend the CPU is always fastest — we make running on the CPU enough.
1-bit models are great for chat and simple tasks, but not yet reliable for agentic tool-use. That's a known capability gap we're actively working on, not a solved feature. We'd rather tell you up front than oversell it.

Pillar 2 — A library bigger than your machine

The peer-to-peer layer gives you two things a purely local setup can't:

Models bigger than your hardware — distributed inference lets a model that wouldn't fit on your machine run across willing peers.
A resilient, community-hosted library of open models — no central registry to throttle, paywall, or take down.

The license gate is a feature, not an afterthought. Only green-licensed weights propagate across the network — MIT, Apache 2.0, BSD, and the TII Falcon license (see docs/MODELS.md for the full green/yellow/red breakdown). A transparent denylist blocks illegal content. That's what keeps a community-hosted library both legal and trustworthy.

Built on infrastructure already in the codebase: Kademlia DHT, NAT traversal, Ed25519 identities, TLS transport.

Honest about the limits: the network's value is real even at small scale — a resilient library and bigger-than-your-hardware models don't need thousands of nodes. We deliberately under-promise on network size; ARIA is built to be useful to one person on one laptop first.

Pillar 3 — A memory that's yours (planned — not yet shipped)

Status: roadmap. This describes where ARIA is going, not what it does today.

The goal: an AI that actually remembers you — your context, your preferences, the things you asked it to follow up on — entirely on your device, with nothing in the cloud. It maps to the five types of human memory, including prospective memory (remembering to do things later), which no major open-source agent framework ships.

It's sequenced after the core (inference + library) lands, and we'll describe it as planned until it's actually built. Memory is the most personal data a system holds — keeping it 100% local is the strongest expression of "an AI that's yours."

Quick start

Install

pip install aria-protocol

Start a node

# Default profile is "efficient" — 1-bit only, low RAM, any CPU
aria node start --port 8765

# OpenAI-compatible API
aria api start --port 3000

Load a model

# Efficiency tier (default)
aria model download BitNet-b1.58-2B-4T

# Quality tier (after switching profile)
aria node profile set balanced
aria model download Gemma-4-E2B

# Specialist tier (after switching profile)
aria node profile set full
aria model download Qwen2.5-Coder-7B-Instruct

Use with the OpenAI client

from openai import OpenAI

client = OpenAI(base_url="http://localhost:3000/v1", api_key="aria")

response = client.chat.completions.create(
    model="BitNet-b1.58-2B-4T",
    messages=[{"role": "user", "content": "What is quantum computing?"}],
)
print(response.choices[0].message.content)

The router picks a model automatically when none is specified — pass a catalog ID when you want a specific one. Full walkthrough: docs/getting-started.md.

Supported models

ARIA v0.9.5 ships with 30 active models across four tiers. Every entry passes a strict license gate at import — the catalog only contains models under MIT, Apache 2.0, or TII Falcon licenses, so peer-to-peer redistribution stays friction-free. Models considered and rejected on licensing grounds (e.g. Llama 3.x, Gemma 3, Mistral research) are listed in docs/MODELS.md with the rejection reasoning.

Tier	# models	License surface
🌱 Efficiency	8	MIT · TII Falcon 2.0
⚡ Quality	10	Apache 2.0 · MIT
🛠️ Specialist	5	Apache 2.0 · MIT
🚀 Accelerated	7	Apache 2.0 · MIT

Counts are active models (30 total). The catalog source (aria/model_catalog.py) carries 33 entries — the three extras are two superseded models, hidden in the desktop UI, and Whisper Large v3 Turbo, reserved for the v1.0 audio backend.

Adding a model is a pull request against aria/model_catalog.py. The gate refuses non-permissive licenses at import time, so the roster cannot drift.

Hardware

ARIA detects the local CPU/GPU/NPU at startup and ships the snapshot in the peer handshake, so routers can prefer hardware-friendly peers:

aria hardware info

CPU detection covers Intel / AMD / Apple Silicon / Qualcomm Snapdragon, including the AVX-512 capability bitnet.cpp uses for native ternary kernels. Discrete GPUs are used by the Accelerated tier.

NPU: detection ships today; acceleration is on the roadmap. ARIA recognizes AMD XDNA/XDNA2, Intel NPU, Qualcomm Hexagon and Apple ANE devices, but inference still runs on CPU/GPU — real NPU acceleration is a v1.0 target (OpenVINO, QNN, Core ML). See docs/NPU_SUPPORT.md.

Benchmarks

Real numbers, reproducible from the repo. All measurements on a single host so they're comparable to each other; treat absolute throughput as indicative across hosts.

Ecosystem benchmark (Zen 4)

Hardware: AMD Ryzen 9 7845HX (12C/24T, Zen 4, 64 GB DDR5) Build: bitnet.cpp + Clang, AVX-512 enabled · 8 threads, 256 tokens, 5 runs, median

Model	Params	Type	tok/s
BitNet-b1.58-large	0.7B	post-quantized	118.25
Falcon-E-1B-Instruct	1.0B	native 1-bit	80.19
Falcon3-1B-Instruct	1.0B	post-quantized	56.31
Falcon-E-3B-Instruct	3.0B	native 1-bit	49.80
BitNet-b1.58-2B-4T	2.4B	native 1-bit	37.76
Falcon3-3B-Instruct	3.0B	post-quantized	33.21
Falcon3-7B-Instruct	7.0B	post-quantized	19.89
Falcon3-10B-Instruct	10.0B	post-quantized	15.12

Key finding: models natively trained in 1-bit (Falcon-E) outperform post-training quantized models by +42% at 1B and +50% at 3B on identical hardware. Native ternary training matters more than parameter count below 7B.

Cross-generation (Zen 5)

AMD Ryzen AI 9 HX 370 (Zen 5, native 512-bit AVX-512). Average improvement: +35% across 7 models.

Model	Zen 4 (t/s)	Zen 5 (t/s)	Δ
Falcon-E-1B	80.19	103.59	+29%
Falcon3-1B	56.31	78.16	+39%
BitNet-2B-4T	37.76	51.82	+37%
Falcon-E-3B	49.80	65.19	+31%
Falcon3-3B	33.21	46.77	+41%
Falcon3-7B	19.89	28.45	+43%
Falcon3-10B	15.12	19.39	+28%

Big.LITTLE CPUs need model-size-aware thread tuning: 1B peaks around 6 threads, 7B around 20. Full results and reproduction harness: benchmarks/.

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                         ARIA PROTOCOL v0.9.5                         │
├──────────────────────────────────────────────────────────────────────┤
│ SERVICE    OpenAI-compatible API · Desktop App · CLI · Dashboard     │
├──────────────────────────────────────────────────────────────────────┤
│ CONSENSUS  Provenance Ledger · Proof of Useful Work ·                │
│            Proof of Sobriety · Consent Contracts                     │
├──────────────────────────────────────────────────────────────────────┤
│ COMPUTE    SmartRouter -> BackendDispatcher -> two llama-servers     │
│              - bitnet.cpp  :8081   Efficiency (1.58-bit ternary)     │
│              - llama.cpp   :8082   Quality / Specialist (CPU);       │
│                                    Accelerated = + CUDA/GPU offload  │
├──────────────────────────────────────────────────────────────────────┤
│ NETWORK    P2P · WebSocket · Kademlia DHT · NAT traversal ·          │
│            Ed25519 identities · TLS                                  │
└──────────────────────────────────────────────────────────────────────┘

The router is a pure function: it takes a query, classifies it, picks (tier, model_id) from a routing table, and returns a routing decision with a fallback chain. Backends are independent processes the router dispatches to by model ID. Detailed view: docs/architecture.md. P2P wire format: docs/protocol-spec.md.

Documentation

Document	Description
Getting Started	Install, first node, per-tier examples
Models	Full catalog, license matrix, model URLs
NPU Support	Detection today, acceleration roadmap
Architecture	Tiers, backends, P2P
Protocol Spec	Peer-to-peer wire protocol
API Reference	OpenAI-compatible HTTP endpoints
Threat Model	Security analysis
Smart Router	Routing table, classifier, fallback
Benchmarks	Methodology and full result sets
Roadmap	All versions and tasks

Desktop app

Download latest release — Windows, macOS (Intel + Apple Silicon), Linux.

A chat-centric interface with a tier badge, profile-preset switcher, hardware panel, and layout presets; a mode switch separates Chat from Node (Dashboard, Models, Energy, Network). See desktop/README.md for build instructions.

Why ARIA?

Problem	What ARIA does
You don't own cloud AI — you rent it	Runs on hardware you own; the model, data and memory are yours
Capable AI seems to require expensive GPUs	1-bit/ternary models run efficiently CPU-first; GPU is optional
Your hardware caps what you can run	The P2P network reaches models bigger than your machine
Model distribution is centralized and gatekept	A resilient, license-gated peer library of open weights
Outputs are untraceable	Every inference is recorded on the provenance ledger
One provider means one dependency	An open catalog from 8+ organizations under permissive licenses

Principles

CPU-first · explicit consent · energy sobriety · useful work · provenance · transparency (MIT, auditable) · accessibility (one-click node).

No token. Ever. ARIA is founded on reputation and contribution, not on a token or any financial instrument. Contributors earn a contribution score for useful work; standing on the network is a reputation requirement, not a stake.

Contributing

Pull requests welcome. To keep ARIA trustworthy as it grows, the project commits up front to: a permissive license (MIT/Apache-2.0) on the core, contributions under a DCO (not a CLA, so contributors keep their rights), upstream attribution, and the enforced license gate on the model catalog.

Areas where help is most useful:

New models — add a ModelEntry to aria/model_catalog.py (the license gate enforces P2P-compatible licenses at import).
Routing improvements — better classifiers or routing tables in aria/smart_router.py.
Backends — wiring acceleration for GPU/NPU under aria/backends/.
Docs and examples — every example in examples/ is welcome.

Development setup

git clone https://github.com/spmfrance-cloud/aria-protocol.git
cd aria-protocol
pip install -e ".[dev]"
make test

Code style: PEP 8, type hints on public APIs, focused functions, tests alongside the change.

License

MIT. See LICENSE.

Citation

@misc{aria2026,
  author = {Anthony MURGO},
  title  = {ARIA: Autonomous Responsible Intelligence Architecture},
  year   = {2026},
  url    = {https://github.com/spmfrance-cloud/aria-protocol}
}

Acknowledgments

Microsoft Research BitNet — 1-bit ternary research and bitnet.cpp
TII Falcon — Falcon-Edge and Falcon3 1.58-bit families
ggml-org — mainline llama.cpp and GGUF builds
Qwen team — Qwen families
DeepSeek — distilled reasoning weights
OpenBMB — MiniCPM-V vision

An AI that's truly yours.

Name		Name	Last commit message	Last commit date
Latest commit History 256 Commits
.github/workflows		.github/workflows
aria		aria
benchmarks		benchmarks
desktop		desktop
docs		docs
examples		examples
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
ARIA_Whitepaper.pdf		ARIA_Whitepaper.pdf
ARIA_Whitepaper_v3.pdf		ARIA_Whitepaper_v3.pdf
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARIA Protocol

What you actually own

Pillar 1 — Inference that fits your hardware

Pillar 2 — A library bigger than your machine

Pillar 3 — A memory that's yours (planned — not yet shipped)

Quick start

Install

Start a node

Load a model

Use with the OpenAI client

Supported models

Hardware

Benchmarks

Ecosystem benchmark (Zen 4)

Cross-generation (Zen 5)

Architecture

Documentation

Desktop app

Why ARIA?

Principles

Contributing

Development setup

License

Citation

Acknowledgments

About

Uh oh!

Releases 15

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ARIA Protocol

What you actually own

Pillar 1 — Inference that fits your hardware

Pillar 2 — A library bigger than your machine

Pillar 3 — A memory that's yours (planned — not yet shipped)

Quick start

Install

Start a node

Load a model

Use with the OpenAI client

Supported models

Hardware

Benchmarks

Ecosystem benchmark (Zen 4)

Cross-generation (Zen 5)

Architecture

Documentation

Desktop app

Why ARIA?

Principles

Contributing

Development setup

License

Citation

Acknowledgments

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages