Skip to content

spmfrance-cloud/aria-protocol

Repository files navigation

ARIA Protocol

Version Python License Tests Desktop

Autonomous Responsible Intelligence Architecture

An AI that's truly yours.

Using capable AI today means renting it from a handful of cloud providers who own the model, the compute, your data, and the memory of every conversation you have. ARIA gives that back. It's an open-source, peer-to-peer protocol that runs AI on hardware you already own β€” the model you choose, the compute you control, and the data and memory that never leave your machine.

ARIA runs 1-bit/ternary models CPU-first, scales up through standard quantized and specialist models when you want more, and links nodes into a peer-to-peer network so you can reach models bigger than your own hardware β€” and a resilient, license-gated library of open weights.


What you actually own

Sovereignty here means ownership across the four things cloud AI centralizes:

  • The model β€” open weights you choose and can inspect, not a black box behind an API.
  • The compute β€” your hardware, CPU-first, GPU-optional.
  • The data & memory β€” they stay on your device; nothing is uploaded to be mined.
  • The governance β€” MIT-licensed, auditable, community-governed, with a public no-rug-pull commitment.

Private, offline inference is table stakes β€” tools like llama.cpp already give you that. Sovereignty is the larger claim: ownership and control end to end, including the parts the local-AI world still leaves centralized β€” model distribution, long-term memory, and governance.


Pillar 1 β€” Inference that fits your hardware

ARIA runs the same protocol across tiers, so a vintage laptop and a modern workstation each get the best model they can actually run:

Tier Models (representative) Backend Memory floor Use case
🌱 Efficiency BitNet b1.58 2B-4T Β· Falcon-E 1B/3B Β· Falcon3 1.58-bit 1B–10B bitnet.cpp 0.4 GB RAM Always-on chat on any CPU; low-power laptops; background nodes
⚑ Quality Gemma 4 · Qwen 3.5 · GLM-4 9B · Granite 4.0 · Phi-4 mini · SmolLM3 · OLMo 3 · Ministral 3 mainline llama.cpp 1.6 GB RAM Multilingual chat, longer context, multimodal (vision + audio)
πŸ› οΈ Specialist Qwen2.5-Coder 7B Β· DeepSeek-R1 0528 8B Β· Granite 3.3 8B Β· Phi-4 mini reasoning Β· Qwen3-VL 2B mainline llama.cpp 1.5 GB RAM Code, reasoning, math, vision β€” routed on demand
πŸš€ Accelerated Qwen3 8B/14B Β· Phi-4 14B Β· Mistral Nemo 12B Β· DeepSeek-R1 14B llama.cpp + GPU offload 4 GB VRAM Dense models on a discrete GPU / Apple Silicon (β‰ˆ6–12 GB VRAM)

A node operator picks one of the profile presets (minimal Β· efficient (default) Β· balanced Β· full Β· specialist_only), and the profile decides which tiers light up. The default keeps nodes lean (Efficiency only); heavier profiles add Quality and Specialist; the Accelerated tier engages automatically when a supported GPU is detected.

Full catalog, license matrix and model URLs: docs/MODELS.md.

Honest about the limits:

  • CPU-first means accessible and efficient, not "no GPU, ever." Most people have some GPU, and ARIA uses it when present (the Accelerated tier). We don't pretend the CPU is always fastest β€” we make running on the CPU enough.
  • 1-bit models are great for chat and simple tasks, but not yet reliable for agentic tool-use. That's a known capability gap we're actively working on, not a solved feature. We'd rather tell you up front than oversell it.

Pillar 2 β€” A library bigger than your machine

The peer-to-peer layer gives you two things a purely local setup can't:

  1. Models bigger than your hardware β€” distributed inference lets a model that wouldn't fit on your machine run across willing peers.
  2. A resilient, community-hosted library of open models β€” no central registry to throttle, paywall, or take down.

The license gate is a feature, not an afterthought. Only green-licensed weights propagate across the network β€” MIT, Apache 2.0, BSD, and the TII Falcon license (see docs/MODELS.md for the full green/yellow/red breakdown). A transparent denylist blocks illegal content. That's what keeps a community-hosted library both legal and trustworthy.

Built on infrastructure already in the codebase: Kademlia DHT, NAT traversal, Ed25519 identities, TLS transport.

Honest about the limits: the network's value is real even at small scale β€” a resilient library and bigger-than-your-hardware models don't need thousands of nodes. We deliberately under-promise on network size; ARIA is built to be useful to one person on one laptop first.


Pillar 3 β€” A memory that's yours (planned β€” not yet shipped)

Status: roadmap. This describes where ARIA is going, not what it does today.

The goal: an AI that actually remembers you β€” your context, your preferences, the things you asked it to follow up on β€” entirely on your device, with nothing in the cloud. It maps to the five types of human memory, including prospective memory (remembering to do things later), which no major open-source agent framework ships.

It's sequenced after the core (inference + library) lands, and we'll describe it as planned until it's actually built. Memory is the most personal data a system holds β€” keeping it 100% local is the strongest expression of "an AI that's yours."


Quick start

Install

pip install aria-protocol

Start a node

# Default profile is "efficient" β€” 1-bit only, low RAM, any CPU
aria node start --port 8765

# OpenAI-compatible API
aria api start --port 3000

Load a model

# Efficiency tier (default)
aria model download BitNet-b1.58-2B-4T

# Quality tier (after switching profile)
aria node profile set balanced
aria model download Gemma-4-E2B

# Specialist tier (after switching profile)
aria node profile set full
aria model download Qwen2.5-Coder-7B-Instruct

Use with the OpenAI client

from openai import OpenAI

client = OpenAI(base_url="http://localhost:3000/v1", api_key="aria")

response = client.chat.completions.create(
    model="BitNet-b1.58-2B-4T",
    messages=[{"role": "user", "content": "What is quantum computing?"}],
)
print(response.choices[0].message.content)

The router picks a model automatically when none is specified β€” pass a catalog ID when you want a specific one. Full walkthrough: docs/getting-started.md.


Supported models

ARIA v0.9.5 ships with 30 active models across four tiers. Every entry passes a strict license gate at import β€” the catalog only contains models under MIT, Apache 2.0, or TII Falcon licenses, so peer-to-peer redistribution stays friction-free. Models considered and rejected on licensing grounds (e.g. Llama 3.x, Gemma 3, Mistral research) are listed in docs/MODELS.md with the rejection reasoning.

Tier # models License surface
🌱 Efficiency 8 MIT · TII Falcon 2.0
⚑ Quality 10 Apache 2.0 · MIT
πŸ› οΈ Specialist 5 Apache 2.0 Β· MIT
πŸš€ Accelerated 7 Apache 2.0 Β· MIT

Counts are active models (30 total). The catalog source (aria/model_catalog.py) carries 33 entries β€” the three extras are two superseded models, hidden in the desktop UI, and Whisper Large v3 Turbo, reserved for the v1.0 audio backend.

Adding a model is a pull request against aria/model_catalog.py. The gate refuses non-permissive licenses at import time, so the roster cannot drift.


Hardware

ARIA detects the local CPU/GPU/NPU at startup and ships the snapshot in the peer handshake, so routers can prefer hardware-friendly peers:

aria hardware info

CPU detection covers Intel / AMD / Apple Silicon / Qualcomm Snapdragon, including the AVX-512 capability bitnet.cpp uses for native ternary kernels. Discrete GPUs are used by the Accelerated tier.

NPU: detection ships today; acceleration is on the roadmap. ARIA recognizes AMD XDNA/XDNA2, Intel NPU, Qualcomm Hexagon and Apple ANE devices, but inference still runs on CPU/GPU β€” real NPU acceleration is a v1.0 target (OpenVINO, QNN, Core ML). See docs/NPU_SUPPORT.md.


Benchmarks

Real numbers, reproducible from the repo. All measurements on a single host so they're comparable to each other; treat absolute throughput as indicative across hosts.

Ecosystem benchmark (Zen 4)

Hardware: AMD Ryzen 9 7845HX (12C/24T, Zen 4, 64 GB DDR5) Build: bitnet.cpp + Clang, AVX-512 enabled Β· 8 threads, 256 tokens, 5 runs, median

Model Params Type tok/s
BitNet-b1.58-large 0.7B post-quantized 118.25
Falcon-E-1B-Instruct 1.0B native 1-bit 80.19
Falcon3-1B-Instruct 1.0B post-quantized 56.31
Falcon-E-3B-Instruct 3.0B native 1-bit 49.80
BitNet-b1.58-2B-4T 2.4B native 1-bit 37.76
Falcon3-3B-Instruct 3.0B post-quantized 33.21
Falcon3-7B-Instruct 7.0B post-quantized 19.89
Falcon3-10B-Instruct 10.0B post-quantized 15.12

Key finding: models natively trained in 1-bit (Falcon-E) outperform post-training quantized models by +42% at 1B and +50% at 3B on identical hardware. Native ternary training matters more than parameter count below 7B.

Cross-generation (Zen 5)

AMD Ryzen AI 9 HX 370 (Zen 5, native 512-bit AVX-512). Average improvement: +35% across 7 models.

Model Zen 4 (t/s) Zen 5 (t/s) Ξ”
Falcon-E-1B 80.19 103.59 +29%
Falcon3-1B 56.31 78.16 +39%
BitNet-2B-4T 37.76 51.82 +37%
Falcon-E-3B 49.80 65.19 +31%
Falcon3-3B 33.21 46.77 +41%
Falcon3-7B 19.89 28.45 +43%
Falcon3-10B 15.12 19.39 +28%

Big.LITTLE CPUs need model-size-aware thread tuning: 1B peaks around 6 threads, 7B around 20. Full results and reproduction harness: benchmarks/.


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         ARIA PROTOCOL v0.9.5                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ SERVICE    OpenAI-compatible API Β· Desktop App Β· CLI Β· Dashboard     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ CONSENSUS  Provenance Ledger Β· Proof of Useful Work Β·                β”‚
β”‚            Proof of Sobriety Β· Consent Contracts                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ COMPUTE    SmartRouter -> BackendDispatcher -> two llama-servers     β”‚
β”‚              - bitnet.cpp  :8081   Efficiency (1.58-bit ternary)     β”‚
β”‚              - llama.cpp   :8082   Quality / Specialist (CPU);       β”‚
β”‚                                    Accelerated = + CUDA/GPU offload  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ NETWORK    P2P Β· WebSocket Β· Kademlia DHT Β· NAT traversal Β·          β”‚
β”‚            Ed25519 identities Β· TLS                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The router is a pure function: it takes a query, classifies it, picks (tier, model_id) from a routing table, and returns a routing decision with a fallback chain. Backends are independent processes the router dispatches to by model ID. Detailed view: docs/architecture.md. P2P wire format: docs/protocol-spec.md.


Documentation

Document Description
Getting Started Install, first node, per-tier examples
Models Full catalog, license matrix, model URLs
NPU Support Detection today, acceleration roadmap
Architecture Tiers, backends, P2P
Protocol Spec Peer-to-peer wire protocol
API Reference OpenAI-compatible HTTP endpoints
Threat Model Security analysis
Smart Router Routing table, classifier, fallback
Benchmarks Methodology and full result sets
Roadmap All versions and tasks

Desktop app

Download latest release β€” Windows, macOS (Intel + Apple Silicon), Linux.

A chat-centric interface with a tier badge, profile-preset switcher, hardware panel, and layout presets; a mode switch separates Chat from Node (Dashboard, Models, Energy, Network). See desktop/README.md for build instructions.


Why ARIA?

Problem What ARIA does
You don't own cloud AI β€” you rent it Runs on hardware you own; the model, data and memory are yours
Capable AI seems to require expensive GPUs 1-bit/ternary models run efficiently CPU-first; GPU is optional
Your hardware caps what you can run The P2P network reaches models bigger than your machine
Model distribution is centralized and gatekept A resilient, license-gated peer library of open weights
Outputs are untraceable Every inference is recorded on the provenance ledger
One provider means one dependency An open catalog from 8+ organizations under permissive licenses

Principles

CPU-first Β· explicit consent Β· energy sobriety Β· useful work Β· provenance Β· transparency (MIT, auditable) Β· accessibility (one-click node).

No token. Ever. ARIA is founded on reputation and contribution, not on a token or any financial instrument. Contributors earn a contribution score for useful work; standing on the network is a reputation requirement, not a stake.


Contributing

Pull requests welcome. To keep ARIA trustworthy as it grows, the project commits up front to: a permissive license (MIT/Apache-2.0) on the core, contributions under a DCO (not a CLA, so contributors keep their rights), upstream attribution, and the enforced license gate on the model catalog.

Areas where help is most useful:

  • New models β€” add a ModelEntry to aria/model_catalog.py (the license gate enforces P2P-compatible licenses at import).
  • Routing improvements β€” better classifiers or routing tables in aria/smart_router.py.
  • Backends β€” wiring acceleration for GPU/NPU under aria/backends/.
  • Docs and examples β€” every example in examples/ is welcome.

Development setup

git clone https://github.com/spmfrance-cloud/aria-protocol.git
cd aria-protocol
pip install -e ".[dev]"
make test

Code style: PEP 8, type hints on public APIs, focused functions, tests alongside the change.


License

MIT. See LICENSE.

Citation

@misc{aria2026,
  author = {Anthony MURGO},
  title  = {ARIA: Autonomous Responsible Intelligence Architecture},
  year   = {2026},
  url    = {https://github.com/spmfrance-cloud/aria-protocol}
}

Acknowledgments


An AI that's truly yours.

About

Peer-to-peer distributed AI inference using 1-bit quantized models. CPU-only, 70-82% energy savings, 103+ tokens/sec. Validated on Zen 4 & Zen 5 (+35% cross-gen improvement).

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors