Work state: SCAFFOLD · Progress:
███░░░░░░░ 25%LLM capacity planner + fleet ledger + desktop inference runtime (Rust core + per-OS native GUIs). Pre-alpha Phase 0; ambitious 11-crate plan, 42 files committed so far (mostly docs/scaffold). · updated 2026-06-02
LLM capacity planner + fleet ledger + desktop inference runtime.
Not a financial ledger. hwLedger tracks hardware fleet audit and provenance for machine learning workloads. It provides per-layer VRAM estimation for LLMs, reconciles predictions against live telemetry from inference engines (MLX, mistral.rs, llama.cpp, vLLM, TGI), and maintains an event-sourced audit log for heterogeneous compute fleets (Apple Silicon, NVIDIA/AMD, cloud rentals).
Status: pre-alpha, Phase 0 bootstrap. See PLAN.md for the implementation roadmap.
hwLedger is an Apache-2.0 desktop app + agent/server pair that:
- Plans VRAM and throughput for any HF / GGUF / MLX / Ollama model, correctly handling dense, MoE, MLA, GQA, sliding-window, SSM/Mamba, and hybrid-attention architectures — with a slider UX over a live per-layer breakdown.
- Reconciles predictions against live telemetry from MLX, mistral.rs, llama.cpp, vLLM, or TGI.
- Runs inference locally on Apple Silicon via a forked oMlx sidecar with SSD-paged KV cache.
- Ledgers a heterogeneous fleet — local NVIDIA/AMD boxes, Apple Silicon laptops, cheap cloud rentals (Vast.ai, RunPod, Lambda) — with a shared event-sourced audit log, dispatch planner, and spot-price-aware cost model.
- Ships as per-OS native GUIs (SwiftUI / WinUI 3 / Qt 6 + Slint) over a shared Rust FFI core.
A hobbyist-sized fleet with enterprise bones.
Install the CLI from source — this is the fastest path to a working hwLedger today:
cargo install --path crates/hwledger-cli
hwledger --helpWeb fallback (no compilation needed): A Streamlit web interface is available at apps/streamlit/ — run cargo run -p hwledger-devtools -- up to launch it locally on localhost:8501 along with the API server.
macOS DMG note: Native macOS DMG distribution is currently blocked on Apple Developer certificate renewal — see WP21 Apple Developer secrets for setup instructions. Use the Streamlit fallback or CLI above for now.
Every existing public VRAM calculator (HF Accelerate, can-it-run-llm, LM Studio's gauge) gets MoE and MLA wrong — they under-count KV cache and over-count MoE throughput. hwLedger's math core is architecture-keyed: it dispatches per AttentionKind (MHA / GQA / MQA / MLA / Sliding / SSM / Hybrid / Sink) and treats resident-vs-active parameters separately for MoE. See PLAN.md §5.
- Core: Rust workspace (
hwledger-core,-arch,-ingest,-probe,-inference,-ledger,-fleet-proto,-agent,-server,-cli,-ffi) - Sidecar:
sidecars/omlx-fork/— fat fork of jundot/omlx, Apache-2.0 - Native apps:
apps/macos/(SwiftUI + UniFFI + XCFramework),apps/windows/(WinUI 3 + .NET 9 + csbindgen),apps/linux-qt/(Qt 6 + cxx-qt + QML),apps/linux-slint/(Rust-native) - Fleet wire: Axum + rustls mTLS for agents; russh + deadpool for SSH agentless; reqwest for Vast/RunPod/Lambda/Modal;
tailscale status --jsonfor tailnet discovery
See the component diagram in PLAN.md §4.1.
KooshaPari/pheno-capacity— VRAM estimation, model-fit scoring, Chinchilla tokens, optimizer state. no_std-compatible Rust crate. Active since 2026-06-18 (L5-105). Streamlit Planner/WhatIf pages will consume this in Phase 2 (see docs/integrations/cost-model-migration.md).
One-liner to build FFI + launch server, docs-site, and Streamlit:
cargo run -p hwledger-devtools -- upSee docs-site/getting-started/dev-setup.md for ports, log locations, and troubleshooting (FFI auto-build, Swift "engine missing" sheet, streamlit hot-reload).
- PLAN.md — phased WBS + DAG + risks + reuse opportunities
- PRD.md — product requirements (forthcoming)
- ADR.md — index of architecture decisions (see
docs/adr/) - CHARTER.md — scope + principles (forthcoming)
- AGENTS.md — AI-agent operating notes (forthcoming)
- docs/research/ — archived Haiku research briefs (oMlx, MLX IPC, inference engines, KV cache formulas, config ingestion, GPU telemetry, Swift/WinUI/Qt FFI, fleet wire, competitor survey)
| Phase | Status |
|---|---|
| P0 Foundation | in progress |
| P1 Math core | planned |
| P2 Ingestion + probe | planned |
| P3 macOS GUI MVP | planned |
| P4 Inference | planned (macOS only in MVP) |
| P5 Fleet | planned |
| P6 Windows GUI | deferred |
| P7 Linux GUI | deferred |
| WP21 macOS Release | code complete (waiting notarization creds) |
Tracked in AgilePlus: feature hwledger-v1-macos-mvp (see agileplus status).
WP21 deliverables (macOS distribution):
- Codesigning infrastructure: READY (Developer ID cert installed, entitlements defined, scripts complete)
- GitHub Actions release workflow: READY (release.yml deployed)
- DMG + notarization flow: READY (scripts deployed, awaiting App Store Connect credentials)
- Sparkle integration: READY (Package.swift updated, updater wired, key generation documented)
- Documentation: READY (docs/reports/WP21-APPLE-DEV-SECRETS.md with step-by-step setup)
Apache-2.0. See LICENSE.
Journey:
install-cargo— Install hwledger from source with cargo, then verify version and help
Intent: Terminal prompt, about to run cargo install. Verified: pass.
Full recorded journey: apps/cli-journeys/manifests/install-cargo/manifest.verified.json
Journey:
first-plan— Run your first plan with colored output showing token distribution for 4 users
Intent: Full VRAM breakdown table — weights · KV cache · activations · overhead. Verified: overall score 0.92.
Annotated keyframe (VRAM fits indicator):
Full manifest: apps/cli-journeys/manifests/first-plan/manifest.verified.json
Journey:
fleet-register— Register a new agent with the fleet, then verify it appears in fleet status
Intent: Device announces GPU inventory, receives mTLS cert, joins gossip network.
Annotated keyframe (registration confirmed):
Full manifest: apps/cli-journeys/manifests/fleet-register/manifest.verified.json
Journey:
traceability-report— Generate a markdown traceability report with coverage data and inspect the output
Full MP4 (richer quality): apps/cli-journeys/recordings/traceability-report/traceability-report.rich.mp4
Annotated keyframe (traceability runner start):
Full manifest: apps/cli-journeys/manifests/traceability-report/manifest.verified.json
Nearest recorded journey:
plan-mla-deepseek— Show MLA classification and KV sequence invariance across 2K, 32K, 128K sequences (dedicated vram-reconcile journey not yet recorded; this shows the prediction side)
Full MP4: apps/cli-journeys/recordings/plan-mla-deepseek/plan-mla-deepseek.rich.mp4
Annotated keyframe (MLA classification + KV invariance):
Full manifest: apps/cli-journeys/manifests/plan-mla-deepseek/manifest.verified.json
Nearest recorded journey:
ingest-local-gguf— Ingest a local GGUF model file and output JSON metadata (dedicated inference-run journey not yet recorded; this shows the ingest/load side)
Full manifest: apps/cli-journeys/manifests/ingest-local-gguf/manifest.verified.json
Journey:
probe-list— List all available probes in both table and JSON formats
Full MP4: apps/cli-journeys/recordings/probe-list/probe-list.rich.mp4
Annotated keyframe (probe table output):
Full manifest: apps/cli-journeys/manifests/probe-list/manifest.verified.json
Nearest recorded journey:
probe-watch— Watch probe metrics update in real time (dedicated cost-model journey not yet recorded; this shows live fleet telemetry)
Annotated keyframe (probe watch start):
Full manifest: apps/cli-journeys/manifests/probe-watch/manifest.verified.json
Journey:
fleet-audit— Audit the fleet with a 3-agent limit to see agent metadata and status
Full MP4: apps/cli-journeys/recordings/fleet-audit/fleet-audit.rich.mp4
Annotated keyframe (fleet audit agent metadata):
Full manifest: apps/cli-journeys/manifests/fleet-audit/manifest.verified.json
Nearest recorded journey:
fleet-register— Register a new agent with the fleet (dedicated fleet-dispatch journey not yet recorded; registration shows the fleet membership side of dispatch)
Full manifest: apps/cli-journeys/manifests/fleet-register/manifest.verified.json
Journey:
ingest-local-gguf— Ingest a local GGUF model file and output JSON metadata
See also: ingest-error journey (error path) —
Annotated keyframe (ingest error path):
Full manifests: ingest-local-gguf · ingest-error
Nearest recorded journey:
probe-watch— Watch probe metrics update in real time with 1-second refresh intervals (dedicated telemetry-sync journey not yet recorded)
Full manifest: apps/cli-journeys/manifests/probe-watch/manifest.verified.json
Journey:
plan-mla-deepseek— Show MLA classification and KV sequence invariance across 2K, 32K, 128K sequences
Full MP4: apps/cli-journeys/recordings/plan-mla-deepseek/plan-mla-deepseek.rich.mp4
Intent: MLA latent projection compresses KV by 16x vs full-rank GQA — sequence length invariant.
Annotated keyframe:
Full manifest: apps/cli-journeys/manifests/plan-mla-deepseek/manifest.verified.json
Nearest recorded journey:
plan-hf-resolve— Plan via HF resolver: bare repo id, full HF URL, and gold fixture shortcut (dedicated spot-price-scan journey not yet recorded; HF resolve shows the model-to-hardware cost estimation entry point)
Full manifest: apps/cli-journeys/manifests/plan-hf-resolve/manifest.verified.json


















