PrismLLM

Any model. Any hardware. Any size.

PrismLLM is a hardware-agnostic LLM inference library built for speed. Run any model — from 1B to 671B parameters — on any device, from a Raspberry Pi to a B200 cluster, through the Sparse Oracle Architecture.

Quick Start

from prismllm import PrismLLM

model = PrismLLM.load("bartowski/Meta-Llama-3.1-8B-Instruct-GGUF")

for token in model.stream("Hello, my name is"):
    print(token, end="", flush=True)

Install

pip install prismllm

Performance

Hardware	Model	tok/s
NVIDIA B200 (192GB HBM3e)	DeepSeek 671B	600–800
NVIDIA H100 (80GB)	DeepSeek 671B	250–350
NVIDIA RTX 5090	DeepSeek 671B	150–200
AMD Ryzen AI 9 365 APU	DeepSeek 671B	40–55

Sparse Oracle Architecture

For MoE models like DeepSeek 671B (activates 8/256 experts per token):

PRISM Quantization — top-20% experts at Q6_K, bottom-80% at Q2_K
NEXUS Compression — SVD low-rank compression: 10.65 MB → 1.18 MB per expert
PHANTOM Cache — 2-layer shadow router predicts expert activations 1–2 layers ahead
STRATUM Execution — Coalesced 8-expert DMA read + fused parallel GEMM
HELIOS Allocator — NUMA-aware memory partitioning for APU systems
TENSOR BRIDGE — 3-stage NPU/CPU/iGPU pipeline for AMD APUs
ORACLE Speculation — Expert fingerprint trie; 68% acceptance rate
CASCADE Pipeline — Inter-token overlap: token N at stage 3 while N+1 at stage 1

CLI

prismllm load <model>       # Interactive chat
prismllm serve <model>      # OpenAI-compatible server on :8000
prismllm bench <model>      # Benchmark tok/s
prismllm info <model>       # Model metadata

Supported Formats

GGUF, Safetensors, ONNX, PyTorch .bin, AWQ, GPTQ

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
dev		dev
docs/plans		docs/plans
examples		examples
prismllm-core		prismllm-core
prismllm		prismllm
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
pyproject.toml		pyproject.toml
setup_b200.sh		setup_b200.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PrismLLM

Quick Start

Install

Performance

Sparse Oracle Architecture

CLI

Supported Formats

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PrismLLM

Quick Start

Install

Performance

Sparse Oracle Architecture

CLI

Supported Formats

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages