OSAI - Open Source AI Kit

Find, recommend, and deploy the best open-source LLM for your stack. One command to go from repo analysis to a running local model — no API keys required.

  ___  ____    _    ___
 / _ \/ ___|  / \  |_ _|
| | | \___ \ / _ \  | |
| |_| |___) / ___ \ | |
 \___/|____/_/   \_\___|

Quick start

# Recommend + install + serve the best model for your project
npx osaikit run local --repo .

# Analyze a remote GitHub repo (no install)
npx osaikit --repo https://github.com/user/repo

# Refresh leaderboard data from 5 live sources
npx osaikit refresh

# Interactive wizard
npx osaikit

`run local` — one command to a running model

Analyzes your repo, picks the best open-source LLM, installs it via Ollama, and verifies the API is serving.

osaikit run local --repo .                    # auto-detect → deploy
osaikit run local --model qwen3-8b            # specific model
osaikit run local --model devstral-small-2    # SOTA coding model
osaikit run local                             # general-purpose defaults

What it does:

Scans your codebase (languages, frameworks, project size)
Scores 52 models across 7 dimensions and picks the best fit
Ensures Ollama is installed and running
Pulls the model
Verifies the API at http://localhost:11434/v1

Requires Ollama installed locally.

Recommend mode

Scores 52 open-source models across seven weighted dimensions to find the best fit for your project. Two modes:

`--repo <path>` — Auto-detect mode

Point at any local repo or remote GitHub URL. The analyzer scans your codebase and auto-detects languages, frameworks, runtime, platform, and project size — then skips straight to recommendations.

osaikit --repo .
osaikit --repo ~/projects/my-api
osaikit --repo https://github.com/user/repo   # clones + analyzes remotely

Interactive wizard

Answer six quick questions about your development environment:

Role — web, backend, mobile, or game development
Tech stack — languages, frameworks, runtime, platform
Constraints — memory limit, budget, deployment mode, privacy
Use cases — code generation, debugging, architecture, review, docs, testing
License — permissive, copyleft, or any
Context — codebase size and whether it's an existing project

What you get

Primary recommendation with score breakdown across 7 dimensions
Quick start commands — copy-paste ollama run, vLLM Docker, llama.cpp, HuggingFace TGI, or LM Studio setup
License guidance — commercial use, fine-tuning rights, output ownership, training data provenance, and action items
Integration snippet — framework-specific code to wire the model into your stack (Express, FastAPI, Gin, Axum, etc.)
Fallback model from a different family
On-device option for local/offline use
Enterprise readiness score — managed hosting providers, SLA availability, SDK quality, community size
Tuned inference config (temperature, top-p, max tokens)
Starter prompt template for your use case
Cost and latency estimates (local vs. cloud)
RAG recommendation when your context exceeds the model's window
Warnings about license, age, memory, and latency tradeoffs

Install

npx osaikit

Or install globally:

npm install -g osaikit
osaikit

Development

git clone https://github.com/patio-coop/osaikit.git
cd osaikit
npm install
npm run dev    # build + run
npm test       # run tests

How scoring works

Models are first filtered by hard constraints (license, RAM, privacy, budget, deployment). Remaining candidates are scored across seven dimensions:

Dimension	Weight	What it measures
Capability fit	3.0x	Strength/weakness match to your role and use cases
Context match	2.0x	Context window vs. your codebase size needs
Benchmarks	2.0x	HumanEval, SWE-bench, Aider polyglot, LiveCodeBench, BigCodeBench
Compute footprint	1.5x	Latency, on-device viability, budget fit
Ecosystem	1.0x	Tooling support (Ollama, llama.cpp, vLLM, etc.)
Enterprise readiness	0.75x	Managed hosting, SLA, VPC, SDK quality, community, docs
Fine-tuning	0.5x	LoRA/adapter support, fine-tunability

Live leaderboard data from five sources (HuggingFace, SWE-bench Verified, Aider Polyglot, LiveCodeBench, and BigCodeBench) is fetched in parallel to enrich results — but the tool works fully offline using its built-in model database. Run osaikit refresh to force-update leaderboard data.

Architecture

src/
├── cli.js                 # Entry point, flag parsing, command dispatch
├── run.js                 # `run local` — ollama install + serve flow
├── app.js                 # State machine (welcome → wizard → loading → results)
├── theme.js               # OSAI design system tokens
├── analyzer/
│   └── repo.js            # Repository scanner (languages, frameworks, runtime)
├── components/
│   ├── wizard.js          # 6-step questionnaire flow
│   ├── steps.js           # Individual step components
│   ├── loading.js         # Loading screen with per-source status
│   └── results.js         # Recommendation display (12 sections)
├── engine/
│   ├── models.js          # Database of 52 open LLMs with enterprise metadata
│   ├── rules.js           # Scoring engine (7 dimensions), prompt templates, costs
│   ├── quickstart.js      # Copy-paste run commands per ecosystem tool
│   ├── modelcards.js      # Structured model cards (limitations, failure modes)
│   ├── licensing.js       # License guidance and risk assessment
│   ├── integration.js     # Framework-specific integration code snippets
│   ├── compliance.js      # Compliance report generation
│   └── safety.js          # Safety recommendations (Llama Guard, etc.)
└── api/
    ├── index.js           # Parallel fetcher + fuzzy model matching (5 sources)
    ├── huggingface.js     # HuggingFace model catalog (v2 leaderboard)
    ├── swebench.js        # SWE-bench Verified leaderboard
    ├── aider.js           # Aider Polyglot benchmark
    ├── livecodebench.js   # LiveCodeBench (contamination-free coding)
    └── bigcodebench.js    # BigCodeBench (function-level coding)

Roadmap — from recommendation CLI to open-source AI distro

osaikit today recommends the right model and tells you how to set it up. The goal is to evolve it into an opinionated distribution of the open-source AI stack — one that provisions the full deployment, not just the model.

The open-source AI stack has all the components but none of the packaging. Models are competitive (3 months behind closed frontier, closing fast), ML frameworks are production-grade, and inference engines like vLLM are battle-tested. But everything around the model — compliance, observability, safety, developer experience — scores 2 out of 5 on enterprise readiness. The gap isn't capability. It's the wrapper.

osaikit closes that gap the way a Linux distro closes the gap between the kernel and a working desktop: opinionated defaults, everything wired together, profiles for different needs.

What's built

Layer	Status	What it does
Model recommendation	Done	Scores 52 models across 7 dimensions, auto-detects project needs
Local deployment	Done	`run local` provisions Ollama + model + verifies API
Leaderboard aggregation	Done	Live data from 5 sources (HuggingFace, SWE-bench, Aider, LiveCodeBench, BigCodeBench)
License guidance	Done	Risk assessment, commercial use flags, training data provenance
Compliance reporting	Done	ToS templates, acceptable use policies, regulatory flags (GDPR, SOC2, HIPAA, EU AI Act, CCPA, export controls)
Safety assessment	Done	Per-model safety profiles, guardrail recommendations, code-specific risk analysis
Model cards	Done	Structured limitations, failure modes, intended use, evaluation gaps
Output configuration	Done	Tuned inference defaults (temperature, top-p, max tokens) per model
System prompts	Done	Starter prompt templates per use case

What's next — the distro gap

The shift from "recommend" to "provision." Each row below has an open-source component that works — osaikit needs to wire it in.

Layer	Enterprise readiness gap	Component to wire	Work
Observability	2/5	Langfuse or OpenLIT	Auto-provision monitoring, token tracking, cost attribution
Content filtering	2/5	Llama Guard, NeMo Guardrails, any-guardrail	Auto-provision input/output safety rails; any-guardrail provides a unified interface across guardrail providers (Llama Guard, ShieldGemma, Alinia)
Code security	2/5	Semgrep, CodeShield	Auto-scan generated code for vulnerabilities
Audit trail	2/5	Langfuse + structured logging	Auto-provision SOC2/GDPR-ready audit logging
UI/API	2/5	Open WebUI, LibreChat	`osaikit ui` — provision a chat interface
Production deployment	3/5	vLLM, Docker, OpenAI-compatible proxy	`osaikit run production` — deploy with monitoring + safety
Evaluation	2/5	LM Eval Harness, Inspect AI, Lumigator	`osaikit eval` — benchmark models on your own data; Lumigator adds metric-based model comparison (BERTScore, ROUGE, METEOR) with UI
Provider abstraction	2/5	any-llm	Unified LLM provider interface — switch between Ollama, OpenAI, Anthropic, Mistral without code changes; simplifies `osaikit run production` multi-backend support
Agent scaffolding	2/5	LangChain, LlamaIndex, any-agent	`osaikit agent` — scaffold agent projects on open models; any-agent provides a single interface across Agno, ADK, LangChain, LlamaIndex, OpenAI, smolagents with built-in evaluation
MCP tooling	2/5	mcpd	Declarative MCP server management — lifecycle, secret injection, dev-to-prod config; powers `osaikit agent` tool-calling setup
Shared agent learning	—	cq	Collective knowledge for agents — persist and share solutions across sessions so agents stop rediscovering failures; future `osaikit agent` enhancement
Quantization	2/5	GGUF, AWQ, GPTQ via Ollama/llama.cpp	Auto-select quantization for available hardware
Federated inference	—	Mesh routing to community GPU nodes	Future: `osaikit run federated` as a backend option

Profiles

Different stacks for different needs, all provisioned with one command:

osaikit init --profile local-dev      # Ollama + model + basic safety
osaikit init --profile startup        # vLLM + monitoring + compliance report
osaikit init --profile enterprise     # Full stack: safety rails, audit logging,
                                      # content filtering, compliance docs
osaikit init --profile research       # Eval harness + fine-tuning tools
osaikit init --profile agent          # Agent framework + tool-calling config

Context

This roadmap is informed by the OSAI gap map — a scoring of 42 subcategories of the open-source AI stack against closed-source equivalents. The models aren't the problem. Enterprise readiness averages 2.3 out of 5. Terms of Service scored 1. The packaging gap is a "years" problem, and almost all the energy in the ecosystem is going to the part (models) that's already closest to parity.

osaikit focuses on the other part.

Development

# Build + run with local changes
npm run dev

# Build + run with --repo flag (local path or remote URL)
npm run dev:repo -- <path-or-url>

# Examples:
npm run dev:repo -- .                                    # analyze current dir
npm run dev:repo -- https://github.com/user/repo         # clone + analyze remote

Run tests:

npm test

See ARCHITECTURE.md for the full project structure and contributor guide.

Tech stack

Ink + React — terminal UI framework
esbuild — bundler
Node.js built-in fetch — zero HTTP dependencies

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
src		src
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md
build.js		build.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OSAI - Open Source AI Kit

Quick start

`run local` — one command to a running model

Recommend mode

`--repo <path>` — Auto-detect mode

Interactive wizard

What you get

Install

Development

How scoring works

Architecture

Roadmap — from recommendation CLI to open-source AI distro

What's built

What's next — the distro gap

Profiles

Context

Development

Tech stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OSAI - Open Source AI Kit

Quick start

run local — one command to a running model

Recommend mode

--repo <path> — Auto-detect mode

Interactive wizard

What you get

Install

Development

How scoring works

Architecture

Roadmap — from recommendation CLI to open-source AI distro

What's built

What's next — the distro gap

Profiles

Context

Development

Tech stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`run local` — one command to a running model

`--repo <path>` — Auto-detect mode

Packages