██████╗ ██████╗ ████████╗██╗███╗ ███╗ ██████╗ █████╗ ██╗ ██████╗ ██████╗
██╔═══██╗██╔══██╗╚══██╔══╝██║████╗ ████║██╔═══██╗ ██╔══██╗██║ ██╔════╝ ██╔═══██╗
██║ ██║██████╔╝ ██║ ██║██╔████╔██║██║ ██║ ███████║██║ ██║ ███╗██║ ██║
██║ ██║██╔═══╝ ██║ ██║██║╚██╔╝██║██║ ██║ ██╔══██║██║ ██║ ██║██║ ██║
╚██████╔╝██║ ██║ ██║██║ ╚═╝ ██║╚██████╔╝ ██║ ██║███████╗╚██████╔╝╚██████╔╝
╚═════╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝ ╚═════╝ ╚═════╝
The Missing Context Layer for Local AI Development.
Stop burning tokens. Start building smarter.
pip install git+https://github.com/itsmorethancv/Optimo-Algo.gitEvery modern AI coding agent — Claude, GPT-4, Gemini — starts a session the same way: it reads every single file in your project before it can help you with anything.
For a medium-sized codebase of 50 files and ~13,000 lines of code, that's 12,000–15,000 tokens burned before a single line of new code is written. At frontier API rates, that adds up to thousands of dollars a year just in context loading.
And the irony? The AI doesn't even need most of it. It needs to know what your files do and how they connect — not the 500 lines of implementation inside each one.
Optimo-Algo solves this.
Optimo-Algo is a local context compression engine. Before you hand your project off to a powerful cloud AI, Optimo-Algo runs a small, fast local model (via Ollama) across your entire codebase and distills it into a single file: context.toon.
This .toon file is a hyper-compressed, LLM-optimized snapshot of your entire project — its structure, its logic, its dependencies, and how every file connects to every other file.
The result: up to 84.3% fewer tokens. Same architectural context. Zero cloud cost for the scan.
Your Project (12,916 tokens) → context.toon (2,022 tokens) → Cloud AI Agent
↑
84.3% token reduction
Scanned 100% locally, privately
The cloud agent — GPT-4, Claude, Gemini — now has full project awareness at a fraction of the cost.
context.toon uses TOON (Token Oriented Object Notation) — a compact, LLM-native format designed specifically for tokenizer efficiency.
Unlike JSON or YAML, TOON uses abbreviated keys (f for file, c for class, m for method, s for summary) and eliminates whitespace bloat. It packs maximum architectural context into minimum tokens.
A 500-line authentication file becomes this:
{"f":"src/auth.py","c":[{"n":"AuthHandler","m":[{"n":"login","a":["user","pass"],"r":"bool","s":"Authenticates user credentials and returns a signed JWT token."}]}],"upstream":["src/db.py","src/config.py"],"downstream":["main.py","src/middleware.py"]}That's ~30 tokens. The original file: ~800 tokens. 97% reduction on a single file.
Optimo-Algo doesn't just summarize files in isolation — it builds a full dependency graph of your project.
Every file in context.toon carries two trace fields:
upstream— files that this file imports or depends ondownstream— files that import or reference this file
This means when an AI agent modifies db.py, it instantly knows that main.py, server.py, and auth.py all depend on it — and can proactively check and fix those files too. No more broken imports. No more "works in isolation, fails in integration."
Tested on the Optimo-Algo codebase itself (19 files, production Python):
| Metric | Value |
|---|---|
| Files Scanned | 19 |
| Original Token Count | 12,916 |
| Compressed Token Count | 2,022 |
| Token Reduction | 84.3% |
For every 100 KB of source code, Optimo-Algo sends only ~15 KB to your AI agent — letting you fit 6× more project context into the same context window.
Before installing, you need:
- Python 3.9+ — Download here
- Ollama — The local AI runtime that powers the scan engine.
Install Ollama:
# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Windows
# Download the installer from https://ollama.ai/downloadPull a model (we recommend qwen2.5:1.5b — fast, accurate, low RAM):
ollama pull qwen2.5:1.5bModel Guide:
qwen2.5:0.5b— Ultra-fast. Best for large projects on limited RAM (4GB+).qwen2.5:1.5b— Recommended. Best balance of speed and summary quality (6GB+).qwen2.5-coder:7b— Highest quality summaries. Requires 10GB+ RAM.
pip install git+https://github.com/itsmorethancv/Optimo-Algo.gitThat's it. The optimo command is now available globally in your terminal.
Run this once after installation. It verifies your environment and installs all dependencies:
optimo initYou'll see dependency checks run and a confirmation that everything is ready.
Navigate to your project directory and run:
cd /your/project
optimo buildOptimo-Algo will scan every file, run the local AI summarizer, trace all dependencies, and generate context.toon in your project root.
Sample output:
Starting Optimo-Algo v1.0
Target: /your/project
Model: qwen2.5:1.5b
✓ Found 34 scannable files.
⠸ Parsing: src/auth.py...
⠼ Parsing: src/database.py...
⠦ Parsing: src/api/routes.py...
...
✓ Done: 34/34 files
Compressed: 18,432 tokens → 2,901 tokens
Saved: 84.3% tokens
context.toon is ready at: /your/project/context.toon
Attach context.toon to any AI agent session as context. Your agent now has full structural awareness of your entire project at a fraction of the cost.
Scans your project and generates context.toon.
optimo build
optimo build --workers 8 # Use 8 parallel threads for faster processing
optimo build --output my.toon # Custom output filename
optimo --path ./backend build # Target a specific directoryHow it works internally:
- Scans your directory using
.gitignore-aware filters (skipsvenv/,node_modules/,.git/, binaries) - Splits large files (200+ lines) into overlapping segments to respect the local model's context window
- Sends each segment to your local Ollama model using an Inverted Prompt (code first, then instructions) for better focus
- Extracts classes, methods, summaries into TOON objects
- Falls back to regex parsing if the LLM produces malformed output (guarantees 0% data loss)
- Builds the full dependency graph and writes
context.toon
Runs as a background daemon. Automatically rebuilds context.toon whenever a file changes — keeping your context always in sync during active development.
optimo watch
optimo watch --workers 2 # Lower workers to save RAM during dev
optimo --path ./src watchFeatures a 2-second debounce — saves a file 10 times in 5 seconds, it rebuilds only once.
Shows a detailed token compression report comparing your raw codebase against the generated .toon file.
optimo statsSample output:
┌─────────────────────────────────────────┐
│ Optimo-Algo Statistics │
├──────────────────────┬──────────────────┤
│ Files Scanned │ 34 │
│ Original Tokens │ 18,432 │
│ Compressed Tokens │ 2,901 │
│ Saved │ 84.3% │
└──────────────────────┴──────────────────┘
Launches the Nodal Architecture Visualizer — a GUI that renders your project as an interactive dependency graph. Every file is a node; arrows show which files import which.
optimo viewUseful for spotting tightly coupled modules, circular dependencies, or understanding an unfamiliar codebase visually.
Requires a display environment. Does not work in headless/SSH sessions.
Boots an interactive expert chat session powered by your local Ollama model, pre-loaded with the full Optimo-Algo documentation. Ask it anything about commands, flags, or workflow.
optimo chatResponses are capped at 2–3 lines for speed. Chat history lives only in RAM — exits cleanly with no data written to disk.
Shows all Ollama models available on your machine, with size and version info.
optimo listmodels┌────────────────────────┬───────────┬──────────────┬────────────┐
│ Model Name │ Size (GB) │ ID │ Modified │
├────────────────────────┼───────────┼──────────────┼────────────┤
│ qwen2.5:1.5b │ 1.00 │ a42b25d8c... │ 2025-03-12 │
│ qwen2.5-coder:7b │ 4.68 │ f3c91e2d1... │ 2025-02-28 │
└────────────────────────┴───────────┴──────────────┴────────────┘
Displays the currently configured active model.
optimo model
# Active Ollama Model: qwen2.5:1.5bDeletes the context.toon file from your project directory. Use this to reset before a full rebuild.
optimo cleanInstalls all required Python dependencies. Run this once after pip install. Safe to re-run at any time.
optimo initDisplays the full in-terminal documentation panel.
optimo helpThese flags work with any command:
| Flag | Description | Example |
|---|---|---|
--path <dir> |
Target a different directory instead of . |
optimo --path ./backend build |
--output <name> |
Custom name for the generated TOON file | optimo --output api.toon build |
--setmodel <name> |
Switch the active Ollama model permanently | optimo --setmodel qwen2.5-coder:7b build |
--workers <int> |
Number of parallel summarization threads | optimo --workers 8 build |
--ignore <patterns> |
Permanently add files/folders to the ignore list | optimo --ignore tests/ docs/ build |
The --workers flag is the most impactful performance control in Optimo-Algo.
| Setup | Recommended Workers | Notes |
|---|---|---|
| MacBook / laptop, 8GB RAM | 2 |
Avoids memory pressure with IDE + browser open |
| Desktop, 16GB RAM, no GPU | 4 |
Default. Good balance. |
| Desktop / workstation, GPU | 8 |
Maximum throughput. Watch VRAM usage. |
| CI / headless server | 1 |
Safest for constrained environments. |
Setting workers higher than your hardware supports will cause Ollama timeouts and actually slow down the build. Start at
4and adjust.
Optimo-Algo stores its configuration in .optimo-algo.json in your project root, auto-created on first run.
{
"model": "qwen2.5:1.5b",
"ignore": ["tests/", "docs/", "*.log"]
}You can edit this file directly or use CLI flags to update it:
optimo --setmodel qwen2.5-coder:7b build # Updates model in config
optimo --ignore migrations/ build # Adds to ignore list in configDirectory
│
▼
[scanner.py] ──── .gitignore-aware traversal, binary detection
│
▼
[llm.py] ──────── Inverted prompt → Ollama → JSON extraction
│ 200-line chunking for large files
│ Regex fallback if LLM output malforms
▼
[compiler.py] ─── Path normalization, dependency graph construction
│ Forward trace + back trace per file
▼
[context.toon] ── Compressed, portable, LLM-ready project snapshot
Standard prompting: [INSTRUCTIONS] ... [CODE]
Optimo-Algo: [CODE] ... [INSTRUCTIONS]
Small models (0.5B–1.5B parameters) tend to lose track of long instruction blocks if they come first. By showing the code before the task, the model's attention remains anchored to the actual content being summarized. This alone improves summary accuracy noticeably on smaller models.
JSON is designed for human readability. TOON is designed for tokenizer efficiency. Abbreviated keys (f, c, m, s) map to fewer tokens in modern LLM tokenizers. Removed whitespace eliminates padding tokens. The result is the same semantic information at ~40–60% of the JSON token cost.
Optimo-Algo is not magic — it's a deliberate trade-off. You should know exactly what you're getting.
The first scan of a large project (50+ files) can take 2–10 minutes depending on your hardware and chosen model. This is the "Slow Build, Fast Chat" trade-off: you pay the compute cost once locally so that every subsequent AI session is faster and cheaper. After the first build, watch mode keeps things in sync incrementally.
Optimo-Algo is a summarization engine — it intentionally compresses. A bug hiding inside a specific function implementation won't be visible in the .toon file. The AI agent reading context.toon knows the architecture, not the implementation. When the agent needs to actually fix or write code inside a file, it should request the full file content — which is the intended workflow.
Effectiveness scales with your machine. Running a 7B model on 8GB of RAM while VS Code and Chrome are open will cause memory pressure. Use smaller models (0.5b, 1.5b) on constrained hardware — their summaries are less nuanced but still structurally accurate.
Models under 1B parameters occasionally produce vague summaries ("This file handles logic"). This is why Optimo-Algo includes the regex fallback — even if the AI summary is poor, the structural data (classes, methods, dependencies) is always extracted accurately from source.
Optimo-Algo requires the Ollama daemon to be running. If Ollama crashes or the model isn't pulled, the build will fail. Always verify with optimo listmodels before running optimo build for the first time.
# 1. First time setup — run once
pip install git+https://github.com/itsmorethancv/Optimo-Algo.git
optimo init
# 2. Start a new project session
cd /your/project
optimo build
# 3. During active development — keep context live
optimo watch &
# 4. Before an AI session — check your savings
optimo stats
# 5. Pass context.toon to your AI agent of choice
# Attach it as a file, paste it in the system prompt, or reference it via API
# 6. Explore the dependency graph visually
optimo view
# 7. Clean up when done
optimo cleanContributions are welcome. If you find a bug, have a feature idea, or want to improve TOON format support for additional languages, open an issue or PR.
git clone https://github.com/itsmorethancv/Optimo-Algo.git
cd Optimo-Algo
pip install -e .
optimo initAreas actively looking for contributions:
- Language support beyond Python (JavaScript/TypeScript
importparsing, Go, Rust) - VS Code extension for inline
context.toongeneration - Benchmark suite comparing TOON vs raw context on real agent tasks
- Support for cloud model backends as an alternative to Ollama
MIT License — see LICENSE for details.
Built to make local AI development practical, private, and affordable.
⭐ Star this repo if Optimo-Algo saves you tokens.