Inspect runtime context

AI Cost Tracking

This project uses AI-generated code. Total cost: $7.5000 with 57 AI commits.

Generated on 2026-06-29 using openrouter/qwen/qwen3-coder-next

One function for small LLM preprocessing before large LLM execution. Like litellm.completion() — but with a smart preprocessing layer.

from prellm import preprocess_and_execute

result = await preprocess_and_execute(
    query="Deploy app to production",
    small_llm="ollama/qwen2.5:3b",     # local, fast, cheap
    large_llm="anthropic/claude-sonnet-4-20250514",  # cloud, powerful
)
print(result.content)

Install & Run in 60 Seconds

pip install prellm

# CLI — zero config
prellm query "Zdeployuj apkę na prod" --small ollama/qwen2.5:3b --large gpt-5.4-mini

# With strategy
prellm query "Refaktoryzuj kod" --strategy structure --json

# Two-agent pipeline (v0.3)
prellm query "Deploy app" --pipeline dual_agent_full

# Docker
docker run prellm/prellm query "Deploy app" --small ollama/qwen2.5:3b --large gpt-5.4-mini

Interactive Configuration (repo)

make install-dev
make config         # guided wizard + diagnostics
source .env
prellm doctor --live
make examples       # runs all example scripts (real-time)

v0.4: Persistent Context for Small LLMs

New in v0.4 — preLLM automatically collects env, compresses codebase, persists sessions, and filters sensitive data. Zero manual pre-prompts. Docs: Persistent Context · Session Persistence · Sensitive Data · Flow Graphs

from prellm import preprocess_and_execute

# Bielik with full persistent context — zero manual pre-prompts
result = await preprocess_and_execute(
    query="Zoptymalizuj monitoring ESP32",
    small_llm="ollama/bielik:7b",
    large_llm="openrouter/google/gemini-3-flash-preview",
    strategy="auto",                        # auto-select best strategy (NEW default)
    collect_runtime=True,                   # full env/shell/process snapshot (NEW)
    session_path=".prellm/sessions.db",     # persistent history across restarts (NEW)
    codebase_path=".",                      # compress project → context (NEW)
    sanitize=True,                          # filter API keys before large-LLM (NEW default)
)

Inspect runtime context

prellm context show # formatted runtime context prellm context show --json # as JSON prellm context show --codebase . # include compressed project

Manage persistent sessions

prellm session list # recent interactions prellm session export backup.json # export to JSON prellm session import backup.json # import from JSON prellm session clear # clear history


---

## How It Works

```text
User Query
  → Small LLM (≤24B, local)    → classify / structure / enrich    → optimized prompt
    Qwen2.5 / Phi3 / Gemma       PromptPipeline (YAML)
  → Large LLM (cloud)          → execute with full context        → validated response
    GPT-4 / Claude / Llama       ResponseValidator (YAML schema)

Result: 70–80% token savings + enterprise-quality output for the price of a small LLM call.

One Function — Two Execution Paths

from prellm import preprocess_and_execute

# PATH A: Strategy-based (v0.2, default)
result = await preprocess_and_execute(
    query="Deploy app to production",
    small_llm="ollama/qwen2.5:3b",
    large_llm="anthropic/claude-sonnet-4-20250514",
    strategy="structure",                 # classify|structure|split|enrich|passthrough
    user_context="gdansk_embedded_python",
)

# PATH B: Pipeline-based two-agent (v0.3)
result = await preprocess_and_execute(
    query="Deploy app to production",
    small_llm="ollama/qwen2.5:3b",
    large_llm="anthropic/claude-sonnet-4-20250514",
    pipeline="dual_agent_full",           # any pipeline from pipelines.yaml
)

print(result.content)              # Large LLM response
print(result.decomposition)        # Small LLM analysis
print(result.model_used)           # Which large model answered
print(result.small_model_used)     # Which small model preprocessed

Sync Version

from prellm import preprocess_and_execute_sync

result = preprocess_and_execute_sync("Deploy app", large_llm="gpt-5.4-mini")
# Defaults: small=ollama/qwen2.5:3b, large=claude-sonnet, strategy=classify
result = await preprocess_and_execute("Refaktoryzuj kod")

LLM Provider Examples

preLLM uses LiteLLM under the hood, so any model string supported by LiteLLM works.

Pull model: ollama pull qwen2.5:3b

result = await preprocess_and_execute( query="Explain Kubernetes pods", small_llm="ollama/qwen2.5:3b", # local small model large_llm="ollama/llama3:70b", # local large model )

Ollama + OpenAI (hybrid)

result = await preprocess_and_execute(
    query="Review my Python code",
    small_llm="ollama/qwen2.5:3b",       # local preprocessing
    large_llm="gpt-5.4-mini",             # OpenAI execution
)
### Ollama + Anthropic (hybrid)

```python
result = await preprocess_and_execute(
    query="Deploy microservices to K8s",
    small_llm="ollama/phi3:mini",         # local preprocessing
    large_llm="anthropic/claude-sonnet-4-20250514",  # Anthropic execution
)

OpenAI only

result = await preprocess_and_execute(
    query="Analyze sales data",
    small_llm="gpt-5.4-mini",             # cheap OpenAI preprocessing
    large_llm="gpt-4o",                  # powerful OpenAI execution
)

Anthropic only

result = await preprocess_and_execute(
    query="Write a compliance report",
    small_llm="anthropic/claude-haiku",
    large_llm="anthropic/claude-sonnet-4-20250514",
)

Groq (fast inference)

result = await preprocess_and_execute(
    query="Summarize meeting notes",
    small_llm="groq/llama-3.1-8b-instant",   # fast Groq preprocessing
    large_llm="groq/llama-3.3-70b-versatile", # fast Groq execution
)

Mistral

result = await preprocess_and_execute(
    query="Translate technical docs",
    small_llm="mistral/mistral-small-latest",
    large_llm="mistral/mistral-large-latest",
)

OpenRouter (multi-provider + vision)

result = await preprocess_and_execute(
    query="Analyze this UI screenshot",
    small_llm="ollama/qwen2.5:3b",
    large_llm="openrouter/qwen/qwen3-vl-32b-instruct",
)

Azure OpenAI

result = await preprocess_and_execute(
    query="Generate quarterly report",
    small_llm="azure/gpt-5.4-mini-deployment",
    large_llm="azure/gpt-4o-deployment",
)

AWS Bedrock

result = await preprocess_and_execute(
    query="Optimize Lambda function",
    small_llm="bedrock/anthropic.claude-haiku",
    large_llm="bedrock/anthropic.claude-sonnet",
)

Full provider list: See LiteLLM docs — preLLM supports all 100+ providers.

Drop-in Enhancement

If you already use LiteLLM, preLLM adds preprocessing with one line change:

# BEFORE — direct litellm call
import litellm
response = await litellm.acompletion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Deploy app to production"}],
)

# AFTER — preLLM preprocessing + same litellm execution
from prellm import preprocess_and_execute
result = await preprocess_and_execute(
    query="Deploy app to production",
    large_llm="gpt-4o",  # same model, now with preprocessing
)
### Use Your Existing `.env`

preLLM reads the same environment variables as LiteLLM:

```bash
# .env — works with both litellm and prellm
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GROQ_API_KEY=gsk_...

# preLLM-specific (optional)
PRELLM_SMALL_DEFAULT=ollama/qwen2.5:3b
PRELLM_LARGE_DEFAULT=anthropic/claude-sonnet-4-20250514
PRELLM_STRATEGY=classify

# Legacy names still supported
SMALL_MODEL=ollama/qwen2.5:3b
LARGE_MODEL=gpt-5.4-mini

LiteLLM Proxy Integration

If you run a LiteLLM proxy, point preLLM at it:

import os
os.environ["OPENAI_API_BASE"] = "http://localhost:4000"  # your litellm proxy

result = await preprocess_and_execute(
    query="Deploy app",
    small_llm="openai/small-model",   # routed through litellm proxy
    large_llm="openai/large-model",   # routed through litellm proxy
)

OpenAI SDK-Compatible Server

preLLM ships an OpenAI-compatible proxy — use it from any OpenAI SDK client:

# Start preLLM server
prellm serve --port 8080 --small ollama/qwen2.5:3b --large gpt-5.4-mini

# Use from OpenAI Python SDK
import openai
client = openai.OpenAI(base_url="http://localhost:8080/v1", api_key="any")
response = client.chat.completions.create(
    model="prellm:default",
    messages=[{"role": "user", "content": "Deploy app to production"}],
)

# Use from curl
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"prellm:qwen→claude","messages":[{"role":"user","content":"Deploy app"}]}'

# Use v0.3 pipeline via API
curl http://localhost:8080/v1/chat/completions \
  -d '{"model":"prellm:default","messages":[{"role":"user","content":"Deploy app"}],"prellm":{"pipeline":"dual_agent_full"}}'

Two-Agent Architecture (v0.3)

The pipeline= parameter activates the new two-agent architecture:

USER QUERY
    │
    ▼
┌─────────────────────────────────────┐
│  PREPROCESSOR AGENT (small LLM)     │
│  PromptRegistry (YAML prompts)      │
│  PromptPipeline (YAML steps)        │
│  → classify → structure → compose   │
│  → IntermediateValidator            │
└──────────────┬──────────────────────┘
               │ structured executor_input
               ▼
┌─────────────────────────────────────┐
│  EXECUTOR AGENT (large LLM)         │
│  → execute with full context        │
│  → ResponseValidator (YAML schema)  │
│  → PreLLMResponse (typed)           │
└─────────────────────────────────────┘

Custom Pipelines (YAML)

Define your own preprocessing pipeline — no Python code changes needed:

# configs/pipelines.yaml
pipelines:
  my_pipeline:
    description: "Custom 3-step pipeline"
    steps:
      - name: classify
        prompt: classify          # from configs/prompts.yaml
        output: classification
      - name: extract
        prompt: structure
        output: fields
      - name: compose
        prompt: compose
        input: [query, classification, fields]
        output: composed_prompt

result = await preprocess_and_execute(
    query="Deploy app",
    pipeline="my_pipeline",  # uses your custom YAML pipeline
)

Available Pipelines

Pipeline	Steps	Best for
`classify`	classify	Quick intent routing
`structure`	classify → structure → compose	DevOps, API calls
`split`	classify → split → compose	Complex multi-part queries
`enrich`	classify → enrich	Incomplete prompts
`dual_agent_full`	context → decompose → optimize → format	Maximum quality
`passthrough`	(none)	Direct forwarding

Custom Prompts (YAML)

All system prompts are in configs/prompts.yaml with Jinja2 templating:

# configs/prompts.yaml
prompts:
  classify:
    system: |
      You are a query classifier.
      Intents: {{ intents | default("deploy, query, create, delete") }}
      Respond ONLY with JSON: {"intent": "...", "confidence": 0.0-1.0}
    max_tokens: 256
    temperature: 0.1

Response Validation (YAML)

Validate LLM outputs with schemas — no code changes:

# configs/response_schemas.yaml
schemas:
  classification:
    required_fields: [intent, confidence]
    types:
      intent: string
      confidence: float
    constraints:
      confidence: {min: 0.0, max: 1.0}
      intent: {enum: [deploy, query, create, delete, other]}

5 Decomposition Strategies (v0.2)

Strategy	What it does	Best for
`classify`	Classify intent + domain	General queries, routing
`structure`	Extract action, target, params	DevOps commands, API calls
`split`	Break into sub-queries	Complex multi-part requests
`enrich`	Add missing context	Incomplete prompts, safety
`passthrough`	No preprocessing	Simple/direct queries

With Domain Rules

result = await preprocess_and_execute(
    query="Usuń bazę danych klientów",
    small_llm="ollama/qwen2.5:3b",
    large_llm="gpt-5.4-mini",
    domain_rules=[{
        "name": "destructive_db",
        "keywords": ["delete", "drop", "usuń"],
        "required_fields": ["target_database", "backup_confirmed"],
        "severity": "critical",
    }],
)
print(result.decomposition.missing_fields)  # ["target_database", "backup_confirmed"]

1. Code Refactoring

result = await preprocess_and_execute(
    query="Popraw mój projekt z hardcode'em",
    small_llm="ollama/qwen2.5:3b",
    large_llm="anthropic/claude-sonnet-4-20250514",
    strategy="structure",
    user_context="gdansk_embedded_python",
)
### 2. Kubernetes Diagnostics

```python
result = await preprocess_and_execute(
    query="Zdiagnozuj problem z K8s podami",
    small_llm="ollama/qwen2.5:3b",
    large_llm="gpt-5.4-mini",
    pipeline="structure",
    user_context={"cluster": "k8s-prod", "namespace": "backend"},
)
### 3. Business Automation

```python
result = await preprocess_and_execute(
    query="Zautomatyzuj kalkulację leasingu dla camper van",
    small_llm="ollama/qwen2.5:3b",
    large_llm="anthropic/claude-sonnet-4-20250514",
    pipeline="enrich",
    user_context="PL_automotive_leasing",
)
# configs/prellm_config.yaml
small_model:
  model: "ollama/qwen2.5:3b"
  fallback: ["phi3:mini"]
  max_tokens: 512

large_model:
  model: "gpt-5.4-mini"
  fallback: ["llama3", "mistral"]
  max_tokens: 2048

default_strategy: classify

domain_rules:
  - name: production_deploy
    keywords: ["deploy", "push", "release"]
    required_fields: ["environment", "version"]
    severity: critical
    strategy: structure

Per-Domain Defaults

Ready-to-use configs in configs/defaults/:

Domain	File	Covers
DevOps	`configs/defaults/devops.yaml`	deploy, K8s, monitoring, CI/CD
Coding	`configs/defaults/coding.yaml`	refactoring, review, debugging
Business	`configs/defaults/business.yaml`	leasing, invoicing, compliance
Embedded	`configs/defaults/embedded.yaml`	RPi, ESP32, sensors, IoT

Process Chains (DevOps Workflows)

from prellm import PreLLM, ProcessChain

engine = PreLLM("configs/prellm_config.yaml")
chain = ProcessChain("configs/deploy.yaml", engine=engine)
result = await chain.execute(env="production", dry_run=True)

for step in result.steps:
    print(f"{step.step_name}: {step.status}")

Architecture

preprocess_and_execute(query, small_llm, large_llm, strategy= | pipeline=)
    │
    ├── [strategy path — v0.2]
    │   ├── ContextEngine (env/git/system)
    │   ├── QueryDecomposer (small LLM)
    │   │   └── classify → structure → split → enrich → compose
    │   └── LLMProvider (large LLM via litellm)
    │
    ├── [pipeline path — v0.3]
    │   ├── PreprocessorAgent
    │   │   ├── PromptRegistry (YAML, Jinja2)
    │   │   ├── PromptPipeline (YAML-configurable steps)
    │   │   │   ├── LLM steps (small LLM calls)
    │   │   │   └── Algorithmic steps (validation, formatting)
    │   │   └── ContextEngine + UserMemory (SQLite)
    │   ├── ExecutorAgent
    │   │   ├── LLMProvider (large LLM via litellm)
    │   │   └── ResponseValidator (YAML schemas)
    │   └── 100+ models via LiteLLM
    │
    └── PreLLMResponse (Pydantic v2 validated)

Examples

Ready-to-run examples in examples/:

Example	File	Config
Quick Start	`examples/quick_start.py`	default env (no config)
K8s Debugging	`examples/k8s_debug.py`	`configs/domains/devops_k8s.yaml`
Polish Leasing	`examples/polish_leasing.py`	`configs/domains/polish_finance.yaml`
Embedded/IoT	`examples/embedded_refactor.py`	`configs/domains/embedded.yaml`
Providers	`examples/providers.py`	env keys per provider
Python SDK	`examples/python_sdk.py`	env keys per provider
CLI + API	`examples/cli_examples.sh`, `examples/curl_api.sh`	server running

Run single example

python examples/quick_start.py python examples/k8s_debug.py python examples/polish_leasing.py

CLI + curl demos (server must be running)

bash examples/cli_examples.sh bash examples/curl_api.sh


### K8s Debugging

```python
from prellm import preprocess_and_execute

result = await preprocess_and_execute(
    query="Pod backend-api restartuje sie z CrashLoopBackOff",
    config_path="configs/domains/devops_k8s.yaml",
    strategy="structure",
    user_context={"cluster": "k8s-prod", "namespace": "backend"},
)

Polish Leasing Calculator

result = await preprocess_and_execute(
    query="Oblicz rate leasingu operacyjnego camper van za 250000 PLN netto, 48 miesiecy",
    config_path="configs/domains/polish_finance.yaml",
    strategy="structure",
)

Embedded/IoT Refactoring

result = await preprocess_and_execute(
    query="Zrefaktoruj ESP32 monitoring - za duzo hardcode'ow, brak OTA",
    config_path="configs/domains/embedded.yaml",
    strategy="structure",
    user_context={"mcu": "ESP32-S3", "flash": "8MB", "ram": "512KB"},
)

Documentation

Doc	Description
Persistent Context	v0.4 architecture — RuntimeContext, auto-env, codebase compression
Session Persistence	Export/import sessions, RAG retrieval, auto-inject, auto-learn
Sensitive Data	3-level filtering (safe/masked/blocked), YAML rules, integration
Flow Graphs	Mermaid diagrams — pipelines, context flow, streaming, Docker
CHANGELOG	Version history with detailed v0.4.0 entry
ROADMAP	12-month plan with completed milestones

Development

git clone https://github.com/wronai/prellm
cd prellm
poetry install
poetry run pytest                 # 376 tests (core + v0.4 context + examples)
poetry run pytest --cov           # coverage report
poetry run ruff check prellm/     # linting

Roadmap

See ROADMAP.md for the full plan.

License

Licensed under Apache-2.0.

Status

Last updated by taskill at 2026-04-25 13:43 UTC

Metric	Value
HEAD	`5a7fbf0`
Coverage	—
Failing tests	—
Commits in last cycle	50

Multiple commits introduce a configuration management system, deepen the code analysis engine (with supporting modules), and add CLI/env improvements (AppDefaults, LiteLLMEnv, delegate env file I/O). Documentation and tests were updated and several refactors applied across config, docs, examples, and tests.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
.prellm		.prellm
.taskill		.taskill
TODO		TODO
configs		configs
docs		docs
examples		examples
prellm		prellm
project		project
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.nojekyll		.nojekyll
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Dockerfile.test		Dockerfile.test
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
TODO.md		TODO.md
Taskfile.yml		Taskfile.yml
VERSION		VERSION
app.doql.css		app.doql.css
app.doql.less		app.doql.less
bumpver.toml		bumpver.toml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yaml		docker-compose.yaml
goal.yaml		goal.yaml
index.html		index.html
list.txt		list.txt
planfile.yaml		planfile.yaml
project.sh		project.sh
pyproject.toml		pyproject.toml
pyqual.yaml		pyqual.yaml
rules.yaml		rules.yaml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI Cost Tracking

Install & Run in 60 Seconds

Interactive Configuration (repo)

v0.4: Persistent Context for Small LLMs

Inspect runtime context

Manage persistent sessions

One Function — Two Execution Paths

Sync Version

LLM Provider Examples

Pull model: ollama pull qwen2.5:3b

Ollama + OpenAI (hybrid)

OpenAI only

Anthropic only

Groq (fast inference)

Mistral

OpenRouter (multi-provider + vision)

Azure OpenAI

AWS Bedrock

Drop-in Enhancement

LiteLLM Proxy Integration

OpenAI SDK-Compatible Server

Two-Agent Architecture (v0.3)

Custom Pipelines (YAML)

Available Pipelines

Custom Prompts (YAML)

Response Validation (YAML)

5 Decomposition Strategies (v0.2)

With Domain Rules

1. Code Refactoring

Per-Domain Defaults

Process Chains (DevOps Workflows)

Architecture

Examples

Run single example

CLI + curl demos (server must be running)

Polish Leasing Calculator

Embedded/IoT Refactoring

Documentation

Development

Roadmap

License

Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages