-
-
Notifications
You must be signed in to change notification settings - Fork 119
Model Quartermaster
MQM is a learning-based model selection engine that dynamically routes requests to the most appropriate LLM based on task characteristics, historical performance, cost constraints, and learned patterns. It is implemented in packages/infra/src/model-quartermaster/ and registered as a pipeline hook at pre-llm and post-llm stages.
The legacy Quartermaster (packages/infra/src/quartermaster/) handles tool orchestration learning (predicting which tool to use next) and is documented separately.
| Signal | Source | Purpose |
|---|---|---|
| Historical performance | Per-task category stats | Which model performed best for this task type |
| Episodic memory hits | Memory system | Similar past requests and their outcomes |
| Cost optimization | Provider cost data | Token/$ efficiency |
| Quality estimation | Reflection feedback | Confidence and correctness scores |
| Trajectory patterns | Recent model usage | Context from the current session |
| Reflection feedback | Post-turn analysis | Self-assessment of response quality |
Signals are fused via weighted combination to predict the best model before each LLM call.
| Mode | Confidence | Behavior |
|---|---|---|
enforce |
> 0.85 | Override model selection entirely |
suggest |
> 0.65 | Inject hint into system prompt |
defer |
≤ 0.65 | Use default provider |
Signal weights update via EMA (Exponential Moving Average):
new_weight = old_weight + learning_rate × (reward - old_weight)
Learning rate decays over time: 0.05 × 0.995^observations, driven by quality and cost efficiency feedback after each turn.
MQM starts in observe-only mode. It records performance data for the first 10 LLM calls before activating and making predictions. This enables faster learning and productivity activation after just a handful of turns.
| Strategy | Preference | Confidence Required | Best For |
|---|---|---|---|
conservative |
Cheaper models | High | Cost-sensitive deployments |
balanced |
Cost/quality balance | Standard | General use (default) |
aggressive |
Highest quality | Lower | Quality-critical tasks |
Auto mode enables per-turn dynamic model selection using a configurable model pool managed through the Quartermaster UI. This extends beyond MQM to allow explicit user control over available models.
-
Model Pool — a list of
{provider, model, enabled}entries curated in the Quartermaster settings UI - Pool Filtering — at pre-llm time, only enabled models are considered
- MQM Prediction — if enabled, MQM predicts the best model from the filtered pool
- Heuristic Fallback — if MQM is disabled or confidence is low, falls back to complexity-based selection (simple → fast model, complex → powerful model)
-
Per-Turn Metadata — resolved model info is reported in the WebSocket
donepayload:requestedModelMode,resolvedProvider,resolvedModel,autoFallback,autoFallbackReason
{
"modelSelection": {
"autoModelPool": [
{ "provider": "anthropic", "model": "claude-sonnet-4-5", "enabled": true },
{ "provider": "openai", "model": "gpt-4o", "enabled": true },
{ "provider": "ollama", "model": "llama3.2", "enabled": false }
]
}
}- Model selector gains an Auto option alongside manual provider/model selection
- When Auto is selected, the chat header displays the resolved model per turn
- Auto fallback reasons (e.g., "MQM disabled", "low confidence") shown in warning toasts
When Quartermaster tool prediction is active, the follow-up instruction after tool execution includes a hint suggesting which tool to use next based on learned patterns. This nudges the model toward productive actions (e.g., suggesting file_write when the model is stuck reading files).
Requests are automatically classified into 5 categories using heuristic keyword matching:
| Category | Triggers | Example Queries |
|---|---|---|
code |
Code blocks, function/class keywords, file paths | "Write a Python script to..." |
analysis |
Data, summarize, evaluate, compare | "Analyze this log output..." |
creative |
Write, design, create, story | "Write a blog post about..." |
factual |
What is, who, when, define, explain | "What is the capital of..." |
conversation |
Greetings, thanks, general chat | "Hello, how are you?" |
Multi-feature context extraction for pattern matching: message length, code detection, question count, complexity estimation, tool round depth, file count, error context, and session age.
5 tables in cortex.db (migration 019):
| Table | Purpose |
|---|---|
mqm_model_stats |
Per-model performance: success rate, avg tokens, avg cost, task category breakdown |
mqm_signal_weights |
Learned signal importance with EMA history |
mqm_decisions |
Full audit trail: predicted vs actual, confidence, mode, correctness |
mqm_session_state |
Per-session tracking and context |
mqm_patterns |
Learned model selection patterns by task + context |
MQM runs as a pipeline hook (@cortex/model-quartermaster, priority 5):
-
pre-llm— computes prediction, enforces model override if confidence > 0.85 -
post-llm— records outcome, updates weights, logs lens audit event
5 lens audit event types:
-
mqm_prediction— model selection prediction made -
mqm_observation— outcome recorded after LLM call -
mqm_weight_updated— signal weight adjusted -
mqm_pattern_learned— new pattern detected -
mqm_mode_changed— arbiter strategy or mode switched
cortex mqm stats # Performance statistics per model
cortex mqm decisions # Recent routing decisions
cortex mqm weights # Current signal weights
cortex mqm accuracy # Prediction accuracy metricsThe Quartermaster unified page provides three panes:
- Tool Orchestration — QM patterns, decisions, tool stats (from legacy Quartermaster)
- Model Intelligence — MQM model stats, accuracy trends chart, signal weight bars, top-10 tool stats with success rate bars
- Settings — Enable/disable Model Intelligence, pin dedicated QM provider + model, choose strategy, observe threshold
{
"modelSelection": {
"enabled": false,
"mode": "balanced",
"observeThreshold": 50,
"enforceConfidence": 0.85,
"suggestConfidence": 0.65,
"costBudget": 1.0,
"qualityThreshold": 0.7,
"allowedProviders": ["anthropic", "openai", "ollama"],
"quartermasterProvider": "google",
"quartermasterModel": "gemini-2.0-flash",
"autoModelPool": [
{ "provider": "anthropic", "model": "claude-sonnet-4-5", "enabled": true },
{ "provider": "openai", "model": "gpt-4o", "enabled": true }
]
}
}| Method | Path | Description |
|---|---|---|
GET |
/api/qm/config |
Current MQM config |
POST |
/api/qm/config |
Update MQM config |
POST |
/api/qm/reset |
Reset all learned state |
GET |
/api/mqm/summary |
High-level MQM summary |
GET |
/api/mqm/accuracy |
Prediction accuracy |
GET |
/api/mqm/stats |
Per-model statistics |
GET |
/api/mqm/decisions |
Decision audit log |
GET |
/api/mqm/weights |
Current signal weights |
- Model Routing — Cascade and threshold routing strategies
- LLM Providers — All 30 supported providers
- Configuration — MQM config options
- Agent Loop — How MQM integrates into agent turns
CortexPrism — Open-source AI agent operating system · Discord · Apache 2.0 License · Built with Deno 2.x + TypeScript
- Agent Loop
- Built-in Agents
- Metacognition
- Memory System
- Skills System
- Sub-Agents
- Built-in Tools
- Code Intelligence
- Code Sandbox
- Cross-Agent Context Protocol
- Prompt Lab
- PKM Assistant
- Voice Pipeline
- Computer Use
- Browser Tool
- Git & GitHub
- Scheduler & Jobs
- Dashboard
- Observability
- A2A Protocol
- MCP Gateway
- Distributed Nodes
- Memori Checkpoints
- Eval System
- Workflow Engine
- Triggers
- Projects
- TUI
- Glossary
- Update System
- Chrome Bridge
- Swarm
- AgentLint
- Model Benchmarking
- Smart Context
- Cost Optimizer