Model Quartermaster

Model Quartermaster (MQM)

MQM is a learning-based model selection engine that dynamically routes requests to the most appropriate LLM based on task characteristics, historical performance, cost constraints, and learned patterns. It is implemented in packages/infra/src/model-quartermaster/ and registered as a pipeline hook at pre-llm and post-llm stages.

The legacy Quartermaster (packages/infra/src/quartermaster/) handles tool orchestration learning (predicting which tool to use next) and is documented separately.

Architecture

6-Signal Prediction Engine

Signal	Source	Purpose
Historical performance	Per-task category stats	Which model performed best for this task type
Episodic memory hits	Memory system	Similar past requests and their outcomes
Cost optimization	Provider cost data	Token/$ efficiency
Quality estimation	Reflection feedback	Confidence and correctness scores
Trajectory patterns	Recent model usage	Context from the current session
Reflection feedback	Post-turn analysis	Self-assessment of response quality

Signals are fused via weighted combination to predict the best model before each LLM call.

Decision Modes

Mode	Confidence	Behavior
`enforce`	> 0.85	Override model selection entirely
`suggest`	> 0.65	Inject hint into system prompt
`defer`	≤ 0.65	Use default provider

Adaptive Learning

Signal weights update via EMA (Exponential Moving Average):

new_weight = old_weight + learning_rate × (reward - old_weight)

Learning rate decays over time: 0.05 × 0.995^observations, driven by quality and cost efficiency feedback after each turn.

Observation-First Startup

MQM starts in observe-only mode. It records performance data for the first 10 LLM calls before activating and making predictions. This enables faster learning and productivity activation after just a handful of turns.

Arbiter Strategies

Strategy	Preference	Confidence Required	Best For
`conservative`	Cheaper models	High	Cost-sensitive deployments
`balanced`	Cost/quality balance	Standard	General use (default)
`aggressive`	Highest quality	Lower	Quality-critical tasks

Auto Model Selection Mode (v0.46+)

Auto mode enables per-turn dynamic model selection using a configurable model pool managed through the Quartermaster UI. This extends beyond MQM to allow explicit user control over available models.

How It Works

Model Pool — a list of {provider, model, enabled} entries curated in the Quartermaster settings UI
Pool Filtering — at pre-llm time, only enabled models are considered
MQM Prediction — if enabled, MQM predicts the best model from the filtered pool
Heuristic Fallback — if MQM is disabled or confidence is low, falls back to complexity-based selection (simple → fast model, complex → powerful model)
Per-Turn Metadata — resolved model info is reported in the WebSocket done payload: requestedModelMode, resolvedProvider, resolvedModel, autoFallback, autoFallbackReason

Configuration

{
  "modelSelection": {
    "autoModelPool": [
      { "provider": "anthropic", "model": "claude-sonnet-4-5", "enabled": true },
      { "provider": "openai", "model": "gpt-4o", "enabled": true },
      { "provider": "ollama", "model": "llama3.2", "enabled": false }
    ]
  }
}

Chat UI

Model selector gains an Auto option alongside manual provider/model selection
When Auto is selected, the chat header displays the resolved model per turn
Auto fallback reasons (e.g., "MQM disabled", "low confidence") shown in warning toasts

Tool Prediction with Auto Mode

When Quartermaster tool prediction is active, the follow-up instruction after tool execution includes a hint suggesting which tool to use next based on learned patterns. This nudges the model toward productive actions (e.g., suggesting file_write when the model is stuck reading files).

Task Categorization

Requests are automatically classified into 5 categories using heuristic keyword matching:

Category	Triggers	Example Queries
`code`	Code blocks, function/class keywords, file paths	"Write a Python script to..."
`analysis`	Data, summarize, evaluate, compare	"Analyze this log output..."
`creative`	Write, design, create, story	"Write a blog post about..."
`factual`	What is, who, when, define, explain	"What is the capital of..."
`conversation`	Greetings, thanks, general chat	"Hello, how are you?"

Context Fingerprinting

Multi-feature context extraction for pattern matching: message length, code detection, question count, complexity estimation, tool round depth, file count, error context, and session age.

Database Schema

5 tables in cortex.db (migration 019):

Table	Purpose
`mqm_model_stats`	Per-model performance: success rate, avg tokens, avg cost, task category breakdown
`mqm_signal_weights`	Learned signal importance with EMA history
`mqm_decisions`	Full audit trail: predicted vs actual, confidence, mode, correctness
`mqm_session_state`	Per-session tracking and context
`mqm_patterns`	Learned model selection patterns by task + context

Pipeline Integration

MQM runs as a pipeline hook (@cortex/model-quartermaster, priority 5):

pre-llm — computes prediction, enforces model override if confidence > 0.85
post-llm — records outcome, updates weights, logs lens audit event

Observatory Events

5 lens audit event types:

mqm_prediction — model selection prediction made
mqm_observation — outcome recorded after LLM call
mqm_weight_updated — signal weight adjusted
mqm_pattern_learned — new pattern detected
mqm_mode_changed — arbiter strategy or mode switched

CLI

cortex mqm stats            # Performance statistics per model
cortex mqm decisions        # Recent routing decisions
cortex mqm weights          # Current signal weights
cortex mqm accuracy         # Prediction accuracy metrics

Web UI

The Quartermaster unified page provides three panes:

Tool Orchestration — QM patterns, decisions, tool stats (from legacy Quartermaster)
Model Intelligence — MQM model stats, accuracy trends chart, signal weight bars, top-10 tool stats with success rate bars
Settings — Enable/disable Model Intelligence, pin dedicated QM provider + model, choose strategy, observe threshold

Configuration

{
  "modelSelection": {
    "enabled": false,
    "mode": "balanced",
    "observeThreshold": 50,
    "enforceConfidence": 0.85,
    "suggestConfidence": 0.65,
    "costBudget": 1.0,
    "qualityThreshold": 0.7,
    "allowedProviders": ["anthropic", "openai", "ollama"],
    "quartermasterProvider": "google",
    "quartermasterModel": "gemini-2.0-flash",
    "autoModelPool": [
      { "provider": "anthropic", "model": "claude-sonnet-4-5", "enabled": true },
      { "provider": "openai", "model": "gpt-4o", "enabled": true }
    ]
  }
}

API Endpoints

Method	Path	Description
`GET`	`/api/qm/config`	Current MQM config
`POST`	`/api/qm/config`	Update MQM config
`POST`	`/api/qm/reset`	Reset all learned state
`GET`	`/api/mqm/summary`	High-level MQM summary
`GET`	`/api/mqm/accuracy`	Prediction accuracy
`GET`	`/api/mqm/stats`	Per-model statistics
`GET`	`/api/mqm/decisions`	Decision audit log
`GET`	`/api/mqm/weights`	Current signal weights

Uh oh!

Uh oh!

Model Quartermaster

Model Quartermaster (MQM)

Architecture

6-Signal Prediction Engine

Decision Modes

Adaptive Learning

Observation-First Startup

Arbiter Strategies

Auto Model Selection Mode (v0.46+)

How It Works

Configuration

Chat UI

Tool Prediction with Auto Mode

Task Categorization

Context Fingerprinting

Database Schema

Pipeline Integration

Observatory Events

CLI

Web UI

Configuration

API Endpoints

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!