Enhance multi-model token usage tracking

MadBomber · MadBomber · commit 930ff96c2e04 · 2025-12-25T14:01:10.000-06:00
- Rename `--metrics` flag to `--tokens` for clarity
- Add `--cost` flag to enable cost estimation for each model
- Update README with comprehensive multi-model comparison example
- Include new CLI options in configuration and help documentation
- Improve documentation to highlight token and cost tracking benefits

This change introduces more granular insights into model interactions, allowing users to:
- Compare token consumption across different models
- Estimate computational costs
- Make informed decisions about model selection
- Understand resource utilization during multi-model comparisons
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -240,7 +240,7 @@ Now with the fix:
 ### Usage Examples
 ```bash
 # Multi-model chat now properly isolates each model's context
-bin/aia --chat --model lms/openai/gpt-oss-20b,ollama/gpt-oss:20b --metrics
+bin/aia --chat --model lms/openai/gpt-oss-20b,ollama/gpt-oss:20b --tokens
 
 > pick a random language and say hello
 # LMS: "Habari!" (Swahili)
@@ -375,7 +375,7 @@ aia --model ollama/llama3.2 --chat
 
 ## [0.9.13] 2025-09-02
 ### New Features
-- **NEW FEATURE**: Added `--metrics` flag to show token counts for each model
+- **NEW FEATURE**: Added `--tokens` flag to show token counts for each model
 - **NEW FEATURE**: Added `--cost` flag to enable cost estimation for each model
 
 ### Improvements
diff --git a/README.md b/README.md
@@ -79,6 +79,47 @@ For more information on AIA visit these locations:
 
 ```
 
+---
+
+## Concurrent Multi-Model Comparison
+
+One of AIA's most powerful features is the ability to send a single prompt to multiple AI models simultaneously and compare their responses side-by-side—complete with token usage and cost tracking.
+
+```bash
+# Compare responses from 3 models with token counts and cost estimates
+aia --chat -m gpt-4o,claude-3-5-sonnet,gemini-1.5-pro --tokens --cost
+```
+
+**Example output:**
+```
+You: What's the best approach for handling database migrations in a microservices architecture?
+
+from: gpt-4o
+Use a versioned migration strategy with backward compatibility...
+
+from: claude-3-5-sonnet
+Consider the Expand-Contract pattern for zero-downtime migrations...
+
+from: gemini-1.5-pro
+Implement a schema registry with event-driven synchronization...
+
+┌─────────────────────────────────────────────────────────────────┐
+│ Model               │ Input Tokens │ Output Tokens │ Cost      │
+├─────────────────────────────────────────────────────────────────┤
+│ gpt-4o              │ 156          │ 342           │ $0.0089   │
+│ claude-3-5-sonnet   │ 156          │ 418           │ $0.0063   │
+│ gemini-1.5-pro      │ 156          │ 387           │ $0.0041   │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+**Why this matters:**
+- **Compare reasoning approaches** - See how different models tackle the same problem
+- **Identify blind spots** - One model might catch something others miss
+- **Cost optimization** - Find the best price/performance ratio for your use case
+- **Consensus building** - Use `--consensus` to synthesize the best answer from all models
+
+---
+
 <!-- Tocer[start]: Auto-generated, don't remove. -->
 
 ## Table of Contents
@@ -225,6 +266,8 @@ aia --fuzzy
 | `--list-roles` | List available role files | `aia --list-roles` |
 | `--output FILE` | Specify output file | `aia --output results.md` |
 | `--fuzzy` | Use fuzzy search for prompts | `aia --fuzzy` |
+| `--tokens` | Display token usage in chat mode | `aia --chat --tokens` |
+| `--cost` | Include cost calculations with token usage | `aia --chat --cost` |
 | `--help` | Show complete help | `aia --help` |
 
 ### Directory Structure
@@ -332,6 +375,8 @@ Your prompt content here...
 | system_prompt | --system-prompt | | AIA_SYSTEM_PROMPT |
 | temperature | -t, --temperature | 0.7 | AIA_LLM__TEMPERATURE |
 | terse | --terse | false | AIA_FLAGS__TERSE |
+| tokens | --tokens | false | AIA_FLAGS__TOKENS |
+| cost | --cost | false | AIA_FLAGS__COST |
 | tool_paths | --tools | [] | AIA_TOOLS__PATHS |
 | allowed_tools | --at, --allowed-tools | nil | AIA_TOOLS__ALLOWED |
 | rejected_tools | --rt, --rejected-tools | nil | AIA_TOOLS__REJECTED |
@@ -580,6 +625,39 @@ Model Details:
 - **Error Handling**: Invalid models are reported but don't prevent valid models from working
 - **Batch Mode Support**: Multi-model responses are properly formatted in output files
 
+#### Token Usage and Cost Tracking
+
+Monitor token consumption and estimate costs across all models with `--tokens` and `--cost`:
+
+```bash
+# Display token usage for each model
+aia my_prompt -m gpt-4o,claude-3-sonnet --tokens
+
+# Include cost estimates (automatically enables --tokens)
+aia my_prompt -m gpt-4o,claude-3-sonnet --cost
+
+# In chat mode with full tracking
+aia --chat -m gpt-4o,claude-3-sonnet,gemini-pro --cost
+```
+
+**Token Usage Output:**
+```
+from: gpt-4o
+Here's my analysis of the code...
+
+from: claude-3-sonnet
+Looking at this code, I notice...
+
+Tokens: gpt-4o: input=245, output=312 | claude-3-sonnet: input=245, output=287
+Cost: gpt-4o: $0.0078 | claude-3-sonnet: $0.0045 | Total: $0.0123
+```
+
+**Use Cases for Token/Cost Tracking:**
+- **Budget management** - Monitor API costs in real-time during development
+- **Model comparison** - Identify which models are most cost-effective for your tasks
+- **Optimization** - Find the right balance between response quality and cost
+- **Billing insights** - Track usage patterns across different model providers
+
 ### Local Model Support
 
 AIA supports running local AI models through Ollama and LM Studio, providing privacy, offline capability, and cost savings.
diff --git a/docs/cli-reference.md b/docs/cli-reference.md
@@ -48,6 +48,24 @@ aia --terse my_prompt
 aia --terse --chat
 ```
 
+### `--tokens`
+Display token usage information after each response in chat mode. Shows input tokens, output tokens, and model ID.
+
+```bash
+aia --chat --tokens
+aia --chat --tokens --model gpt-4
+```
+
+### `--cost`
+Include cost calculations with token usage. Automatically enables `--tokens`. Shows estimated cost based on the model's pricing.
+
+```bash
+aia --chat --cost
+aia --chat --cost --model gpt-4,claude-3-sonnet
+```
+
+**Note**: `--cost` implies `--tokens`, so you don't need to specify both.
+
 ## Adapter Options
 
 ### `--adapter ADAPTER`
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -135,6 +135,8 @@ export GOOGLE_API_KEY="your_key_here"
 export AIA_FLAGS__CHAT="true"
 export AIA_FLAGS__VERBOSE="true"
 export AIA_FLAGS__DEBUG="false"
+export AIA_FLAGS__TOKENS="true"
+export AIA_FLAGS__COST="true"
 
 # Output settings (nested under output:)
 export AIA_OUTPUT__FILE="/tmp/aia_output.md"
diff --git a/docs/guides/chat.md b/docs/guides/chat.md
@@ -421,6 +421,52 @@ class ChatCommands < RubyLLM::Tool
 end
 ```
 
+## Token Usage and Cost Tracking
+
+### Displaying Token Usage
+Use the `--tokens` flag to see token usage after each response:
+
+```bash
+# Enable token usage display
+aia --chat --tokens
+
+# Example output after a response:
+# AI: Here's my response to your question...
+#
+# Tokens: input=125, output=89, model=gpt-4o-mini
+```
+
+### Cost Estimation
+Use the `--cost` flag to include cost calculations with token usage:
+
+```bash
+# Enable cost estimation (automatically enables --tokens)
+aia --chat --cost
+
+# Example output after a response:
+# AI: Here's my response to your question...
+#
+# Tokens: input=125, output=89, model=gpt-4o-mini
+# Cost: $0.0003 (input: $0.0002, output: $0.0001)
+```
+
+### Multi-Model Token Tracking
+When using multiple models, token usage is displayed for each model:
+
+```bash
+aia --chat --tokens --model gpt-4,claude-3-sonnet
+
+# Example output:
+# from: gpt-4
+# Here's my response...
+#
+# from: claude-3-sonnet
+# Here's my alternative response...
+#
+# Model: gpt-4 - Tokens: input=125, output=89
+# Model: claude-3-sonnet - Tokens: input=125, output=112
+```
+
 ## Troubleshooting Chat Mode
 
 ### Common Issues
diff --git a/docs/guides/models.md b/docs/guides/models.md
@@ -167,6 +167,84 @@ Model Details:
 - **Error Handling**: Invalid models are reported but don't prevent valid models from working
 - **Batch Mode Support**: Multi-model responses are properly formatted in output files
 
+### Token Usage and Cost Tracking
+
+One of AIA's most powerful capabilities is real-time tracking of token usage and cost estimates across multiple models. This enables informed decisions about model selection based on both quality and cost.
+
+#### Enabling Token Tracking
+
+```bash
+# Display token usage for each model
+aia my_prompt -m gpt-4o,claude-3-sonnet --tokens
+
+# Include cost estimates (automatically enables --tokens)
+aia my_prompt -m gpt-4o,claude-3-sonnet --cost
+
+# In chat mode with full tracking
+aia --chat -m gpt-4o,claude-3-sonnet,gemini-pro --cost
+```
+
+#### Multi-Model Comparison with Metrics
+
+```bash
+# Compare 3 models with cost tracking
+aia --chat -m gpt-4o,claude-3-5-sonnet,gemini-1.5-pro --cost
+```
+
+**Example Output:**
+```
+You: Explain the CAP theorem and its implications for distributed databases.
+
+from: gpt-4o
+The CAP theorem states that a distributed system can only guarantee two of three properties...
+
+from: claude-3-5-sonnet
+CAP theorem, proposed by Eric Brewer, describes fundamental trade-offs in distributed systems...
+
+from: gemini-1.5-pro
+The CAP theorem is a cornerstone principle in distributed computing that states...
+
+┌─────────────────────────────────────────────────────────────────┐
+│ Model               │ Input Tokens │ Output Tokens │ Cost      │
+├─────────────────────────────────────────────────────────────────┤
+│ gpt-4o              │ 42           │ 287           │ $0.0068   │
+│ claude-3-5-sonnet   │ 42           │ 312           │ $0.0053   │
+│ gemini-1.5-pro      │ 42           │ 298           │ $0.0038   │
+└─────────────────────────────────────────────────────────────────┘
+Total: $0.0159
+```
+
+#### Use Cases for Token/Cost Tracking
+
+| Use Case | Description |
+|----------|-------------|
+| **Budget Management** | Monitor API costs in real-time during development |
+| **Model Evaluation** | Compare quality vs. cost across different providers |
+| **Cost Optimization** | Identify the most cost-effective model for your tasks |
+| **Usage Auditing** | Track token consumption for billing and optimization |
+| **A/B Testing** | Compare model performance with objective metrics |
+
+#### Combining with Consensus Mode
+
+```bash
+# Get consensus response with cost breakdown
+aia my_prompt -m gpt-4o,claude-3-sonnet,gemini-pro --consensus --cost
+
+# The consensus response shows combined metrics:
+# Tokens: input=126 (total), output=892 (consensus + individual)
+# Cost: $0.0189 (all models combined)
+```
+
+#### Environment Variables
+
+```bash
+# Enable token tracking by default
+export AIA_FLAGS__TOKENS=true
+
+# Enable cost tracking by default
+export AIA_FLAGS__COST=true
+```
+
 ### Per-Model Roles
 
 Assign specific roles to each model in multi-model mode to get diverse perspectives on your prompts. Each model receives a prepended role prompt that shapes its perspective.
diff --git a/lib/aia/config.rb b/lib/aia/config.rb
@@ -174,7 +174,7 @@ def setup(cli_overrides = {})
       chat: [:flags, :chat],
       cost: [:flags, :cost],
       fuzzy: [:flags, :fuzzy],
-      metrics: [:flags, :metrics],
+      tokens: [:flags, :tokens],
       no_mcp: [:flags, :no_mcp],
       terse: [:flags, :terse],
       debug: [:flags, :debug],
diff --git a/lib/aia/config/cli_parser.rb b/lib/aia/config/cli_parser.rb
@@ -290,13 +290,13 @@ def setup_utility_options(opts, options)
           options[:completion] = shell
         end
 
-        opts.on("--metrics", "Display token usage in chat mode") do
-          options[:metrics] = true
+        opts.on("--tokens", "Display token usage in chat mode") do
+          options[:tokens] = true
         end
 
-        opts.on("--cost", "Include cost calculations with metrics") do
+        opts.on("--cost", "Include cost calculations with token usage") do
           options[:cost] = true
-          options[:metrics] = true  # --cost implies --metrics
+          options[:tokens] = true  # --cost implies --tokens
         end
 
         opts.on("--mcp FILE", "Load MCP server(s) from JSON file (can be used multiple times)") do |file|
diff --git a/lib/aia/config/defaults.yml b/lib/aia/config/defaults.yml
@@ -127,7 +127,7 @@ flags:
   debug: false
   verbose: false
   fuzzy: false
-  metrics: false
+  tokens: false
   no_mcp: false
   speak: false
   terse: false
diff --git a/lib/aia/session.rb b/lib/aia/session.rb
@@ -384,8 +384,8 @@ def run_chat_loop
 
         @ui_presenter.display_ai_response(content)
 
-        # Display metrics if enabled and available (chat mode only)
-        if AIA.config.flags.metrics
+        # Display token usage if enabled and available (chat mode only)
+        if AIA.config.flags.tokens
           if multi_metrics
             # Display metrics for each model in multi-model mode
             @ui_presenter.display_multi_model_metrics(multi_metrics)