Skip to content

Commit 930ff96

Browse files
committed
Enhance multi-model token usage tracking
- Rename `--metrics` flag to `--tokens` for clarity - Add `--cost` flag to enable cost estimation for each model - Update README with comprehensive multi-model comparison example - Include new CLI options in configuration and help documentation - Improve documentation to highlight token and cost tracking benefits This change introduces more granular insights into model interactions, allowing users to: - Compare token consumption across different models - Estimate computational costs - Make informed decisions about model selection - Understand resource utilization during multi-model comparisons
1 parent cc19594 commit 930ff96

10 files changed

Lines changed: 232 additions & 10 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -240,7 +240,7 @@ Now with the fix:
240240
### Usage Examples
241241
```bash
242242
# Multi-model chat now properly isolates each model's context
243-
bin/aia --chat --model lms/openai/gpt-oss-20b,ollama/gpt-oss:20b --metrics
243+
bin/aia --chat --model lms/openai/gpt-oss-20b,ollama/gpt-oss:20b --tokens
244244
245245
> pick a random language and say hello
246246
# LMS: "Habari!" (Swahili)
@@ -375,7 +375,7 @@ aia --model ollama/llama3.2 --chat
375375

376376
## [0.9.13] 2025-09-02
377377
### New Features
378-
- **NEW FEATURE**: Added `--metrics` flag to show token counts for each model
378+
- **NEW FEATURE**: Added `--tokens` flag to show token counts for each model
379379
- **NEW FEATURE**: Added `--cost` flag to enable cost estimation for each model
380380

381381
### Improvements

README.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,47 @@ For more information on AIA visit these locations:
7979
8080
```
8181

82+
---
83+
84+
## Concurrent Multi-Model Comparison
85+
86+
One of AIA's most powerful features is the ability to send a single prompt to multiple AI models simultaneously and compare their responses side-by-side—complete with token usage and cost tracking.
87+
88+
```bash
89+
# Compare responses from 3 models with token counts and cost estimates
90+
aia --chat -m gpt-4o,claude-3-5-sonnet,gemini-1.5-pro --tokens --cost
91+
```
92+
93+
**Example output:**
94+
```
95+
You: What's the best approach for handling database migrations in a microservices architecture?
96+
97+
from: gpt-4o
98+
Use a versioned migration strategy with backward compatibility...
99+
100+
from: claude-3-5-sonnet
101+
Consider the Expand-Contract pattern for zero-downtime migrations...
102+
103+
from: gemini-1.5-pro
104+
Implement a schema registry with event-driven synchronization...
105+
106+
┌─────────────────────────────────────────────────────────────────┐
107+
│ Model │ Input Tokens │ Output Tokens │ Cost │
108+
├─────────────────────────────────────────────────────────────────┤
109+
│ gpt-4o │ 156 │ 342 │ $0.0089 │
110+
│ claude-3-5-sonnet │ 156 │ 418 │ $0.0063 │
111+
│ gemini-1.5-pro │ 156 │ 387 │ $0.0041 │
112+
└─────────────────────────────────────────────────────────────────┘
113+
```
114+
115+
**Why this matters:**
116+
- **Compare reasoning approaches** - See how different models tackle the same problem
117+
- **Identify blind spots** - One model might catch something others miss
118+
- **Cost optimization** - Find the best price/performance ratio for your use case
119+
- **Consensus building** - Use `--consensus` to synthesize the best answer from all models
120+
121+
---
122+
82123
<!-- Tocer[start]: Auto-generated, don't remove. -->
83124

84125
## Table of Contents
@@ -225,6 +266,8 @@ aia --fuzzy
225266
| `--list-roles` | List available role files | `aia --list-roles` |
226267
| `--output FILE` | Specify output file | `aia --output results.md` |
227268
| `--fuzzy` | Use fuzzy search for prompts | `aia --fuzzy` |
269+
| `--tokens` | Display token usage in chat mode | `aia --chat --tokens` |
270+
| `--cost` | Include cost calculations with token usage | `aia --chat --cost` |
228271
| `--help` | Show complete help | `aia --help` |
229272

230273
### Directory Structure
@@ -332,6 +375,8 @@ Your prompt content here...
332375
| system_prompt | --system-prompt | | AIA_SYSTEM_PROMPT |
333376
| temperature | -t, --temperature | 0.7 | AIA_LLM__TEMPERATURE |
334377
| terse | --terse | false | AIA_FLAGS__TERSE |
378+
| tokens | --tokens | false | AIA_FLAGS__TOKENS |
379+
| cost | --cost | false | AIA_FLAGS__COST |
335380
| tool_paths | --tools | [] | AIA_TOOLS__PATHS |
336381
| allowed_tools | --at, --allowed-tools | nil | AIA_TOOLS__ALLOWED |
337382
| rejected_tools | --rt, --rejected-tools | nil | AIA_TOOLS__REJECTED |
@@ -580,6 +625,39 @@ Model Details:
580625
- **Error Handling**: Invalid models are reported but don't prevent valid models from working
581626
- **Batch Mode Support**: Multi-model responses are properly formatted in output files
582627

628+
#### Token Usage and Cost Tracking
629+
630+
Monitor token consumption and estimate costs across all models with `--tokens` and `--cost`:
631+
632+
```bash
633+
# Display token usage for each model
634+
aia my_prompt -m gpt-4o,claude-3-sonnet --tokens
635+
636+
# Include cost estimates (automatically enables --tokens)
637+
aia my_prompt -m gpt-4o,claude-3-sonnet --cost
638+
639+
# In chat mode with full tracking
640+
aia --chat -m gpt-4o,claude-3-sonnet,gemini-pro --cost
641+
```
642+
643+
**Token Usage Output:**
644+
```
645+
from: gpt-4o
646+
Here's my analysis of the code...
647+
648+
from: claude-3-sonnet
649+
Looking at this code, I notice...
650+
651+
Tokens: gpt-4o: input=245, output=312 | claude-3-sonnet: input=245, output=287
652+
Cost: gpt-4o: $0.0078 | claude-3-sonnet: $0.0045 | Total: $0.0123
653+
```
654+
655+
**Use Cases for Token/Cost Tracking:**
656+
- **Budget management** - Monitor API costs in real-time during development
657+
- **Model comparison** - Identify which models are most cost-effective for your tasks
658+
- **Optimization** - Find the right balance between response quality and cost
659+
- **Billing insights** - Track usage patterns across different model providers
660+
583661
### Local Model Support
584662

585663
AIA supports running local AI models through Ollama and LM Studio, providing privacy, offline capability, and cost savings.

docs/cli-reference.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,24 @@ aia --terse my_prompt
4848
aia --terse --chat
4949
```
5050

51+
### `--tokens`
52+
Display token usage information after each response in chat mode. Shows input tokens, output tokens, and model ID.
53+
54+
```bash
55+
aia --chat --tokens
56+
aia --chat --tokens --model gpt-4
57+
```
58+
59+
### `--cost`
60+
Include cost calculations with token usage. Automatically enables `--tokens`. Shows estimated cost based on the model's pricing.
61+
62+
```bash
63+
aia --chat --cost
64+
aia --chat --cost --model gpt-4,claude-3-sonnet
65+
```
66+
67+
**Note**: `--cost` implies `--tokens`, so you don't need to specify both.
68+
5169
## Adapter Options
5270

5371
### `--adapter ADAPTER`

docs/configuration.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,8 @@ export GOOGLE_API_KEY="your_key_here"
135135
export AIA_FLAGS__CHAT="true"
136136
export AIA_FLAGS__VERBOSE="true"
137137
export AIA_FLAGS__DEBUG="false"
138+
export AIA_FLAGS__TOKENS="true"
139+
export AIA_FLAGS__COST="true"
138140
139141
# Output settings (nested under output:)
140142
export AIA_OUTPUT__FILE="/tmp/aia_output.md"

docs/guides/chat.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -421,6 +421,52 @@ class ChatCommands < RubyLLM::Tool
421421
end
422422
```
423423

424+
## Token Usage and Cost Tracking
425+
426+
### Displaying Token Usage
427+
Use the `--tokens` flag to see token usage after each response:
428+
429+
```bash
430+
# Enable token usage display
431+
aia --chat --tokens
432+
433+
# Example output after a response:
434+
# AI: Here's my response to your question...
435+
#
436+
# Tokens: input=125, output=89, model=gpt-4o-mini
437+
```
438+
439+
### Cost Estimation
440+
Use the `--cost` flag to include cost calculations with token usage:
441+
442+
```bash
443+
# Enable cost estimation (automatically enables --tokens)
444+
aia --chat --cost
445+
446+
# Example output after a response:
447+
# AI: Here's my response to your question...
448+
#
449+
# Tokens: input=125, output=89, model=gpt-4o-mini
450+
# Cost: $0.0003 (input: $0.0002, output: $0.0001)
451+
```
452+
453+
### Multi-Model Token Tracking
454+
When using multiple models, token usage is displayed for each model:
455+
456+
```bash
457+
aia --chat --tokens --model gpt-4,claude-3-sonnet
458+
459+
# Example output:
460+
# from: gpt-4
461+
# Here's my response...
462+
#
463+
# from: claude-3-sonnet
464+
# Here's my alternative response...
465+
#
466+
# Model: gpt-4 - Tokens: input=125, output=89
467+
# Model: claude-3-sonnet - Tokens: input=125, output=112
468+
```
469+
424470
## Troubleshooting Chat Mode
425471

426472
### Common Issues

docs/guides/models.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,84 @@ Model Details:
167167
- **Error Handling**: Invalid models are reported but don't prevent valid models from working
168168
- **Batch Mode Support**: Multi-model responses are properly formatted in output files
169169

170+
### Token Usage and Cost Tracking
171+
172+
One of AIA's most powerful capabilities is real-time tracking of token usage and cost estimates across multiple models. This enables informed decisions about model selection based on both quality and cost.
173+
174+
#### Enabling Token Tracking
175+
176+
```bash
177+
# Display token usage for each model
178+
aia my_prompt -m gpt-4o,claude-3-sonnet --tokens
179+
180+
# Include cost estimates (automatically enables --tokens)
181+
aia my_prompt -m gpt-4o,claude-3-sonnet --cost
182+
183+
# In chat mode with full tracking
184+
aia --chat -m gpt-4o,claude-3-sonnet,gemini-pro --cost
185+
```
186+
187+
#### Multi-Model Comparison with Metrics
188+
189+
```bash
190+
# Compare 3 models with cost tracking
191+
aia --chat -m gpt-4o,claude-3-5-sonnet,gemini-1.5-pro --cost
192+
```
193+
194+
**Example Output:**
195+
```
196+
You: Explain the CAP theorem and its implications for distributed databases.
197+
198+
from: gpt-4o
199+
The CAP theorem states that a distributed system can only guarantee two of three properties...
200+
201+
from: claude-3-5-sonnet
202+
CAP theorem, proposed by Eric Brewer, describes fundamental trade-offs in distributed systems...
203+
204+
from: gemini-1.5-pro
205+
The CAP theorem is a cornerstone principle in distributed computing that states...
206+
207+
┌─────────────────────────────────────────────────────────────────┐
208+
│ Model │ Input Tokens │ Output Tokens │ Cost │
209+
├─────────────────────────────────────────────────────────────────┤
210+
│ gpt-4o │ 42 │ 287 │ $0.0068 │
211+
│ claude-3-5-sonnet │ 42 │ 312 │ $0.0053 │
212+
│ gemini-1.5-pro │ 42 │ 298 │ $0.0038 │
213+
└─────────────────────────────────────────────────────────────────┘
214+
Total: $0.0159
215+
```
216+
217+
#### Use Cases for Token/Cost Tracking
218+
219+
| Use Case | Description |
220+
|----------|-------------|
221+
| **Budget Management** | Monitor API costs in real-time during development |
222+
| **Model Evaluation** | Compare quality vs. cost across different providers |
223+
| **Cost Optimization** | Identify the most cost-effective model for your tasks |
224+
| **Usage Auditing** | Track token consumption for billing and optimization |
225+
| **A/B Testing** | Compare model performance with objective metrics |
226+
227+
#### Combining with Consensus Mode
228+
229+
```bash
230+
# Get consensus response with cost breakdown
231+
aia my_prompt -m gpt-4o,claude-3-sonnet,gemini-pro --consensus --cost
232+
233+
# The consensus response shows combined metrics:
234+
# Tokens: input=126 (total), output=892 (consensus + individual)
235+
# Cost: $0.0189 (all models combined)
236+
```
237+
238+
#### Environment Variables
239+
240+
```bash
241+
# Enable token tracking by default
242+
export AIA_FLAGS__TOKENS=true
243+
244+
# Enable cost tracking by default
245+
export AIA_FLAGS__COST=true
246+
```
247+
170248
### Per-Model Roles
171249

172250
Assign specific roles to each model in multi-model mode to get diverse perspectives on your prompts. Each model receives a prepended role prompt that shapes its perspective.

lib/aia/config.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ def setup(cli_overrides = {})
174174
chat: [:flags, :chat],
175175
cost: [:flags, :cost],
176176
fuzzy: [:flags, :fuzzy],
177-
metrics: [:flags, :metrics],
177+
tokens: [:flags, :tokens],
178178
no_mcp: [:flags, :no_mcp],
179179
terse: [:flags, :terse],
180180
debug: [:flags, :debug],

lib/aia/config/cli_parser.rb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -290,13 +290,13 @@ def setup_utility_options(opts, options)
290290
options[:completion] = shell
291291
end
292292

293-
opts.on("--metrics", "Display token usage in chat mode") do
294-
options[:metrics] = true
293+
opts.on("--tokens", "Display token usage in chat mode") do
294+
options[:tokens] = true
295295
end
296296

297-
opts.on("--cost", "Include cost calculations with metrics") do
297+
opts.on("--cost", "Include cost calculations with token usage") do
298298
options[:cost] = true
299-
options[:metrics] = true # --cost implies --metrics
299+
options[:tokens] = true # --cost implies --tokens
300300
end
301301

302302
opts.on("--mcp FILE", "Load MCP server(s) from JSON file (can be used multiple times)") do |file|

lib/aia/config/defaults.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ flags:
127127
debug: false
128128
verbose: false
129129
fuzzy: false
130-
metrics: false
130+
tokens: false
131131
no_mcp: false
132132
speak: false
133133
terse: false

lib/aia/session.rb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -384,8 +384,8 @@ def run_chat_loop
384384

385385
@ui_presenter.display_ai_response(content)
386386

387-
# Display metrics if enabled and available (chat mode only)
388-
if AIA.config.flags.metrics
387+
# Display token usage if enabled and available (chat mode only)
388+
if AIA.config.flags.tokens
389389
if multi_metrics
390390
# Display metrics for each model in multi-model mode
391391
@ui_presenter.display_multi_model_metrics(multi_metrics)

0 commit comments

Comments
 (0)