Skip to content

Commit b313ace

Browse files
authored
Merge pull request #5 from jgarzik/updates
Compaction, cost tracking, -O optimization and more
2 parents 5f56903 + 87bde0a commit b313ace

31 files changed

Lines changed: 2588 additions & 355 deletions

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ clap = { version = "4", features = ["derive", "env"] }
1515
dirs = "5"
1616
dotenvy = "0.15"
1717
glob = "0.3"
18+
once_cell = "1"
1819
regex = "1"
1920
rustyline = { version = "17", features = ["with-file-history"] }
2021
serde = { version = "1", features = ["derive"] }

IDEAS.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# Future Optimization Ideas for `-O` Mode
2+
3+
This document captures ideas for future enhancements to the `-O` optimization flag, building on the research that shorter, denser prompts improve LLM performance.
4+
5+
## Research Foundation
6+
7+
- **Context Rot**: Accuracy degrades as prompts grow longer (Chroma study on 18 models)
8+
- **LLMLingua**: 20x compression with only 1.5% performance loss
9+
- **Positive Framing**: "Do this" outperforms "don't do this" in prompts
10+
- **Signal Density**: "Find the smallest set of high-signal tokens that maximize the likelihood of your desired outcome"
11+
12+
Sources:
13+
- https://github.com/microsoft/LLMLingua
14+
- https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
15+
- https://gritdaily.com/impact-prompt-length-llm-performance/
16+
17+
---
18+
19+
## Implemented Layers
20+
21+
### Layer 1: Terse System Prompt
22+
- Reduced from ~60 tokens to ~15 tokens
23+
- Positive framing: "AI-to-AI mode. Maximum information density. Structure over prose. No narration."
24+
25+
### Layer 2: Compressed Tool Schemas
26+
- Tool descriptions shortened (e.g., "Read file content. Paths relative to project root." → "Read file")
27+
- Parameter descriptions stripped in optimize mode
28+
- Uses `SchemaOptions` struct for extensibility
29+
30+
---
31+
32+
## Future Layers
33+
34+
### Layer 3: Tool Result Compression
35+
36+
**Concept**: Strip metadata from tool results in `-O` mode.
37+
38+
Current Read result:
39+
```json
40+
{
41+
"path": "foo.rs",
42+
"offset": 0,
43+
"truncated": false,
44+
"content": "...",
45+
"sha256": "abc123",
46+
"lines": 42
47+
}
48+
```
49+
50+
Optimized result:
51+
```json
52+
{"content": "..."}
53+
```
54+
55+
**Implementation**:
56+
- Add `optimize` flag to `tools::execute()`
57+
- Conditionally strip fields: `path`, `offset`, `truncated`, `sha256`, `lines`
58+
- Keep only essential data needed for task completion
59+
60+
**Estimated token savings**: 30-50% per tool result
61+
62+
---
63+
64+
### Layer 4: History Summarization
65+
66+
**Concept**: Compress older conversation turns to maintain context while reducing tokens.
67+
68+
**Approaches**:
69+
1. **Sliding Window**: Keep only last N turns in full, summarize older ones
70+
2. **Semantic Compression**: Use small model to compress verbose assistant responses
71+
3. **Result Deduplication**: Merge repeated tool results (e.g., multiple Read calls on same file)
72+
73+
**Implementation ideas**:
74+
- Add `conversation_compressor` module
75+
- Trigger compression when context exceeds threshold
76+
- Preserve tool call/result structure for agent continuity
77+
78+
**Research reference**: LLMLingua-2 achieves 3-6x faster compression with task-agnostic distillation
79+
80+
---
81+
82+
### Layer 5: Output Style Enforcement
83+
84+
**Concept**: Enforce structured output format in `-O` mode.
85+
86+
**Current**: LLM outputs natural language explanations mixed with actions
87+
**Optimized**: Pure structured output, no prose
88+
89+
**Implementation ideas**:
90+
1. **Structured Output Schema**: Add JSON schema for responses
91+
2. **Response Format Instruction**: "Respond only with tool calls or structured JSON"
92+
3. **Post-processing**: Strip explanation text, keep only actions
93+
94+
**Example transformation**:
95+
```
96+
Before: "I'll read the config file to understand the settings. Let me use the Read tool..."
97+
After: [tool_call: Read, path: "config.toml"]
98+
```
99+
100+
**Trade-off**: May reduce transparency for human review, but ideal for AI-to-AI pipelines
101+
102+
---
103+
104+
### Layer 6: Dynamic Tool Injection
105+
106+
**Concept**: Only include tool schemas likely needed for the current task.
107+
108+
**Current**: All 8 tools included in every request
109+
**Optimized**: Analyze prompt, inject relevant subset
110+
111+
**Heuristics**:
112+
- "read", "view", "show" → Read, Grep, Glob
113+
- "edit", "modify", "change" → Read, Edit, Write
114+
- "run", "execute", "test", "build" → Bash
115+
- "find", "search" → Grep, Glob
116+
- "delegate", "subagent" → Task
117+
118+
**Implementation**:
119+
- Add `infer_tools_from_prompt(prompt: &str) -> Vec<ToolName>`
120+
- Apply before schema generation
121+
- Fall back to full toolset if uncertain
122+
123+
---
124+
125+
### Layer 7: CodeAgents-Style Pseudocode
126+
127+
**Concept**: Use structured pseudocode instead of natural language for reasoning.
128+
129+
**Research**: CodeAgents framework reduces tokens by 55-87%.
130+
131+
**Current**:
132+
```
133+
I need to first read the file to understand its structure, then I'll make the edit...
134+
```
135+
136+
**Optimized**:
137+
```
138+
PLAN: Read("src/main.rs") -> Edit(find="old", replace="new")
139+
```
140+
141+
**Implementation**:
142+
- Add `--reasoning-format=pseudocode` option
143+
- Train/prompt model to use structured planning notation
144+
- Parse pseudocode for execution
145+
146+
---
147+
148+
## Measurement & Validation
149+
150+
To validate optimization effectiveness:
151+
152+
1. **Token Counting**: Compare input/output tokens with and without `-O`
153+
2. **Task Success Rate**: Ensure optimizations don't reduce accuracy
154+
3. **Latency**: Measure time-to-first-token improvement
155+
4. **Cost**: Calculate API cost savings
156+
157+
**Suggested benchmarks**:
158+
- Simple file read/edit tasks
159+
- Multi-step refactoring tasks
160+
- Codebase exploration tasks
161+
162+
---
163+
164+
## Configuration Ideas
165+
166+
Future `SchemaOptions` extensions:
167+
```rust
168+
pub struct SchemaOptions {
169+
pub optimize: bool,
170+
// Future fields:
171+
pub compress_results: bool,
172+
pub dynamic_tools: bool,
173+
pub pseudocode_reasoning: bool,
174+
pub max_history_turns: Option<usize>,
175+
}
176+
```
177+
178+
Command-line exposure:
179+
```
180+
yo -O # Enable all optimizations
181+
yo -O --no-compress # Optimize schemas but not results
182+
yo --optimize-level=2 # Granular control
183+
```

fixtures/mcp_calc_server/src/main.rs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@ use std::io::{self, BufRead, Write};
66

77
#[derive(Deserialize)]
88
struct JsonRpcRequest {
9-
#[allow(dead_code)]
10-
jsonrpc: String,
119
id: Option<u64>,
1210
method: String,
1311
params: Option<Value>,

src/agent.rs

Lines changed: 73 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
33
use crate::{
44
cli::Context,
5-
llm,
5+
llm::{self, LlmClient},
66
plan::{self, PlanPhase},
77
policy::Decision,
88
tools,
@@ -34,6 +34,15 @@ impl CommandStats {
3434
}
3535
}
3636

37+
/// Result of a turn, including stats and continuation info
38+
#[derive(Debug, Default, Clone)]
39+
pub struct TurnResult {
40+
pub stats: CommandStats,
41+
/// If true, a Stop hook requested continuation with the given prompt
42+
pub force_continue: bool,
43+
pub continue_prompt: Option<String>,
44+
}
45+
3746
const SYSTEM_PROMPT: &str = r#"You are an agentic coding assistant running locally.
3847
You can only access files via tools. All paths are relative to the project root.
3948
Use Glob/Grep to find files before Read. Before Edit/Write, explain what you will change.
@@ -53,12 +62,8 @@ fn verbose(ctx: &Context, message: &str) {
5362
}
5463
}
5564

56-
pub fn run_turn(
57-
ctx: &Context,
58-
user_input: &str,
59-
messages: &mut Vec<Value>,
60-
) -> Result<CommandStats> {
61-
let mut stats = CommandStats::default();
65+
pub fn run_turn(ctx: &Context, user_input: &str, messages: &mut Vec<Value>) -> Result<TurnResult> {
66+
let mut result = TurnResult::default();
6267
let _ = ctx.transcript.borrow_mut().user_message(user_input);
6368

6469
messages.push(json!({
@@ -111,9 +116,10 @@ pub fn run_turn(
111116
}
112117

113118
// Get built-in tool schemas (including Task for main agent) and add MCP tools
119+
let schema_opts = tools::SchemaOptions::new(ctx.args.optimize);
114120
let mut tool_schemas = if in_planning_mode {
115121
// In planning mode, only provide read-only tools
116-
tools::schemas()
122+
tools::schemas(&schema_opts)
117123
.into_iter()
118124
.filter(|schema| {
119125
if let Some(name) = schema
@@ -128,7 +134,7 @@ pub fn run_turn(
128134
})
129135
.collect()
130136
} else {
131-
tools::schemas_with_task()
137+
tools::schemas_with_task(&schema_opts)
132138
};
133139

134140
// Only add MCP tools if not in planning mode
@@ -187,6 +193,11 @@ pub fn run_turn(
187193
SYSTEM_PROMPT.to_string()
188194
};
189195

196+
// Add optimization mode instructions if -O flag is set
197+
if ctx.args.optimize {
198+
system_prompt.push_str("\n\nAI-to-AI mode. Maximum information density. Structure over prose. No narration.");
199+
}
200+
190201
// Add skill pack index
191202
let skill_index = ctx.skill_index.borrow();
192203
let skill_prompt = skill_index.format_for_prompt(50);
@@ -222,8 +233,25 @@ pub fn run_turn(
222233

223234
// Track token usage from this LLM call
224235
if let Some(usage) = &response.usage {
225-
stats.input_tokens += usage.prompt_tokens;
226-
stats.output_tokens += usage.completion_tokens;
236+
result.stats.input_tokens += usage.prompt_tokens;
237+
result.stats.output_tokens += usage.completion_tokens;
238+
239+
// Record cost for this operation
240+
let turn_number = *ctx.turn_counter.borrow();
241+
let op = ctx.session_costs.borrow_mut().record_operation(
242+
turn_number,
243+
&target.model,
244+
usage.prompt_tokens,
245+
usage.completion_tokens,
246+
);
247+
248+
// Log token usage to transcript
249+
let _ = ctx.transcript.borrow_mut().token_usage(
250+
&target.model,
251+
usage.prompt_tokens,
252+
usage.completion_tokens,
253+
op.cost_usd,
254+
);
227255
}
228256

229257
if response.choices.is_empty() {
@@ -234,6 +262,13 @@ pub fn run_turn(
234262
let choice = &response.choices[0];
235263
let msg = &choice.message;
236264

265+
// Warn if response was truncated due to length limit
266+
if choice.finish_reason.as_deref() == Some("length") {
267+
eprintln!(
268+
"⚠️ Response truncated (max tokens reached). Consider increasing max_tokens or using /compact."
269+
);
270+
}
271+
237272
if let Some(content) = &msg.content {
238273
if !content.is_empty() {
239274
println!("{}", content);
@@ -311,7 +346,7 @@ pub fn run_turn(
311346
let args: Value = serde_json::from_str(&tc.function.arguments).unwrap_or(json!({}));
312347

313348
// Count this tool use
314-
stats.tool_uses += 1;
349+
result.stats.tool_uses += 1;
315350

316351
trace(
317352
ctx,
@@ -411,9 +446,9 @@ pub fn run_turn(
411446
}
412447
} else if name == "Task" {
413448
// Execute Task tool (subagent delegation)
414-
let (result, sub_stats) = tools::task::execute(args.clone(), ctx)?;
415-
stats.merge(&sub_stats);
416-
result
449+
let (task_result, sub_stats) = tools::task::execute(args.clone(), ctx)?;
450+
result.stats.merge(&sub_stats);
451+
task_result
417452
} else if name.starts_with("mcp.") {
418453
// Execute MCP tool
419454
let start = std::time::Instant::now();
@@ -508,8 +543,28 @@ pub fn run_turn(
508543
}
509544
}
510545

511-
// Run Stop hooks (note: force_continue not implemented yet)
512-
let _ = ctx.hooks.borrow().on_stop("end_turn", None);
546+
// Run Stop hooks - may request continuation
547+
let last_assistant_message = messages.iter().rev().find_map(|m| {
548+
if m["role"].as_str() == Some("assistant") {
549+
m["content"].as_str().map(|s| s.to_string())
550+
} else {
551+
None
552+
}
553+
});
554+
555+
let (force_continue, continue_prompt) = ctx
556+
.hooks
557+
.borrow()
558+
.on_stop("end_turn", last_assistant_message.as_deref());
559+
560+
// If force_continue is requested, signal to caller to run another turn
561+
if force_continue {
562+
if let Some(prompt) = continue_prompt {
563+
result.force_continue = true;
564+
result.continue_prompt = Some(prompt);
565+
verbose(ctx, "Stop hook requested continuation");
566+
}
567+
}
513568

514-
Ok(stats)
569+
Ok(result)
515570
}

0 commit comments

Comments
 (0)