Skip to content

1.2.0-rc.1#67

Merged
Windpicker-owo merged 6 commits into
mainfrom
dev
Jun 15, 2026
Merged

1.2.0-rc.1#67
Windpicker-owo merged 6 commits into
mainfrom
dev

Conversation

@Windpicker-owo

@Windpicker-owo Windpicker-owo commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Summary by Sourcery

Update LLM streaming, usage accounting, and configuration defaults while improving watchdog robustness and tool execution observability.

New Features:

  • Add support for extracting and propagating detailed usage statistics, including reasoning and cache tokens, from OpenAI and Anthropic responses.
  • Expose a stream_events API on LLMResponse to stream raw events while keeping response state (message, reasoning, tool calls) incrementally updated.
  • Allow agents to supply instance-level dynamic usables and propagate task observers through tool execution to track background tasks.

Bug Fixes:

  • Run message distribution logic for incoming messages as a background task to avoid EventBus timeouts cancelling critical operations.
  • Feed the stream watchdog from a dedicated background task so long LLM calls do not cause false-positive timeouts.
  • Ensure reasoning content snapshots are available incrementally during streaming without consuming internal accumulator state.

Enhancements:

  • Tune default model configuration to use newer model identifiers, larger context windows, updated pricing, and lower retry counts.
  • Improve cost calculation to correctly account for reasoning tokens without double-charging when completion tokens already include reasoning.
  • Respect chatter-level configuration for disabling message buffering when appropriate (e.g. interactive TUI environments).
  • Expose the current LLMUsable execution task via observers to integrate with external task tracking or lifecycle management.

Tests:

  • Add unit tests for OpenAI and Anthropic usage extraction and cost calculation with reasoning and cache tokens.
  • Add tests verifying incremental reasoning updates during streaming responses and the new stream_events API behavior.
  • Add tests for session usage_total persistence across save/load cycles, including backward compatibility with legacy session files.

Chores:

  • Bump project and core configuration versions to 1.2.0-rc.1 and export WaitResumeEvent in the plugin system base API.

@sourcery-ai

sourcery-ai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Implements 1.2.0-rc.1 with streaming stability, better reasoning/usage accounting, updated default models, and new tooling observability, while adding tests around token usage, cost calculation, reasoning streaming, and session usage persistence.

Sequence diagram for LLM usable execution task observers

sequenceDiagram
    participant Agent
    participant Chatter
    participant Tooling as llm_tool_call
    participant Exec as LLMUsableExecution
    participant Observer

    Agent->>Agent: execute_local_usable(task_observer)
    Agent->>Chatter: run_tool_call(..., task_observer)
    Chatter->>Tooling: run_tool_call(..., task_observer)
    Tooling->>Tooling: exec_llm_usable(..., task_observer)
    Tooling->>Tooling: create_llm_usable_execution(..., task_observer)
    Tooling->>Exec: __init__(execution)
    Tooling->>Exec: execution.add_task_observer(task_observer)
    Exec->>Exec: _set_task(asyncio.create_task(...))
    Exec-->>Observer: task_observer(task)
Loading

File-Level Changes

Change Details Files
Move message dispatch side-effects to a fire-and-forget background task and add per-chatter control over message buffering.
  • Wrap stream creation, message persistence, and stream loop startup in an inner async function executed via asyncio.create_task to avoid EventBus wait_for timeouts cancelling dispatch work.
  • Lazily import stream and chatter managers inside the background task instead of at event handler entry.
  • Inspect chatter class allow_message_buffer to optionally disable message buffering for certain chatters (e.g. interactive TUIs) and persist this on the stream context.
  • Preserve existing behavior of adding messages, updating last_message_time, and conditionally starting the StreamLoopManager when running.
src/core/transport/distribution/distributor.py
Add Anthropic and OpenAI usage extraction with reasoning and cache awareness, and ensure cost calculation treats reasoning tokens correctly.
  • Introduce _extract_anthropic_usage to normalize Anthropic usage (prompt/completion, cache hit/miss/write, completion_includes_reasoning).
  • Extend Anthropic non-stream and stream create paths to return a usage dict, aggregating message_start input usage with message_delta output tokens via StreamEvent.usage.
  • Extend OpenAI _extract_usage_from_obj to read reasoning_tokens from completion/output_tokens_details and set completion_includes_reasoning accordingly.
  • Update calculate_request_cost to handle reasoning_tokens and completion_includes_reasoning, avoiding double-charging when completion tokens already include reasoning.
  • Propagate reasoning_tokens into LLM stats collection and request records.
src/kernel/llm/model_client/anthropic_client.py
src/kernel/llm/model_client/openai_client.py
src/kernel/llm/observation.py
src/kernel/llm/stats/collector.py
test/kernel/llm/test_openai_usage.py
test/kernel/llm/test_anthropic_usage.py
test/kernel/llm/test_calculate_cost.py
Improve streaming response handling so reasoning content is tracked incrementally and exposed via both raw events and text deltas.
  • Change LLMReasoningAccumulator to support snapshotting current reasoning blocks without consuming state, and update LLMStreamReducer.finalize to use snapshot instead of finalize for reasoning_parts.
  • Refactor LLMResponse to add a stream_events() iterator that consumes the underlying stream once, updates message/reasoning/call_list state on each event via LLMStreamReducer, and yields raw StreamEvent objects.
  • Implement aiter in LLMResponse in terms of stream_events(), yielding only text_delta segments while still updating reasoning content.
  • Add tests ensuring asynchronous iteration over responses exposes incremental reasoning_content and final message content in both pytest and runtime-style tests.
src/kernel/llm/stream_state.py
src/kernel/llm/response.py
test/kernel/llm/test_response.py
test/kernel/llm/test_response_streaming_reasoning.py
Expose background LLMUsable execution tasks to callers and allow external observers to track created asyncio tasks, including from Agent and Chatter tool calls.
  • Enhance LLMUsableExecution to keep a list of task observers, notify them whenever a new internal task is created or updated via _set_task, and expose add_task_observer for registration.
  • Ensure all internal task creations/resumes (_advance_iterator, _await_result, resume) go through _set_task.
  • Plumb an optional task_observer down from Agent.execute_local_usable and Chatter.run_tool_call through create_llm_usable_execution and exec_llm_usable into run_tool_call so callers can subscribe to usable execution tasks.
src/kernel/llm/payload/tooling.py
src/core/components/base/agent.py
src/core/components/base/chatter.py
src/core/utils/llm_tool_call.py
Add an independent watchdog feeder within the stream loop to prevent long-running LLM calls from triggering spurious watchdog timeouts.
  • Introduce a background _watchdog_feeder coroutine inside the stream loop that periodically feeds the watchdog for the current stream_id based on configured bot.tick_interval (with a minimum interval).
  • Start the feeder as an asyncio task when the loop starts and ensure it is cancelled and awaited in the loop’s finally block to avoid leaks.
  • Keep the existing tick loop semantics unchanged aside from the extra heartbeat.
src/core/transport/distribution/loop.py
Update default model configuration, retries, and pricing for new model lineup and larger contexts.
  • Lower APIProviderSection.max_retry default from 3 to 2 and align SiliconFlow provider’s max_retry accordingly.
  • Increase ModelInfoSection.max_context default from 32768 to 131072 tokens.
  • Swap task model defaults to deepSeek-v4-flash and qwen3.5-4b for core tasks and qwen3.6-35b-a3b for multimodal tasks, and update tool_use model to deepSeek-v4-flash.
  • Adjust ModelInfoSection list to match new names, identifiers, and pricing including cache_hit_price_in for deepSeek-v4-flash and cheaper vision model pricing.
src/core/config/model_config.py
Track per-session usage totals in the coding agent plugin and ensure backward-compatible persistence of existing sessions.
  • Add a usage_total field to SessionData (from plugins.coding_agent.session_store) with default empty dict for new sessions.
  • Ensure SessionStore.save/load round-trips usage_total correctly and gracefully handles existing JSON files without this field by defaulting to {}.
  • Add tests covering default usage_total, multi-model usage maps, and migration behavior for old session files.
test/plugins/test_session_usage_persistence.py
Expose additional plugin event type and bump core/project version for the new release.
  • Export WaitResumeEvent from the plugin_system.base package for plugin authors.
  • Align core version string and project version to 1.2.0-rc.1 in configuration and packaging metadata.
src/app/plugin_system/base/__init__.py
src/core/config/core_config.py
pyproject.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In AnthropicChatClient._create_non_stream, _extract_anthropic_usage is called with the full response object instead of its usage field; this will miss the actual token stats on typical Anthropic responses (where usage is nested), so consider passing response.usage (or _get_attr(response, "usage")) instead.
  • In the Anthropic streaming path (_create_stream), when you merge message_delta usage into input_usage you update completion_tokens but never recompute total_tokens (and derived fields like cache_miss_tokens), so it would be safer to either call _extract_anthropic_usage on a combined usage object or explicitly recalculate total_tokens after setting completion_tokens.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `AnthropicChatClient._create_non_stream`, `_extract_anthropic_usage` is called with the full `response` object instead of its `usage` field; this will miss the actual token stats on typical Anthropic responses (where usage is nested), so consider passing `response.usage` (or `_get_attr(response, "usage")`) instead.
- In the Anthropic streaming path (`_create_stream`), when you merge `message_delta` usage into `input_usage` you update `completion_tokens` but never recompute `total_tokens` (and derived fields like `cache_miss_tokens`), so it would be safer to either call `_extract_anthropic_usage` on a combined usage object or explicitly recalculate `total_tokens` after setting `completion_tokens`.

## Individual Comments

### Comment 1
<location path="src/kernel/llm/model_client/anthropic_client.py" line_range="565" />
<code_context>
         if not reasoning_parts and reasoning_content:
             reasoning_parts = [ReasoningText(reasoning_content)]
-        return message_text, tool_calls, None, reasoning_parts or None
+        usage = _extract_anthropic_usage(response)
+        return message_text, tool_calls, None, reasoning_parts or None, usage

</code_context>
<issue_to_address>
**issue (bug_risk):** Non-stream Anthropic usage is extracted from the whole response object instead of its `usage` field

`response` is the full message object, but `_extract_anthropic_usage` expects the Anthropic `usage` payload. As written, this will almost always fall back to default values instead of real token counts, skewing cost and usage metrics. You likely want to pass `response.usage` (or similar), e.g.:

```python
usage = _extract_anthropic_usage(getattr(response, "usage", None))
```
</issue_to_address>

### Comment 2
<location path="src/kernel/llm/model_client/anthropic_client.py" line_range="593-602" />
<code_context>
+                        continue
+
+                    # ── message_delta:合并 output 侧 usage 并产出 StreamEvent ──
+                    if event_type == "message_delta":
+                        delta_usage = _get_attr(event, "usage")
+                        if delta_usage is not None:
+                            output_tokens = _get_attr(delta_usage, "output_tokens", 0) or 0
+                            input_usage["completion_tokens"] = output_tokens
+                            # Anthropic output_tokens 已包含 reasoning tokens
+                            input_usage["completion_includes_reasoning"] = True
+                        yield StreamEvent(usage=dict(input_usage) if input_usage else None)
+                        continue
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Anthropic streaming usage aggregation does not update `total_tokens` when output tokens arrive

In the streaming path, `input_usage` is initialized from `message_start` with `total_tokens = prompt_tokens + completion_tokens`. When handling `message_delta`, you overwrite `completion_tokens` but never recompute `total_tokens`, so it will be stale and won’t match the final output token count.

Consider updating `total_tokens` whenever you set `completion_tokens`, e.g.:

```python
if delta_usage is not None:
    output_tokens = _get_attr(delta_usage, "output_tokens", 0) or 0
    input_usage["completion_tokens"] = output_tokens
    input_usage["total_tokens"] = input_usage.get("prompt_tokens", 0) + output_tokens
    input_usage["completion_includes_reasoning"] = True
```

This keeps streaming and non-streaming usage accounting consistent.

```suggestion
                    # ── message_start:提取 input 侧 usage ──
                    if event_type == "message_start":
                        msg_obj = _get_attr(event, "message")
                        if msg_obj is not None:
                            msg_usage = _get_attr(msg_obj, "usage")
                            if msg_usage is not None:
                                input_usage = _extract_anthropic_usage(msg_usage)
                        continue

                    # ── message_delta:合并 output 侧 usage 并产出 StreamEvent ──
                    if event_type == "message_delta":
                        delta_usage = _get_attr(event, "usage")
                        if delta_usage is not None:
                            output_tokens = _get_attr(delta_usage, "output_tokens", 0) or 0
                            input_usage["completion_tokens"] = output_tokens
                            # 更新 total_tokens 以保持与非流式统计一致
                            input_usage["total_tokens"] = input_usage.get("prompt_tokens", 0) + output_tokens
                            # Anthropic output_tokens 已包含 reasoning tokens
                            input_usage["completion_includes_reasoning"] = True
                        yield StreamEvent(usage=dict(input_usage) if input_usage else None)
                        continue
```
</issue_to_address>

### Comment 3
<location path="test/kernel/llm/test_response.py" line_range="260-269" />
<code_context>
+    def store(self, tmp_path: Path) -> SessionStore:
+        return SessionStore(str(tmp_path))
+
+    @pytest.mark.asyncio
+    async def test_save_and_load_with_usage(self, store: SessionStore) -> None:
+        usage = {
+            "gpt-4o": {
+                "prompt_tokens": 5000,
+                "completion_tokens": 2000,
+                "cache_hit_tokens": 500,
+                "cache_write_tokens": 0,
+                "reasoning_tokens": 300,
+                "cost": 0.05,
+                "request_count": 8,
+            }
+        }
+        data = SessionData(
+            session_id="sess-1",
+            working_directory="/tmp/project",
+            title="Test Session",
+            created_at=1000.0,
+            phase="ready",
+            usage_total=usage,
+        )
+        await store.save("sess-1", data)
+
+        loaded = await store.load("sess-1")
</code_context>
<issue_to_address>
**issue (testing):** The new async reasoning snapshot test doesn’t match the updated LLMResponse iteration semantics

This test iterates over `response` (i.e. `LLMResponse.__aiter__`), which after the `stream_events` refactor now yields only `text_delta` chunks. With only two `text_delta` events ("答" and "案"), `snapshots` will only have two entries, so the expected `['先', '先', '先想', '先想']` will not be met. To validate per-event reasoning snapshots, iterate over `response.stream_events()` (as in `test_response_streaming_reasoning.py`) and assert across all four events. If instead `__aiter__` is meant to expose reasoning for every event, its implementation and associated tests should be updated to reflect that contract.
</issue_to_address>

### Comment 4
<location path="test/kernel/llm/test_anthropic_client.py" line_range="250-253" />
<code_context>
     monkeypatch.setattr(client, "_get_client", lambda **_: fake_client)

-    message, tool_calls, stream_iter, reasoning = await client.create(
+    message, tool_calls, stream_iter, reasoning, usage = await client.create(
         model_name="claude-sonnet-4-6",
         payloads=[LLMPayload(ROLE.USER, Text("hello")), LLMPayload(ROLE.TOOL, MockTool)],
         tools=[],
</code_context>
<issue_to_address>
**suggestion (testing):** The updated Anthropics client tests don’t assert anything about the new usage return value

Since `AnthropicChatClient.create` now returns `usage`, these tests only unpack it without verifying anything. To exercise `_extract_anthropic_usage` in both non-stream and stream paths, consider making `fake_client.messages.create` return an object with a `usage` field and assert that it has the expected token and cache fields plus `completion_includes_reasoning=True`. For the streaming case, you could also assert that `stream_iter` yields an event with a non-`None` `.usage` matching the merged `message_start` and `message_delta` usage.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/kernel/llm/model_client/anthropic_client.py Outdated
Comment thread src/kernel/llm/model_client/anthropic_client.py
Comment thread test/kernel/llm/test_response.py
Comment thread test/kernel/llm/test_anthropic_client.py
feat: 在 AnthropicChatClient 中提取 usage 时支持默认值,增强流式处理中的 token 统计
test: 增加对 usage 返回值的验证,确保正确提取和计算 token 数量
@Windpicker-owo Windpicker-owo merged commit bc43cbb into main Jun 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant