Skip to content

1.2.0-rc.2#71

Merged
Windpicker-owo merged 9 commits into
mainfrom
dev
Jun 19, 2026
Merged

1.2.0-rc.2#71
Windpicker-owo merged 9 commits into
mainfrom
dev

Conversation

@Windpicker-owo

@Windpicker-owo Windpicker-owo commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary by Sourcery

Add robust streaming resilience, richer reasoning/usage handling, and improved shutdown/authentication behavior for LLM and runtime components.

New Features:

  • Introduce automatic retry and request reconstruction for LLM streaming responses across multiple consumption methods.
  • Expose streaming finish/stop reasons and redacted reasoning metadata in StreamEvent and propagate them through reducers and responses.
  • Add WebSocket token verification sharing the same API-key validation logic as HTTP endpoints.

Bug Fixes:

  • Ensure OpenAI usage extraction ignores non-usage-like mock objects and coerces numeric fields safely.
  • Avoid synthesizing fake thinking blocks and tool names in Anthropic message conversion, and properly carry native thinking parameters.
  • Preserve and backfill reasoning_content history only when explicitly allowed via reasoning_history_mode to avoid polluting OpenAI-compatible requests.
  • Fix tool_result payload ordering in context so assistant responses are appended before tool results, preventing context validation failures.
  • Correct handling of stream_options so they remain a top-level OpenAI parameter and default to include_usage in streaming mode.
  • Fix potential misuse of Anthropic streaming stop_reason and ensure stop_reason is reflected in aggregated stream results.
  • Handle empty or missing API keys more safely in logging when validation fails.

Enhancements:

  • Track LLMResponse stop reasons and reasoning-only states to distinguish visible vs. internal model output.
  • Improve OpenAI thinking detection to honor reasoning_effort and nested reasoning.effort flags.
  • Normalize OpenAI tool call IDs in streaming (synthesizing stable IDs when providers only send indexes) and ensure async stream closing is awaited.
  • Tighten Anthropic streaming handling to emit thinking and redacted_thinking blocks with associated redacted data and usage.
  • Refine DNS/threadpool management in the bot runtime by introducing a dedicated DNS executor, restoring patched DNS functions, and explicitly shutting down executors and lingering asyncio tasks on shutdown.
  • Align schema normalization tests with shared helpers and broaden reasoning history tests for different providers and modes.

Build:

  • Bump project and core versions to 1.2.0-rc.2.

Tests:

  • Add extensive streaming tests for the OpenAI client covering finish reasons, tool-call ID synthesis, stream_options, and reducer behavior.
  • Extend Anthropic client streaming tests to validate stop_reason propagation, redacted thinking round-tripping, and usage aggregation.
  • Add tests for reasoning enablement flags, reasoning history modes, reasoning-only response state, non-streaming usage exposure, and WebSocket/HTTP behavior where applicable.

Windpicker-owo and others added 9 commits June 17, 2026 14:17
- 在 `StreamEvent` 中新增 `reasoning_redacted_data`、`finish_reason` 和 `stop_reason` 字段,以优化事件处理逻辑。
- 更新 `_thinking_enabled` 函数,使其能够识别包括 `reasoning_effort` 在内的新推理参数。
- 引入 `_coerce_usage_number` 和 `_looks_like_usage_obj` 辅助函数,以改进用量数据的处理机制。
- 增强 `_extract_usage_from_obj` 功能,标准化用量字段并支持推理 Token 的处理。
- 修改 `OpenAIChatClient`,以管理推理历史模式并确保推理内容得到正确处理。
- 新增测试用例,覆盖仅推理状态、推理历史模式及流式行为。
- 优化流式场景下工具调用 ID 的处理逻辑,确保在不同服务提供商间行为的一致性。
…bSocket-auth-token-check

feat: 增强安全验证模块,支持 WebSocket 连接的鉴权令牌校验
@sourcery-ai

sourcery-ai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Introduce robust streaming retry support, richer reasoning/usage propagation, and safer shutdown/auth handling across the LLM stack, while tightening OpenAI/Anthropic client behavior and expanding test coverage for streaming, reasoning, and security paths.

File-Level Changes

Change Details Files
Add automatic, configurable retry handling for streaming LLM responses, including propagation of finish/stop reasons and reasoning metadata through reducers and response objects.
  • Introduce _is_retryable_stream_error and helper methods on LLMResponse to read retry configuration from model_set, reset state, and resend requests for streaming paths.
  • Wrap stream_events, _collect_full_response, stream_with_callback, stream_events_with_callback, and stream_with_buffer in retry loops that rebuild the request, log warnings, and resume streaming after transient failures.
  • Extend StreamEvent, LLMStreamReducer, StreamReductionResult, and LLMResponse to carry finish_reason/stop_reason and redacted reasoning metadata end-to-end.
  • Ensure original payloads are preserved on LLMResponse for use during stream retries and refine _apply_stream_result to update stop_reason and reasoning fields.
src/kernel/llm/response.py
src/kernel/llm/stream_state.py
src/kernel/llm/model_client/base.py
src/kernel/llm/request_execution.py
test/kernel/llm/test_llm_response.py
test/kernel/llm/test_response_advanced.py
test/kernel/llm/test_response.py
test/kernel/llm/test_openai_client_streaming.py
Refine OpenAI client behavior for reasoning/thinking, usage extraction, streaming tool-calls, and parameter shaping, with corresponding tests.
  • Broaden _thinking_enabled to detect multiple reasoning-related knobs and gate reasoning history backfill on an explicit reasoning_history_mode flag.
  • Normalize and validate usage objects via _looks_like_usage_obj and _coerce_usage_number to avoid bogus token stats from mocks or non-numeric fields.
  • Strip or backfill assistant reasoning_content based on reasoning_history_mode (none/deepseek/kimi/auto) and model; ensure stream_options stay a top-level param and include usage by default for streams.
  • Improve streaming mapping: emit finish_reason events, synthesize stable tool_call ids when providers omit them, and correctly handle async/sync close of underlying streams.
src/kernel/llm/model_client/openai_client.py
test/kernel/llm/test_openai_client.py
test/kernel/llm/test_openai_client_streaming.py
Tighten Anthropic client message conversion and streaming, including native thinking param support, redacted thinking propagation, and stop_reason emission.
  • Stop synthesizing fake thinking blocks after tool_result and stop sending tool_name on tool_result content blocks.
  • Expose Anthropic-native thinking parameters directly on the request instead of tunneling via extra_params and avoid overriding them when temperature is set.
  • Enhance streaming iterator to emit usage+stop_reason on message_delta and to translate redacted_thinking blocks (with their data) into StreamEvent reasoning metadata.
  • Extend tests to cover redacted_thinking round-trip, stop_reason propagation, and removal of synthesized thinking behavior.
src/kernel/llm/model_client/anthropic_client.py
test/kernel/llm/test_anthropic_client.py
Improve security and runtime lifecycle: shared API key validation for HTTP and WebSocket, DNS-threadpool lifecycle management, and graceful shutdown of background loops and tasks.
  • Factor API key validation into _validate_api_key and reuse it in a new verify_websocket_token helper that authenticates WebSocket connections and closes them with policy-violation codes on failure.
  • Track DNS-specific ThreadPoolExecutor and original loop getaddrinfo/getnameinfo in Bot, restoring and shutting them (and the default executor) during shutdown to avoid leaks.
  • Stop StreamLoopManager on shutdown and cancel/await all remaining asyncio tasks before reporting shutdown completion.
src/core/utils/security/__init__.py
src/app/runtime/bot.py
Adjust configuration and tests to align with the new version and behavior.
  • Bump project and core versions from 1.2.0-rc.1 to 1.2.0-rc.2.
  • Update tests to import shared schema normalization helpers, assert usage presence/absence, and rename/clarify certain behavior expectations for thinking backfill and tool_result ordering.
pyproject.toml
src/core/config/core_config.py
test/kernel/llm/test_openai_client.py
test/kernel/llm/test_llm_response.py
test/kernel/llm/test_anthropic_client.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@Windpicker-owo Windpicker-owo merged commit 04b13de into main Jun 19, 2026
3 checks passed

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In OpenAIChatClient.create, the new reasoning_history_mode handling appears inconsistent with the tests and docstrings: you default it to True and only enable reasoning history when == True, but the new tests expect the default mode to strip reasoning_content and string modes like "deepseek"/"kimi" to enable it; consider changing the default and checking for explicit modes/truthiness instead of == True so the behavior matches the intended modes.
  • In request_execution.execute_request, resp._original_payloads = list(trimmed_payloads) is assigned twice in a row with the same value; this looks like an accidental duplication and can be safely reduced to a single assignment.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `OpenAIChatClient.create`, the new `reasoning_history_mode` handling appears inconsistent with the tests and docstrings: you default it to `True` and only enable reasoning history when `== True`, but the new tests expect the default mode to strip `reasoning_content` and string modes like `"deepseek"`/`"kimi"` to enable it; consider changing the default and checking for explicit modes/truthiness instead of `== True` so the behavior matches the intended modes.
- In `request_execution.execute_request`, `resp._original_payloads = list(trimmed_payloads)` is assigned twice in a row with the same value; this looks like an accidental duplication and can be safely reduced to a single assignment.

## Individual Comments

### Comment 1
<location path="src/kernel/llm/response.py" line_range="556-562" />
<code_context>
-        except Exception as exc:
-            stream_error = exc
-
-        if buffer:
-            yield "".join(buffer)
-
-        self._apply_stream_result(reducer.finalize(stream_error))
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Flushing the buffer before deciding to retry can cause partial, failed-attempt chunks to be emitted.

In `stream_with_buffer`, the `buffer` is flushed even when an exception occurs, so consumers can see a final partial chunk from a failed attempt followed by overlapping content after a retry. To keep retries transparent, only flush the buffer when `stream_error is None`; on error, discard the partial buffer and let the retry re-emit content. At least, separate the success path from the error+retry path so failed attempts never emit a trailing chunk.

```suggestion
        except Exception as exc:
            stream_error = exc

        # Only flush the buffer when the stream completed successfully.
        # On error, discard any partial buffered content so retries remain transparent.
        if stream_error is None:
            if buffer:
                yield "".join(buffer)
                buffer.clear()
        else:
            buffer.clear()

        self._apply_stream_result(reducer.finalize(stream_error))
```
</issue_to_address>

### Comment 2
<location path="src/kernel/llm/model_client/openai_client.py" line_range="837-840" />
<code_context>
         # force_sync_http 已废弃,移除后不传给 API
         extra_params.pop("force_sync_http", None)
+        # 控制是否向兼容供应商发送 reasoning_content 历史字段
+        reasoning_history_mode = extra_params.pop("reasoning_history_mode", True)

         client = self._get_client(
</code_context>
<issue_to_address>
**suggestion:** The equality check for `reasoning_history_mode` is very strict and may ignore truthy values like 'true' or 1.

`allow_reasoning_history` will only be enabled when `reasoning_history_mode` is exactly `True`. Common truthy values like `'true'`, `'on'`, or `1` will be treated as `False`. If this is a config/CLI/env-style flag, consider a more permissive check (e.g. casting to `bool` or normalizing strings). If you really require a strict boolean, it would be better to validate the type and fail fast on non-bool values instead of silently treating them as `False`.

Suggested implementation:

```python
        # 控制是否向兼容供应商发送 reasoning_content 历史字段
        # 接受常见 truthy 值(True, 1, "true", "on", "yes" 等),避免因为严格等于 True 而忽略配置/CLI/env 传入的值
        raw_reasoning_history_mode = extra_params.pop("reasoning_history_mode", True)
        if isinstance(raw_reasoning_history_mode, str):
            reasoning_history_mode = raw_reasoning_history_mode.strip().lower() in {
                "1",
                "true",
                "yes",
                "y",
                "on",
                "t",
            }
        else:
            reasoning_history_mode = bool(raw_reasoning_history_mode)

```

```python
        # 默认不向 OpenAI-compatible provider 发送非标准 reasoning_content 历史字段,
        # 避免污染请求结构;需要时通过 reasoning_history_mode 显式开启。
        allow_reasoning_history = reasoning_history_mode

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +556 to +562
except Exception as exc:
stream_error = exc

if buffer:
yield "".join(buffer)

self._apply_stream_result(reducer.finalize(stream_error))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Flushing the buffer before deciding to retry can cause partial, failed-attempt chunks to be emitted.

In stream_with_buffer, the buffer is flushed even when an exception occurs, so consumers can see a final partial chunk from a failed attempt followed by overlapping content after a retry. To keep retries transparent, only flush the buffer when stream_error is None; on error, discard the partial buffer and let the retry re-emit content. At least, separate the success path from the error+retry path so failed attempts never emit a trailing chunk.

Suggested change
except Exception as exc:
stream_error = exc
if buffer:
yield "".join(buffer)
self._apply_stream_result(reducer.finalize(stream_error))
except Exception as exc:
stream_error = exc
# Only flush the buffer when the stream completed successfully.
# On error, discard any partial buffered content so retries remain transparent.
if stream_error is None:
if buffer:
yield "".join(buffer)
buffer.clear()
else:
buffer.clear()
self._apply_stream_result(reducer.finalize(stream_error))

Comment on lines +837 to 840
reasoning_history_mode = extra_params.pop("reasoning_history_mode", True)

client = self._get_client(
api_key=api_key,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The equality check for reasoning_history_mode is very strict and may ignore truthy values like 'true' or 1.

allow_reasoning_history will only be enabled when reasoning_history_mode is exactly True. Common truthy values like 'true', 'on', or 1 will be treated as False. If this is a config/CLI/env-style flag, consider a more permissive check (e.g. casting to bool or normalizing strings). If you really require a strict boolean, it would be better to validate the type and fail fast on non-bool values instead of silently treating them as False.

Suggested implementation:

        # 控制是否向兼容供应商发送 reasoning_content 历史字段
        # 接受常见 truthy 值(True, 1, "true", "on", "yes" 等),避免因为严格等于 True 而忽略配置/CLI/env 传入的值
        raw_reasoning_history_mode = extra_params.pop("reasoning_history_mode", True)
        if isinstance(raw_reasoning_history_mode, str):
            reasoning_history_mode = raw_reasoning_history_mode.strip().lower() in {
                "1",
                "true",
                "yes",
                "y",
                "on",
                "t",
            }
        else:
            reasoning_history_mode = bool(raw_reasoning_history_mode)
        # 默认不向 OpenAI-compatible provider 发送非标准 reasoning_content 历史字段,
        # 避免污染请求结构;需要时通过 reasoning_history_mode 显式开启。
        allow_reasoning_history = reasoning_history_mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants