Skip to content

feat(tool_parser): add DeepSeek V3.2 DSML tool call parser#1030

Open
key4ng wants to merge 11 commits intomainfrom
keyang/deepseek_3_2_tool_call
Open

feat(tool_parser): add DeepSeek V3.2 DSML tool call parser#1030
key4ng wants to merge 11 commits intomainfrom
keyang/deepseek_3_2_tool_call

Conversation

@key4ng
Copy link
Copy Markdown
Collaborator

@key4ng key4ng commented Apr 3, 2026

Description

Problem

DeepSeek V3.2 introduces a new XML-like DSML format for tool calls, replacing the special-token approach used in V3/V3.1. The gateway has no parser for this format, so V3.2 models cannot use tool calling through the gRPC streaming path.

Solution

Add a new DeepSeek32Parser that handles the DSML format with incremental streaming support, following the SGLang DeepSeekV32Detector pattern.

Changes

  • New parser: crates/tool_parser/src/parsers/deepseek32.rs — handles DSML format with regex-based parsing
    • Supports both XML parameter tags (<|DSML|parameter>) and direct JSON fallback inside invoke blocks
    • Type-aware argument reconstruction: string="true" → string, string="false" → parsed JSON
    • Incremental streaming with argument diffing
    • Partial DSML prefix detection to avoid flushing incomplete tags
  • Factory registration: deepseek32 parser with model mappings
    • deepseek-v3.2* / deepseek-ai/DeepSeek-V3.2*deepseek32 (DSML format)
    • deepseek-v3.2-exp* / deepseek-ai/DeepSeek-V3.2-Exp*deepseek31 (V3.2-Exp uses V3.1 format)
  • 13 integration tests: complete parsing, streaming, factory registration

Test Plan

cargo test -p tool-parser --test tool_parser_deepseek32
# 13 passed; 0 failed

cargo test -p tool-parser
# 347 passed; 0 failed (no regressions)

cargo clippy -p tool-parser --all-targets --all-features -- -D warnings
# clean
Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • New Features

    • Added support for DeepSeek V3.2 model format, enabling extraction of DSML tool calls and streaming of tool names plus incremental argument deltas.
    • Model detection updated so V3.2 variants use the new parser while V3.2-Exp variants continue to use prior V3.1-format handling.
  • Tests

    • Added comprehensive tests for complete and incremental/streaming parsing, multiple invokes, parameter types, nested JSON bodies, edge cases, and model-to-parser mappings.

@github-actions github-actions bot added tests Test changes tool-parser Tool/function call parser changes labels Apr 3, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new DeepSeek V3.2 parser (DeepSeek32Parser) with complete and incremental DSML parsing, registers and maps it in the parser factory for V3.2 model patterns, re-exports the parser, and adds integration tests covering parsing and factory resolution.

Changes

Cohort / File(s) Summary
Parser Implementation
crates/tool_parser/src/parsers/deepseek32.rs
New DeepSeek32Parser implementing ToolParser: DSML detection, complete parsing, incremental/streaming parsing with buffering/state, JSON argument reconstruction, partial-parameter handling, and reset/helpers.
Parsers Module & Re-exports
crates/tool_parser/src/parsers/mod.rs, crates/tool_parser/src/lib.rs
Added deepseek32 module and publicly re-exported DeepSeek32Parser.
Factory & Model Mapping
crates/tool_parser/src/factory.rs
Registered "deepseek32" in ParserFactory::new(); mapped deepseek-v3.2* and deepseek-ai/DeepSeek-V3.2* to it; mapped deepseek-v3.2-exp* / deepseek-ai/DeepSeek-V3.2-Exp* to the existing "deepseek31" parser.
Tests
crates/tool_parser/tests/tool_parser_deepseek32.rs
New tests for DeepSeek32Parser and factory mappings: complete parsing (XML-like params, JSON payloads, mixed/nested types), incremental streaming across chunks, marker detection, and model→parser resolution.
Factory Re-exports/Registration File
crates/tool_parser/src/factory.rs, crates/tool_parser/src/lib.rs
Updated imports/registrations and public re-exports to include DeepSeek32Parser.

Sequence Diagram(s)

(Skipped)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • slin1237
  • CatherineSue

Poem

🐇 I nibble DSML crumbs beneath the moon,
Invokes turned to JSON, tidy and soon,
Chunks hop and settle into my nest,
Names then args stream out at my best,
V3.2 carrots—what a cozy fest!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the main change: adding a DeepSeek V3.2 DSML tool call parser. It accurately reflects the primary objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch keyang/deepseek_3_2_tool_call

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1cc2419261

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the DeepSeek32Parser to support the DeepSeek V3.2 DSML tool-calling format, providing both complete and incremental parsing. The parser is integrated into the ParserFactory with mappings for V3.2 and V3.2-Exp models, and integration tests are included. Feedback suggests improving the robustness of parameter parsing and using warning-level logging for invalid tool names.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Line 313: Remove the explicit drop(captures); statement in deepseek32.rs: the
local variable captures (which borrows buf_snapshot) will go out of scope
naturally, so delete the drop call and ensure there are no further references to
captures after its intended use (verify the surrounding code in the function
where captures and buf_snapshot are used).
- Around line 370-398: The argument diff logic in argument_diff can be
simplified: when is_complete is false and you have a prev_args (from
self.prev_tool_call_arr) and DSML parameters only ever accumulate, replace the
find_common_prefix-based branching inside the else-if that checks let Some(prev)
= &prev_args with a direct slice from sent_len into current_args (i.e., treat
the new content as current_args[sent_len..].to_string()); this removes the
prefix computation while preserving behavior for monotonic accumulation—keep the
existing handling for the is_complete branch and the None cases, and only change
the block that currently calls helpers::find_common_prefix and compares
prefix.len() to sent_len.
- Around line 316-324: The invalid-tool branch currently breaks leaving parser
state stale and preventing processing of remaining invokes; change it to follow
the pattern used in other parsers: when func_name is invalid and is_complete is
true, advance self.buffer via match_end (if Some(end)) and then reset parser
state (clear streamed_args_for_tool and set current_tool_name_sent = false) and
continue the loop instead of break; when func_name is invalid and is_complete is
false, reset the same state (clear streamed_args_for_tool and set
current_tool_name_sent = false) and return/exit early so partial invokes are
dropped and state is clean for the next chunk.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6a76df48-375a-4a91-898f-0a879f03acd3

📥 Commits

Reviewing files that changed from the base of the PR and between f6beb69 and 1cc2419.

📒 Files selected for processing (5)
  • crates/tool_parser/src/factory.rs
  • crates/tool_parser/src/lib.rs
  • crates/tool_parser/src/parsers/deepseek32.rs
  • crates/tool_parser/src/parsers/mod.rs
  • crates/tool_parser/tests/tool_parser_deepseek32.rs

key4ng added 3 commits April 2, 2026 18:46
…r stripping

Signed-off-by: key4ng <rukeyang@gmail.com>
…ation

Signed-off-by: key4ng <rukeyang@gmail.com>
…reaking

Signed-off-by: key4ng <rukeyang@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c7c1c2f6c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 336-381: The bug is that prev_tool_call_arr stores "arguments" as
an object but the code expects a string, so prev_args becomes None and the first
partial chunk is dropped; update the logic in the block computing argument_diff
(around parse_parameters_from_dsml, streamed_args_for_tool, prev_tool_call_arr
and helpers::find_common_prefix) to treat a missing/non-string previous
arguments as an empty string (or initialize "arguments" as an empty string when
setting prev_tool_call_arr) and then compute the diff from sent_len (i.e., if
prev_args is None treat prev = "" and emit current_args[sent_len..] when
!is_complete or when appropriate), ensuring the first partial arguments are
returned instead of None.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 574ade0e-dcfd-4409-a2d4-abea50e43965

📥 Commits

Reviewing files that changed from the base of the PR and between 1cc2419 and 2c7c1c2.

📒 Files selected for processing (1)
  • crates/tool_parser/src/parsers/deepseek32.rs

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cdff14236c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: key4ng <rukeyang@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/tool_parser/src/parsers/deepseek32.rs (1)

346-349: ⚠️ Potential issue | 🟠 Major

Emit the first partial argument delta.

Line 348 seeds arguments with a non-string value, but Lines 363-367 only recover previous arguments via .as_str(). On the first incomplete invoke, prev_args is therefore missing, so Lines 389-390 return None and the stream emits the tool name without any initial parameter bytes.

🛠️ Possible fix
-            let prev_args = if tool_id < self.prev_tool_call_arr.len() {
-                self.prev_tool_call_arr[tool_id]
-                    .get("arguments")
-                    .and_then(|v| v.as_str())
-                    .map(|s| s.to_string())
-            } else {
-                None
-            };
+            let prev_args = if tool_id < self.prev_tool_call_arr.len() {
+                self.prev_tool_call_arr[tool_id]
+                    .get("arguments")
+                    .and_then(|v| v.as_str())
+                    .unwrap_or_default()
+                    .to_string()
+            } else {
+                String::new()
+            };
 
             let argument_diff = if is_complete {
                 if sent_len < current_args.len() {
                     Some(current_args[sent_len..].to_string())
                 } else {
                     Some(String::new())
                 }
-            } else if let Some(prev) = &prev_args {
-                if current_args == *prev {
+            } else if prev_args.is_empty() {
+                if sent_len < current_args.len() {
+                    Some(current_args[sent_len..].to_string())
+                } else {
+                    None
+                }
+            } else if current_args == prev_args {
                     None
                 } else {
-                    let prefix = helpers::find_common_prefix(prev, &current_args);
+                    let prefix = helpers::find_common_prefix(&prev_args, &current_args);
                     if prefix.len() > sent_len {
                         Some(prefix[sent_len..].to_string())
                     } else {
                         None
                     }
                 }
-            } else {
-                None
             };

Also applies to: 363-390

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tool_parser/src/parsers/deepseek32.rs` around lines 346 - 349, The
code seeds prev_tool_call_arr[tool_id] with "arguments": {} (in deepseek32
parser) but later logic expects a string via .as_str(), so on the first partial
invoke prev_args is missing and no initial parameter bytes are emitted; fix by
initializing the seeded value to an empty string (e.g., set "arguments" to ""
instead of {}) or alternatively update the prev_args recovery (where .as_str()
is used) to handle non-string JSON by calling .as_str().unwrap_or_else(||
json_value.to_string().as_str()) or using to_string()/unwrap_or_default() so the
first partial argument delta is emitted; modify the code around
prev_tool_call_arr, tool_id and the prev_args extraction to ensure arguments are
a string when consumed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 244-270: The partial-marker detection in the has_partial_prefix
check is too narrow (only '<', '<|', '</', '</|') causing longer truncated
tokens (e.g. '<|DSML' or '<|DSML|fun') to be treated as normal_text; update the
computation of has_partial_prefix in the parsing function (the variables
current_text, has_dsml, has_partial_prefix and the early-return branch that
yields StreamingParseResult::default()) so it detects any trailing incomplete
tag: find the last '<' (or "</") in current_text and treat it as a partial
prefix if there is no matching closing '>' after it (or use a regex like
r"</?[^>]*$" to detect an unterminated tag); this will ensure such longer
partial DSML fragments are buffered instead of flushed as normal_text.

---

Duplicate comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 346-349: The code seeds prev_tool_call_arr[tool_id] with
"arguments": {} (in deepseek32 parser) but later logic expects a string via
.as_str(), so on the first partial invoke prev_args is missing and no initial
parameter bytes are emitted; fix by initializing the seeded value to an empty
string (e.g., set "arguments" to "" instead of {}) or alternatively update the
prev_args recovery (where .as_str() is used) to handle non-string JSON by
calling .as_str().unwrap_or_else(|| json_value.to_string().as_str()) or using
to_string()/unwrap_or_default() so the first partial argument delta is emitted;
modify the code around prev_tool_call_arr, tool_id and the prev_args extraction
to ensure arguments are a string when consumed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 09577cc6-67fa-45f1-936b-c32a8acefbd2

📥 Commits

Reviewing files that changed from the base of the PR and between 2c7c1c2 and cdff142.

📒 Files selected for processing (1)
  • crates/tool_parser/src/parsers/deepseek32.rs

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7fa9333830

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
crates/tool_parser/src/parsers/deepseek32.rs (2)

348-351: ⚠️ Potential issue | 🟠 Major

Still dropping the first partial argument delta.

prev_tool_call_arr[tool_id]["arguments"] is initialized as {}, but Lines 365-369 only read strings. On the first incomplete invoke, prev_args becomes None, so the !is_complete path returns None and no parameters are streamed until a later chunk. Initialize "arguments" as "" or treat a non-string previous value as empty.

🛠️ Minimal fix
                 self.prev_tool_call_arr[tool_id] = serde_json::json!({
                     "name": func_name,
-                    "arguments": {},
+                    "arguments": "",
                 });
...
-            } else {
-                None
+            } else if sent_len < current_args.len() {
+                Some(current_args[sent_len..].to_string())
+            } else {
+                None
             };

Also applies to: 365-392

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tool_parser/src/parsers/deepseek32.rs` around lines 348 - 351, The
current initialization of self.prev_tool_call_arr[tool_id]["arguments"] as an
object causes the first partial argument delta to be dropped because prev_args
(read elsewhere in deepseek32.rs around the prev_args / is_complete logic)
expects a string; change the initialization in the prev_tool_call_arr entry for
func_name to set "arguments" to an empty string "" (or alternatively update the
code that reads prev_args to treat non-string values as empty string) so that
the first incomplete chunk is appended/streamed correctly; update any logic that
merges incoming argument chunks (the code paths around prev_args and
is_complete) to handle and coerce non-string previous values to "" before
concatenation.

246-272: ⚠️ Potential issue | 🟠 Major

Longer truncated DSML prefixes are still flushed as normal text.

Lines 249-252 only preserve <, <|, </, and </|. A chunk ending with <|DSML or <|DSML|inv still reaches Line 254 and gets emitted as normal_text, so the next chunk can never reconstruct the tag. Buffer any trailing unterminated <... fragment instead of hard-coding four suffixes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tool_parser/src/parsers/deepseek32.rs` around lines 246 - 272, The
code currently only treats four very short suffixes as partial DSML prefixes
(has_partial_prefix) so longer truncated tags like "<|DSML" get flushed as
normal_text; change has_partial_prefix to detect any trailing unterminated '<'
fragment by finding the last '<' in current_text and checking if there is no
corresponding '>' after it (i.e., an open tag that runs to the end of the
chunk), and if so treat that suffix as a partial prefix; when producing
normal_text (in the branch that strips end tokens and returns
StreamingParseResult), remove that trailing unterminated fragment from
normal_text and put it back into self.buffer so the fragment is preserved for
the next chunk; update uses of has_partial_prefix, current_text, buffer, and
StreamingParseResult accordingly instead of hard-coding the four suffixes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 51-67: strip_dsml_trailing currently trims by character set and
can remove legitimate argument characters; change it to only remove an actual
trailing substring that is a prefix of the full DSML closing tag. For each
fragment group (DSML_PARAM_END_FRAGMENTS and DSML_INVOKE_END_FRAGMENTS) build
the full closing string by concatenating the fragments (e.g.
"</|DSML|parameter>"), then for the input string find the longest k>0 such that
result.ends_with(&full[..k]) and remove exactly that suffix (no per-character
trimming). Update strip_dsml_trailing to perform this suffix-prefix check and
removal using the concatenated full tag rather than fragment.contains(c); keep
references to DSML_PARAM_END_FRAGMENTS and DSML_INVOKE_END_FRAGMENTS and the
function name strip_dsml_trailing for locating the change.

---

Duplicate comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 348-351: The current initialization of
self.prev_tool_call_arr[tool_id]["arguments"] as an object causes the first
partial argument delta to be dropped because prev_args (read elsewhere in
deepseek32.rs around the prev_args / is_complete logic) expects a string; change
the initialization in the prev_tool_call_arr entry for func_name to set
"arguments" to an empty string "" (or alternatively update the code that reads
prev_args to treat non-string values as empty string) so that the first
incomplete chunk is appended/streamed correctly; update any logic that merges
incoming argument chunks (the code paths around prev_args and is_complete) to
handle and coerce non-string previous values to "" before concatenation.
- Around line 246-272: The code currently only treats four very short suffixes
as partial DSML prefixes (has_partial_prefix) so longer truncated tags like
"<|DSML" get flushed as normal_text; change has_partial_prefix to detect any
trailing unterminated '<' fragment by finding the last '<' in current_text and
checking if there is no corresponding '>' after it (i.e., an open tag that runs
to the end of the chunk), and if so treat that suffix as a partial prefix; when
producing normal_text (in the branch that strips end tokens and returns
StreamingParseResult), remove that trailing unterminated fragment from
normal_text and put it back into self.buffer so the fragment is preserved for
the next chunk; update uses of has_partial_prefix, current_text, buffer, and
StreamingParseResult accordingly instead of hard-coding the four suffixes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 08b90a6f-e740-4a73-b664-9be71109e9f4

📥 Commits

Reviewing files that changed from the base of the PR and between cdff142 and 7fa9333.

📒 Files selected for processing (1)
  • crates/tool_parser/src/parsers/deepseek32.rs

key4ng added 2 commits April 6, 2026 13:39
…m DSML block

Signed-off-by: key4ng <rukeyang@gmail.com>
…id invoke abort

Signed-off-by: key4ng <rukeyang@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf8518dd3b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@key4ng
Copy link
Copy Markdown
Collaborator Author

key4ng commented Apr 6, 2026

SGLang Alignment Review (Corrected)

Replaces the previous comparison after fact-checking against the actual SGLang source.

This parser follows SGLang's DeepSeekV32Detector implementation with targeted improvements.

Regex Patterns
Purpose SGLang Ours Match?
function_calls block (.*?) with re.DOTALL Same with (?s) inline Yes
Complete invoke Reuses streaming regex (with |$ fallback) Dedicated stricter regex (no |$) Ours is stricter
Complete parameter string="([^"]+)" string="(true|false)" Stricter per spec
Partial parameter string="([^"]+)" string="(true|false)" Stricter per spec
Invoke (streaming) (.*?)(end_tag|$) Same Yes
DSML Fragment Stripping
Aspect SGLang Ours Match?
Param fragments ["</", "|DSML|", "parameter"] Same Yes
Invoke fragments ["</", "|DSML|", "inv", "oke"] Same Yes
Stripping method rstrip(chars) in reverse trim_end_matches in reverse Equivalent
Where applied (params) Strip remaining_content BEFORE partial regex Same Yes
Where applied (JSON) Strip invoke-end from direct JSON Same Yes
Parameter Parsing
Aspect SGLang Ours Match?
Direct JSON detection starts_with("{") Same Yes
Direct JSON partial Strip invoke-end, return Same Yes
Direct JSON complete Check ends_with("}"), return Same Yes
string="true" value.strip() (always trims whitespace) Value::String(value) (no trim in complete path) Minor diff — SGLang trims
string="false" json.loads with fallback serde_json::from_str with fallback Yes
Partial: strip before regex Strip remaining[last_match_end:], then regex Same Yes
Partial: incomplete JSON _partial_json_loads dependency serde_json::from_str with string fallback Functional gap for partial non-string values; diff algorithm compensates
Return format json.dumps(parameters) serde_json::to_string Yes
Streaming Logic
Aspect SGLang Ours Match?
Buffer accumulation self._buffer += new_text self.buffer.push_str(chunk) Yes
DSML marker detection bot_token or <|DSML|invoke Same Yes
Non-DSML flush Strip eot_token, invoke_end_token Strip 4 end tokens Yes
Invoke loop while True + re.search loop + captures Yes
Complete detection bool(group(3)) .is_some_and() Yes
Tool name emit ToolCallItem(name=func_name) Same Yes
Arg parsing call allow_partial=not is_tool_end Same Yes
Diff (complete) current_params[sent_len:] Same Yes
Diff (partial, has prev) _find_common_prefix + > sent_len Same Yes
Diff (partial, no prev) Falls through — no emission Emits from sent_len Ours is better
Update prev state {"name": ..., "arguments": ...} Same Yes
Complete: advance buffer self._buffer = text[match.end():] Same Yes
Complete: advance tool_id += 1, reset, continue Same Yes
Partial: break break Same Yes
Tool name validation No validation — all names forwarded as-is Validates against tool_indices, skips invalid Ours is better
State Management
Field SGLang Ours Match?
Buffer self._buffer self.buffer Yes
Tool index self.current_tool_id (starts -1) Same Yes
Name sent flag self.current_tool_name_sent Same Yes
Previous tool calls self.prev_tool_call_arr (list of dicts) Same (Vec of Value) Yes
Streamed args self.streamed_args_for_tool (list of strings) Same Yes

Improvements Over SGLang

Area SGLang Ours
First partial args Silently dropped (prev_args is None) Emits from sent_len — fixes one-chunk delay
Tool name validation No validation — invalid names forwarded to client Validates against tools list; skips invalid invokes
Complete invoke regex Reuses streaming regex with end-of-string fallback Dedicated stricter regex for parse_complete
string attribute regex Accepts any quoted value Strict true or false only, per official spec

Acceptable Differences

  • Partial JSON for string="false": SGLang uses _partial_json_loads dependency; we use serde_json::from_str with string fallback. The diff algorithm (common-prefix) handles structure changes safely when the closing tag arrives.
  • potentially_dsml mid-buffer check: SGLang checks for |DSML| substring anywhere; we only check ends_with. DSML tokens arrive atomically from the tokenizer.
  • Whitespace trimming on string="true" values: SGLang calls .strip() on complete parameter values; our complete path does not. Partial path does trim.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
crates/tool_parser/src/parsers/deepseek32.rs (2)

57-67: ⚠️ Potential issue | 🟠 Major

strip_dsml_trailing still removes real argument bytes.

Line 64 trims by character set, not by DSML suffix. A partial value like bar becomes b, and direct-JSON fragments can also lose legitimate trailing bytes before the diffing code sees them. Strip only the longest trailing substring that is actually a prefix of the closing tag.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tool_parser/src/parsers/deepseek32.rs` around lines 57 - 67,
strip_dsml_trailing currently uses trim_end_matches with a character predicate
and thus deletes any trailing characters that appear anywhere in a fragment;
instead detect and remove full DSML closing-tag substrings only: in
strip_dsml_trailing, stop using trim_end_matches(|c| ...) and replace with logic
that finds the longest fragment from fragments that is a suffix of the current
result (use ends_with(fragment)) and then chop off that exact fragment (once)
from the end; iterate in reverse or repeatedly as before but always remove
whole-fragment suffixes only so legitimate trailing bytes are not lost.

246-252: ⚠️ Potential issue | 🟠 Major

Buffer any unterminated DSML tag, not just four 1–2 byte suffixes.

Lines 249-252 only preserve <, <|, </, and </|. A chunk ending with <|DSML, <|DSML|function_cal, or <|DSML|inv falls through Line 254 as normal_text, so the next chunk can no longer reconstruct the marker. Detect any trailing <... without a closing > instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tool_parser/src/parsers/deepseek32.rs` around lines 246 - 252, The
current partial-prefix check only matches four specific short suffixes and
misses longer unterminated DSML fragments; update the logic that computes
has_partial_prefix (used alongside has_dsml and current_text in this parser) to
detect any trailing opening tag without a closing '>' instead of only exact
suffixes — e.g., consider the last index of '<' versus the last index of '>' in
current_text and treat as a partial if there's an unmatched '<' (and ensure this
works with the existing self.has_tool_markers check so longer fragments like
"<|DSML" or "<|DSML|function_cal" are buffered rather than emitted as
normal_text).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 378-398: The code currently emits the placeholder "{}" as an
argument diff when prev_args is None (first partial chunk) because
parse_parameters_from_dsml(..., true) returns "{}" for an empty/incomplete
payload; update the first-partial fallback in the argument_diff computation (the
branch that now does `else if sent_len < current_args.len() { /* First partial
chunk */ Some(current_args[sent_len..].to_string()) }`) to only emit when
current_args contains actual payload bytes (e.g., require current_args != "{}"
or current_args.len() > 2) so the placeholder isn't streamed; keep the existing
complete-path (is_complete) behavior so a truly complete empty-object call can
still be sent.

---

Duplicate comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 57-67: strip_dsml_trailing currently uses trim_end_matches with a
character predicate and thus deletes any trailing characters that appear
anywhere in a fragment; instead detect and remove full DSML closing-tag
substrings only: in strip_dsml_trailing, stop using trim_end_matches(|c| ...)
and replace with logic that finds the longest fragment from fragments that is a
suffix of the current result (use ends_with(fragment)) and then chop off that
exact fragment (once) from the end; iterate in reverse or repeatedly as before
but always remove whole-fragment suffixes only so legitimate trailing bytes are
not lost.
- Around line 246-252: The current partial-prefix check only matches four
specific short suffixes and misses longer unterminated DSML fragments; update
the logic that computes has_partial_prefix (used alongside has_dsml and
current_text in this parser) to detect any trailing opening tag without a
closing '>' instead of only exact suffixes — e.g., consider the last index of
'<' versus the last index of '>' in current_text and treat as a partial if
there's an unmatched '<' (and ensure this works with the existing
self.has_tool_markers check so longer fragments like "<|DSML" or
"<|DSML|function_cal" are buffered rather than emitted as normal_text).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bef2d1e5-bd1e-4e34-8949-4c1dbfac94ab

📥 Commits

Reviewing files that changed from the base of the PR and between 7fa9333 and bf8518d.

📒 Files selected for processing (1)
  • crates/tool_parser/src/parsers/deepseek32.rs

@key4ng
Copy link
Copy Markdown
Collaborator Author

key4ng commented Apr 6, 2026

Parser Walkthrough

What the model outputs

DeepSeek V3.2 uses an XML-like "DSML" format for tool calls:

I'll check the weather for you.

<|DSML|function_calls>
<|DSML|invoke name="get_weather">
<|DSML|parameter name="city" string="true">Tokyo</|DSML|parameter>
<|DSML|parameter name="date" string="false">16</|DSML|parameter>
</|DSML|invoke>
</|DSML|function_calls>

The parser turns this into: name: "get_weather", arguments: '{"city":"Tokyo","date":16}'

  • string="true" → JSON string value
  • string="false" → parse as raw JSON (number, bool, array, object)
  • Also handles a fallback format where the model outputs raw JSON inside <invoke> instead of <parameter> tags

Two modes

  • parse_complete — called with the entire model output at once. Extracts normal text before the DSML block, parses all invoke blocks, returns tool calls.
  • parse_incremental — called once per streaming chunk. Accumulates tokens in a buffer, emits tool names and argument diffs incrementally.
DSML Fragment Stripping

During streaming, a chunk may end mid-closing-tag:

chunk 1: ...name="city" string="true">Tokyo</|DSML|para
chunk 2: meter>

The captured value from chunk 1 would be Tokyo</|DSML|para. We strip trailing DSML fragments using character-level right-trimming (same approach as SGLang's rstrip):

Fragments: ["</", "|DSML|", "parameter"]   (applied in reverse)

"Tokyo</|DSML|para"
  → strip chars in "parameter": a,r,a,p  → "Tokyo</|DSML|"
  → strip chars in "|DSML|": |,L,M,S,D,| → "Tokyo</"
  → strip chars in "</": /,<              → "Tokyo"
Argument Reconstruction (parse_parameters_from_dsml)

Converts DSML parameter tags into a JSON arguments string. Two paths:

Direct JSON path: If invoke content starts with {, treat as raw JSON. Strip trailing DSML fragments if streaming.

XML parameter path:

  1. Match all complete <parameter> tags → build a Map<String, Value>
  2. string="true"Value::String("Tokyo")
  3. string="false" → try serde_json::from_str("42")Value::Number(42), fallback to string
  4. If streaming (allow_partial): find text after the last complete parameter, strip DSML fragments, try to match a partial <parameter> tag and add it
  5. Serialize map → {"city":"Tokyo","date":16}
Streaming Engine (parse_incremental)

Phase 1 — Buffer or flush?

Each chunk is appended to the buffer, then:

  • No DSML markers, no partial tag prefix → flush as normal_text
  • Ends with <, <|, </, </| → might be start of DSML tag, buffer and wait
  • Has DSML content → enter the invoke processing loop

Phase 2 — Invoke processing loop

Processes invoke blocks one at a time from the buffer:

Buffer: "<invoke name="search">...complete...</invoke><invoke name="weather">...partial..."
         ├──── complete: process + consume ────┤├──── partial: process + break ────┤

For each invoke match:

  1. Validate tool name against provided tools list. Skip invalid complete invokes, reset on invalid partial.
  2. Emit tool name on first encounter → client knows "get_weather is starting"
  3. Parse current args via parse_parameters_from_dsml(content, allow_partial)
  4. Compute diff against what we've already sent:
    • Complete → send everything from sent_len to end
    • Partial with previous → use find_common_prefix to find stable prefix, send new stable portion
    • Partial without previous (first chunk) → send from sent_len
  5. Advance or wait: complete invoke → slice buffer past it, increment tool_id, continue. Partial → break and wait for more chunks.
End-to-end streaming example

Model calls get_weather(city="Tokyo"), tokens arrive as:

Chunk Action Emitted
"Let me check.\n\n" No DSML → flush normal_text: "Let me check."
"<|DSML|function_calls>\n" Has DSML, no invoke match yet nothing
"<|DSML|invoke name=\"get_weather\">\n" Invoke matched (partial). Emit name name: "get_weather"
"<|DSML|parameter name=\"city\" string=\"true\">" Param tag opened, no value yet nothing
"Tokyo" Partial param value → args = {"city":"Tokyo"} params: '{"city":"Tokyo"}'
"</|DSML|parameter>\n" Param complete, no new diff nothing
"</|DSML|invoke>\n" Invoke COMPLETE. Send remaining diff. Advance tool_id. params: "}"
"</|DSML|function_calls>" No invoke match. End tokens stripped. nothing

Client receives tool call get_weather with arguments {"city":"Tokyo"} streamed incrementally.

key4ng added 2 commits April 8, 2026 11:20
…o prevent delta corruption

Signed-off-by: key4ng <rukeyang@gmail.com>
…ing for DSML fragments

Signed-off-by: key4ng <rukeyang@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09abcf7018

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@key4ng
Copy link
Copy Markdown
Collaborator Author

key4ng commented Apr 8, 2026

E2E Validation: DeepSeek V3.2-Exp

Tested against a live DeepSeek V3.2-Exp (FP8) deployment on 8x H200.
V3.2-Exp uses V3.1 tool call format — auto-detected as deepseek31 parser via factory mapping deepseek-ai/DeepSeek-V3.2-Exp* → deepseek31.

Setup

sglang backend (gRPC mode):

/home/ubuntu/sglang_venv/bin/python -m sglang.launch_server \
  --model-path /raid/models/deepseek-ai/DeepSeek-V3.2-Exp \
  --served-model-name deepseek-ai/DeepSeek-V3.2-Exp \
  --tp 8 --trust-remote-code --port 30000 --grpc-mode

smg router (V3.1 vLLM template, no --tool-call-parser needed):

./target/debug/smg \
  --worker-urls grpc://localhost:30000 \
  --model-path deepseek-ai/DeepSeek-V3.2-Exp \
  --tokenizer-path /raid/models/deepseek-ai/DeepSeek-V3.2-Exp \
  --chat-template e2e_test/fixtures/chat_templates/tool_chat_template_deepseekv31.jinja \
  --port 8080

Run tests:

SMG_BASE_URL=http://localhost:8080 SMG_MODEL=deepseek-ai/DeepSeek-V3.2-Exp \
  python -m pytest e2e_test/chat_completions/test_deepseek32_tool_calling.py \
  -v --tb=short --no-header --rootdir=/tmp --noconftest

Test result

============================= test session starts ==============================
collected 17 items

TestDeepSeek32NonStreaming::test_single_tool_call_required           PASSED
TestDeepSeek32NonStreaming::test_tool_call_arguments_are_valid_json  PASSED
TestDeepSeek32NonStreaming::test_tool_call_has_id                    PASSED
TestDeepSeek32NonStreaming::test_tool_call_finish_reason             PASSED
TestDeepSeek32NonStreaming::test_model_picks_correct_tool            PASSED
TestDeepSeek32NonStreaming::test_parallel_tool_calls                 PASSED
TestDeepSeek32NonStreaming::test_single_tool_call_auto               PASSED
TestDeepSeek32NonStreaming::test_tool_choice_none                    PASSED
TestDeepSeek32NonStreaming::test_usage_stats_present                 PASSED
TestDeepSeek32NonStreaming::test_unicode_in_tool_arguments           PASSED
TestDeepSeek32Streaming::test_streaming_single_tool_call             PASSED
TestDeepSeek32Streaming::test_streaming_arguments_arrive_incrementally PASSED
TestDeepSeek32Streaming::test_streaming_finish_reason                PASSED
TestDeepSeek32Streaming::test_streaming_single_tool_call_auto        PASSED
TestDeepSeek32Streaming::test_streaming_parallel_tool_calls          PASSED
TestDeepSeek32MultiTurn::test_tool_result_followup                   PASSED
TestDeepSeek32MultiTurn::test_tool_result_followup_streaming         PASSED

============================== 17 passed in 9.70s ==============================
Full test file: e2e_test/chat_completions/test_deepseek32_tool_calling.py
"""DeepSeek V3.2 Tool Calling E2E Tests.

Tests for the DeepSeek V3.2 DSML-format tool parser via the SMG gateway.
Tests both non-streaming and streaming modes against a live sglang backend.

IMPORTANT: DeepSeek V3.2 has no built-in Jinja chat template in tokenizer_config.json.
The DSML template must be provided via --chat-template. Without it:
- tool_choice=required works (uses JSON schema constrained decoding, bypasses tool parser)
- tool_choice=auto fails (model output not parsed by deepseek32 DSML parser)

Usage:
    SMG_BASE_URL=http://localhost:8080 pytest \
        e2e_test/chat_completions/test_deepseek32_tool_calling.py -v \
        --rootdir=/tmp --noconftest

Setup:
    # sglang
    python -m sglang.launch_server \
        --model-path /raid/models/deepseek-ai/DeepSeek-V3.2 \
        --served-model-name deepseek-ai/DeepSeek-V3.2 \
        --tp 8 --trust-remote-code --port 30000 --grpc-mode

    # smg (DSML template required for tool_choice=auto tests)
    ./target/debug/smg \
        --worker-urls grpc://localhost:30000 \
        --model-path deepseek-ai/DeepSeek-V3.2 \
        --tokenizer-path /raid/models/deepseek-ai/DeepSeek-V3.2 \
        --tool-call-parser deepseek32 \
        --chat-template e2e_test/fixtures/chat_templates/tool_chat_template_deepseekv32.jinja \
        --port 8080
"""

from __future__ import annotations

import json
import logging
import os

import openai
import pytest

logger = logging.getLogger(__name__)

BASE_URL = os.environ.get("SMG_BASE_URL", "http://localhost:8080")
MODEL = os.environ.get("SMG_MODEL", "deepseek-ai/DeepSeek-V3.2")

# =============================================================================
# Client fixture
# =============================================================================


@pytest.fixture(scope="module")
def client():
    return openai.OpenAI(base_url=f"{BASE_URL}/v1", api_key="dummy")


# =============================================================================
# Tool definitions
# =============================================================================

WEATHER_TOOL = {
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name, e.g. 'San Francisco'",
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit",
                },
            },
            "required": ["location"],
        },
    },
}

SEARCH_TOOL = {
    "type": "function",
    "function": {
        "name": "search",
        "description": "Search for information on the web.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query string",
                },
            },
            "required": ["query"],
        },
    },
}

TRANSLATE_TOOL = {
    "type": "function",
    "function": {
        "name": "translate",
        "description": "Translate text from one language to another.",
        "parameters": {
            "type": "object",
            "properties": {
                "text": {"type": "string", "description": "Text to translate"},
                "target_language": {"type": "string", "description": "Target language code"},
            },
            "required": ["text", "target_language"],
        },
    },
}

ALL_TOOLS = [WEATHER_TOOL, SEARCH_TOOL, TRANSLATE_TOOL]


# =============================================================================
# Helpers
# =============================================================================


def assert_valid_tool_call(tool_call, expected_name=None):
    assert tool_call.function.name, "Tool call must have a function name"
    assert tool_call.function.arguments, "Tool call must have arguments"
    args = json.loads(tool_call.function.arguments)
    assert isinstance(args, dict), "Arguments must be a JSON object"
    if expected_name:
        assert tool_call.function.name == expected_name
    return args


def collect_streaming_tool_calls(stream):
    tool_calls = {}
    chunks_count = 0
    finish_reason = None
    for chunk in stream:
        chunks_count += 1
        delta = chunk.choices[0].delta if chunk.choices else None
        if not delta:
            continue
        if chunk.choices[0].finish_reason:
            finish_reason = chunk.choices[0].finish_reason
        if delta.tool_calls:
            for tc in delta.tool_calls:
                idx = tc.index
                if idx not in tool_calls:
                    tool_calls[idx] = {"name": "", "arguments": ""}
                if tc.function and tc.function.name:
                    tool_calls[idx]["name"] = tc.function.name
                if tc.function and tc.function.arguments:
                    tool_calls[idx]["arguments"] += tc.function.arguments
    return tool_calls, chunks_count, finish_reason


# =============================================================================
# Non-Streaming Tests
# =============================================================================


class TestDeepSeek32NonStreaming:
    """Non-streaming tool call tests for DeepSeek V3.2 DSML parser."""

    def test_single_tool_call_required(self, client):
        """tool_choice=required forces a tool call."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls, "Expected tool calls with tool_choice=required"
        args = assert_valid_tool_call(msg.tool_calls[0], "get_weather")
        assert "location" in args
        logger.info("Tool args: %s", args)

    def test_tool_call_arguments_are_valid_json(self, client):
        """Tool call arguments must be parseable JSON objects."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Search for 'best restaurants in Tokyo'"}],
            tools=[SEARCH_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        args = json.loads(msg.tool_calls[0].function.arguments)
        assert isinstance(args, dict)
        assert "query" in args

    def test_tool_call_has_id(self, client):
        """Each tool call should have a unique ID."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Check weather in Berlin"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        assert msg.tool_calls[0].id, "Tool call should have an ID"
        assert msg.tool_calls[0].type == "function"

    def test_tool_call_finish_reason(self, client):
        """finish_reason should be 'tool_calls' when tools are returned."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Weather in London?"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        if response.choices[0].message.tool_calls:
            assert response.choices[0].finish_reason == "tool_calls"

    def test_model_picks_correct_tool(self, client):
        """With multiple tools, model should pick the right one."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "user", "content": "Translate 'hello' to French. Use the translate tool."}
            ],
            tools=ALL_TOOLS,
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        assert msg.tool_calls[0].function.name == "translate"

    def test_parallel_tool_calls(self, client):
        """Model can return multiple tool calls in a single response."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": (
                        "Do two things: 1) Get weather in Tokyo "
                        "2) Search for 'Tokyo travel guide'. Call both tools in parallel."
                    ),
                }
            ],
            tools=[WEATHER_TOOL, SEARCH_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=1024,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        assert len(msg.tool_calls) >= 2
        names = {tc.function.name for tc in msg.tool_calls}
        assert "get_weather" in names
        assert "search" in names

    def test_single_tool_call_auto(self, client):
        """tool_choice=auto exercises the deepseek32 DSML parser path.

        Unlike tool_choice=required (which uses JSON schema constrained decoding
        and bypasses the parser), auto mode lets the model output freely and
        relies on the deepseek32 parser to detect and parse DSML markers.
        Requires the DSML chat template (--chat-template) to be set.
        """
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": "Use the get_weather tool to check the weather in Tokyo.",
                }
            ],
            tools=[WEATHER_TOOL],
            tool_choice="auto",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls, "Model should call tool when explicitly asked (auto mode)"
        args = assert_valid_tool_call(msg.tool_calls[0], "get_weather")
        assert "location" in args
        logger.info("Auto DSML tool args: %s", args)

    def test_tool_choice_none(self, client):
        """tool_choice=none should prevent tool calls."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "What's the weather in NYC?"}],
            tools=[WEATHER_TOOL],
            tool_choice="none",
            temperature=0,
            max_tokens=256,
        )

        msg = response.choices[0].message
        assert not msg.tool_calls
        assert msg.content

    def test_usage_stats_present(self, client):
        """Response should include usage statistics."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Check weather in NYC"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=256,
        )

        assert response.usage is not None
        assert response.usage.prompt_tokens > 0
        assert response.usage.completion_tokens > 0

    def test_unicode_in_tool_arguments(self, client):
        """Tool arguments with unicode content."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "user", "content": "Translate 'こんにちは' to English using the translate tool."}
            ],
            tools=[TRANSLATE_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        args = json.loads(msg.tool_calls[0].function.arguments)
        assert "text" in args


# =============================================================================
# Streaming Tests
# =============================================================================


class TestDeepSeek32Streaming:
    """Streaming tool call tests for DeepSeek V3.2 DSML parser."""

    def test_streaming_single_tool_call(self, client):
        """Streaming delivers tool call name and arguments across chunks."""
        stream = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
            stream=True,
        )

        tool_calls, chunks_count, finish_reason = collect_streaming_tool_calls(stream)

        assert chunks_count > 1, "Streaming should return multiple chunks"
        assert len(tool_calls) >= 1
        tc = tool_calls[0]
        assert tc["name"] == "get_weather"
        args = json.loads(tc["arguments"])
        assert "location" in args

    def test_streaming_arguments_arrive_incrementally(self, client):
        """Arguments should arrive across multiple chunks."""
        stream = client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "user", "content": "Search for 'comprehensive guide to machine learning'"}
            ],
            tools=[SEARCH_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
            stream=True,
        )

        arg_chunk_count = 0
        for chunk in stream:
            delta = chunk.choices[0].delta if chunk.choices else None
            if delta and delta.tool_calls:
                for tc in delta.tool_calls:
                    if tc.function and tc.function.arguments:
                        arg_chunk_count += 1

        assert arg_chunk_count > 1, f"Expected incremental args, got {arg_chunk_count} chunks"

    def test_streaming_finish_reason(self, client):
        """Streaming should end with a finish_reason."""
        stream = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Weather in London"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=256,
            stream=True,
        )

        _, _, finish_reason = collect_streaming_tool_calls(stream)
        assert finish_reason is not None

    
    def test_streaming_single_tool_call_auto(self, client):
        """Streaming with tool_choice=auto exercises the DSML incremental parser.

        This is the most important streaming test — it validates parse_incremental
        with real DSML token output. All other streaming tests use required mode
        which bypasses the parser via JSON schema constrained decoding.
        """
        stream = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": "Use the get_weather tool to check the weather in Tokyo.",
                }
            ],
            tools=[WEATHER_TOOL],
            tool_choice="auto",
            temperature=0,
            max_tokens=512,
            stream=True,
        )

        tool_calls, chunks_count, _ = collect_streaming_tool_calls(stream)

        assert chunks_count > 1, "Streaming should return multiple chunks"
        assert len(tool_calls) >= 1, "Model should call tool when explicitly asked (auto streaming)"
        tc = tool_calls[0]
        assert tc["name"] == "get_weather"
        args = json.loads(tc["arguments"])
        assert "location" in args
        logger.info("Streaming auto DSML tool args: %s", args)

    def test_streaming_parallel_tool_calls(self, client):
        """Streaming should handle multiple tool calls when model emits them."""
        stream = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": (
                        "Do two things at once: "
                        "1) Get weather in Paris "
                        "2) Search for 'Paris travel tips'. "
                        "You MUST call BOTH get_weather AND search tools in parallel."
                    ),
                }
            ],
            tools=[WEATHER_TOOL, SEARCH_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=1024,
            stream=True,
        )

        tool_calls, _, _ = collect_streaming_tool_calls(stream)

        assert len(tool_calls) >= 1, "Should have at least one streaming tool call"
        for idx, tc in tool_calls.items():
            assert tc["name"], f"Tool call {idx} should have a name"
            args = json.loads(tc["arguments"])
            assert isinstance(args, dict), f"Tool call {idx} args should be valid JSON object"

        names = {tc["name"] for tc in tool_calls.values()}
        logger.info("Streaming parallel tool names: %s (count: %d)", names, len(tool_calls))

        if len(tool_calls) >= 2:
            assert "get_weather" in names
            assert "search" in names


# =============================================================================
# Multi-Turn Tests
# =============================================================================


class TestDeepSeek32MultiTurn:
    """Multi-turn conversations with tool results."""

    def test_tool_result_followup(self, client):
        """Model should use tool result to form a final text response."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        tool_call = msg.tool_calls[0]

        response2 = client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "user", "content": "What's the weather in Tokyo?"},
                {
                    "role": "assistant",
                    "tool_calls": [
                        {
                            "id": tool_call.id,
                            "type": "function",
                            "function": {
                                "name": tool_call.function.name,
                                "arguments": tool_call.function.arguments,
                            },
                        }
                    ],
                },
                {
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(
                        {"temperature": 22, "unit": "celsius", "condition": "sunny"}
                    ),
                },
            ],
            tools=[WEATHER_TOOL],
            temperature=0,
            max_tokens=512,
        )

        msg2 = response2.choices[0].message
        assert msg2.content, "Model should reply with text after receiving tool result"

    def test_tool_result_followup_streaming(self, client):
        """Streaming follow-up with tool result should produce text content."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "What's the weather in Paris?"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=256,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        tool_call = msg.tool_calls[0]

        stream = client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "user", "content": "What's the weather in Paris?"},
                {
                    "role": "assistant",
                    "tool_calls": [
                        {
                            "id": tool_call.id,
                            "type": "function",
                            "function": {
                                "name": tool_call.function.name,
                                "arguments": tool_call.function.arguments,
                            },
                        }
                    ],
                },
                {
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(
                        {"temperature": 18, "unit": "celsius", "condition": "cloudy"}
                    ),
                },
            ],
            tools=[WEATHER_TOOL],
            temperature=0,
            max_tokens=256,
            stream=True,
        )

        content_parts = []
        for chunk in stream:
            delta = chunk.choices[0].delta if chunk.choices else None
            if delta and delta.content:
                content_parts.append(delta.content)

        assert "".join(content_parts), "Streaming follow-up should produce text content"


# =============================================================================
# Run directly
# =============================================================================

if __name__ == "__main__":
    import sys

    sys.exit(
        pytest.main([__file__, "-v", "--tb=short", "-x", "--no-header", *sys.argv[1:]])
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests Test changes tool-parser Tool/function call parser changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant