feat(tool_parser): add DeepSeek V3.2 DSML tool call parser#1030
feat(tool_parser): add DeepSeek V3.2 DSML tool call parser#1030
Conversation
Signed-off-by: key4ng <rukeyang@gmail.com>
Signed-off-by: key4ng <rukeyang@gmail.com>
Signed-off-by: key4ng <rukeyang@gmail.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new DeepSeek V3.2 parser ( Changes
Sequence Diagram(s)(Skipped) Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1cc2419261
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request introduces the DeepSeek32Parser to support the DeepSeek V3.2 DSML tool-calling format, providing both complete and incremental parsing. The parser is integrated into the ParserFactory with mappings for V3.2 and V3.2-Exp models, and integration tests are included. Feedback suggests improving the robustness of parameter parsing and using warning-level logging for invalid tool names.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Line 313: Remove the explicit drop(captures); statement in deepseek32.rs: the
local variable captures (which borrows buf_snapshot) will go out of scope
naturally, so delete the drop call and ensure there are no further references to
captures after its intended use (verify the surrounding code in the function
where captures and buf_snapshot are used).
- Around line 370-398: The argument diff logic in argument_diff can be
simplified: when is_complete is false and you have a prev_args (from
self.prev_tool_call_arr) and DSML parameters only ever accumulate, replace the
find_common_prefix-based branching inside the else-if that checks let Some(prev)
= &prev_args with a direct slice from sent_len into current_args (i.e., treat
the new content as current_args[sent_len..].to_string()); this removes the
prefix computation while preserving behavior for monotonic accumulation—keep the
existing handling for the is_complete branch and the None cases, and only change
the block that currently calls helpers::find_common_prefix and compares
prefix.len() to sent_len.
- Around line 316-324: The invalid-tool branch currently breaks leaving parser
state stale and preventing processing of remaining invokes; change it to follow
the pattern used in other parsers: when func_name is invalid and is_complete is
true, advance self.buffer via match_end (if Some(end)) and then reset parser
state (clear streamed_args_for_tool and set current_tool_name_sent = false) and
continue the loop instead of break; when func_name is invalid and is_complete is
false, reset the same state (clear streamed_args_for_tool and set
current_tool_name_sent = false) and return/exit early so partial invokes are
dropped and state is clean for the next chunk.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6a76df48-375a-4a91-898f-0a879f03acd3
📒 Files selected for processing (5)
crates/tool_parser/src/factory.rscrates/tool_parser/src/lib.rscrates/tool_parser/src/parsers/deepseek32.rscrates/tool_parser/src/parsers/mod.rscrates/tool_parser/tests/tool_parser_deepseek32.rs
…r stripping Signed-off-by: key4ng <rukeyang@gmail.com>
…ation Signed-off-by: key4ng <rukeyang@gmail.com>
…reaking Signed-off-by: key4ng <rukeyang@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c7c1c2f6c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 336-381: The bug is that prev_tool_call_arr stores "arguments" as
an object but the code expects a string, so prev_args becomes None and the first
partial chunk is dropped; update the logic in the block computing argument_diff
(around parse_parameters_from_dsml, streamed_args_for_tool, prev_tool_call_arr
and helpers::find_common_prefix) to treat a missing/non-string previous
arguments as an empty string (or initialize "arguments" as an empty string when
setting prev_tool_call_arr) and then compute the diff from sent_len (i.e., if
prev_args is None treat prev = "" and emit current_args[sent_len..] when
!is_complete or when appropriate), ensuring the first partial arguments are
returned instead of None.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 574ade0e-dcfd-4409-a2d4-abea50e43965
📒 Files selected for processing (1)
crates/tool_parser/src/parsers/deepseek32.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cdff14236c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: key4ng <rukeyang@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
crates/tool_parser/src/parsers/deepseek32.rs (1)
346-349:⚠️ Potential issue | 🟠 MajorEmit the first partial argument delta.
Line 348 seeds
argumentswith a non-string value, but Lines 363-367 only recover previous arguments via.as_str(). On the first incomplete invoke,prev_argsis therefore missing, so Lines 389-390 returnNoneand the stream emits the tool name without any initial parameter bytes.🛠️ Possible fix
- let prev_args = if tool_id < self.prev_tool_call_arr.len() { - self.prev_tool_call_arr[tool_id] - .get("arguments") - .and_then(|v| v.as_str()) - .map(|s| s.to_string()) - } else { - None - }; + let prev_args = if tool_id < self.prev_tool_call_arr.len() { + self.prev_tool_call_arr[tool_id] + .get("arguments") + .and_then(|v| v.as_str()) + .unwrap_or_default() + .to_string() + } else { + String::new() + }; let argument_diff = if is_complete { if sent_len < current_args.len() { Some(current_args[sent_len..].to_string()) } else { Some(String::new()) } - } else if let Some(prev) = &prev_args { - if current_args == *prev { + } else if prev_args.is_empty() { + if sent_len < current_args.len() { + Some(current_args[sent_len..].to_string()) + } else { + None + } + } else if current_args == prev_args { None } else { - let prefix = helpers::find_common_prefix(prev, ¤t_args); + let prefix = helpers::find_common_prefix(&prev_args, ¤t_args); if prefix.len() > sent_len { Some(prefix[sent_len..].to_string()) } else { None } } - } else { - None };Also applies to: 363-390
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/tool_parser/src/parsers/deepseek32.rs` around lines 346 - 349, The code seeds prev_tool_call_arr[tool_id] with "arguments": {} (in deepseek32 parser) but later logic expects a string via .as_str(), so on the first partial invoke prev_args is missing and no initial parameter bytes are emitted; fix by initializing the seeded value to an empty string (e.g., set "arguments" to "" instead of {}) or alternatively update the prev_args recovery (where .as_str() is used) to handle non-string JSON by calling .as_str().unwrap_or_else(|| json_value.to_string().as_str()) or using to_string()/unwrap_or_default() so the first partial argument delta is emitted; modify the code around prev_tool_call_arr, tool_id and the prev_args extraction to ensure arguments are a string when consumed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 244-270: The partial-marker detection in the has_partial_prefix
check is too narrow (only '<', '<|', '</', '</|') causing longer truncated
tokens (e.g. '<|DSML' or '<|DSML|fun') to be treated as normal_text; update the
computation of has_partial_prefix in the parsing function (the variables
current_text, has_dsml, has_partial_prefix and the early-return branch that
yields StreamingParseResult::default()) so it detects any trailing incomplete
tag: find the last '<' (or "</") in current_text and treat it as a partial
prefix if there is no matching closing '>' after it (or use a regex like
r"</?[^>]*$" to detect an unterminated tag); this will ensure such longer
partial DSML fragments are buffered instead of flushed as normal_text.
---
Duplicate comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 346-349: The code seeds prev_tool_call_arr[tool_id] with
"arguments": {} (in deepseek32 parser) but later logic expects a string via
.as_str(), so on the first partial invoke prev_args is missing and no initial
parameter bytes are emitted; fix by initializing the seeded value to an empty
string (e.g., set "arguments" to "" instead of {}) or alternatively update the
prev_args recovery (where .as_str() is used) to handle non-string JSON by
calling .as_str().unwrap_or_else(|| json_value.to_string().as_str()) or using
to_string()/unwrap_or_default() so the first partial argument delta is emitted;
modify the code around prev_tool_call_arr, tool_id and the prev_args extraction
to ensure arguments are a string when consumed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 09577cc6-67fa-45f1-936b-c32a8acefbd2
📒 Files selected for processing (1)
crates/tool_parser/src/parsers/deepseek32.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7fa9333830
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
crates/tool_parser/src/parsers/deepseek32.rs (2)
348-351:⚠️ Potential issue | 🟠 MajorStill dropping the first partial argument delta.
prev_tool_call_arr[tool_id]["arguments"]is initialized as{}, but Lines 365-369 only read strings. On the first incomplete invoke,prev_argsbecomesNone, so the!is_completepath returnsNoneand no parameters are streamed until a later chunk. Initialize"arguments"as""or treat a non-string previous value as empty.🛠️ Minimal fix
self.prev_tool_call_arr[tool_id] = serde_json::json!({ "name": func_name, - "arguments": {}, + "arguments": "", }); ... - } else { - None + } else if sent_len < current_args.len() { + Some(current_args[sent_len..].to_string()) + } else { + None };Also applies to: 365-392
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/tool_parser/src/parsers/deepseek32.rs` around lines 348 - 351, The current initialization of self.prev_tool_call_arr[tool_id]["arguments"] as an object causes the first partial argument delta to be dropped because prev_args (read elsewhere in deepseek32.rs around the prev_args / is_complete logic) expects a string; change the initialization in the prev_tool_call_arr entry for func_name to set "arguments" to an empty string "" (or alternatively update the code that reads prev_args to treat non-string values as empty string) so that the first incomplete chunk is appended/streamed correctly; update any logic that merges incoming argument chunks (the code paths around prev_args and is_complete) to handle and coerce non-string previous values to "" before concatenation.
246-272:⚠️ Potential issue | 🟠 MajorLonger truncated DSML prefixes are still flushed as normal text.
Lines 249-252 only preserve
<,<|,</, and</|. A chunk ending with<|DSMLor<|DSML|invstill reaches Line 254 and gets emitted asnormal_text, so the next chunk can never reconstruct the tag. Buffer any trailing unterminated<...fragment instead of hard-coding four suffixes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/tool_parser/src/parsers/deepseek32.rs` around lines 246 - 272, The code currently only treats four very short suffixes as partial DSML prefixes (has_partial_prefix) so longer truncated tags like "<|DSML" get flushed as normal_text; change has_partial_prefix to detect any trailing unterminated '<' fragment by finding the last '<' in current_text and checking if there is no corresponding '>' after it (i.e., an open tag that runs to the end of the chunk), and if so treat that suffix as a partial prefix; when producing normal_text (in the branch that strips end tokens and returns StreamingParseResult), remove that trailing unterminated fragment from normal_text and put it back into self.buffer so the fragment is preserved for the next chunk; update uses of has_partial_prefix, current_text, buffer, and StreamingParseResult accordingly instead of hard-coding the four suffixes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 51-67: strip_dsml_trailing currently trims by character set and
can remove legitimate argument characters; change it to only remove an actual
trailing substring that is a prefix of the full DSML closing tag. For each
fragment group (DSML_PARAM_END_FRAGMENTS and DSML_INVOKE_END_FRAGMENTS) build
the full closing string by concatenating the fragments (e.g.
"</|DSML|parameter>"), then for the input string find the longest k>0 such that
result.ends_with(&full[..k]) and remove exactly that suffix (no per-character
trimming). Update strip_dsml_trailing to perform this suffix-prefix check and
removal using the concatenated full tag rather than fragment.contains(c); keep
references to DSML_PARAM_END_FRAGMENTS and DSML_INVOKE_END_FRAGMENTS and the
function name strip_dsml_trailing for locating the change.
---
Duplicate comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 348-351: The current initialization of
self.prev_tool_call_arr[tool_id]["arguments"] as an object causes the first
partial argument delta to be dropped because prev_args (read elsewhere in
deepseek32.rs around the prev_args / is_complete logic) expects a string; change
the initialization in the prev_tool_call_arr entry for func_name to set
"arguments" to an empty string "" (or alternatively update the code that reads
prev_args to treat non-string values as empty string) so that the first
incomplete chunk is appended/streamed correctly; update any logic that merges
incoming argument chunks (the code paths around prev_args and is_complete) to
handle and coerce non-string previous values to "" before concatenation.
- Around line 246-272: The code currently only treats four very short suffixes
as partial DSML prefixes (has_partial_prefix) so longer truncated tags like
"<|DSML" get flushed as normal_text; change has_partial_prefix to detect any
trailing unterminated '<' fragment by finding the last '<' in current_text and
checking if there is no corresponding '>' after it (i.e., an open tag that runs
to the end of the chunk), and if so treat that suffix as a partial prefix; when
producing normal_text (in the branch that strips end tokens and returns
StreamingParseResult), remove that trailing unterminated fragment from
normal_text and put it back into self.buffer so the fragment is preserved for
the next chunk; update uses of has_partial_prefix, current_text, buffer, and
StreamingParseResult accordingly instead of hard-coding the four suffixes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 08b90a6f-e740-4a73-b664-9be71109e9f4
📒 Files selected for processing (1)
crates/tool_parser/src/parsers/deepseek32.rs
…m DSML block Signed-off-by: key4ng <rukeyang@gmail.com>
…id invoke abort Signed-off-by: key4ng <rukeyang@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf8518dd3b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
SGLang Alignment Review (Corrected)
This parser follows SGLang's Regex Patterns
DSML Fragment Stripping
Parameter Parsing
Streaming Logic
State Management
Improvements Over SGLang
Acceptable Differences
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
crates/tool_parser/src/parsers/deepseek32.rs (2)
57-67:⚠️ Potential issue | 🟠 Major
strip_dsml_trailingstill removes real argument bytes.Line 64 trims by character set, not by DSML suffix. A partial value like
barbecomesb, and direct-JSON fragments can also lose legitimate trailing bytes before the diffing code sees them. Strip only the longest trailing substring that is actually a prefix of the closing tag.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/tool_parser/src/parsers/deepseek32.rs` around lines 57 - 67, strip_dsml_trailing currently uses trim_end_matches with a character predicate and thus deletes any trailing characters that appear anywhere in a fragment; instead detect and remove full DSML closing-tag substrings only: in strip_dsml_trailing, stop using trim_end_matches(|c| ...) and replace with logic that finds the longest fragment from fragments that is a suffix of the current result (use ends_with(fragment)) and then chop off that exact fragment (once) from the end; iterate in reverse or repeatedly as before but always remove whole-fragment suffixes only so legitimate trailing bytes are not lost.
246-252:⚠️ Potential issue | 🟠 MajorBuffer any unterminated DSML tag, not just four 1–2 byte suffixes.
Lines 249-252 only preserve
<,<|,</, and</|. A chunk ending with<|DSML,<|DSML|function_cal, or<|DSML|invfalls through Line 254 asnormal_text, so the next chunk can no longer reconstruct the marker. Detect any trailing<...without a closing>instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/tool_parser/src/parsers/deepseek32.rs` around lines 246 - 252, The current partial-prefix check only matches four specific short suffixes and misses longer unterminated DSML fragments; update the logic that computes has_partial_prefix (used alongside has_dsml and current_text in this parser) to detect any trailing opening tag without a closing '>' instead of only exact suffixes — e.g., consider the last index of '<' versus the last index of '>' in current_text and treat as a partial if there's an unmatched '<' (and ensure this works with the existing self.has_tool_markers check so longer fragments like "<|DSML" or "<|DSML|function_cal" are buffered rather than emitted as normal_text).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 378-398: The code currently emits the placeholder "{}" as an
argument diff when prev_args is None (first partial chunk) because
parse_parameters_from_dsml(..., true) returns "{}" for an empty/incomplete
payload; update the first-partial fallback in the argument_diff computation (the
branch that now does `else if sent_len < current_args.len() { /* First partial
chunk */ Some(current_args[sent_len..].to_string()) }`) to only emit when
current_args contains actual payload bytes (e.g., require current_args != "{}"
or current_args.len() > 2) so the placeholder isn't streamed; keep the existing
complete-path (is_complete) behavior so a truly complete empty-object call can
still be sent.
---
Duplicate comments:
In `@crates/tool_parser/src/parsers/deepseek32.rs`:
- Around line 57-67: strip_dsml_trailing currently uses trim_end_matches with a
character predicate and thus deletes any trailing characters that appear
anywhere in a fragment; instead detect and remove full DSML closing-tag
substrings only: in strip_dsml_trailing, stop using trim_end_matches(|c| ...)
and replace with logic that finds the longest fragment from fragments that is a
suffix of the current result (use ends_with(fragment)) and then chop off that
exact fragment (once) from the end; iterate in reverse or repeatedly as before
but always remove whole-fragment suffixes only so legitimate trailing bytes are
not lost.
- Around line 246-252: The current partial-prefix check only matches four
specific short suffixes and misses longer unterminated DSML fragments; update
the logic that computes has_partial_prefix (used alongside has_dsml and
current_text in this parser) to detect any trailing opening tag without a
closing '>' instead of only exact suffixes — e.g., consider the last index of
'<' versus the last index of '>' in current_text and treat as a partial if
there's an unmatched '<' (and ensure this works with the existing
self.has_tool_markers check so longer fragments like "<|DSML" or
"<|DSML|function_cal" are buffered rather than emitted as normal_text).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bef2d1e5-bd1e-4e34-8949-4c1dbfac94ab
📒 Files selected for processing (1)
crates/tool_parser/src/parsers/deepseek32.rs
Parser WalkthroughWhat the model outputsDeepSeek V3.2 uses an XML-like "DSML" format for tool calls: The parser turns this into:
Two modes
DSML Fragment StrippingDuring streaming, a chunk may end mid-closing-tag: The captured value from chunk 1 would be Argument Reconstruction (parse_parameters_from_dsml)Converts DSML parameter tags into a JSON arguments string. Two paths: Direct JSON path: If invoke content starts with XML parameter path:
Streaming Engine (parse_incremental)Phase 1 — Buffer or flush? Each chunk is appended to the buffer, then:
Phase 2 — Invoke processing loop Processes invoke blocks one at a time from the buffer: For each invoke match:
End-to-end streaming exampleModel calls
Client receives tool call |
…o prevent delta corruption Signed-off-by: key4ng <rukeyang@gmail.com>
…ing for DSML fragments Signed-off-by: key4ng <rukeyang@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 09abcf7018
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
E2E Validation: DeepSeek V3.2-ExpTested against a live DeepSeek V3.2-Exp (FP8) deployment on 8x H200. Setupsglang backend (gRPC mode): /home/ubuntu/sglang_venv/bin/python -m sglang.launch_server \
--model-path /raid/models/deepseek-ai/DeepSeek-V3.2-Exp \
--served-model-name deepseek-ai/DeepSeek-V3.2-Exp \
--tp 8 --trust-remote-code --port 30000 --grpc-modesmg router (V3.1 vLLM template, no --tool-call-parser needed): ./target/debug/smg \
--worker-urls grpc://localhost:30000 \
--model-path deepseek-ai/DeepSeek-V3.2-Exp \
--tokenizer-path /raid/models/deepseek-ai/DeepSeek-V3.2-Exp \
--chat-template e2e_test/fixtures/chat_templates/tool_chat_template_deepseekv31.jinja \
--port 8080Run tests: SMG_BASE_URL=http://localhost:8080 SMG_MODEL=deepseek-ai/DeepSeek-V3.2-Exp \
python -m pytest e2e_test/chat_completions/test_deepseek32_tool_calling.py \
-v --tb=short --no-header --rootdir=/tmp --noconftestTest resultFull test file: e2e_test/chat_completions/test_deepseek32_tool_calling.py"""DeepSeek V3.2 Tool Calling E2E Tests.
Tests for the DeepSeek V3.2 DSML-format tool parser via the SMG gateway.
Tests both non-streaming and streaming modes against a live sglang backend.
IMPORTANT: DeepSeek V3.2 has no built-in Jinja chat template in tokenizer_config.json.
The DSML template must be provided via --chat-template. Without it:
- tool_choice=required works (uses JSON schema constrained decoding, bypasses tool parser)
- tool_choice=auto fails (model output not parsed by deepseek32 DSML parser)
Usage:
SMG_BASE_URL=http://localhost:8080 pytest \
e2e_test/chat_completions/test_deepseek32_tool_calling.py -v \
--rootdir=/tmp --noconftest
Setup:
# sglang
python -m sglang.launch_server \
--model-path /raid/models/deepseek-ai/DeepSeek-V3.2 \
--served-model-name deepseek-ai/DeepSeek-V3.2 \
--tp 8 --trust-remote-code --port 30000 --grpc-mode
# smg (DSML template required for tool_choice=auto tests)
./target/debug/smg \
--worker-urls grpc://localhost:30000 \
--model-path deepseek-ai/DeepSeek-V3.2 \
--tokenizer-path /raid/models/deepseek-ai/DeepSeek-V3.2 \
--tool-call-parser deepseek32 \
--chat-template e2e_test/fixtures/chat_templates/tool_chat_template_deepseekv32.jinja \
--port 8080
"""
from __future__ import annotations
import json
import logging
import os
import openai
import pytest
logger = logging.getLogger(__name__)
BASE_URL = os.environ.get("SMG_BASE_URL", "http://localhost:8080")
MODEL = os.environ.get("SMG_MODEL", "deepseek-ai/DeepSeek-V3.2")
# =============================================================================
# Client fixture
# =============================================================================
@pytest.fixture(scope="module")
def client():
return openai.OpenAI(base_url=f"{BASE_URL}/v1", api_key="dummy")
# =============================================================================
# Tool definitions
# =============================================================================
WEATHER_TOOL = {
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. 'San Francisco'",
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit",
},
},
"required": ["location"],
},
},
}
SEARCH_TOOL = {
"type": "function",
"function": {
"name": "search",
"description": "Search for information on the web.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query string",
},
},
"required": ["query"],
},
},
}
TRANSLATE_TOOL = {
"type": "function",
"function": {
"name": "translate",
"description": "Translate text from one language to another.",
"parameters": {
"type": "object",
"properties": {
"text": {"type": "string", "description": "Text to translate"},
"target_language": {"type": "string", "description": "Target language code"},
},
"required": ["text", "target_language"],
},
},
}
ALL_TOOLS = [WEATHER_TOOL, SEARCH_TOOL, TRANSLATE_TOOL]
# =============================================================================
# Helpers
# =============================================================================
def assert_valid_tool_call(tool_call, expected_name=None):
assert tool_call.function.name, "Tool call must have a function name"
assert tool_call.function.arguments, "Tool call must have arguments"
args = json.loads(tool_call.function.arguments)
assert isinstance(args, dict), "Arguments must be a JSON object"
if expected_name:
assert tool_call.function.name == expected_name
return args
def collect_streaming_tool_calls(stream):
tool_calls = {}
chunks_count = 0
finish_reason = None
for chunk in stream:
chunks_count += 1
delta = chunk.choices[0].delta if chunk.choices else None
if not delta:
continue
if chunk.choices[0].finish_reason:
finish_reason = chunk.choices[0].finish_reason
if delta.tool_calls:
for tc in delta.tool_calls:
idx = tc.index
if idx not in tool_calls:
tool_calls[idx] = {"name": "", "arguments": ""}
if tc.function and tc.function.name:
tool_calls[idx]["name"] = tc.function.name
if tc.function and tc.function.arguments:
tool_calls[idx]["arguments"] += tc.function.arguments
return tool_calls, chunks_count, finish_reason
# =============================================================================
# Non-Streaming Tests
# =============================================================================
class TestDeepSeek32NonStreaming:
"""Non-streaming tool call tests for DeepSeek V3.2 DSML parser."""
def test_single_tool_call_required(self, client):
"""tool_choice=required forces a tool call."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls, "Expected tool calls with tool_choice=required"
args = assert_valid_tool_call(msg.tool_calls[0], "get_weather")
assert "location" in args
logger.info("Tool args: %s", args)
def test_tool_call_arguments_are_valid_json(self, client):
"""Tool call arguments must be parseable JSON objects."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Search for 'best restaurants in Tokyo'"}],
tools=[SEARCH_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
args = json.loads(msg.tool_calls[0].function.arguments)
assert isinstance(args, dict)
assert "query" in args
def test_tool_call_has_id(self, client):
"""Each tool call should have a unique ID."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Check weather in Berlin"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
assert msg.tool_calls[0].id, "Tool call should have an ID"
assert msg.tool_calls[0].type == "function"
def test_tool_call_finish_reason(self, client):
"""finish_reason should be 'tool_calls' when tools are returned."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Weather in London?"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
if response.choices[0].message.tool_calls:
assert response.choices[0].finish_reason == "tool_calls"
def test_model_picks_correct_tool(self, client):
"""With multiple tools, model should pick the right one."""
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "user", "content": "Translate 'hello' to French. Use the translate tool."}
],
tools=ALL_TOOLS,
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
assert msg.tool_calls[0].function.name == "translate"
def test_parallel_tool_calls(self, client):
"""Model can return multiple tool calls in a single response."""
response = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": (
"Do two things: 1) Get weather in Tokyo "
"2) Search for 'Tokyo travel guide'. Call both tools in parallel."
),
}
],
tools=[WEATHER_TOOL, SEARCH_TOOL],
tool_choice="required",
temperature=0,
max_tokens=1024,
)
msg = response.choices[0].message
assert msg.tool_calls
assert len(msg.tool_calls) >= 2
names = {tc.function.name for tc in msg.tool_calls}
assert "get_weather" in names
assert "search" in names
def test_single_tool_call_auto(self, client):
"""tool_choice=auto exercises the deepseek32 DSML parser path.
Unlike tool_choice=required (which uses JSON schema constrained decoding
and bypasses the parser), auto mode lets the model output freely and
relies on the deepseek32 parser to detect and parse DSML markers.
Requires the DSML chat template (--chat-template) to be set.
"""
response = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": "Use the get_weather tool to check the weather in Tokyo.",
}
],
tools=[WEATHER_TOOL],
tool_choice="auto",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls, "Model should call tool when explicitly asked (auto mode)"
args = assert_valid_tool_call(msg.tool_calls[0], "get_weather")
assert "location" in args
logger.info("Auto DSML tool args: %s", args)
def test_tool_choice_none(self, client):
"""tool_choice=none should prevent tool calls."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What's the weather in NYC?"}],
tools=[WEATHER_TOOL],
tool_choice="none",
temperature=0,
max_tokens=256,
)
msg = response.choices[0].message
assert not msg.tool_calls
assert msg.content
def test_usage_stats_present(self, client):
"""Response should include usage statistics."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Check weather in NYC"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=256,
)
assert response.usage is not None
assert response.usage.prompt_tokens > 0
assert response.usage.completion_tokens > 0
def test_unicode_in_tool_arguments(self, client):
"""Tool arguments with unicode content."""
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "user", "content": "Translate 'こんにちは' to English using the translate tool."}
],
tools=[TRANSLATE_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
args = json.loads(msg.tool_calls[0].function.arguments)
assert "text" in args
# =============================================================================
# Streaming Tests
# =============================================================================
class TestDeepSeek32Streaming:
"""Streaming tool call tests for DeepSeek V3.2 DSML parser."""
def test_streaming_single_tool_call(self, client):
"""Streaming delivers tool call name and arguments across chunks."""
stream = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
stream=True,
)
tool_calls, chunks_count, finish_reason = collect_streaming_tool_calls(stream)
assert chunks_count > 1, "Streaming should return multiple chunks"
assert len(tool_calls) >= 1
tc = tool_calls[0]
assert tc["name"] == "get_weather"
args = json.loads(tc["arguments"])
assert "location" in args
def test_streaming_arguments_arrive_incrementally(self, client):
"""Arguments should arrive across multiple chunks."""
stream = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "user", "content": "Search for 'comprehensive guide to machine learning'"}
],
tools=[SEARCH_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
stream=True,
)
arg_chunk_count = 0
for chunk in stream:
delta = chunk.choices[0].delta if chunk.choices else None
if delta and delta.tool_calls:
for tc in delta.tool_calls:
if tc.function and tc.function.arguments:
arg_chunk_count += 1
assert arg_chunk_count > 1, f"Expected incremental args, got {arg_chunk_count} chunks"
def test_streaming_finish_reason(self, client):
"""Streaming should end with a finish_reason."""
stream = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Weather in London"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=256,
stream=True,
)
_, _, finish_reason = collect_streaming_tool_calls(stream)
assert finish_reason is not None
def test_streaming_single_tool_call_auto(self, client):
"""Streaming with tool_choice=auto exercises the DSML incremental parser.
This is the most important streaming test — it validates parse_incremental
with real DSML token output. All other streaming tests use required mode
which bypasses the parser via JSON schema constrained decoding.
"""
stream = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": "Use the get_weather tool to check the weather in Tokyo.",
}
],
tools=[WEATHER_TOOL],
tool_choice="auto",
temperature=0,
max_tokens=512,
stream=True,
)
tool_calls, chunks_count, _ = collect_streaming_tool_calls(stream)
assert chunks_count > 1, "Streaming should return multiple chunks"
assert len(tool_calls) >= 1, "Model should call tool when explicitly asked (auto streaming)"
tc = tool_calls[0]
assert tc["name"] == "get_weather"
args = json.loads(tc["arguments"])
assert "location" in args
logger.info("Streaming auto DSML tool args: %s", args)
def test_streaming_parallel_tool_calls(self, client):
"""Streaming should handle multiple tool calls when model emits them."""
stream = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": (
"Do two things at once: "
"1) Get weather in Paris "
"2) Search for 'Paris travel tips'. "
"You MUST call BOTH get_weather AND search tools in parallel."
),
}
],
tools=[WEATHER_TOOL, SEARCH_TOOL],
tool_choice="required",
temperature=0,
max_tokens=1024,
stream=True,
)
tool_calls, _, _ = collect_streaming_tool_calls(stream)
assert len(tool_calls) >= 1, "Should have at least one streaming tool call"
for idx, tc in tool_calls.items():
assert tc["name"], f"Tool call {idx} should have a name"
args = json.loads(tc["arguments"])
assert isinstance(args, dict), f"Tool call {idx} args should be valid JSON object"
names = {tc["name"] for tc in tool_calls.values()}
logger.info("Streaming parallel tool names: %s (count: %d)", names, len(tool_calls))
if len(tool_calls) >= 2:
assert "get_weather" in names
assert "search" in names
# =============================================================================
# Multi-Turn Tests
# =============================================================================
class TestDeepSeek32MultiTurn:
"""Multi-turn conversations with tool results."""
def test_tool_result_followup(self, client):
"""Model should use tool result to form a final text response."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
tool_call = msg.tool_calls[0]
response2 = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"},
{
"role": "assistant",
"tool_calls": [
{
"id": tool_call.id,
"type": "function",
"function": {
"name": tool_call.function.name,
"arguments": tool_call.function.arguments,
},
}
],
},
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(
{"temperature": 22, "unit": "celsius", "condition": "sunny"}
),
},
],
tools=[WEATHER_TOOL],
temperature=0,
max_tokens=512,
)
msg2 = response2.choices[0].message
assert msg2.content, "Model should reply with text after receiving tool result"
def test_tool_result_followup_streaming(self, client):
"""Streaming follow-up with tool result should produce text content."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=256,
)
msg = response.choices[0].message
assert msg.tool_calls
tool_call = msg.tool_calls[0]
stream = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "user", "content": "What's the weather in Paris?"},
{
"role": "assistant",
"tool_calls": [
{
"id": tool_call.id,
"type": "function",
"function": {
"name": tool_call.function.name,
"arguments": tool_call.function.arguments,
},
}
],
},
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(
{"temperature": 18, "unit": "celsius", "condition": "cloudy"}
),
},
],
tools=[WEATHER_TOOL],
temperature=0,
max_tokens=256,
stream=True,
)
content_parts = []
for chunk in stream:
delta = chunk.choices[0].delta if chunk.choices else None
if delta and delta.content:
content_parts.append(delta.content)
assert "".join(content_parts), "Streaming follow-up should produce text content"
# =============================================================================
# Run directly
# =============================================================================
if __name__ == "__main__":
import sys
sys.exit(
pytest.main([__file__, "-v", "--tb=short", "-x", "--no-header", *sys.argv[1:]])
) |
Description
Problem
DeepSeek V3.2 introduces a new XML-like DSML format for tool calls, replacing the special-token approach used in V3/V3.1. The gateway has no parser for this format, so V3.2 models cannot use tool calling through the gRPC streaming path.
Solution
Add a new
DeepSeek32Parserthat handles the DSML format with incremental streaming support, following the SGLangDeepSeekV32Detectorpattern.Changes
crates/tool_parser/src/parsers/deepseek32.rs— handles DSML format with regex-based parsing<|DSML|parameter>) and direct JSON fallback inside invoke blocksstring="true"→ string,string="false"→ parsed JSONdeepseek32parser with model mappingsdeepseek-v3.2*/deepseek-ai/DeepSeek-V3.2*→deepseek32(DSML format)deepseek-v3.2-exp*/deepseek-ai/DeepSeek-V3.2-Exp*→deepseek31(V3.2-Exp uses V3.1 format)Test Plan
Checklist
cargo +nightly fmtpassescargo clippy --all-targets --all-features -- -D warningspassesSummary by CodeRabbit
New Features
Tests