-
Notifications
You must be signed in to change notification settings - Fork 49
[WIP] Connorli/fix func call for KimiK2.5 #1070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ConnorLi96
wants to merge
16
commits into
main
Choose a base branch
from
connorli/fix-func-call-all
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
ea47938
read engine metrics from PROMETHEUS_MULTIPROC_DIR in grpc mode
65b14bb
lint and format
d8616a8
fix(tool_parser): fix function call parsing for models with native to…
ConnorLi96 f58f15b
fix(gateway): comprehensive func call and response quality fixes
ConnorLi96 bfab5ad
fix(trtllm): add <|im_end|> stop string for TRT-LLM gRPC requests
ConnorLi96 46b284f
fix(tokenizer): load merged EOS token IDs from config.json + generati…
ConnorLi96 98bd903
fix(metrics): guard against empty prometheus multiproc metrics output
ConnorLi96 de6058e
fix(tokenizer): reduce tiktoken partial UTF-8 decode log from warn to…
ConnorLi96 9cdaa5d
fix(gateway): graceful degradation for unsupported multimodal requests
ConnorLi96 bc34b81
Revert "fix(gateway): graceful degradation for unsupported multimodal…
ConnorLi96 20aa536
feat(multimodal): add Kimi-K2.5 vision model spec and image processor
ConnorLi96 2bc8274
feat(protocol): add thinking param to Chat API and support bare strin…
ConnorLi96 28218b4
feat(observability): add worker selection and backend response INFO logs
ConnorLi96 4be6380
feat(metrics): merge engine metrics into :29000/metrics endpoint
ConnorLi96 58fe788
fix: resolve all clippy warnings for clean --all-targets build
ConnorLi96 7f21625
fix(reasoning): run reasoning parser before JSON/tool post-processing…
ConnorLi96 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,167 @@ | ||
| use std::collections::HashMap; | ||
|
|
||
| use serde_json::{json, Value}; | ||
|
|
||
| use crate::{ | ||
| registry::{ModelMetadata, ModelProcessorSpec, ModelRegistryError, RegistryResult}, | ||
| types::{FieldLayout, Modality, PromptReplacement, TokenId}, | ||
| vision::image_processor::PreprocessedImages, | ||
| }; | ||
|
|
||
| pub(super) struct KimiK25Spec; | ||
|
|
||
| impl KimiK25Spec { | ||
| fn media_placeholder_token_id(metadata: &ModelMetadata) -> RegistryResult<TokenId> { | ||
| metadata | ||
| .config_u32(&["media_placeholder_token_id"]) | ||
| .map(|v| v as TokenId) | ||
| .ok_or_else(|| ModelRegistryError::MissingConfigField { | ||
| field: "media_placeholder_token_id".to_string(), | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| impl ModelProcessorSpec for KimiK25Spec { | ||
| fn name(&self) -> &'static str { | ||
| "kimi_k25" | ||
| } | ||
|
|
||
| fn matches(&self, metadata: &ModelMetadata) -> bool { | ||
| metadata | ||
| .config_model_type() | ||
| .is_some_and(|mt| mt == "kimi_k25") | ||
| || { | ||
| let id = metadata.model_id.to_ascii_lowercase(); | ||
| id.contains("kimi") && (id.contains("k2.5") || id.contains("k25")) | ||
| } | ||
| } | ||
|
|
||
| fn placeholder_token(&self, _metadata: &ModelMetadata) -> RegistryResult<String> { | ||
| Ok("<|media_pad|>".to_string()) | ||
| } | ||
|
|
||
| fn placeholder_token_id(&self, metadata: &ModelMetadata) -> RegistryResult<TokenId> { | ||
| Self::media_placeholder_token_id(metadata) | ||
| } | ||
|
|
||
| fn modality_limits( | ||
| &self, | ||
| _metadata: &ModelMetadata, | ||
| ) -> RegistryResult<HashMap<Modality, usize>> { | ||
| Ok(HashMap::from([(Modality::Image, 10)])) | ||
| } | ||
|
|
||
| fn processor_kwargs(&self, _metadata: &ModelMetadata) -> RegistryResult<Value> { | ||
| Ok(json!({})) | ||
| } | ||
|
|
||
| fn prompt_replacements( | ||
| &self, | ||
| metadata: &ModelMetadata, | ||
| preprocessed: &PreprocessedImages, | ||
| ) -> RegistryResult<Vec<PromptReplacement>> { | ||
| let pad_token_id = Self::media_placeholder_token_id(metadata)?; | ||
| let placeholder_token = self.placeholder_token(metadata)?; | ||
| // Keep 1 placeholder per image — TRT-LLM's KimiK25InputProcessor | ||
| // handles expansion to N vision tokens server-side based on grid_thws. | ||
| // SMG must NOT pre-expand or TRT-LLM will see N placeholders and | ||
| // attempt to expand each one again. | ||
| Ok(preprocessed | ||
| .num_img_tokens | ||
| .iter() | ||
| .map(|_| { | ||
| let tokens = vec![pad_token_id; 1]; | ||
| PromptReplacement::sequence(Modality::Image, &placeholder_token, tokens) | ||
| }) | ||
| .collect()) | ||
| } | ||
|
|
||
| fn field_layouts(&self) -> HashMap<String, FieldLayout> { | ||
| // pixel_values is patchified: [total_patches, C, patch_h, patch_w]. | ||
| // grid_thws is [num_images, 3] with (t, h, w) per image. | ||
| HashMap::from([ | ||
| ( | ||
| "pixel_values".to_string(), | ||
| FieldLayout::flat("patches_per_image"), | ||
| ), | ||
| ("grid_thws".to_string(), FieldLayout::Batched), | ||
| ("patches_per_image".to_string(), FieldLayout::Batched), | ||
| ]) | ||
| } | ||
|
|
||
| fn keep_on_cpu_keys(&self) -> Vec<String> { | ||
| vec!["grid_thws".to_string()] | ||
| } | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use serde_json::json; | ||
|
|
||
| use crate::{ | ||
| registry::{test_helpers::*, ModelMetadata, ModelRegistry}, | ||
| types::ImageSize, | ||
| }; | ||
|
|
||
| #[test] | ||
| fn kimi_k25_matches_by_model_type() { | ||
| let tokenizer = TestTokenizer::new(&[("<|media_pad|>", 163605)]); | ||
| let config = json!({ | ||
| "model_type": "kimi_k25", | ||
| "media_placeholder_token_id": 163605, | ||
| }); | ||
| let metadata = ModelMetadata { | ||
| model_id: "some-custom-name", | ||
| tokenizer: &tokenizer, | ||
| config: &config, | ||
| }; | ||
| let registry = ModelRegistry::new(); | ||
| let spec = registry.lookup(&metadata).expect("should match kimi_k25"); | ||
| assert_eq!(spec.name(), "kimi_k25"); | ||
| } | ||
|
|
||
| #[test] | ||
| fn kimi_k25_matches_by_model_id() { | ||
| let tokenizer = TestTokenizer::new(&[("<|media_pad|>", 163605)]); | ||
| let config = json!({ | ||
| "model_type": "unknown", | ||
| "media_placeholder_token_id": 163605, | ||
| }); | ||
| let metadata = ModelMetadata { | ||
| model_id: "nvidia/Kimi-K2.5-NVFP4", | ||
| tokenizer: &tokenizer, | ||
| config: &config, | ||
| }; | ||
| let registry = ModelRegistry::new(); | ||
| let spec = registry | ||
| .lookup(&metadata) | ||
| .expect("should match kimi by name"); | ||
| assert_eq!(spec.name(), "kimi_k25"); | ||
| } | ||
|
|
||
| #[test] | ||
| fn kimi_k25_prompt_replacements() { | ||
| let tokenizer = TestTokenizer::new(&[("<|media_pad|>", 163605)]); | ||
| let config = json!({ | ||
| "model_type": "kimi_k25", | ||
| "media_placeholder_token_id": 163605, | ||
| }); | ||
| let metadata = ModelMetadata { | ||
| model_id: "nvidia/Kimi-K2.5-NVFP4", | ||
| tokenizer: &tokenizer, | ||
| config: &config, | ||
| }; | ||
| let registry = ModelRegistry::new(); | ||
| let spec = registry.lookup(&metadata).expect("kimi spec"); | ||
| // 1 placeholder per image (TRT-LLM expands server-side) | ||
| let replacements = spec | ||
| .prompt_replacements( | ||
| &metadata, | ||
| &test_preprocessed_with_tokens(&[ImageSize::new(448, 448)], &[256]), | ||
| ) | ||
| .unwrap(); | ||
| assert_eq!(replacements.len(), 1); | ||
| assert_eq!(replacements[0].tokens.len(), 1); | ||
| assert_eq!(replacements[0].tokens[0], 163605); | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| //! Kimi-K2.5 image processor. | ||
| //! | ||
| //! Kimi-K2.5 uses MoonViT3d with a NaViT-style architecture very similar | ||
| //! to Qwen VL: patch_size=14, merge_kernel_size=(2,2), dynamic resolution. | ||
| //! We reuse the QwenVLProcessorBase with Kimi-specific defaults. | ||
|
|
||
| use image::DynamicImage; | ||
|
|
||
| use super::qwen_vl_base::{QwenVLConfig, QwenVLProcessorBase}; | ||
| use crate::vision::{ | ||
| image_processor::{ImagePreProcessor, PreprocessedImages}, | ||
| preprocessor_config::PreProcessorConfig, | ||
| transforms::TransformError, | ||
| }; | ||
|
|
||
| pub struct KimiK25Processor { | ||
| inner: QwenVLProcessorBase, | ||
| } | ||
|
|
||
| impl Default for KimiK25Processor { | ||
| fn default() -> Self { | ||
| Self::new() | ||
| } | ||
| } | ||
|
|
||
| impl KimiK25Processor { | ||
| pub fn new() -> Self { | ||
| Self { | ||
| inner: QwenVLProcessorBase::new(QwenVLConfig { | ||
| patch_size: 14, | ||
| merge_size: 2, | ||
| min_pixels: 14 * 14 * 4, | ||
| max_pixels: 14 * 14 * 16384, | ||
| temporal_patch_size: 1, | ||
| mean: [0.5, 0.5, 0.5], | ||
| std: [0.5, 0.5, 0.5], | ||
| model_name: "kimi_k25", | ||
| }), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl ImagePreProcessor for KimiK25Processor { | ||
| fn default_mean(&self) -> [f64; 3] { | ||
| [0.5, 0.5, 0.5] | ||
| } | ||
|
|
||
| fn default_std(&self) -> [f64; 3] { | ||
| [0.5, 0.5, 0.5] | ||
| } | ||
|
|
||
| fn preprocess( | ||
| &self, | ||
| images: &[DynamicImage], | ||
| config: &PreProcessorConfig, | ||
| ) -> Result<PreprocessedImages, TransformError> { | ||
| self.inner.preprocess(images, config) | ||
| } | ||
|
|
||
| fn calculate_num_tokens(&self, width: u32, height: u32, config: &PreProcessorConfig) -> usize { | ||
| self.inner.calculate_num_tokens(width, height, config) | ||
| } | ||
|
|
||
| fn model_name(&self) -> &'static str { | ||
| "kimi_k25" | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unconditionally copies merged EOS IDs into
stop_token_ids, butstop_token_idsare hard stop conditions whileignore_eosis meant to continue generation past EOS. As written, requests withignore_eos=truewill still terminate when an EOS token appears, which breaks the documentedignore_eosbehavior for TRT-LLM chat/harmony requests.Useful? React with 👍 / 👎.