diff --git a/.gitignore b/.gitignore index 5191c3f..844a1b2 100644 --- a/.gitignore +++ b/.gitignore @@ -3,7 +3,7 @@ files/ outputs/ test_venv/ backup*/ -demo/ +resources/ # Python cache __pycache__/ diff --git a/AGENTS.md b/AGENTS.md index ba24d00..3068ddb 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,84 +1,11 @@ -# CLAUDE.md +# Agent Integration -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. - -## Project Overview - -CNLLM is a unified adapter layer that translates Chinese LLM vendor APIs (MiniMax, DeepSeek, KIMI, Doubao, GLM, Xiaomi) into OpenAI-compatible request/response formats, enabling seamless integration with LangChain, LlamaIndex, and other OpenAI-compatible frameworks. - -## Running Tests +CNLLM now provides a dedicated Agent Skill following the Claude Skills / Agent Skills standard. +📦 **Install the skill**: ```bash -# Unit tests only (no API keys needed) -pytest tests/test_*.py - -# All tests including integration tests (require API keys in env vars) -pytest tests/ - -# Single test file -pytest tests/test_adapter_config.py -v -``` - -API-key-dependent tests live in `tests/key_needed/` and are gated by the presence of environment variables (e.g., `MINIMAX_API_KEY`, `XIAOMI_API_KEY`). - -## Architecture - +npx skills add kanchengw/cnllm-skill ``` -CNLLM (client) → ChatNamespace / EmbeddingsNamespace → create/batch - → BaseAdapter._build_payload() [YAML-driven field mapping] - → BaseHttpClient (httpx) [HTTP execution] - → Responder.to_openai_format() [vendor response → OpenAI format] - → Accumulator [field accumulation, stream handling] -``` - -### Three-component vendor pattern - -Each vendor in `cnllm/core/vendor/{vendor}.py` implements three classes: - -1. **`{Vendor}Adapter(BaseAdapter)`** — builds request payload (`_build_payload`), performs format conversion (`_to_openai_format`, `_do_to_openai_stream_format`), and registers with `_register()` -2. **`{Vendor}Responder(Responder)`** — maps vendor response fields to OpenAI standard fields via `configs/{vendor}/response_{vendor}.yaml`; usually just sets `CONFIG_DIR` -3. **`{Vendor}VendorError(VendorError)`** — parses vendor-specific error responses via `from_response()` and registers with `VendorErrorRegistry.register()` - -The vendor module is also where you place any subclass overrides for logic that can't be expressed in YAML (e.g., MiniMax's stream chunk dedup, Xiaomi's `thinking` transform). - -### YAML-driven request/response mapping - -**Request config** (`configs/{vendor}/request_{vendor}.yaml`) drives: -- `required_fields` — mandatory parameters and validation -- `optional_fields` — optional parameters, including field name `map` (rename), `transform` (value conversion), `skip` (exclude from body, e.g. for headers) -- `model_mapping` — short model alias → vendor model name -- `error_check` — vendor error code → CNLLM exception type mapping - -**Response config** (`configs/{vendor}/response_{vendor}.yaml`) drives: -- `fields` — vendor response path → OpenAI field path (e.g. `"content": "choices[0].message.content"`) -- `stream_fields` — same for streaming chunks (`content_path`, `tool_calls_path`, `reasoning_content_path`) -- `defaults` — fallback values when vendor omits fields -- `error_check` — sensitive content detection paths - -The parameter processing order is: `resolve_default` (read scope defaults) → `validate_for_scope` (PARAM_REGISTRY + YAML + drop_params) → `_validate_one_of` → `_check_image_support` → `_build_payload` (YAML field mapping + get_vendor_model) → `get_base_url` + `get_api_path` → `get_header_mappings`. - -### Field accumulation - -Streaming responses accumulate into `adapter._cnllm_extra`: -- `_thinking` — raw reasoning/thinking content -- `_still` — cleaned final response content -- `_tools` — accumulated tool_calls - -Accessible via `client.chat.think`, `client.chat.still`, `client.chat.tools`, `client.chat.raw`. - -### FallbackManager - -`FallbackManager` is only invoked when `chat.create()` is called **without** a `model` argument (or with `model=""`). If the primary model fails, it iterates through `fallback_models` in order. If a model is passed directly to `chat.create()`, no fallback occurs. - -### Sync/async relationship - -`CNLLM` (sync) holds an internal `AsyncCNLLM` engine and delegates async operations to it. The `LangChainRunnable` integration uses the async engine directly. - -## Adding a New Vendor - -1. Create `configs/{vendor}/request_{vendor}.yaml` and `configs/{vendor}/response_{vendor}.yaml` -2. Create `cnllm/core/vendor/{vendor}.py` with the three-component pattern; call `{Vendor}Adapter._register()` at the bottom -3. Add model alias → vendor name mappings to the YAML `model_mapping.chat` section -4. Write tests: unit tests in `tests/test_*.py` (no API key), integration tests in `tests/key_needed/` (with key assignment at the top: `MODEL = "..."; API_KEY = os.getenv("...")`) -Full walkthrough: see `docs/CONTRIBUTOR.md`. \ No newline at end of file +📖 For full documentation and examples, visit the dedicated skill repository: +https://github.com/kanchengw/cnllm-skill \ No newline at end of file diff --git a/README.md b/README.md index 225b4a4..870d707 100644 --- a/README.md +++ b/README.md @@ -124,20 +124,23 @@ CNLLM 为中文大模型提供了一个**统一的 OpenAI 兼容接口层**与 ### 1.1 安装 -#### 1.1.1 SDK 安装 +#### 1.1.1 作为 Agent Skill 安装 (推荐) + +CNLLM 遵循 Claude Skills 规范提供标准 Agent Skill。 + +**安装 Skill**: ```bash -pip install cnllm +npx skills add kanchengw/cnllm-skill ``` -#### 1.1.2 作为 Agent Skill 安装 +📖 完整文档和示例,请访问 CNLLM Skill 仓库: +https://github.com/kanchengw/cnllm-skill -**一键安装**: +#### 1.1.2 SDK 安装 ```bash -npx skills add https://github.com/kanchengw/cnllm +pip install cnllm ``` -或手动将项目根目录的 `SKILL.md` 文件复制到 Agent 的技能目录下,在**调用中文大模型时, 会优先使用 CNLLM**。 - ### 1.2 客户端初始化 #### 1.2.1 同步客户端 @@ -266,7 +269,7 @@ print(resp.raw) # 完整累积后的模型原生响应 | **raw**: 模型原生响应 | `resp.raw` | `List[Dict]` | `[模型原生流式 chunks 列表]` | **repr():** -流式调用中,展示**chunks 合并和字段累积的实时结果**,而非流式 chunks 列表;不改变流式响应对象类型,即包含所有标准流式 chunks 的**迭代器**。 +以类似非流式响应的**字典结构**展示流式响应的**字段名聚合和字段值累积的实时结果**;不改变流式响应对象类型,即包含所有标准流式 chunks 的**迭代器**。 ```python for chunk in resp: print(resp) @@ -275,7 +278,7 @@ for chunk in resp: ### 2.2 chat completions 批量调用 -可通过`prompt`和`messages`参数输入并快速配置全局参数,也可以通过`requests`参数为单个请求进行独立配置。 +可通过 `prompt` 和 `messages` 参数输入并快速配置全局参数,也可以通过 `requests` 参数为单个请求进行独立配置。 **prompt 参数:** @@ -325,15 +328,12 @@ BatchResponse 外层结构,其中 `results[request_id]` 字段下的每条响 ```python { "status": {"elapsed": "3.42s", "success_count": 2, "fail_count": 1, "total": 3}, # 统计信息 - "usage": {"prompt_tokens": 5, "total_tokens": 5}, # 批处理的总用量信息 - "errors": {"request_2": "error message"}, # 所有失败请求的 request_id 和错误信息映射 - "results": { # 所有成功请求的 request_id 和标准响应映射 - "request_0": {...}, - "request_1": {...} - }, + "usage": {"prompt_tokens": 5, "total_tokens": 5}, # 批处理的总用量信息 + "errors": {"request_2": "error message"}, # 所有失败请求的 request_id 和错误信息映射 + "results": {"request_0": {...}, "request_1": {...}}, # 所有成功请求的 request_id 和标准响应映射 "think": {"request_0": "...", "request_1": "..."}, "still": {"request_0": "...", "request_1": "..."}, - "tools": {"request_0": [...], "request_1": [...]}, + "tools": {"request_0": {...}, "request_1": {...}}, "raw": {"request_0": {...}, "request_1": {...}} } ``` @@ -429,22 +429,12 @@ BatchEmbeddingResponse 外层结构,其中 `results[request_id]` 字段下每 ```python { - "status": { - "elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2 - }, - "batch_info": { - "batch_size": 2, "batch_count": 2, "dimension": 1024 - }, + "status": {"elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2}, + "batch_info": {"batch_size": 2, "batch_count": 2, "dimension": 1024}, "usage": {"prompt_tokens": 5, "total_tokens": 5}, - "errors": {"request_1": "error message"}, - "results": { - "request_0": { - "object": "list", - "data": [{"object": "embedding","embedding": [0.1, 0.2, ...], "index": 0}], - "model": "embedding-2" - } - } - "vectors": {"request_0": [...]} + "results": {"request_0": {...}, "request_1": {...}} + "errors": {"request_2": "error message"}, + "vectors": {"request_0": [...]} # 所有成功请求的 request_id 和嵌入向量映射 } ``` @@ -622,11 +612,8 @@ client = CNLLM(..., keep=["vectors"]) | 静默忽略模式 | `drop_params="ignore"` | 静默丢弃未知参数,不产生任何日志 | **说明:** --进行批量调用时,若全局参数中包含未知参数,`drop_params="strict"` 直接抛出异常,不实际启动批量任务; -若批量任务中的单个请求包含未知参数,`drop_params="strict"` 直接将该请求归入 `errors` 字段,不实际执行该请求,并继续执行后续的批量任务。 - -- 特别地,当配置`drop_params="strict"` 且 `stop_on_error=True` 时,批量请求中遭遇第一个错误时会立即中断批量任务,同时返回已处理的请求结果,详见 [遇错停止](#253-遇错停止)。 -- `drop_params` 参数支持客户端配置以及所有调用方式(包括 `create` 单条调用方式)。 +- 进行批量调用时,若全局参数中包含未知参数,`drop_params="strict"` 直接抛出异常,不实际启动批量任务; +- 若批量任务中的单个请求包含未知参数,`drop_params="strict"` 直接将该请求归入 `errors` 字段,不实际执行该请求,并继续执行后续的批量任务。 ## 3. CNLLM 标准响应格式 @@ -717,7 +704,7 @@ CNLLM 请求参数与**OpenAI 标准参数**基本一致,覆盖范围基于国 | 参数 | 类型 | 默认值 | 说明 | | ------------------- | ------------------------------- | ------------------------------- | ------------------------------------------------------ | -| `model` | `str` | - | 模型名称,客户端初始化必填,调用入口可覆盖 | +| `model` | `str` | - | 模型名称,模型名见[支持的模型](#支持的模型) | | `api_key` | `str` | - | API 密钥 | | `base_url` | `str` | 自动适配 | 可自定义 API 地址 | | `messages` | `list[dict]`/`list[list[dict]]` | - | `chat()` 输入参数,支持上下文管理/图片识别(仅支持调用入口配置) | @@ -783,7 +770,7 @@ CNLLM 内部定义的参数,控制内部执行的行为或策略,不向 API | `max_retries` | `int` | `3` | 最大重试次数 | | `retry_delay` | `float` | `1.0` | 重试延迟(秒) | | `fallback_models`¹ | `dict` | - | 备用模型(仅支持客户端初始化配置),见下方说明 | -| `drop_params` | `str` | `"warn"` | 见 [未知参数处理策略](#255) | +| `drop_params` | `str` | `"warn"` | 见 [未知参数处理策略](#255-未知参数处理策略) | ¹`fallback_models` 模型降级策略: @@ -819,7 +806,7 @@ fallback_models = { | `stop_on_error` | `bool` | `False` | 遇错时停止后续请求,返回已处理结果 | | `callbacks` | `list` | - | 进度回调函数列表 | | `custom_ids` | `list[str]` | - | 自定义请求 ID 列表 | -| `keep` | `set/list` | 见 [字段存储控制](#254) | 迭代后保留的数据字段 | +| `keep` | `set/list` | 见 [字段存储控制](#254-字段存储控制) | 迭代后保留的数据字段 | ## 5. 框架集成 diff --git a/README_en.md b/README_en.md index 83a66ba..d2df6a5 100644 --- a/README_en.md +++ b/README_en.md @@ -124,20 +124,23 @@ Project Documentation: ### 1.1 Installation -#### 1.1.1 SDK Installation +#### 1.1.1 Install as Agent Skill (Recommended) + +CNLLM now provides a dedicated Agent Skill following the Claude Skills / Agent Skills standard. + +**Install the skill**: ```bash -pip install cnllm +npx skills add kanchengw/cnllm-skill ``` -#### 1.1.2 Install as Agent Skill +📖 For full documentation and examples, visit the dedicated skill repository: +https://github.com/kanchengw/cnllm-skill -**One-Click Install**: +#### 1.1.2 SDK Installation ```bash -npx skills add https://github.com/kanchengw/cnllm +pip install cnllm ``` -Or manually copy the `SKILL.md` file from the project root to your agent's skill directory. When **calling Chinese LLMs, CNLLM will be used as the preferred option**. - ### 1.2 Client Initialization #### 1.2.1 Sync Client @@ -266,7 +269,7 @@ In streaming calls, access via `for` loop with **real-time accumulation** for re | **raw**: model native response | `resp.raw` | `Dict` | `{"id": "...", "choices": [...], ...}` | **repr():** -During streaming, displays **real-time merged chunks and accumulated field results**, not the real-time streaming chunks list; does not change the streaming response object type, which is an **iterator** containing all standard streaming chunks. +Displays **real-time field keys aggregation and field value accumulation** in a non-streaming-like **dictionary format**; which does not change the streaming response object type, which is an **iterator** containing all standard streaming chunks. ```python for chunk in resp: print(resp) @@ -326,15 +329,12 @@ BatchResponse outer structure, where each response under `results[request_id]` i ```python { "status": {"elapsed": "3.42s", "success_count": 2, "fail_count": 1, "total": 3}, # Statistics - "usage": {"prompt_tokens": 5, "total_tokens": 5}, # Batch processing total usage info - "errors": {"request_2": "error message"}, # Mapping of all failed requests' request_id and error messages - "results": { # Mapping of all successful requests' request_id and standard responses - "request_0": {...}, - "request_1": {...} - }, + "usage": {"prompt_tokens": 5, "total_tokens": 5}, # Batch processing total usage info + "errors": {"request_2": "error message"}, # Mapping of all failed requests' request_id and error messages + "results": {"request_0": {...}, "request_1": {...}}, # Mapping of all successful requests' request_id and standard responses "think": {"request_0": "...", "request_1": "..."}, "still": {"request_0": "...", "request_1": "..."}, - "tools": {"request_0": [...], "request_1": [...]}, + "tools": {"request_0": {...}, "request_1": {...}}, "raw": {"request_0": {...}, "request_1": {...}} } ``` @@ -426,6 +426,21 @@ resp.to_dict() # Default: keeps vectors field + metadata (status/u resp.to_dict(results=True) # Keeps results field + metadata (status/usage/batch_info) ``` +#### 2.3.3 Embeddings Batch Response Structure + +BatchEmbeddingResponse outer structure, where each response under `results[request_id]` is in **OpenAI standard Embeddings response structure**: + +```python +{ + "status": {"elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2}, + "batch_info": {"batch_size": 2, "batch_count": 2, "dimension": 1024}, + "usage": {"prompt_tokens": 5, "total_tokens": 5}, + "results": {"request_0": {...}, "request_1": {...}} + "errors": {"request_2": "error message"}, + "vectors": {"request_0": [...]} # Mapping of all successful requests' request_id and embedding vectors +} +``` + ### 2.4 Batch Call Control Parameters Batch calls support **retry strategy, concurrency control** parameter configuration: @@ -544,10 +559,7 @@ Use `drop_params` to control the handling behavior of **incompatible parameters **Notes:** - When doing batch calls, if global parameters contain unknown parameters, `drop_params="strict"` directly throws an exception without actually starting the batch task; -If a single request within the batch task contains unknown parameters, `drop_params="strict"` directly puts that request into the `errors` field without actually executing that request, and continues executing subsequent batch tasks. - -- Specifically, when configured with `drop_params="strict"` and `stop_on_error=True`, the first error encountered in batch requests immediately interrupts the batch task while returning already processed request results. See [Stop on Error](#253-stop-on-error). -- The `drop_params` parameter supports client configuration and all calling methods (including `create` single-call method). +- If a single request within the batch task contains unknown parameters, `drop_params="strict"` directly puts that request into the `errors` field without actually executing that request, and continues executing subsequent batch tasks. ## 3. CNLLM Standard Response Format @@ -638,7 +650,7 @@ Note: Not all supported models support all request parameters. Please refer to v | Parameter | Type | Default | Description | | ------------------- | ------------------------------- | ------------------------------- | ------------------------------------------------------ | -| `model` | `str` | - | Model name, required at client initialization, can be overridden at call entry | +| `model` | `str` | - | Model name, see [Supported Models](#supported-models)| | `api_key` | `str` | - | API key | | `base_url` | `str` | Auto-adapted | Customizable API address | | `messages` | `list[dict]`/`list[list[dict]]` | - | `chat()` input parameter, supports context management/image recognition (call entry configuration only) | diff --git a/SKILL.md b/SKILL.md index a6f6835..3068ddb 100644 --- a/SKILL.md +++ b/SKILL.md @@ -1,371 +1,11 @@ ---- -name: cnllm-chinese-llm-adapter -version: 1.0.4 -description: >- - 为中文大模型定制的通用增强 SDK / unified adapter for Chinese LLMs: DeepSeek, - GLM/Zhipu (智谱), KIMI/Moonshot (月之暗面), MiniMax (稀宇), - Doubao/ByteDance (豆包/字节), Xiaomi mimo (小米). - 统一接口替代多厂商自研 SDK 或 OpenAI SDL/LiteLLM,并将模型响应封装为 OpenAI 标准格式响应。 - 深度适配 LangChain,LlamaIndex,Autoogen,Haystack,Deepeval 等 LLM 应用框架。 - CNLLM 支持模型的原生参数和功能,解决传参可能静默失效或缺少参数支持,提供高透明度和响应可控性。 - 支持同步/异步、流式/非流式、批量/非批量及混合流式策略的批量 chat 和 embeddings 调用。 - 提供纯净回复、推理内容、工具调用快捷访问入口,无需额外解析。 - 工程化能力:参数验证、未知参数处理策略控制、备用模型自动降级等,提高模型调用的稳定性和可靠性。 - 批量高级功能:单个请求独立配置、实时进度、可配置的失败策略和内存控制、回调、自定义索引。 ---- +# Agent Integration -# CNLLM: Chinese LLM Unified Adapter - -## When to Use - -- Calling **DeepSeek**, **GLM (Zhipu)**, **KIMI (Moonshot)**, **MiniMax**, **Doubao (ByteDance)**, or **Xiaomi** models -- **Multi-model workflows** where different stages use different models (e.g., embedding with MiniMax → reasoning with DeepSeek-reasoner → generation with GLM in a single LangChain pipeline) -- **Multi-model evaluation / LLM-as-Judge** — same input sent to multiple models in one batch call to compare or score outputs -- **Transparency & control** — CNLLM's YAML-driven config explicitly declares every supported param, its mapping, and its behavior. Unlike OpenAI-compatible interfaces where `extra_body` params may silently fail with no feedback, CNLLM gives you deterministic, visible parameter handling -- **Streaming lifecycle inspection** with real-time access to automated content accumulation (`.think`, `.still`, `.tools` properties). -- **Multi-model fallback** for production resilience — auto-retry with different providers on failure -- **Batch data processing** — high-throughput labeling, classification, translation, or synthetic data generation. CNLLM supports **real-time progress monitoring** (request_counts updated per-request), **per-request independent configuration** (different model/params/thinking for each item), **real-time progress statitic**, **configurable failure policy** and **memory mannagement**, **callbacks**, **custom IDs** — features the OpenAI batch API does not offer -- Project uses **LangChain**, **LlamaIndex**, or **LiteLLM** and needs Chinese LLM integration -- Codebase already has `cnllm` in dependencies (check `requirements.txt`, `pyproject.toml`, or existing imports) - -## Anti-Patterns (Do NOT) - -- **Do NOT** use vendor-native SDKs (`zhipuai`, `dashscope`, `volcengine`) — CNLLM handles format conversion -- **Do NOT** implement custom retry/fallback — use CNLLM's built-in `fallback_models` -- **Do NOT** write vendor-specific error parsing — use CNLLM's OpenAI-style exceptions - -## Installation +CNLLM now provides a dedicated Agent Skill following the Claude Skills / Agent Skills standard. +📦 **Install the skill**: ```bash -pip install cnllm -``` - -## Import - -```python -from cnllm import CNLLM -``` - -## Quick Reference - -### 1. Basic Chat - -```python -from cnllm import CNLLM - -client = CNLLM(model="deepseek-chat", api_key="your_key") -# Standard OpenAI-style messages -resp = client.chat.create( - messages=[{"role": "user", "content": "Hello"}] -) -print(resp.choices[0].message.content) - -# Prompt shortcut (single user message) -resp = client.chat.create(prompt="Hello", stream=True) -``` - -### 2. Streaming with Thinking Content - -```python -client = CNLLM(model="deepseek-reasoner", api_key="your_key") -resp = client.chat.create( - messages=[{"role": "user", "content": "Think step by step"}], - stream=True -) -for chunk in resp: - # Real-time accumulated access during streaming: - pass - -# After iteration, access accumulated results: -resp.think # str — reasoning/thinking content -resp.still # str — final response -resp.tools # dict — accumulated tool_calls -resp.raw # dict — raw vendor response -``` - -### 4. Multi-Model Fallback - -```python -client = CNLLM( - model="deepseek-chat", - api_key="primary_key", - fallback_models={ - "glm-4.7-flash": {"api_key": "glm_key"}, # with different key - "deepseek-reasoner": {"api_key": "ds_key", "base_url": "https://api.deepseek.com/v1"}, # with key + endpoint - } -) -# No model argument → triggers FallbackManager -resp = client.chat.create(prompt="Hello") -# Auto-falls back if primary fails; raises FallbackError if all fail -``` - -### 5. Batch Processing - -```python -# Simple batch — same params for all -resp = client.chat.batch( - prompt=["Hello", "How are you?", "What is AI?"], - stream=True -) -print(resp.still["request_0"]) # per-request response text -print(resp.status) # real-time success/fail/total/elapsed - -# Per-request config -resp = client.chat.batch( - requests=[ - {"prompt": "Hi", "model": "deepseek-chat", "thinking": True}, - {"prompt": "1+1=", "model": "glm-4.7-flash"}, - ], - max_concurrent=3 -) - -# Advanced: callbacks + custom_ids + stop-on-error -def on_complete(request_id, status): - print(f"[{request_id}] {status}") - -resp = client.chat.batch( - prompt=["Task A", "Task B", "Task C"], - custom_ids=["job_001", "job_002", "job_003"], - callbacks=[on_complete], - stop_on_error=True, - max_concurrent=5, - timeout=60 -) -``` - -### 6. Embeddings - -```python -# Single -resp = client.embeddings.create(input="Hello world") - -# Batch -resp = client.embeddings.batch( - input=["Hello", "world", "你好"], - custom_ids=["doc_1", "doc_2", "doc_3"] -) -``` - -### 7. LangChain Runnable - -```python -from cnllm.core.framework import LangChainRunnable -from langchain_core.prompts import ChatPromptTemplate - -client = CNLLM(model="deepseek-chat", api_key="your_key") -runnable = LangChainRunnable(client) - -chain = ChatPromptTemplate.from_messages([ - ("system", "You are helpful"), - ("human", "{input}") -]) | runnable - -resp = chain.invoke({"input": "Hello"}) # sync -for chunk in chain.stream({"input": "Count to 5"}): # streaming - print(chunk.content, end="") -import asyncio -asyncio.run(chain.ainvoke({"input": "Hi"})) # async -``` - -### 7.1. LlamaIndex - -```python -from cnllm import CNLLM -from llama_index.core.llms import ChatMessage, MessageRole - -client = CNLLM(model="deepseek-chat", api_key="your_key") -resp = client.chat.create(prompt="Introduce yourself") - -msg = ChatMessage(role=MessageRole.ASSISTANT, content=resp.still) -print(msg.content) -``` - -### 7.2. AutoGen - -```python -from cnllm import CNLLM -from autogen_agentchat.messages import TextMessage - -client = CNLLM(model="deepseek-chat", api_key="your_key") -resp = client.chat.create(prompt="1+1=?") - -msg = TextMessage(content=resp.still, source="assistant") -print(msg.content) -``` - -### 7.3. Haystack - -```python -from cnllm import CNLLM -from haystack import Document -from haystack.dataclasses import ChatMessage - -client = CNLLM(model="deepseek-chat", api_key="your_key") - -text = "CNLLM is a Chinese LLM adapter" -resp = client.embeddings.create(input=text) -doc = Document(content=text, embedding=resp["data"][0]["embedding"]) - -resp = client.chat.create(prompt="1+1=?") -msg = ChatMessage.from_assistant(resp.still) -print(msg.text) -``` - -### 7.4. DeepEval - -```python -from cnllm import CNLLM -from deepeval.test_case import LLMTestCase - -client = CNLLM(model="deepseek-chat", api_key="your_key") -resp = client.chat.create(messages=[{"role": "user", "content": "1+1=?"}]) - -test_case = LLMTestCase( - input="1+1=?", actual_output=resp.still, expected_output="2", -) -print(test_case.actual_output) -``` - -### 8. Asynchronous Client - -```python -from cnllm import asyncCNLLM -import asyncio - -async def main(): - client = asyncCNLLM(model="deepseek-chat", api_key="your_key") - # Use await for all asyncCNLLM methods - resp = await client.chat.create(prompt="Hello", stream=True) - async for chunk in resp: - print(chunk) - -asyncio.run(main()) -``` - -### 9. Context Management - -```python -# Persistent — close manually -client = CNLLM(model="deepseek-chat", api_key="key") -resp = client.chat.create(prompt="Hello") -client.close() - -# Temporary — auto-closes -with CNLLM(model="deepseek-chat", api_key="key") as client: - resp = client.chat.create(prompt="Hello") -``` - -## Response Reference - -### Chat Completion Response - -```python -resp = client.chat.create(messages=[...]) - -# Direct CNLLM accessors (preferred): -resp.still # str — response content -resp.think # str — reasoning/thinking content (if any) -resp.tools # dict — tool_calls (if any) -resp.raw # dict — full raw vendor response in OpenAI-compatible format -``` - -### Streaming Access - -```python -resp = client.chat.create(messages=[...], stream=True) -for chunk in resp: - # Real-time accumulated access during streaming: - pass - -# After/during iteration, same accessors on the response: -resp.still # str — accumulated response content -resp.think # str — accumulated reasoning content -resp.tools # dict — accumulated tool_calls -resp.raw # dict — accumulated raw vendor response -``` - -### Batch Response (Chat & Embeddings) - -```python -resp = client.chat.batch(prompt=[...]) -# Also accessible via client.chat.batch_result.* in either sync/async - -# Top-level fields: -resp.status # dict — {success_count, fail_count, total, elapsed} -resp.errors # dict — {request_id: error_msg} -resp.usage # dict — {prompt_tokens, completion_tokens, total_tokens} - -# Per-request access (chat): -resp.results["request_0"] # OpenAI-format response per request -resp.think["request_0"] # reasoning content (chat only) -resp.still["request_0"] # response text (chat only) -resp.tools["request_0"] # tool_calls (chat only) -resp.raw["request_0"] # raw vendor response - -# Embeddings-only extra: -resp.batch_info["dimension"] # int — embedding dimension -``` - -### Error Handling - -```python -from cnllm import CNLLMError, AuthenticationError, RateLimitError, \ - TimeoutError, NetworkError, ServerError, InvalidRequestError, \ - ContentFilteredError, ModelNotSupportedError, FallbackError -try: - resp = client.chat.create(prompt="Hi") -except RateLimitError: - # handle rate limit -except ContentFilteredError: - # sensitive content detected -except FallbackError: - # all fallback models failed -``` - -## Supported Vendors - -| Vendor | Chat Models | Embeddings Models | -|-------------|-----------------------------------------------------------------------------|------------------------------------| -| **DeepSeek** | deepseek-chat, deepseek-reasoner, deepseek-v4-pro, deepseek-v4-flash | — | -| **KIMI** | kimi-k2.6, kimi-k2.5, moonshot-v1-8k/32k/128k, moonshot-v1-vision-preview | — | -| **GLM** | glm-4.6, glm-4.7, glm-4.7-flash, glm-4.7-flashx, glm-5, glm-5.1, glm-4.5 series, glm-5v-turbo, glm-4.5v, glm-4.6v, glm-4.6v-flash | embedding-2, embedding-3, embedding-3-pro | -| **MiniMax** | MiniMax-M2, MiniMax-M2.1, MiniMax-M2.5, MiniMax-M2.5-highspeed, MiniMax-M2.7, MiniMax-M2.7-highspeed | embo-01 | -| **Doubao** | doubao-seed-2-0-pro-260215 (doubao-seed-2-0-pro), doubao-seed-2-0-mini-260215 (doubao-seed-2-0-mini), doubao-seed-2-0-lite-260215 (doubao-seed-2-0-lite), doubao-seed-2-0-code-preview-260215 (doubao-seed-2-0-code), doubao-seed-1-8-251228 (doubao-seed-1-8), doubao-seed-1-6-251015 (doubao-seed-1-6), doubao-seed-1-6-flash-250828 (doubao-seed-1-6-flash), doubao-seed-1-6-vision-250815 (doubao-seed-1-6-vision), doubao-1-5-vision-pro-32k-250115 (doubao-1-5-vision-pro), doubao-seed-1-5-lite-32k-250115 (doubao-seed-1-5-lite), doubao-seed-1-5-pro-32k-250115 (doubao-seed-1-5-pro-32k), doubao-seed-1-5-pro-256k-250115 (doubao-seed-1-5-pro) | — | -| **Xiaomi** | mimo-v2-pro, mimo-v2-omni, mimo-v2-flash, mimo-v2.5-pro, mimo-v2.5 | — | - -## Key Parameters - -All **OpenAI-standard parameters** are supported: `temperature`, `max_tokens`, `max_completion_tokens`, `top_p`, `tools`, `tool_choice`, `thinking`, `response_format`, `stop`, `presence_penalty`, `frequency_penalty`, `user`, `timeout`, `max_retries`. - -Two notable CNLLM extensions: -- **`thinking`**: `True`/`False`/`"auto"` — controls reasoning/thinking. Maps to each vendor's native thinking param -- **`fallback_models`**: dict of `{model_name: {"api_key": ..., "base_url": ...}}` — each fallback model has its own api_key (required) and optional base_url. Only active when `chat.create()` is called without a `model` argument - -**Batch-specific parameters** (set at the batch level, not per-request): -- **`max_concurrent`**: `int` — max concurrent requests (default: 3 for chat, 12 for embeddings) -- **`rps`**: `float` — requests per second rate limit -- **`timeout`**: `int` — per-request timeout in seconds (default: 30) -- **`max_retries`**: `int` — max retry attempts on failure (default: 3) -- **`retry_delay`**: `float` — delay between retries in seconds (default: 1.0) -- **`custom_ids`**: `list[str]` — meaningful request IDs for each input -- **`callbacks`**: `list[callable]` — invoked on each request completion for real-time tracking -- **`stop_on_error`**: `bool` — if True, halts all remaining requests on first failure - -Parameters passable at init (shared across calls) or overridden per-call: - -```python -client = CNLLM(model="...", api_key="...", temperature=0.7) -resp = client.chat.create(prompt="Hi", temperature=0.3) # overrides -``` - -## Error Handling (OpenAI-style) - -```python -from cnllm import ( - CNLLMError, AuthenticationError, RateLimitError, TimeoutError, - NetworkError, ServerError, InvalidRequestError, ContentFilteredError, - ModelNotSupportedError, FallbackError, TokenLimitError -) +npx skills add kanchengw/cnllm-skill ``` -# \ No newline at end of file +📖 For full documentation and examples, visit the dedicated skill repository: +https://github.com/kanchengw/cnllm-skill \ No newline at end of file