kanchengw · kanchengw · May 26, 2026 · May 26, 2026
diff --git a/.gitignore b/.gitignore
@@ -3,7 +3,7 @@ files/
 outputs/
 test_venv/
 backup*/
-demo/
+resources/
 
 # Python cache
 __pycache__/

diff --git a/AGENTS.md b/AGENTS.md
@@ -1,84 +1,11 @@
-# CLAUDE.md
+# Agent Integration
 
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-## Project Overview
-
-CNLLM is a unified adapter layer that translates Chinese LLM vendor APIs (MiniMax, DeepSeek, KIMI, Doubao, GLM, Xiaomi) into OpenAI-compatible request/response formats, enabling seamless integration with LangChain, LlamaIndex, and other OpenAI-compatible frameworks.
-
-## Running Tests
+CNLLM now provides a dedicated Agent Skill following the Claude Skills / Agent Skills standard.
 
+📦 **Install the skill**:
 ```bash
-# Unit tests only (no API keys needed)
-pytest tests/test_*.py
-
-# All tests including integration tests (require API keys in env vars)
-pytest tests/
-
-# Single test file
-pytest tests/test_adapter_config.py -v
-```
-
-API-key-dependent tests live in `tests/key_needed/` and are gated by the presence of environment variables (e.g., `MINIMAX_API_KEY`, `XIAOMI_API_KEY`).
-
-## Architecture
-
+npx skills add kanchengw/cnllm-skill
 ```
-CNLLM (client) → ChatNamespace / EmbeddingsNamespace → create/batch
-  → BaseAdapter._build_payload()  [YAML-driven field mapping]
-  → BaseHttpClient (httpx)        [HTTP execution]
-  → Responder.to_openai_format()  [vendor response → OpenAI format]
-  → Accumulator                   [field accumulation, stream handling]
-```
-
-### Three-component vendor pattern
-
-Each vendor in `cnllm/core/vendor/{vendor}.py` implements three classes:
-
-1. **`{Vendor}Adapter(BaseAdapter)`** — builds request payload (`_build_payload`), performs format conversion (`_to_openai_format`, `_do_to_openai_stream_format`), and registers with `_register()`
-2. **`{Vendor}Responder(Responder)`** — maps vendor response fields to OpenAI standard fields via `configs/{vendor}/response_{vendor}.yaml`; usually just sets `CONFIG_DIR`
-3. **`{Vendor}VendorError(VendorError)`** — parses vendor-specific error responses via `from_response()` and registers with `VendorErrorRegistry.register()`
-
-The vendor module is also where you place any subclass overrides for logic that can't be expressed in YAML (e.g., MiniMax's stream chunk dedup, Xiaomi's `thinking` transform).
-
-### YAML-driven request/response mapping
-
-**Request config** (`configs/{vendor}/request_{vendor}.yaml`) drives:
-- `required_fields` — mandatory parameters and validation
-- `optional_fields` — optional parameters, including field name `map` (rename), `transform` (value conversion), `skip` (exclude from body, e.g. for headers)
-- `model_mapping` — short model alias → vendor model name
-- `error_check` — vendor error code → CNLLM exception type mapping
-
-**Response config** (`configs/{vendor}/response_{vendor}.yaml`) drives:
-- `fields` — vendor response path → OpenAI field path (e.g. `"content": "choices[0].message.content"`)
-- `stream_fields` — same for streaming chunks (`content_path`, `tool_calls_path`, `reasoning_content_path`)
-- `defaults` — fallback values when vendor omits fields
-- `error_check` — sensitive content detection paths
-
-The parameter processing order is: `resolve_default` (read scope defaults) → `validate_for_scope` (PARAM_REGISTRY + YAML + drop_params) → `_validate_one_of` → `_check_image_support` → `_build_payload` (YAML field mapping + get_vendor_model) → `get_base_url` + `get_api_path` → `get_header_mappings`.
-
-### Field accumulation
-
-Streaming responses accumulate into `adapter._cnllm_extra`:
-- `_thinking` — raw reasoning/thinking content
-- `_still` — cleaned final response content
-- `_tools` — accumulated tool_calls
-
-Accessible via `client.chat.think`, `client.chat.still`, `client.chat.tools`, `client.chat.raw`.
-
-### FallbackManager
-
-`FallbackManager` is only invoked when `chat.create()` is called **without** a `model` argument (or with `model=""`). If the primary model fails, it iterates through `fallback_models` in order. If a model is passed directly to `chat.create()`, no fallback occurs.
-
-### Sync/async relationship
-
-`CNLLM` (sync) holds an internal `AsyncCNLLM` engine and delegates async operations to it. The `LangChainRunnable` integration uses the async engine directly.
-
-## Adding a New Vendor
-
-1. Create `configs/{vendor}/request_{vendor}.yaml` and `configs/{vendor}/response_{vendor}.yaml`
-2. Create `cnllm/core/vendor/{vendor}.py` with the three-component pattern; call `{Vendor}Adapter._register()` at the bottom
-3. Add model alias → vendor name mappings to the YAML `model_mapping.chat` section
-4. Write tests: unit tests in `tests/test_*.py` (no API key), integration tests in `tests/key_needed/` (with key assignment at the top: `MODEL = "..."; API_KEY = os.getenv("...")`)
 
-Full walkthrough: see `docs/CONTRIBUTOR.md`.
+📖 For full documentation and examples, visit the dedicated skill repository:
+https://github.com/kanchengw/cnllm-skill
diff --git a/README.md b/README.md
@@ -124,20 +124,23 @@ CNLLM 为中文大模型提供了一个**统一的 OpenAI 兼容接口层**与
 
 ### 1.1 安装
 
-#### 1.1.1 SDK 安装
+#### 1.1.1 作为 Agent Skill 安装 （推荐）
+
+CNLLM 遵循 Claude Skills 规范提供标准 Agent Skill。
+
+**安装 Skill**：
 ```bash
-pip install cnllm
+npx skills add kanchengw/cnllm-skill
 ```
 
-#### 1.1.2 作为 Agent Skill 安装
+📖 完整文档和示例，请访问 CNLLM Skill 仓库：
+https://github.com/kanchengw/cnllm-skill
 
-**一键安装**：
+#### 1.1.2 SDK 安装
 ```bash
-npx skills add https://github.com/kanchengw/cnllm
+pip install cnllm
 ```
 
-或手动将项目根目录的 `SKILL.md` 文件复制到 Agent 的技能目录下，在**调用中文大模型时， 会优先使用 CNLLM**。
-
 ### 1.2 客户端初始化
 
 #### 1.2.1 同步客户端
@@ -266,7 +269,7 @@ print(resp.raw)  # 完整累积后的模型原生响应
 | **raw**: 模型原生响应            | `resp.raw`   | `List[Dict]`      | `[模型原生流式 chunks 列表]`           |
 
 **repr():** 
-流式调用中，展示**chunks 合并和字段累积的实时结果**，而非流式 chunks 列表；不改变流式响应对象类型，即包含所有标准流式 chunks 的**迭代器**。
+以类似非流式响应的**字典结构**展示流式响应的**字段名聚合和字段值累积的实时结果**；不改变流式响应对象类型，即包含所有标准流式 chunks 的**迭代器**。
 ```python
 for chunk in resp:
     print(resp)
@@ -275,7 +278,7 @@ for chunk in resp:
 
 ### 2.2 chat completions 批量调用
 
-可通过`prompt`和`messages`参数输入并快速配置全局参数，也可以通过`requests`参数为单个请求进行独立配置。
+可通过 `prompt` 和 `messages` 参数输入并快速配置全局参数，也可以通过 `requests` 参数为单个请求进行独立配置。
 
 **prompt 参数：**
 
@@ -325,15 +328,12 @@ BatchResponse 外层结构，其中 `results[request_id]` 字段下的每条响
 ```python
 {
     "status": {"elapsed": "3.42s", "success_count": 2, "fail_count": 1, "total": 3},  # 统计信息
-    "usage": {"prompt_tokens": 5, "total_tokens": 5},  # 批处理的总用量信息
-    "errors": {"request_2": "error message"},  # 所有失败请求的 request_id 和错误信息映射
-    "results": {     # 所有成功请求的 request_id 和标准响应映射
-        "request_0": {...}, 
-        "request_1": {...}  
-    },
+    "usage": {"prompt_tokens": 5, "total_tokens": 5},     # 批处理的总用量信息
+    "errors": {"request_2": "error message"},             # 所有失败请求的 request_id 和错误信息映射
+    "results": {"request_0": {...}, "request_1": {...}},  # 所有成功请求的 request_id 和标准响应映射
     "think": {"request_0": "...", "request_1": "..."},
     "still": {"request_0": "...", "request_1": "..."},
-    "tools": {"request_0": [...], "request_1": [...]},
+    "tools": {"request_0": {...}, "request_1": {...}},
     "raw": {"request_0": {...}, "request_1": {...}}
 }
 ```
@@ -429,22 +429,12 @@ BatchEmbeddingResponse 外层结构，其中 `results[request_id]` 字段下每
 
 ```python
 {   
-    "status": {
-        "elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2
-    },
-    "batch_info": {
-        "batch_size": 2, "batch_count": 2, "dimension": 1024
-    },
+    "status": {"elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2},
+    "batch_info": {"batch_size": 2, "batch_count": 2, "dimension": 1024},
     "usage": {"prompt_tokens": 5, "total_tokens": 5},
-    "errors": {"request_1": "error message"},
-    "results": {
-        "request_0": {
-            "object": "list",
-            "data": [{"object": "embedding","embedding": [0.1, 0.2, ...], "index": 0}],
-            "model": "embedding-2"
-        }
-    }
-    "vectors": {"request_0": [...]}
+    "results": {"request_0": {...}, "request_1": {...}}
+    "errors": {"request_2": "error message"},
+    "vectors": {"request_0": [...]}    # 所有成功请求的 request_id 和嵌入向量映射
 }
 ```
 
@@ -622,11 +612,8 @@ client = CNLLM(..., keep=["vectors"])
 | 静默忽略模式   | `drop_params="ignore"` | 静默丢弃未知参数，不产生任何日志              |
 
 **说明：**
--进行批量调用时，若全局参数中包含未知参数，`drop_params="strict"` 直接抛出异常，不实际启动批量任务；
-若批量任务中的单个请求包含未知参数，`drop_params="strict"` 直接将该请求归入 `errors` 字段，不实际执行该请求，并继续执行后续的批量任务。
-
-- 特别地，当配置`drop_params="strict"` 且 `stop_on_error=True` 时，批量请求中遭遇第一个错误时会立即中断批量任务，同时返回已处理的请求结果，详见 [遇错停止](#253-遇错停止)。
-- `drop_params` 参数支持客户端配置以及所有调用方式（包括 `create` 单条调用方式）。
+- 进行批量调用时，若全局参数中包含未知参数，`drop_params="strict"` 直接抛出异常，不实际启动批量任务；
+- 若批量任务中的单个请求包含未知参数，`drop_params="strict"` 直接将该请求归入 `errors` 字段，不实际执行该请求，并继续执行后续的批量任务。
 
 ## 3. CNLLM 标准响应格式
 
@@ -717,7 +704,7 @@ CNLLM 请求参数与**OpenAI 标准参数**基本一致，覆盖范围基于国
 
 | 参数                  | 类型                              | 默认值                             | 说明                                                     | 
 | ------------------- | ------------------------------- | ------------------------------- | ------------------------------------------------------ | 
-| `model`             | `str`                           | -                               | 模型名称，客户端初始化必填，调用入口可覆盖         | 
+| `model`             | `str`                           | -                               | 模型名称，模型名见[支持的模型](#支持的模型)       | 
 | `api_key`           | `str`                           | -                               | API 密钥                                                 | 
 | `base_url`          | `str`                           | 自动适配                            | 可自定义 API 地址                                            | 
 | `messages`          | `list[dict]`/`list[list[dict]]` | -                               | `chat()` 输入参数，支持上下文管理/图片识别（仅支持调用入口配置）                           | 
@@ -783,7 +770,7 @@ CNLLM 内部定义的参数，控制内部执行的行为或策略，不向 API
 | `max_retries`     | `int`   | `3`      | 最大重试次数             |
 | `retry_delay`     | `float` | `1.0`    | 重试延迟（秒）            |
 | `fallback_models`¹ | `dict`  | -        | 备用模型（仅支持客户端初始化配置），见下方说明 |
-| `drop_params`     | `str`   | `"warn"` | 见 [未知参数处理策略](#255) |
+| `drop_params`     | `str`   | `"warn"` | 见 [未知参数处理策略](#255-未知参数处理策略) |
 
 ¹`fallback_models` 模型降级策略：
 
@@ -819,7 +806,7 @@ fallback_models = {
 | `stop_on_error`  | `bool`      | `False`                      | 遇错时停止后续请求，返回已处理结果     |
 | `callbacks`      | `list`      | -                            | 进度回调函数列表              |
 | `custom_ids`     | `list[str]` | -                            | 自定义请求 ID 列表           |
-| `keep`           | `set/list`  | 见 [字段存储控制](#254)             | 迭代后保留的数据字段            |
+| `keep`           | `set/list`  | 见 [字段存储控制](#254-字段存储控制)             | 迭代后保留的数据字段            |
 
 ## 5. 框架集成
 

diff --git a/README_en.md b/README_en.md
@@ -124,20 +124,23 @@ Project Documentation:
 
 ### 1.1 Installation
 
-#### 1.1.1 SDK Installation
+#### 1.1.1 Install as Agent Skill (Recommended)
+
+CNLLM now provides a dedicated Agent Skill following the Claude Skills / Agent Skills standard.
+
+**Install the skill**:
 ```bash
-pip install cnllm
+npx skills add kanchengw/cnllm-skill
 ```
 
-#### 1.1.2 Install as Agent Skill
+📖 For full documentation and examples, visit the dedicated skill repository:
+https://github.com/kanchengw/cnllm-skill
 
-**One-Click Install**:
+#### 1.1.2 SDK Installation
 ```bash
-npx skills add https://github.com/kanchengw/cnllm
+pip install cnllm
 ```
 
-Or manually copy the `SKILL.md` file from the project root to your agent's skill directory. When **calling Chinese LLMs, CNLLM will be used as the preferred option**.
-
 ### 1.2 Client Initialization
 
 #### 1.2.1 Sync Client
@@ -266,7 +269,7 @@ In streaming calls, access via `for` loop with **real-time accumulation** for re
 | **raw**: model native response | `resp.raw`   | `Dict`            | `{"id": "...", "choices": [...], ...}`           |
 
 **repr():** 
-During streaming, displays **real-time merged chunks and accumulated field results**, not the real-time streaming chunks list; does not change the streaming response object type, which is an **iterator** containing all standard streaming chunks.
+Displays **real-time field keys aggregation and field value accumulation** in  a non-streaming-like **dictionary format**; which does not change the streaming response object type, which is an **iterator** containing all standard streaming chunks.
 ```python
 for chunk in resp:
     print(resp)
@@ -326,15 +329,12 @@ BatchResponse outer structure, where each response under `results[request_id]` i
 ```python
 {
     "status": {"elapsed": "3.42s", "success_count": 2, "fail_count": 1, "total": 3},  # Statistics
-    "usage": {"prompt_tokens": 5, "total_tokens": 5},  # Batch processing total usage info
-    "errors": {"request_2": "error message"},  # Mapping of all failed requests' request_id and error messages
-    "results": {     # Mapping of all successful requests' request_id and standard responses
-        "request_0": {...},
-        "request_1": {...}
-    },
+    "usage": {"prompt_tokens": 5, "total_tokens": 5},     # Batch processing total usage info
+    "errors": {"request_2": "error message"},             # Mapping of all failed requests' request_id and error messages
+    "results": {"request_0": {...}, "request_1": {...}},  # Mapping of all successful requests' request_id and standard responses
     "think": {"request_0": "...", "request_1": "..."},
     "still": {"request_0": "...", "request_1": "..."},
-    "tools": {"request_0": [...], "request_1": [...]},
+    "tools": {"request_0": {...}, "request_1": {...}},
     "raw": {"request_0": {...}, "request_1": {...}}
 }
 ```
@@ -426,6 +426,21 @@ resp.to_dict()               # Default: keeps vectors field + metadata (status/u
 resp.to_dict(results=True)   # Keeps results field + metadata (status/usage/batch_info)
 ```
 
+#### 2.3.3 Embeddings Batch Response Structure
+
+BatchEmbeddingResponse outer structure, where each response under `results[request_id]` is in **OpenAI standard Embeddings response structure**:
+
+```python
+{   
+    "status": {"elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2},
+    "batch_info": {"batch_size": 2, "batch_count": 2, "dimension": 1024},
+    "usage": {"prompt_tokens": 5, "total_tokens": 5},
+    "results": {"request_0": {...}, "request_1": {...}}
+    "errors": {"request_2": "error message"},
+    "vectors": {"request_0": [...]}    # Mapping of all successful requests' request_id and embedding vectors
+}
+```
+
 ### 2.4 Batch Call Control Parameters
 
 Batch calls support **retry strategy, concurrency control** parameter configuration:
@@ -544,10 +559,7 @@ Use `drop_params` to control the handling behavior of **incompatible parameters
 
 **Notes:**
 - When doing batch calls, if global parameters contain unknown parameters, `drop_params="strict"` directly throws an exception without actually starting the batch task;
-If a single request within the batch task contains unknown parameters, `drop_params="strict"` directly puts that request into the `errors` field without actually executing that request, and continues executing subsequent batch tasks.
-
-- Specifically, when configured with `drop_params="strict"` and `stop_on_error=True`, the first error encountered in batch requests immediately interrupts the batch task while returning already processed request results. See [Stop on Error](#253-stop-on-error).
-- The `drop_params` parameter supports client configuration and all calling methods (including `create` single-call method).
+- If a single request within the batch task contains unknown parameters, `drop_params="strict"` directly puts that request into the `errors` field without actually executing that request, and continues executing subsequent batch tasks.
 
 ## 3. CNLLM Standard Response Format
 
@@ -638,7 +650,7 @@ Note: Not all supported models support all request parameters. Please refer to v
 
 | Parameter | Type | Default | Description |
 | ------------------- | ------------------------------- | ------------------------------- | ------------------------------------------------------ |
-| `model`             | `str`                           | -                               | Model name, required at client initialization, can be overridden at call entry         |
+| `model`             | `str`                           | -                               | Model name, see [Supported Models](#supported-models)|
 | `api_key`           | `str`                           | -                               | API key                                                 |
 | `base_url`          | `str`                           | Auto-adapted                            | Customizable API address                                            |
 | `messages`          | `list[dict]`/`list[list[dict]]` | -                               | `chat()` input parameter, supports context management/image recognition (call entry configuration only)                           |