diff --git a/.gitignore b/.gitignore
index 5191c3f..844a1b2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,7 +3,7 @@ files/
 outputs/
 test_venv/
 backup*/
-demo/
+resources/
 
 # Python cache
 __pycache__/
diff --git a/AGENTS.md b/AGENTS.md
index ba24d00..3068ddb 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,84 +1,11 @@
-# CLAUDE.md
+# Agent Integration
 
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-## Project Overview
-
-CNLLM is a unified adapter layer that translates Chinese LLM vendor APIs (MiniMax, DeepSeek, KIMI, Doubao, GLM, Xiaomi) into OpenAI-compatible request/response formats, enabling seamless integration with LangChain, LlamaIndex, and other OpenAI-compatible frameworks.
-
-## Running Tests
+CNLLM now provides a dedicated Agent Skill following the Claude Skills / Agent Skills standard.
 
+📦 **Install the skill**:
 ```bash
-# Unit tests only (no API keys needed)
-pytest tests/test_*.py
-
-# All tests including integration tests (require API keys in env vars)
-pytest tests/
-
-# Single test file
-pytest tests/test_adapter_config.py -v
-```
-
-API-key-dependent tests live in `tests/key_needed/` and are gated by the presence of environment variables (e.g., `MINIMAX_API_KEY`, `XIAOMI_API_KEY`).
-
-## Architecture
-
+npx skills add kanchengw/cnllm-skill
 ```
-CNLLM (client) → ChatNamespace / EmbeddingsNamespace → create/batch
-  → BaseAdapter._build_payload()  [YAML-driven field mapping]
-  → BaseHttpClient (httpx)        [HTTP execution]
-  → Responder.to_openai_format()  [vendor response → OpenAI format]
-  → Accumulator                   [field accumulation, stream handling]
-```
-
-### Three-component vendor pattern
-
-Each vendor in `cnllm/core/vendor/{vendor}.py` implements three classes:
-
-1. **`{Vendor}Adapter(BaseAdapter)`** — builds request payload (`_build_payload`), performs format conversion (`_to_openai_format`, `_do_to_openai_stream_format`), and registers with `_register()`
-2. **`{Vendor}Responder(Responder)`** — maps vendor response fields to OpenAI standard fields via `configs/{vendor}/response_{vendor}.yaml`; usually just sets `CONFIG_DIR`
-3. **`{Vendor}VendorError(VendorError)`** — parses vendor-specific error responses via `from_response()` and registers with `VendorErrorRegistry.register()`
-
-The vendor module is also where you place any subclass overrides for logic that can't be expressed in YAML (e.g., MiniMax's stream chunk dedup, Xiaomi's `thinking` transform).
-
-### YAML-driven request/response mapping
-
-**Request config** (`configs/{vendor}/request_{vendor}.yaml`) drives:
-- `required_fields` — mandatory parameters and validation
-- `optional_fields` — optional parameters, including field name `map` (rename), `transform` (value conversion), `skip` (exclude from body, e.g. for headers)
-- `model_mapping` — short model alias → vendor model name
-- `error_check` — vendor error code → CNLLM exception type mapping
-
-**Response config** (`configs/{vendor}/response_{vendor}.yaml`) drives:
-- `fields` — vendor response path → OpenAI field path (e.g. `"content": "choices[0].message.content"`)
-- `stream_fields` — same for streaming chunks (`content_path`, `tool_calls_path`, `reasoning_content_path`)
-- `defaults` — fallback values when vendor omits fields
-- `error_check` — sensitive content detection paths
-
-The parameter processing order is: `resolve_default` (read scope defaults) → `validate_for_scope` (PARAM_REGISTRY + YAML + drop_params) → `_validate_one_of` → `_check_image_support` → `_build_payload` (YAML field mapping + get_vendor_model) → `get_base_url` + `get_api_path` → `get_header_mappings`.
-
-### Field accumulation
-
-Streaming responses accumulate into `adapter._cnllm_extra`:
-- `_thinking` — raw reasoning/thinking content
-- `_still` — cleaned final response content
-- `_tools` — accumulated tool_calls
-
-Accessible via `client.chat.think`, `client.chat.still`, `client.chat.tools`, `client.chat.raw`.
-
-### FallbackManager
-
-`FallbackManager` is only invoked when `chat.create()` is called **without** a `model` argument (or with `model=""`). If the primary model fails, it iterates through `fallback_models` in order. If a model is passed directly to `chat.create()`, no fallback occurs.
-
-### Sync/async relationship
-
-`CNLLM` (sync) holds an internal `AsyncCNLLM` engine and delegates async operations to it. The `LangChainRunnable` integration uses the async engine directly.
-
-## Adding a New Vendor
-
-1. Create `configs/{vendor}/request_{vendor}.yaml` and `configs/{vendor}/response_{vendor}.yaml`
-2. Create `cnllm/core/vendor/{vendor}.py` with the three-component pattern; call `{Vendor}Adapter._register()` at the bottom
-3. Add model alias → vendor name mappings to the YAML `model_mapping.chat` section
-4. Write tests: unit tests in `tests/test_*.py` (no API key), integration tests in `tests/key_needed/` (with key assignment at the top: `MODEL = "..."; API_KEY = os.getenv("...")`)
 
-Full walkthrough: see `docs/CONTRIBUTOR.md`.
\ No newline at end of file
+📖 For full documentation and examples, visit the dedicated skill repository:
+https://github.com/kanchengw/cnllm-skill
\ No newline at end of file
diff --git a/README.md b/README.md
index 225b4a4..870d707 100644
--- a/README.md
+++ b/README.md
@@ -124,20 +124,23 @@ CNLLM 为中文大模型提供了一个**统一的 OpenAI 兼容接口层**与
 
 ### 1.1 安装
 
-#### 1.1.1 SDK 安装
+#### 1.1.1 作为 Agent Skill 安装 （推荐）
+
+CNLLM 遵循 Claude Skills 规范提供标准 Agent Skill。
+
+**安装 Skill**：
 ```bash
-pip install cnllm
+npx skills add kanchengw/cnllm-skill
 ```
 
-#### 1.1.2 作为 Agent Skill 安装
+📖 完整文档和示例，请访问 CNLLM Skill 仓库：
+https://github.com/kanchengw/cnllm-skill
 
-**一键安装**：
+#### 1.1.2 SDK 安装
 ```bash
-npx skills add https://github.com/kanchengw/cnllm
+pip install cnllm
 ```
 
-或手动将项目根目录的 `SKILL.md` 文件复制到 Agent 的技能目录下，在**调用中文大模型时， 会优先使用 CNLLM**。
-
 ### 1.2 客户端初始化
 
 #### 1.2.1 同步客户端
@@ -266,7 +269,7 @@ print(resp.raw)  # 完整累积后的模型原生响应
 | **raw**: 模型原生响应            | `resp.raw`   | `List[Dict]`      | `[模型原生流式 chunks 列表]`           |
 
 **repr():** 
-流式调用中，展示**chunks 合并和字段累积的实时结果**，而非流式 chunks 列表；不改变流式响应对象类型，即包含所有标准流式 chunks 的**迭代器**。
+以类似非流式响应的**字典结构**展示流式响应的**字段名聚合和字段值累积的实时结果**；不改变流式响应对象类型，即包含所有标准流式 chunks 的**迭代器**。
 ```python
 for chunk in resp:
     print(resp)
@@ -275,7 +278,7 @@ for chunk in resp:
 
 ### 2.2 chat completions 批量调用
 
-可通过`prompt`和`messages`参数输入并快速配置全局参数，也可以通过`requests`参数为单个请求进行独立配置。
+可通过 `prompt` 和 `messages` 参数输入并快速配置全局参数，也可以通过 `requests` 参数为单个请求进行独立配置。
 
 **prompt 参数：**
 
@@ -325,15 +328,12 @@ BatchResponse 外层结构，其中 `results[request_id]` 字段下的每条响
 ```python
 {
     "status": {"elapsed": "3.42s", "success_count": 2, "fail_count": 1, "total": 3},  # 统计信息
-    "usage": {"prompt_tokens": 5, "total_tokens": 5},  # 批处理的总用量信息
-    "errors": {"request_2": "error message"},  # 所有失败请求的 request_id 和错误信息映射
-    "results": {     # 所有成功请求的 request_id 和标准响应映射
-        "request_0": {...}, 
-        "request_1": {...}  
-    },
+    "usage": {"prompt_tokens": 5, "total_tokens": 5},     # 批处理的总用量信息
+    "errors": {"request_2": "error message"},             # 所有失败请求的 request_id 和错误信息映射
+    "results": {"request_0": {...}, "request_1": {...}},  # 所有成功请求的 request_id 和标准响应映射
     "think": {"request_0": "...", "request_1": "..."},
     "still": {"request_0": "...", "request_1": "..."},
-    "tools": {"request_0": [...], "request_1": [...]},
+    "tools": {"request_0": {...}, "request_1": {...}},
     "raw": {"request_0": {...}, "request_1": {...}}
 }
 ```
@@ -429,22 +429,12 @@ BatchEmbeddingResponse 外层结构，其中 `results[request_id]` 字段下每
 
 ```python
 {   
-    "status": {
-        "elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2
-    },
-    "batch_info": {
-        "batch_size": 2, "batch_count": 2, "dimension": 1024
-    },
+    "status": {"elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2},
+    "batch_info": {"batch_size": 2, "batch_count": 2, "dimension": 1024},
     "usage": {"prompt_tokens": 5, "total_tokens": 5},
-    "errors": {"request_1": "error message"},
-    "results": {
-        "request_0": {
-            "object": "list",
-            "data": [{"object": "embedding","embedding": [0.1, 0.2, ...], "index": 0}],
-            "model": "embedding-2"
-        }
-    }
-    "vectors": {"request_0": [...]}
+    "results": {"request_0": {...}, "request_1": {...}}
+    "errors": {"request_2": "error message"},
+    "vectors": {"request_0": [...]}    # 所有成功请求的 request_id 和嵌入向量映射
 }
 ```
 
@@ -622,11 +612,8 @@ client = CNLLM(..., keep=["vectors"])
 | 静默忽略模式   | `drop_params="ignore"` | 静默丢弃未知参数，不产生任何日志              |
 
 **说明：**
--进行批量调用时，若全局参数中包含未知参数，`drop_params="strict"` 直接抛出异常，不实际启动批量任务；
-若批量任务中的单个请求包含未知参数，`drop_params="strict"` 直接将该请求归入 `errors` 字段，不实际执行该请求，并继续执行后续的批量任务。
-
-- 特别地，当配置`drop_params="strict"` 且 `stop_on_error=True` 时，批量请求中遭遇第一个错误时会立即中断批量任务，同时返回已处理的请求结果，详见 [遇错停止](#253-遇错停止)。
-- `drop_params` 参数支持客户端配置以及所有调用方式（包括 `create` 单条调用方式）。
+- 进行批量调用时，若全局参数中包含未知参数，`drop_params="strict"` 直接抛出异常，不实际启动批量任务；
+- 若批量任务中的单个请求包含未知参数，`drop_params="strict"` 直接将该请求归入 `errors` 字段，不实际执行该请求，并继续执行后续的批量任务。
 
 ## 3. CNLLM 标准响应格式
 
@@ -717,7 +704,7 @@ CNLLM 请求参数与**OpenAI 标准参数**基本一致，覆盖范围基于国
 
 | 参数                  | 类型                              | 默认值                             | 说明                                                     | 
 | ------------------- | ------------------------------- | ------------------------------- | ------------------------------------------------------ | 
-| `model`             | `str`                           | -                               | 模型名称，客户端初始化必填，调用入口可覆盖         | 
+| `model`             | `str`                           | -                               | 模型名称，模型名见[支持的模型](#支持的模型)       | 
 | `api_key`           | `str`                           | -                               | API 密钥                                                 | 
 | `base_url`          | `str`                           | 自动适配                            | 可自定义 API 地址                                            | 
 | `messages`          | `list[dict]`/`list[list[dict]]` | -                               | `chat()` 输入参数，支持上下文管理/图片识别（仅支持调用入口配置）                           | 
@@ -783,7 +770,7 @@ CNLLM 内部定义的参数，控制内部执行的行为或策略，不向 API
 | `max_retries`     | `int`   | `3`      | 最大重试次数             |
 | `retry_delay`     | `float` | `1.0`    | 重试延迟（秒）            |
 | `fallback_models`¹ | `dict`  | -        | 备用模型（仅支持客户端初始化配置），见下方说明 |
-| `drop_params`     | `str`   | `"warn"` | 见 [未知参数处理策略](#255) |
+| `drop_params`     | `str`   | `"warn"` | 见 [未知参数处理策略](#255-未知参数处理策略) |
 
 ¹`fallback_models` 模型降级策略：
 
@@ -819,7 +806,7 @@ fallback_models = {
 | `stop_on_error`  | `bool`      | `False`                      | 遇错时停止后续请求，返回已处理结果     |
 | `callbacks`      | `list`      | -                            | 进度回调函数列表              |
 | `custom_ids`     | `list[str]` | -                            | 自定义请求 ID 列表           |
-| `keep`           | `set/list`  | 见 [字段存储控制](#254)             | 迭代后保留的数据字段            |
+| `keep`           | `set/list`  | 见 [字段存储控制](#254-字段存储控制)             | 迭代后保留的数据字段            |
 
 ## 5. 框架集成
 
diff --git a/README_en.md b/README_en.md
index 83a66ba..d2df6a5 100644
--- a/README_en.md
+++ b/README_en.md
@@ -124,20 +124,23 @@ Project Documentation:
 
 ### 1.1 Installation
 
-#### 1.1.1 SDK Installation
+#### 1.1.1 Install as Agent Skill (Recommended)
+
+CNLLM now provides a dedicated Agent Skill following the Claude Skills / Agent Skills standard.
+
+**Install the skill**:
 ```bash
-pip install cnllm
+npx skills add kanchengw/cnllm-skill
 ```
 
-#### 1.1.2 Install as Agent Skill
+📖 For full documentation and examples, visit the dedicated skill repository:
+https://github.com/kanchengw/cnllm-skill
 
-**One-Click Install**:
+#### 1.1.2 SDK Installation
 ```bash
-npx skills add https://github.com/kanchengw/cnllm
+pip install cnllm
 ```
 
-Or manually copy the `SKILL.md` file from the project root to your agent's skill directory. When **calling Chinese LLMs, CNLLM will be used as the preferred option**.
-
 ### 1.2 Client Initialization
 
 #### 1.2.1 Sync Client
@@ -266,7 +269,7 @@ In streaming calls, access via `for` loop with **real-time accumulation** for re
 | **raw**: model native response | `resp.raw`   | `Dict`            | `{"id": "...", "choices": [...], ...}`           |
 
 **repr():** 
-During streaming, displays **real-time merged chunks and accumulated field results**, not the real-time streaming chunks list; does not change the streaming response object type, which is an **iterator** containing all standard streaming chunks.
+Displays **real-time field keys aggregation and field value accumulation** in  a non-streaming-like **dictionary format**; which does not change the streaming response object type, which is an **iterator** containing all standard streaming chunks.
 ```python
 for chunk in resp:
     print(resp)
@@ -326,15 +329,12 @@ BatchResponse outer structure, where each response under `results[request_id]` i
 ```python
 {
     "status": {"elapsed": "3.42s", "success_count": 2, "fail_count": 1, "total": 3},  # Statistics
-    "usage": {"prompt_tokens": 5, "total_tokens": 5},  # Batch processing total usage info
-    "errors": {"request_2": "error message"},  # Mapping of all failed requests' request_id and error messages
-    "results": {     # Mapping of all successful requests' request_id and standard responses
-        "request_0": {...},
-        "request_1": {...}
-    },
+    "usage": {"prompt_tokens": 5, "total_tokens": 5},     # Batch processing total usage info
+    "errors": {"request_2": "error message"},             # Mapping of all failed requests' request_id and error messages
+    "results": {"request_0": {...}, "request_1": {...}},  # Mapping of all successful requests' request_id and standard responses
     "think": {"request_0": "...", "request_1": "..."},
     "still": {"request_0": "...", "request_1": "..."},
-    "tools": {"request_0": [...], "request_1": [...]},
+    "tools": {"request_0": {...}, "request_1": {...}},
     "raw": {"request_0": {...}, "request_1": {...}}
 }
 ```
@@ -426,6 +426,21 @@ resp.to_dict()               # Default: keeps vectors field + metadata (status/u
 resp.to_dict(results=True)   # Keeps results field + metadata (status/usage/batch_info)
 ```
 
+#### 2.3.3 Embeddings Batch Response Structure
+
+BatchEmbeddingResponse outer structure, where each response under `results[request_id]` is in **OpenAI standard Embeddings response structure**:
+
+```python
+{   
+    "status": {"elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2},
+    "batch_info": {"batch_size": 2, "batch_count": 2, "dimension": 1024},
+    "usage": {"prompt_tokens": 5, "total_tokens": 5},
+    "results": {"request_0": {...}, "request_1": {...}}
+    "errors": {"request_2": "error message"},
+    "vectors": {"request_0": [...]}    # Mapping of all successful requests' request_id and embedding vectors
+}
+```
+
 ### 2.4 Batch Call Control Parameters
 
 Batch calls support **retry strategy, concurrency control** parameter configuration:
@@ -544,10 +559,7 @@ Use `drop_params` to control the handling behavior of **incompatible parameters
 
 **Notes:**
 - When doing batch calls, if global parameters contain unknown parameters, `drop_params="strict"` directly throws an exception without actually starting the batch task;
-If a single request within the batch task contains unknown parameters, `drop_params="strict"` directly puts that request into the `errors` field without actually executing that request, and continues executing subsequent batch tasks.
-
-- Specifically, when configured with `drop_params="strict"` and `stop_on_error=True`, the first error encountered in batch requests immediately interrupts the batch task while returning already processed request results. See [Stop on Error](#253-stop-on-error).
-- The `drop_params` parameter supports client configuration and all calling methods (including `create` single-call method).
+- If a single request within the batch task contains unknown parameters, `drop_params="strict"` directly puts that request into the `errors` field without actually executing that request, and continues executing subsequent batch tasks.
 
 ## 3. CNLLM Standard Response Format
 
@@ -638,7 +650,7 @@ Note: Not all supported models support all request parameters. Please refer to v
 
 | Parameter | Type | Default | Description |
 | ------------------- | ------------------------------- | ------------------------------- | ------------------------------------------------------ |
-| `model`             | `str`                           | -                               | Model name, required at client initialization, can be overridden at call entry         |
+| `model`             | `str`                           | -                               | Model name, see [Supported Models](#supported-models)|
 | `api_key`           | `str`                           | -                               | API key                                                 |
 | `base_url`          | `str`                           | Auto-adapted                            | Customizable API address                                            |
 | `messages`          | `list[dict]`/`list[list[dict]]` | -                               | `chat()` input parameter, supports context management/image recognition (call entry configuration only)                           |
diff --git a/SKILL.md b/SKILL.md
index a6f6835..3068ddb 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -1,371 +1,11 @@
----
-name: cnllm-chinese-llm-adapter
-version: 1.0.4
-description: >-
-  为中文大模型定制的通用增强 SDK / unified adapter for Chinese LLMs: DeepSeek,
-  GLM/Zhipu (智谱), KIMI/Moonshot (月之暗面), MiniMax (稀宇),
-  Doubao/ByteDance (豆包/字节), Xiaomi mimo (小米).
-  统一接口替代多厂商自研 SDK 或 OpenAI SDL/LiteLLM，并将模型响应封装为 OpenAI 标准格式响应。
-  深度适配 LangChain,LlamaIndex,Autoogen,Haystack,Deepeval 等 LLM 应用框架。
-  CNLLM 支持模型的原生参数和功能，解决传参可能静默失效或缺少参数支持，提供高透明度和响应可控性。
-  支持同步/异步、流式/非流式、批量/非批量及混合流式策略的批量 chat 和 embeddings 调用。
-  提供纯净回复、推理内容、工具调用快捷访问入口，无需额外解析。
-  工程化能力：参数验证、未知参数处理策略控制、备用模型自动降级等，提高模型调用的稳定性和可靠性。
-  批量高级功能：单个请求独立配置、实时进度、可配置的失败策略和内存控制、回调、自定义索引。
----
+# Agent Integration
 
-# CNLLM: Chinese LLM Unified Adapter
-
-## When to Use
-
-- Calling **DeepSeek**, **GLM (Zhipu)**, **KIMI (Moonshot)**, **MiniMax**, **Doubao (ByteDance)**, or **Xiaomi** models
-- **Multi-model workflows** where different stages use different models (e.g., embedding with MiniMax → reasoning with DeepSeek-reasoner → generation with GLM in a single LangChain pipeline)
-- **Multi-model evaluation / LLM-as-Judge** — same input sent to multiple models in one batch call to compare or score outputs
-- **Transparency & control** — CNLLM's YAML-driven config explicitly declares every supported param, its mapping, and its behavior. Unlike OpenAI-compatible interfaces where `extra_body` params may silently fail with no feedback, CNLLM gives you deterministic, visible parameter handling
-- **Streaming lifecycle inspection** with real-time access to automated content accumulation (`.think`, `.still`, `.tools` properties).
-- **Multi-model fallback** for production resilience — auto-retry with different providers on failure
-- **Batch data processing** — high-throughput labeling, classification, translation, or synthetic data generation. CNLLM supports **real-time progress monitoring** (request_counts updated per-request), **per-request independent configuration** (different model/params/thinking for each item), **real-time progress statitic**, **configurable failure policy** and **memory mannagement**, **callbacks**, **custom IDs** — features the OpenAI batch API does not offer
-- Project uses **LangChain**, **LlamaIndex**, or **LiteLLM** and needs Chinese LLM integration
-- Codebase already has `cnllm` in dependencies (check `requirements.txt`, `pyproject.toml`, or existing imports)
-
-## Anti-Patterns (Do NOT)
-
-- **Do NOT** use vendor-native SDKs (`zhipuai`, `dashscope`, `volcengine`) — CNLLM handles format conversion
-- **Do NOT** implement custom retry/fallback — use CNLLM's built-in `fallback_models`
-- **Do NOT** write vendor-specific error parsing — use CNLLM's OpenAI-style exceptions
-
-## Installation
+CNLLM now provides a dedicated Agent Skill following the Claude Skills / Agent Skills standard.
 
+📦 **Install the skill**:
 ```bash
-pip install cnllm
-```
-
-## Import
-
-```python
-from cnllm import CNLLM
-```
-
-## Quick Reference
-
-### 1. Basic Chat
-
-```python
-from cnllm import CNLLM
-
-client = CNLLM(model="deepseek-chat", api_key="your_key")
-# Standard OpenAI-style messages
-resp = client.chat.create(
-    messages=[{"role": "user", "content": "Hello"}]
-)
-print(resp.choices[0].message.content)
-
-# Prompt shortcut (single user message)
-resp = client.chat.create(prompt="Hello", stream=True)
-```
-
-### 2. Streaming with Thinking Content
-
-```python
-client = CNLLM(model="deepseek-reasoner", api_key="your_key")
-resp = client.chat.create(
-    messages=[{"role": "user", "content": "Think step by step"}],
-    stream=True
-)
-for chunk in resp:
-    # Real-time accumulated access during streaming:
-    pass
-
-# After iteration, access accumulated results:
-resp.think   # str — reasoning/thinking content
-resp.still   # str — final response
-resp.tools   # dict — accumulated tool_calls
-resp.raw     # dict — raw vendor response
-```
-
-### 4. Multi-Model Fallback
-
-```python
-client = CNLLM(
-    model="deepseek-chat",
-    api_key="primary_key",
-    fallback_models={
-        "glm-4.7-flash": {"api_key": "glm_key"},                            # with different key
-        "deepseek-reasoner": {"api_key": "ds_key", "base_url": "https://api.deepseek.com/v1"},  # with key + endpoint
-    }
-)
-# No model argument → triggers FallbackManager
-resp = client.chat.create(prompt="Hello")
-# Auto-falls back if primary fails; raises FallbackError if all fail
-```
-
-### 5. Batch Processing
-
-```python
-# Simple batch — same params for all
-resp = client.chat.batch(
-    prompt=["Hello", "How are you?", "What is AI?"],
-    stream=True
-)
-print(resp.still["request_0"])   # per-request response text
-print(resp.status)                 # real-time success/fail/total/elapsed
-
-# Per-request config
-resp = client.chat.batch(
-    requests=[
-        {"prompt": "Hi", "model": "deepseek-chat", "thinking": True},
-        {"prompt": "1+1=", "model": "glm-4.7-flash"},
-    ],
-    max_concurrent=3
-)
-
-# Advanced: callbacks + custom_ids + stop-on-error
-def on_complete(request_id, status):
-    print(f"[{request_id}] {status}")
-
-resp = client.chat.batch(
-    prompt=["Task A", "Task B", "Task C"],
-    custom_ids=["job_001", "job_002", "job_003"],
-    callbacks=[on_complete],
-    stop_on_error=True,
-    max_concurrent=5,
-    timeout=60
-)
-```
-
-### 6. Embeddings
-
-```python
-# Single
-resp = client.embeddings.create(input="Hello world")
-
-# Batch
-resp = client.embeddings.batch(
-    input=["Hello", "world", "你好"],
-    custom_ids=["doc_1", "doc_2", "doc_3"]
-)
-```
-
-### 7. LangChain Runnable
-
-```python
-from cnllm.core.framework import LangChainRunnable
-from langchain_core.prompts import ChatPromptTemplate
-
-client = CNLLM(model="deepseek-chat", api_key="your_key")
-runnable = LangChainRunnable(client)
-
-chain = ChatPromptTemplate.from_messages([
-    ("system", "You are helpful"),
-    ("human", "{input}")
-]) | runnable
-
-resp = chain.invoke({"input": "Hello"})           # sync
-for chunk in chain.stream({"input": "Count to 5"}):  # streaming
-    print(chunk.content, end="")
-import asyncio
-asyncio.run(chain.ainvoke({"input": "Hi"}))       # async
-```
-
-### 7.1. LlamaIndex
-
-```python
-from cnllm import CNLLM
-from llama_index.core.llms import ChatMessage, MessageRole
-
-client = CNLLM(model="deepseek-chat", api_key="your_key")
-resp = client.chat.create(prompt="Introduce yourself")
-
-msg = ChatMessage(role=MessageRole.ASSISTANT, content=resp.still)
-print(msg.content)
-```
-
-### 7.2. AutoGen
-
-```python
-from cnllm import CNLLM
-from autogen_agentchat.messages import TextMessage
-
-client = CNLLM(model="deepseek-chat", api_key="your_key")
-resp = client.chat.create(prompt="1+1=?")
-
-msg = TextMessage(content=resp.still, source="assistant")
-print(msg.content)
-```
-
-### 7.3. Haystack
-
-```python
-from cnllm import CNLLM
-from haystack import Document
-from haystack.dataclasses import ChatMessage
-
-client = CNLLM(model="deepseek-chat", api_key="your_key")
-
-text = "CNLLM is a Chinese LLM adapter"
-resp = client.embeddings.create(input=text)
-doc = Document(content=text, embedding=resp["data"][0]["embedding"])
-
-resp = client.chat.create(prompt="1+1=?")
-msg = ChatMessage.from_assistant(resp.still)
-print(msg.text)
-```
-
-### 7.4. DeepEval
-
-```python
-from cnllm import CNLLM
-from deepeval.test_case import LLMTestCase
-
-client = CNLLM(model="deepseek-chat", api_key="your_key")
-resp = client.chat.create(messages=[{"role": "user", "content": "1+1=?"}])
-
-test_case = LLMTestCase(
-    input="1+1=?", actual_output=resp.still, expected_output="2",
-)
-print(test_case.actual_output)
-```
-
-### 8. Asynchronous Client
-
-```python
-from cnllm import asyncCNLLM
-import asyncio
-
-async def main():
-    client = asyncCNLLM(model="deepseek-chat", api_key="your_key")
-    # Use await for all asyncCNLLM methods
-    resp = await client.chat.create(prompt="Hello", stream=True)
-    async for chunk in resp:
-        print(chunk)
-
-asyncio.run(main())
-```
-
-### 9. Context Management
-
-```python
-# Persistent — close manually
-client = CNLLM(model="deepseek-chat", api_key="key")
-resp = client.chat.create(prompt="Hello")
-client.close()
-
-# Temporary — auto-closes
-with CNLLM(model="deepseek-chat", api_key="key") as client:
-    resp = client.chat.create(prompt="Hello")
-```
-
-## Response Reference
-
-### Chat Completion Response
-
-```python
-resp = client.chat.create(messages=[...])
-
-# Direct CNLLM accessors (preferred):
-resp.still   # str — response content
-resp.think   # str — reasoning/thinking content (if any)
-resp.tools   # dict — tool_calls (if any)
-resp.raw     # dict — full raw vendor response in OpenAI-compatible format
-```
-
-### Streaming Access
-
-```python
-resp = client.chat.create(messages=[...], stream=True)
-for chunk in resp:
-    # Real-time accumulated access during streaming:
-    pass
-
-# After/during iteration, same accessors on the response:
-resp.still   # str — accumulated response content
-resp.think   # str — accumulated reasoning content
-resp.tools   # dict — accumulated tool_calls
-resp.raw     # dict — accumulated raw vendor response
-```
-
-### Batch Response (Chat & Embeddings)
-
-```python
-resp = client.chat.batch(prompt=[...])
-# Also accessible via client.chat.batch_result.* in either sync/async
-
-# Top-level fields:
-resp.status           # dict — {success_count, fail_count, total, elapsed}
-resp.errors           # dict — {request_id: error_msg}
-resp.usage            # dict — {prompt_tokens, completion_tokens, total_tokens}
-
-# Per-request access (chat):
-resp.results["request_0"]    # OpenAI-format response per request
-resp.think["request_0"]      # reasoning content (chat only)
-resp.still["request_0"]      # response text (chat only)
-resp.tools["request_0"]      # tool_calls (chat only)
-resp.raw["request_0"]        # raw vendor response
-
-# Embeddings-only extra:
-resp.batch_info["dimension"]       # int — embedding dimension
-```
-
-### Error Handling
-
-```python
-from cnllm import CNLLMError, AuthenticationError, RateLimitError, \
-    TimeoutError, NetworkError, ServerError, InvalidRequestError, \
-    ContentFilteredError, ModelNotSupportedError, FallbackError
-try:
-    resp = client.chat.create(prompt="Hi")
-except RateLimitError:
-    # handle rate limit
-except ContentFilteredError:
-    # sensitive content detected
-except FallbackError:
-    # all fallback models failed
-```
-
-## Supported Vendors
-
-| Vendor      | Chat Models                                                                 | Embeddings Models                  |
-|-------------|-----------------------------------------------------------------------------|------------------------------------|
-| **DeepSeek**  | deepseek-chat, deepseek-reasoner, deepseek-v4-pro, deepseek-v4-flash      | —                                  |
-| **KIMI**      | kimi-k2.6, kimi-k2.5, moonshot-v1-8k/32k/128k, moonshot-v1-vision-preview | — |
-| **GLM**       | glm-4.6, glm-4.7, glm-4.7-flash, glm-4.7-flashx, glm-5, glm-5.1, glm-4.5 series, glm-5v-turbo, glm-4.5v, glm-4.6v, glm-4.6v-flash | embedding-2, embedding-3, embedding-3-pro |
-| **MiniMax**   | MiniMax-M2, MiniMax-M2.1, MiniMax-M2.5, MiniMax-M2.5-highspeed, MiniMax-M2.7, MiniMax-M2.7-highspeed | embo-01 |
-| **Doubao**    | doubao-seed-2-0-pro-260215 (doubao-seed-2-0-pro), doubao-seed-2-0-mini-260215 (doubao-seed-2-0-mini), doubao-seed-2-0-lite-260215 (doubao-seed-2-0-lite), doubao-seed-2-0-code-preview-260215 (doubao-seed-2-0-code), doubao-seed-1-8-251228 (doubao-seed-1-8), doubao-seed-1-6-251015 (doubao-seed-1-6), doubao-seed-1-6-flash-250828 (doubao-seed-1-6-flash), doubao-seed-1-6-vision-250815 (doubao-seed-1-6-vision), doubao-1-5-vision-pro-32k-250115 (doubao-1-5-vision-pro), doubao-seed-1-5-lite-32k-250115 (doubao-seed-1-5-lite), doubao-seed-1-5-pro-32k-250115 (doubao-seed-1-5-pro-32k), doubao-seed-1-5-pro-256k-250115 (doubao-seed-1-5-pro) | — |
-| **Xiaomi**    | mimo-v2-pro, mimo-v2-omni, mimo-v2-flash, mimo-v2.5-pro, mimo-v2.5       | —                                  |
-
-## Key Parameters
-
-All **OpenAI-standard parameters** are supported: `temperature`, `max_tokens`, `max_completion_tokens`, `top_p`, `tools`, `tool_choice`, `thinking`, `response_format`, `stop`, `presence_penalty`, `frequency_penalty`, `user`, `timeout`, `max_retries`.
-
-Two notable CNLLM extensions:
-- **`thinking`**: `True`/`False`/`"auto"` — controls reasoning/thinking. Maps to each vendor's native thinking param
-- **`fallback_models`**: dict of `{model_name: {"api_key": ..., "base_url": ...}}` — each fallback model has its own api_key (required) and optional base_url. Only active when `chat.create()` is called without a `model` argument
-
-**Batch-specific parameters** (set at the batch level, not per-request):
-- **`max_concurrent`**: `int` — max concurrent requests (default: 3 for chat, 12 for embeddings)
-- **`rps`**: `float` — requests per second rate limit
-- **`timeout`**: `int` — per-request timeout in seconds (default: 30)
-- **`max_retries`**: `int` — max retry attempts on failure (default: 3)
-- **`retry_delay`**: `float` — delay between retries in seconds (default: 1.0)
-- **`custom_ids`**: `list[str]` — meaningful request IDs for each input
-- **`callbacks`**: `list[callable]` — invoked on each request completion for real-time tracking
-- **`stop_on_error`**: `bool` — if True, halts all remaining requests on first failure
-
-Parameters passable at init (shared across calls) or overridden per-call:
-
-```python
-client = CNLLM(model="...", api_key="...", temperature=0.7)
-resp = client.chat.create(prompt="Hi", temperature=0.3)  # overrides
-```
-
-## Error Handling (OpenAI-style)
-
-```python
-from cnllm import (
-    CNLLMError, AuthenticationError, RateLimitError, TimeoutError,
-    NetworkError, ServerError, InvalidRequestError, ContentFilteredError,
-    ModelNotSupportedError, FallbackError, TokenLimitError
-)
+npx skills add kanchengw/cnllm-skill
 ```
 
-#
\ No newline at end of file
+📖 For full documentation and examples, visit the dedicated skill repository:
+https://github.com/kanchengw/cnllm-skill
\ No newline at end of file