Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ files/
outputs/
test_venv/
backup*/
demo/
resources/

# Python cache
__pycache__/
Expand Down
85 changes: 6 additions & 79 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,84 +1,11 @@
# CLAUDE.md
# Agent Integration

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

CNLLM is a unified adapter layer that translates Chinese LLM vendor APIs (MiniMax, DeepSeek, KIMI, Doubao, GLM, Xiaomi) into OpenAI-compatible request/response formats, enabling seamless integration with LangChain, LlamaIndex, and other OpenAI-compatible frameworks.

## Running Tests
CNLLM now provides a dedicated Agent Skill following the Claude Skills / Agent Skills standard.

📦 **Install the skill**:
```bash
# Unit tests only (no API keys needed)
pytest tests/test_*.py

# All tests including integration tests (require API keys in env vars)
pytest tests/

# Single test file
pytest tests/test_adapter_config.py -v
```

API-key-dependent tests live in `tests/key_needed/` and are gated by the presence of environment variables (e.g., `MINIMAX_API_KEY`, `XIAOMI_API_KEY`).

## Architecture

npx skills add kanchengw/cnllm-skill
```
CNLLM (client) → ChatNamespace / EmbeddingsNamespace → create/batch
→ BaseAdapter._build_payload() [YAML-driven field mapping]
→ BaseHttpClient (httpx) [HTTP execution]
→ Responder.to_openai_format() [vendor response → OpenAI format]
→ Accumulator [field accumulation, stream handling]
```

### Three-component vendor pattern

Each vendor in `cnllm/core/vendor/{vendor}.py` implements three classes:

1. **`{Vendor}Adapter(BaseAdapter)`** — builds request payload (`_build_payload`), performs format conversion (`_to_openai_format`, `_do_to_openai_stream_format`), and registers with `_register()`
2. **`{Vendor}Responder(Responder)`** — maps vendor response fields to OpenAI standard fields via `configs/{vendor}/response_{vendor}.yaml`; usually just sets `CONFIG_DIR`
3. **`{Vendor}VendorError(VendorError)`** — parses vendor-specific error responses via `from_response()` and registers with `VendorErrorRegistry.register()`

The vendor module is also where you place any subclass overrides for logic that can't be expressed in YAML (e.g., MiniMax's stream chunk dedup, Xiaomi's `thinking` transform).

### YAML-driven request/response mapping

**Request config** (`configs/{vendor}/request_{vendor}.yaml`) drives:
- `required_fields` — mandatory parameters and validation
- `optional_fields` — optional parameters, including field name `map` (rename), `transform` (value conversion), `skip` (exclude from body, e.g. for headers)
- `model_mapping` — short model alias → vendor model name
- `error_check` — vendor error code → CNLLM exception type mapping

**Response config** (`configs/{vendor}/response_{vendor}.yaml`) drives:
- `fields` — vendor response path → OpenAI field path (e.g. `"content": "choices[0].message.content"`)
- `stream_fields` — same for streaming chunks (`content_path`, `tool_calls_path`, `reasoning_content_path`)
- `defaults` — fallback values when vendor omits fields
- `error_check` — sensitive content detection paths

The parameter processing order is: `resolve_default` (read scope defaults) → `validate_for_scope` (PARAM_REGISTRY + YAML + drop_params) → `_validate_one_of` → `_check_image_support` → `_build_payload` (YAML field mapping + get_vendor_model) → `get_base_url` + `get_api_path` → `get_header_mappings`.

### Field accumulation

Streaming responses accumulate into `adapter._cnllm_extra`:
- `_thinking` — raw reasoning/thinking content
- `_still` — cleaned final response content
- `_tools` — accumulated tool_calls

Accessible via `client.chat.think`, `client.chat.still`, `client.chat.tools`, `client.chat.raw`.

### FallbackManager

`FallbackManager` is only invoked when `chat.create()` is called **without** a `model` argument (or with `model=""`). If the primary model fails, it iterates through `fallback_models` in order. If a model is passed directly to `chat.create()`, no fallback occurs.

### Sync/async relationship

`CNLLM` (sync) holds an internal `AsyncCNLLM` engine and delegates async operations to it. The `LangChainRunnable` integration uses the async engine directly.

## Adding a New Vendor

1. Create `configs/{vendor}/request_{vendor}.yaml` and `configs/{vendor}/response_{vendor}.yaml`
2. Create `cnllm/core/vendor/{vendor}.py` with the three-component pattern; call `{Vendor}Adapter._register()` at the bottom
3. Add model alias → vendor name mappings to the YAML `model_mapping.chat` section
4. Write tests: unit tests in `tests/test_*.py` (no API key), integration tests in `tests/key_needed/` (with key assignment at the top: `MODEL = "..."; API_KEY = os.getenv("...")`)

Full walkthrough: see `docs/CONTRIBUTOR.md`.
📖 For full documentation and examples, visit the dedicated skill repository:
https://github.com/kanchengw/cnllm-skill
65 changes: 26 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,20 +124,23 @@ CNLLM 为中文大模型提供了一个**统一的 OpenAI 兼容接口层**与

### 1.1 安装

#### 1.1.1 SDK 安装
#### 1.1.1 作为 Agent Skill 安装 (推荐)

CNLLM 遵循 Claude Skills 规范提供标准 Agent Skill。

**安装 Skill**:
```bash
pip install cnllm
npx skills add kanchengw/cnllm-skill
```

#### 1.1.2 作为 Agent Skill 安装
📖 完整文档和示例,请访问 CNLLM Skill 仓库:
https://github.com/kanchengw/cnllm-skill

**一键安装**:
#### 1.1.2 SDK 安装
```bash
npx skills add https://github.com/kanchengw/cnllm
pip install cnllm
```

或手动将项目根目录的 `SKILL.md` 文件复制到 Agent 的技能目录下,在**调用中文大模型时, 会优先使用 CNLLM**。

### 1.2 客户端初始化

#### 1.2.1 同步客户端
Expand Down Expand Up @@ -266,7 +269,7 @@ print(resp.raw) # 完整累积后的模型原生响应
| **raw**: 模型原生响应 | `resp.raw` | `List[Dict]` | `[模型原生流式 chunks 列表]` |

**repr():**
流式调用中,展示**chunks 合并和字段累积的实时结果**,而非流式 chunks 列表;不改变流式响应对象类型,即包含所有标准流式 chunks 的**迭代器**。
以类似非流式响应的**字典结构**展示流式响应的**字段名聚合和字段值累积的实时结果**;不改变流式响应对象类型,即包含所有标准流式 chunks 的**迭代器**。
```python
for chunk in resp:
print(resp)
Expand All @@ -275,7 +278,7 @@ for chunk in resp:

### 2.2 chat completions 批量调用

可通过`prompt``messages`参数输入并快速配置全局参数,也可以通过`requests`参数为单个请求进行独立配置。
可通过 `prompt``messages` 参数输入并快速配置全局参数,也可以通过 `requests` 参数为单个请求进行独立配置。

**prompt 参数:**

Expand Down Expand Up @@ -325,15 +328,12 @@ BatchResponse 外层结构,其中 `results[request_id]` 字段下的每条响
```python
{
"status": {"elapsed": "3.42s", "success_count": 2, "fail_count": 1, "total": 3}, # 统计信息
"usage": {"prompt_tokens": 5, "total_tokens": 5}, # 批处理的总用量信息
"errors": {"request_2": "error message"}, # 所有失败请求的 request_id 和错误信息映射
"results": { # 所有成功请求的 request_id 和标准响应映射
"request_0": {...},
"request_1": {...}
},
"usage": {"prompt_tokens": 5, "total_tokens": 5}, # 批处理的总用量信息
"errors": {"request_2": "error message"}, # 所有失败请求的 request_id 和错误信息映射
"results": {"request_0": {...}, "request_1": {...}}, # 所有成功请求的 request_id 和标准响应映射
"think": {"request_0": "...", "request_1": "..."},
"still": {"request_0": "...", "request_1": "..."},
"tools": {"request_0": [...], "request_1": [...]},
"tools": {"request_0": {...}, "request_1": {...}},
"raw": {"request_0": {...}, "request_1": {...}}
}
```
Expand Down Expand Up @@ -429,22 +429,12 @@ BatchEmbeddingResponse 外层结构,其中 `results[request_id]` 字段下每

```python
{
"status": {
"elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2
},
"batch_info": {
"batch_size": 2, "batch_count": 2, "dimension": 1024
},
"status": {"elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2},
"batch_info": {"batch_size": 2, "batch_count": 2, "dimension": 1024},
"usage": {"prompt_tokens": 5, "total_tokens": 5},
"errors": {"request_1": "error message"},
"results": {
"request_0": {
"object": "list",
"data": [{"object": "embedding","embedding": [0.1, 0.2, ...], "index": 0}],
"model": "embedding-2"
}
}
"vectors": {"request_0": [...]}
"results": {"request_0": {...}, "request_1": {...}}
"errors": {"request_2": "error message"},
"vectors": {"request_0": [...]} # 所有成功请求的 request_id 和嵌入向量映射
}
```

Expand Down Expand Up @@ -622,11 +612,8 @@ client = CNLLM(..., keep=["vectors"])
| 静默忽略模式 | `drop_params="ignore"` | 静默丢弃未知参数,不产生任何日志 |

**说明:**
-进行批量调用时,若全局参数中包含未知参数,`drop_params="strict"` 直接抛出异常,不实际启动批量任务;
若批量任务中的单个请求包含未知参数,`drop_params="strict"` 直接将该请求归入 `errors` 字段,不实际执行该请求,并继续执行后续的批量任务。

- 特别地,当配置`drop_params="strict"` 且 `stop_on_error=True` 时,批量请求中遭遇第一个错误时会立即中断批量任务,同时返回已处理的请求结果,详见 [遇错停止](#253-遇错停止)。
- `drop_params` 参数支持客户端配置以及所有调用方式(包括 `create` 单条调用方式)。
- 进行批量调用时,若全局参数中包含未知参数,`drop_params="strict"` 直接抛出异常,不实际启动批量任务;
- 若批量任务中的单个请求包含未知参数,`drop_params="strict"` 直接将该请求归入 `errors` 字段,不实际执行该请求,并继续执行后续的批量任务。

## 3. CNLLM 标准响应格式

Expand Down Expand Up @@ -717,7 +704,7 @@ CNLLM 请求参数与**OpenAI 标准参数**基本一致,覆盖范围基于国

| 参数 | 类型 | 默认值 | 说明 |
| ------------------- | ------------------------------- | ------------------------------- | ------------------------------------------------------ |
| `model` | `str` | - | 模型名称,客户端初始化必填,调用入口可覆盖 |
| `model` | `str` | - | 模型名称,模型名见[支持的模型](#支持的模型) |
| `api_key` | `str` | - | API 密钥 |
| `base_url` | `str` | 自动适配 | 可自定义 API 地址 |
| `messages` | `list[dict]`/`list[list[dict]]` | - | `chat()` 输入参数,支持上下文管理/图片识别(仅支持调用入口配置) |
Expand Down Expand Up @@ -783,7 +770,7 @@ CNLLM 内部定义的参数,控制内部执行的行为或策略,不向 API
| `max_retries` | `int` | `3` | 最大重试次数 |
| `retry_delay` | `float` | `1.0` | 重试延迟(秒) |
| `fallback_models`¹ | `dict` | - | 备用模型(仅支持客户端初始化配置),见下方说明 |
| `drop_params` | `str` | `"warn"` | 见 [未知参数处理策略](#255) |
| `drop_params` | `str` | `"warn"` | 见 [未知参数处理策略](#255-未知参数处理策略) |

¹`fallback_models` 模型降级策略:

Expand Down Expand Up @@ -819,7 +806,7 @@ fallback_models = {
| `stop_on_error` | `bool` | `False` | 遇错时停止后续请求,返回已处理结果 |
| `callbacks` | `list` | - | 进度回调函数列表 |
| `custom_ids` | `list[str]` | - | 自定义请求 ID 列表 |
| `keep` | `set/list` | 见 [字段存储控制](#254) | 迭代后保留的数据字段 |
| `keep` | `set/list` | 见 [字段存储控制](#254-字段存储控制) | 迭代后保留的数据字段 |

## 5. 框架集成

Expand Down
52 changes: 32 additions & 20 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,20 +124,23 @@ Project Documentation:

### 1.1 Installation

#### 1.1.1 SDK Installation
#### 1.1.1 Install as Agent Skill (Recommended)

CNLLM now provides a dedicated Agent Skill following the Claude Skills / Agent Skills standard.

**Install the skill**:
```bash
pip install cnllm
npx skills add kanchengw/cnllm-skill
```

#### 1.1.2 Install as Agent Skill
📖 For full documentation and examples, visit the dedicated skill repository:
https://github.com/kanchengw/cnllm-skill

**One-Click Install**:
#### 1.1.2 SDK Installation
```bash
npx skills add https://github.com/kanchengw/cnllm
pip install cnllm
```

Or manually copy the `SKILL.md` file from the project root to your agent's skill directory. When **calling Chinese LLMs, CNLLM will be used as the preferred option**.

### 1.2 Client Initialization

#### 1.2.1 Sync Client
Expand Down Expand Up @@ -266,7 +269,7 @@ In streaming calls, access via `for` loop with **real-time accumulation** for re
| **raw**: model native response | `resp.raw` | `Dict` | `{"id": "...", "choices": [...], ...}` |

**repr():**
During streaming, displays **real-time merged chunks and accumulated field results**, not the real-time streaming chunks list; does not change the streaming response object type, which is an **iterator** containing all standard streaming chunks.
Displays **real-time field keys aggregation and field value accumulation** in a non-streaming-like **dictionary format**; which does not change the streaming response object type, which is an **iterator** containing all standard streaming chunks.
```python
for chunk in resp:
print(resp)
Expand Down Expand Up @@ -326,15 +329,12 @@ BatchResponse outer structure, where each response under `results[request_id]` i
```python
{
"status": {"elapsed": "3.42s", "success_count": 2, "fail_count": 1, "total": 3}, # Statistics
"usage": {"prompt_tokens": 5, "total_tokens": 5}, # Batch processing total usage info
"errors": {"request_2": "error message"}, # Mapping of all failed requests' request_id and error messages
"results": { # Mapping of all successful requests' request_id and standard responses
"request_0": {...},
"request_1": {...}
},
"usage": {"prompt_tokens": 5, "total_tokens": 5}, # Batch processing total usage info
"errors": {"request_2": "error message"}, # Mapping of all failed requests' request_id and error messages
"results": {"request_0": {...}, "request_1": {...}}, # Mapping of all successful requests' request_id and standard responses
"think": {"request_0": "...", "request_1": "..."},
"still": {"request_0": "...", "request_1": "..."},
"tools": {"request_0": [...], "request_1": [...]},
"tools": {"request_0": {...}, "request_1": {...}},
"raw": {"request_0": {...}, "request_1": {...}}
}
```
Expand Down Expand Up @@ -426,6 +426,21 @@ resp.to_dict() # Default: keeps vectors field + metadata (status/u
resp.to_dict(results=True) # Keeps results field + metadata (status/usage/batch_info)
```

#### 2.3.3 Embeddings Batch Response Structure

BatchEmbeddingResponse outer structure, where each response under `results[request_id]` is in **OpenAI standard Embeddings response structure**:

```python
{
"status": {"elapsed": "3.35s", "success_count": 1, "fail_count": 1, "total": 2},
"batch_info": {"batch_size": 2, "batch_count": 2, "dimension": 1024},
"usage": {"prompt_tokens": 5, "total_tokens": 5},
"results": {"request_0": {...}, "request_1": {...}}
"errors": {"request_2": "error message"},
"vectors": {"request_0": [...]} # Mapping of all successful requests' request_id and embedding vectors
}
```

### 2.4 Batch Call Control Parameters

Batch calls support **retry strategy, concurrency control** parameter configuration:
Expand Down Expand Up @@ -544,10 +559,7 @@ Use `drop_params` to control the handling behavior of **incompatible parameters

**Notes:**
- When doing batch calls, if global parameters contain unknown parameters, `drop_params="strict"` directly throws an exception without actually starting the batch task;
If a single request within the batch task contains unknown parameters, `drop_params="strict"` directly puts that request into the `errors` field without actually executing that request, and continues executing subsequent batch tasks.

- Specifically, when configured with `drop_params="strict"` and `stop_on_error=True`, the first error encountered in batch requests immediately interrupts the batch task while returning already processed request results. See [Stop on Error](#253-stop-on-error).
- The `drop_params` parameter supports client configuration and all calling methods (including `create` single-call method).
- If a single request within the batch task contains unknown parameters, `drop_params="strict"` directly puts that request into the `errors` field without actually executing that request, and continues executing subsequent batch tasks.

## 3. CNLLM Standard Response Format

Expand Down Expand Up @@ -638,7 +650,7 @@ Note: Not all supported models support all request parameters. Please refer to v

| Parameter | Type | Default | Description |
| ------------------- | ------------------------------- | ------------------------------- | ------------------------------------------------------ |
| `model` | `str` | - | Model name, required at client initialization, can be overridden at call entry |
| `model` | `str` | - | Model name, see [Supported Models](#supported-models)|
| `api_key` | `str` | - | API key |
| `base_url` | `str` | Auto-adapted | Customizable API address |
| `messages` | `list[dict]`/`list[list[dict]]` | - | `chat()` input parameter, supports context management/image recognition (call entry configuration only) |
Expand Down
Loading
Loading