From ba8c5b31b7122ddc02d16cf447ab9855031c956b Mon Sep 17 00:00:00 2001 From: hlin99 Date: Mon, 6 Apr 2026 21:01:15 +0800 Subject: [PATCH] =?UTF-8?q?docs:=20unified=20structure=20=E2=80=94=20LICEN?= =?UTF-8?q?SE,=20README,=20CONTRIBUTING,=20bot/?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add Apache 2.0 LICENSE - Rewrite README.md (concise, one-screen, links to guide) - Add CONTRIBUTING.md - Move bot files to bot/ (DESIGN_PRINCIPLES, DEV_LOOP, REVIEW_POLICY, iterations/) - Add documentation update rules to DEV_LOOP.md --- CONTRIBUTING.md | 33 +++++ LICENSE | 202 ++++++++++++++++++++++++++++ README.md | 61 +++++---- REVIEW_POLICY.md | 62 --------- bot/AUTHOR_POLICY.md | 55 ++++++++ bot/BOT_POLICY.md | 23 ++++ bot/DESIGN_PRINCIPLES.md | 37 +++++ bot/DEV_LOOP.md | 34 +++++ bot/ENTRY.md | 18 +++ bot/REVIEW_POLICY.md | 32 +++++ {docs => bot}/iterations/current.md | 0 docs/DESIGN_PRINCIPLES.md | 90 ------------- docs/DEV_LOOP.md | 85 ------------ 13 files changed, 466 insertions(+), 266 deletions(-) create mode 100644 CONTRIBUTING.md create mode 100644 LICENSE delete mode 100644 REVIEW_POLICY.md create mode 100644 bot/AUTHOR_POLICY.md create mode 100644 bot/BOT_POLICY.md create mode 100644 bot/DESIGN_PRINCIPLES.md create mode 100644 bot/DEV_LOOP.md create mode 100644 bot/ENTRY.md create mode 100644 bot/REVIEW_POLICY.md rename {docs => bot}/iterations/current.md (100%) delete mode 100644 docs/DESIGN_PRINCIPLES.md delete mode 100644 docs/DEV_LOOP.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..31c5fea --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,33 @@ +# Contributing to xPyD-bench + +## Development Setup + +```bash +git clone https://github.com/xPyD-hub/xPyD-bench +cd xPyD-bench +pip install -e ".[dev]" +``` + +## Running Tests + +```bash +pytest tests/ -q +``` + +## Code Style + +- Python 3.10+ +- Ruff for linting: `ruff check xpyd_bench/ tests/` +- All PRs must pass CI (lint + tests on 3.10/3.11/3.12 + integration trigger) + +## PR Process + +1. Create a branch from `main` +2. Make changes, add tests +3. Push and open PR +4. CI runs: unit tests + integration tests (via trigger) +5. Review and merge + +## Bot Development + +See [bot/](bot/) for automated development policies and iteration records. diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/README.md b/README.md index d0a47a1..0438b70 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,17 @@ -๐Ÿ“– **[ๅฎŒๆ•ดไฝฟ็”จๆŒ‡ๅ— โ†’ docs/guide.md](docs/guide.md)** - # xPyD-bench -Benchmarking & PD ratio planning tool for [xPyD-proxy](https://github.com/xPyD-hub/xPyD-proxy). +**Benchmarking & PD ratio planning tool for LLM inference endpoints.** -## Features +xPyD-bench measures the performance of OpenAI-compatible LLM serving endpoints with detailed latency, throughput, and quality metrics. Built as a superset of vLLM bench with full CLI compatibility. -- **`xpyd-bench`** โ€” Benchmark xPyD proxy with configurable concurrency, request patterns, and both `/v1/completions` and `/v1/chat/completions` endpoints +## Key Features -For PD ratio planning, see [xPyD-plan](https://github.com/xPyD-hub/xPyD-plan). +- **vLLM bench compatible CLI** โ€” drop-in replacement, same arguments +- **Rich metrics** โ€” TTFT, TPOT, ITL, P50/P90/P95/P99, throughput +- **Flexible load patterns** โ€” constant, burst, ramp, poisson, custom +- **Multiple datasets** โ€” JSONL, CSV, JSON, synthetic generation +- **Advanced analysis** โ€” comparison, regression detection, SLA validation, cost estimation +- **Reports** โ€” JSON, CSV, Markdown, HTML dashboard, JUnit XML, Prometheus ## Install @@ -16,37 +19,37 @@ For PD ratio planning, see [xPyD-plan](https://github.com/xPyD-hub/xPyD-plan). pip install xpyd-bench ``` -## Quick Start +Or as part of the full xPyD toolkit: + +```bash +pip install xpyd +``` -### Benchmark +## Quick Start ```bash -# Run benchmark against a running xPyD proxy -xpyd-bench --target http://localhost:8080 \ - --endpoint chat \ - --concurrency 16 \ - --num-requests 200 \ - --output results.json - -# Use completion endpoint -xpyd-bench --target http://localhost:8080 \ - --endpoint completion \ - --concurrency 8 \ - --num-requests 100 +# Benchmark a running endpoint +xpyd-bench --base-url http://localhost:8080 \ + --model my-model \ + --dataset-name random \ + --num-prompts 100 + +# Compare two runs +xpyd-bench compare baseline.json candidate.json ``` -## Configuration +## Part of xPyD -See [examples/](examples/) for sample configs and scenarios. +xPyD-bench is part of the [xPyD ecosystem](https://github.com/xPyD-hub/xPyD) for PD-disaggregated LLM serving: -## Output Metrics +| Component | Description | +|-----------|-------------| +| [xpyd-proxy](https://github.com/xPyD-hub/xPyD-proxy) | Prefill-Decode disaggregated proxy | +| [xpyd-sim](https://github.com/xPyD-hub/xPyD-sim) | OpenAI-compatible inference simulator | +| **xpyd-bench** | Benchmarking & planning tool | -- **TTFT** โ€” Time to first token -- **TPS** โ€” Tokens per second (per request & aggregate) -- **Latency** โ€” P50 / P90 / P99 end-to-end latency -- **Throughput** โ€” Total requests/sec and tokens/sec -- **Error rate** โ€” Failed requests count and percentage +๐Ÿ“– **[Full Guide โ†’](docs/guide.md)** | ๐Ÿ’ก **[Examples โ†’](examples/)** | ๐Ÿ—๏ธ **[Contributing โ†’](CONTRIBUTING.md)** ## License -TBD +Apache 2.0 โ€” see [LICENSE](LICENSE) diff --git a/REVIEW_POLICY.md b/REVIEW_POLICY.md deleted file mode 100644 index 4e3ee8a..0000000 --- a/REVIEW_POLICY.md +++ /dev/null @@ -1,62 +0,0 @@ -# Review Policy - -## Roles - -| Role | GitHub Account | Action | -|------|---------------|--------| -| Implementer | `hlin99` | Write code, submit PRs, fix issues | -| Reviewer 1 | `hlin99-Review-Bot` | Review PRs: approve / request changes / close | -| Reviewer 2 | `hlin99-Review-BotX` | Review PRs: approve / request changes / close | - -## Timing - -| Parameter | Value | -|-----------|-------| -| Iteration interval | 10 minutes | -| PR wait for review | max 15 minutes | -| Fix after request changes | max 10 minutes | -| Reviewer check frequency | every 5 minutes | -| Reviewer response deadline | 15 minutes after assign | -| Reviewer timeout action | close PR (iteration failed) | -| Total round timeout | 1 hour from PR creation | -| Round timeout action | close PR (iteration failed) | - -## Review Criteria - -Reviewers evaluate each PR on two dimensions: - -### 1. Idea Value -- Is the direction/approach valuable for the project? -- Does it align with the project goals? -- **If NO โ†’ close PR immediately** (one close = PR rejected) - -### 2. Code Quality -- Is the code correct? -- Are tests included/passing? -- Is `docs/iterations/current.md` updated with clear description? -- Does `docs/guide.md` reflect changes (if applicable)? -- **If idea is good but code has issues โ†’ request changes** - -## Decision Rules - -| Scenario | Action | -|----------|--------| -| Both reviewers approve | Auto-merge | -| One approves, one requests changes | Implementer fixes, reviewers re-review | -| Either reviewer closes | PR closed, iteration failed | -| Both approve after fixes | Auto-merge | -| Timeout (15min no review) | PR closed, iteration failed | -| Total timeout (1 hour) | PR closed, iteration failed | - -## Iteration Record - -Every PR MUST update `docs/iterations/current.md` with: -- What was done this iteration -- Result: merged / closed (with reason) -- Reviewer scores/comments summary - -## Auto-Merge Requirements - -- 2 approvals from designated reviewers -- CI passes (all checks green) -- No unresolved review comments diff --git a/bot/AUTHOR_POLICY.md b/bot/AUTHOR_POLICY.md new file mode 100644 index 0000000..9f28aa9 --- /dev/null +++ b/bot/AUTHOR_POLICY.md @@ -0,0 +1,55 @@ + + + +# Author Policy โ€” xPyD-bench + +Rules for the bot that writes code and submits PRs. + +## Identity + +| Role | GitHub Account | +|------|---------------| +| Author | `hlin99` | + +## Before Coding + +1. Pull latest main: `git pull origin main` +2. Create feature branch: `git checkout -b /` +3. Read [DESIGN_PRINCIPLES.md](DESIGN_PRINCIPLES.md) for architecture constraints. + +## Code Quality + +1. Run lint: `ruff check xpyd_bench tests` +2. Run tests: `pytest tests/ -q` +3. Rebase before push: `git pull --rebase origin main` + +## PR Submission + +1. **One PR per task.** Don't bundle unrelated changes. +2. **Descriptive title.** Format: `type: short description` (e.g., `feat: add SLA validation`). +3. **PR body must include:** what changed, why, test coverage, breaking changes. +4. **All CI must pass** before requesting review. + +## Responding to Review + +1. Fix all blockers before re-requesting review. +2. Reply to every comment โ€” "Fixed in " or explain disagreement with evidence. +3. Push new commits (don't force push over reviewer comments). +4. **Never force push.** If the branch is too messy, close the PR and open a new one. + +## Documentation Updates + +Every PR must update relevant documentation: + +| Change Type | Update | +|---|---| +| New feature / CLI argument | `docs/guide.md` โ€” add usage section | +| Architecture change | `docs/architecture.md` โ€” update descriptions | +| Design decision | `docs/design.md` โ€” append decision record | +| Quick Start affected | `README.md` โ€” update (keep it one screen max, link to guide.md) | +| PR completed | `bot/iterations/current.md` โ€” append summary | + +`docs/guide.md` is the source of truth for how to use the tool. +`docs/architecture.md` and `docs/design.md` are append-only โ€” never delete history. + +When current iteration is complete, rename `bot/iterations/current.md` to `YYYY-MM-DD-.md` and create a fresh `current.md`. diff --git a/bot/BOT_POLICY.md b/bot/BOT_POLICY.md new file mode 100644 index 0000000..934ebfc --- /dev/null +++ b/bot/BOT_POLICY.md @@ -0,0 +1,23 @@ + + + +# Bot Policy โ€” xPyD-bench + +## Language +- **English only** โ€” all code, docs, issues, PRs, comments on GitHub must be in English. No Chinese characters. + +## Code Rules +- All changes go through PR. Never push directly to main. +- Every PR must have tests. No untested code. +- CI must be 100% green before merge. No skips allowed. +- No test may be skipped. If a test can't run, fix it or remove it. +- Rebase to latest main before pushing. + +## Testing +- Unit tests in `tests/` โ€” pure bench logic, no external dependencies. +- Integration tests in [xPyD-integration](https://github.com/xPyD-hub/xPyD-integration) โ€” cross-component tests. + +## Architecture +- Bench is a pure client tool. No server components. +- All inference backend interaction goes through xPyD-integration tests. +- Follow vLLM bench CLI compatibility (see [DESIGN_PRINCIPLES.md](DESIGN_PRINCIPLES.md)). diff --git a/bot/DESIGN_PRINCIPLES.md b/bot/DESIGN_PRINCIPLES.md new file mode 100644 index 0000000..69ff7f9 --- /dev/null +++ b/bot/DESIGN_PRINCIPLES.md @@ -0,0 +1,37 @@ + + + +# xPyD-bench Design Principles + +## Core Positioning +A comprehensive benchmarking tool for LLM inference endpoints, built as an enhancement on top of vLLM bench. + +## CLI Compatibility +- **CLI arguments must be fully compatible with vLLM bench** โ€” users can switch from `python benchmark_serving.py` to `xpyd-bench` without changing their command line +- Extended features beyond vLLM bench CLI should be configured via **YAML config file** (`--config config.yaml`) +- Basic usage = CLI only (vLLM bench compatible), advanced usage = CLI + YAML + +## Alignment with vLLM Bench +- CLI arguments must align with vLLM bench where applicable +- Output format must align with vLLM bench +- We build incremental improvements on top of vLLM bench, not a replacement + +## Areas of Enhancement +- **Full OpenAI API coverage**: every parameter matters โ€” all 4 input formats, temperature, top_k, top_p, frequency_penalty, presence_penalty, stop sequences, logprobs, etc. No omissions. +- **Flexible request rate patterns**: vLLM bench only supports per-second rate. Support per-5s, per-10s, burst patterns, ramp-up/ramp-down, Poisson distribution, custom patterns. +- **Rich dataset input**: support JSONL, JSON, CSV โ€” let users bring their own data easily. +- **Extended metrics**: beyond what vLLM bench provides. + +## Architecture +- Bench is a **pure client tool** โ€” it sends requests and measures responses. +- No built-in server or simulator. Backend simulation is handled by [xPyD-sim](https://github.com/xPyD-hub/xPyD-sim). +- Integration tests (bench + sim, bench + proxy + sim) live in [xPyD-integration](https://github.com/xPyD-hub/xPyD-integration). +- Code in `xpyd_bench/`, tests in `tests/`. + +## Code Organization +- `xpyd_bench/bench/` โ€” core benchmark runner, metrics, rate patterns +- `xpyd_bench/reporting/` โ€” output formats (JSON, CSV, HTML, Prometheus) +- `xpyd_bench/scenarios/` โ€” preset configurations +- `xpyd_bench/distributed/` โ€” multi-worker coordination +- `xpyd_bench/plugins/` โ€” backend plugin system +- `tests/` โ€” unit tests only (no external dependencies) diff --git a/bot/DEV_LOOP.md b/bot/DEV_LOOP.md new file mode 100644 index 0000000..06b4eb2 --- /dev/null +++ b/bot/DEV_LOOP.md @@ -0,0 +1,34 @@ + + + +# Development Loop โ€” xPyD-bench + +Autonomous iteration loop. References policies for rules โ€” this file only describes the operational workflow. + +## Rules + +All rules are defined in policy files. Read them first: +- [BOT_POLICY.md](BOT_POLICY.md) โ€” hard constraints +- [AUTHOR_POLICY.md](AUTHOR_POLICY.md) โ€” code quality, PR process, doc updates +- [REVIEW_POLICY.md](REVIEW_POLICY.md) โ€” timing, review standards + +## Setup (every iteration) +``` +git config user.email "tony.lin@intel.com" +git config user.name "hlin99" +``` + +## Each Iteration + +1. Pull latest code +2. Read `ROADMAP.md` โ€” find the next incomplete milestone +3. Read `DESIGN_PRINCIPLES.md` โ€” follow the rules +4. Check open issues/PRs โ€” handle unmerged PRs first +5. Create GitHub Issue: problem, solution, acceptance criteria, tests +6. Create branch, implement code + tests +7. Verify locally (lint + tests per AUTHOR_POLICY.md) +8. Push, create PR, request review +9. Wait for review (timing per REVIEW_POLICY.md) +10. Fix review comments, iterate until approved +11. Merge and update `bot/iterations/current.md` +12. Next iteration diff --git a/bot/ENTRY.md b/bot/ENTRY.md new file mode 100644 index 0000000..3cf7125 --- /dev/null +++ b/bot/ENTRY.md @@ -0,0 +1,18 @@ + + + + +# Bot Entry Point + +Read this file first when starting any automated task on this repo. + +## Required Reading (in order) + +1. **[BOT_POLICY.md](BOT_POLICY.md)** โ€” Hard rules. Must follow. +2. **[AUTHOR_POLICY.md](AUTHOR_POLICY.md)** โ€” Rules for writing code and submitting PRs. +3. **[REVIEW_POLICY.md](REVIEW_POLICY.md)** โ€” Rules for reviewing PRs. +4. **[DESIGN_PRINCIPLES.md](DESIGN_PRINCIPLES.md)** โ€” Architecture constraints and design rules. +5. **[DEV_LOOP.md](DEV_LOOP.md)** โ€” Development workflow (operational steps, references policies above). +6. **[iterations/current.md](iterations/current.md)** โ€” Current task context. + +Files 1-4 are mandatory for all repos. Files 5-6 are repo-specific. diff --git a/bot/REVIEW_POLICY.md b/bot/REVIEW_POLICY.md new file mode 100644 index 0000000..9448264 --- /dev/null +++ b/bot/REVIEW_POLICY.md @@ -0,0 +1,32 @@ + + + +# Review Policy โ€” xPyD-bench + +## Roles + +| Role | GitHub Account | Action | +|------|---------------|--------| +| Implementer | `hlin99` | Write code, submit PRs, fix issues | +| Reviewer 1 | `hlin99-Review-Bot` | Review PRs: approve / request changes / close | +| Reviewer 2 | `hlin99-Review-BotX` | Review PRs: approve / request changes / close | + +## Timing Parameters + +These are the single source of truth for all timing values: + +| Parameter | Value | +|-----------|-------| +| Iteration interval | 10 minutes | +| PR wait for review | max 15 minutes | +| Fix after request changes | max 10 minutes | +| Reviewer check frequency | every 5 minutes | +| Reviewer response deadline | 15 minutes after assign | +| Reviewer timeout action | close PR (iteration failed) | + +## Review Standards + +- At least 1 approval required to merge. +- Blockers (๐Ÿ”ด) must be fixed. No exceptions. +- Yellow (๐ŸŸก) issues should be fixed unless author provides good reason. +- All CI checks must pass. diff --git a/docs/iterations/current.md b/bot/iterations/current.md similarity index 100% rename from docs/iterations/current.md rename to bot/iterations/current.md diff --git a/docs/DESIGN_PRINCIPLES.md b/docs/DESIGN_PRINCIPLES.md deleted file mode 100644 index 01853b7..0000000 --- a/docs/DESIGN_PRINCIPLES.md +++ /dev/null @@ -1,90 +0,0 @@ -# xPyD-bench Design Principles - -## Core Positioning -A comprehensive benchmarking tool for LLM inference endpoints, built as an enhancement on top of vLLM bench. - -## CLI Compatibility -- **CLI arguments must be fully compatible with vLLM bench** โ€” users can switch from `python benchmark_serving.py` to `xpyd-bench` without changing their command line -- Extended features beyond vLLM bench CLI should be configured via **YAML config file** (`--config config.yaml`) -- Basic usage = CLI only (vLLM bench compatible), advanced usage = CLI + YAML - -## Alignment with vLLM Bench -- CLI arguments must align with vLLM bench where applicable -- Output format must align with vLLM bench -- We build incremental improvements on top of vLLM bench, not a replacement - -## Areas of Enhancement (think creatively, these are examples) -- **Full OpenAI API coverage**: every parameter matters โ€” all 4 input formats, temperature, top_k, top_p, frequency_penalty, presence_penalty, stop sequences, logprobs, etc. No omissions. -- **Flexible request rate patterns**: vLLM bench only supports per-second rate. Support per-5s, per-10s, burst patterns, ramp-up/ramp-down, Poisson distribution, custom patterns. -- **Rich dataset input**: support JSONL, JSON, CSV โ€” let users bring their own data easily. -- **Extended metrics**: beyond what vLLM bench provides. -- Think about what else users need that vLLM bench doesn't offer. - -## Dummy Server โ€” vLLM Boundary Rule (IMPORTANT) - -The dummy server simulates a **vLLM backend** for bench validation. It must stay strictly within vLLM's API surface: - -### Hard Rules -1. **Only implement features that vLLM actually supports.** If vLLM doesn't have it, the dummy server must not have it. -2. **OpenAI API parameters**: only those that vLLM's OpenAI-compatible server accepts (including vLLM extensions like `best_of`, `top_k`, `min_p`, etc.). -3. **Response format**: must match vLLM's response structure, including vLLM-specific fields (`stop_reason`, `service_tier`, `kv_transfer_params`). -4. **No test-only hacks**: features like gzip decompression, rate-limit simulation (429/X-RateLimit headers), custom header echo, speculative decoding metadata injection, or online /v1/batches API do NOT belong in the dummy server โ€” they are not vLLM behaviors. - -### API Compatibility Levels - -The dummy server implements two levels of API compatibility, matching xPyD-sim: - -**Level 1: OpenAI API Spec** -- Accept and validate all OpenAI API parameters -- Response format matches OpenAI spec -- Parameter range validation (temperature, top_p, penalties) -- response_format (json_object / json_schema) -- Embedding encoding_format (float / base64) - -**Level 2: vLLM Backend Extensions** -- Accept all vLLM-specific sampling params without error -- Response includes vLLM fields: `stop_reason`, `service_tier` -- base64 encoding uses little-endian byte order - -### What Belongs Here vs. Elsewhere -| Need | Where to implement | -|---|---| -| Simulating vLLM inference behavior | โœ… Dummy server | -| Testing bench's own features (compression, rate-limit tracking, header injection) | โŒ NOT dummy server โ€” use a separate test fixture/middleware | -| Features vLLM doesn't support | โŒ NOT dummy server | - -### API Compatibility Levels - -The dummy server implements two levels of API compatibility, matching xPyD-sim: - -**Level 1: OpenAI API Spec** -- Accept and validate all OpenAI API parameters -- Response format matches OpenAI spec -- Parameter range validation (temperature, top_p, penalties) -- response_format (json_object / json_schema) -- Embedding encoding_format (float / base64, little-endian byte order) - -**Level 2: vLLM Backend Extensions** -- Accept all vLLM-specific sampling params without error -- Response includes vLLM fields: `stop_reason`, `service_tier` -- Responses match vLLM's shape, not just OpenAI's - -### Co-Evolution with xPyD-sim -- The dummy server will eventually be **replaced by xPyD-sim** as the canonical vLLM simulator. -- Any feature added to the dummy must also exist (or be planned) in xPyD-sim. -- When in doubt, check vLLM's source: `vllm/entrypoints/openai/` is the reference. -- Code must be decoupled from bench โ€” separate module, no imports between them. - -## Principles -- **Independent thinking**: reference vLLM bench for alignment, but design our own enhancements -- **Data-driven**: all metrics from real measurements -- **User-friendly**: easy CLI, sensible defaults, clear output -- **Rigorous**: every parameter, every edge case matters - -## Rules -- Committer must be `hlin99 ` -- All code, docs, issues, PRs in English -- Commit messages: conventional commits format -- Code in `xpyd_bench/`, tests in `tests/` -- Dummy server in `xpyd_bench/dummy/` (decoupled from bench code) -- Follow pyproject.toml ruff/isort config diff --git a/docs/DEV_LOOP.md b/docs/DEV_LOOP.md deleted file mode 100644 index a889be4..0000000 --- a/docs/DEV_LOOP.md +++ /dev/null @@ -1,85 +0,0 @@ -# Development Loop - -Autonomous infinite loop. Runs until explicitly stopped. - -## Setup (every iteration) -``` -git config user.email "tony.lin@intel.com" -git config user.name "hlin99" -``` - -## Each Iteration - -1. Pull latest code -2. Read `ROADMAP.md` โ€” find the next incomplete milestone -3. Read `DESIGN_PRINCIPLES.md` โ€” follow the rules -4. Check open issues/PRs โ€” handle unmerged PRs first (fix CI failures, address review comments) -5. If no milestone left, create new ones (see Phase 2 below) -6. Create GitHub Issue: problem, solution, acceptance criteria, tests -7. Create branch, implement code + tests -8. Pass lint: `ruff check src tests && isort --check src tests` -9. Update `docs/iterations/current.md` with what you did this iteration -10. Create PR (body contains `Closes #N`) -11. Wait for CI green. Fix failures. Never merge red CI. -12. **Wait for reviewer bots** โ€” do NOT self-merge. Two reviewer bots (`hlin99-Review-Bot` and `hlin99-Review-BotX`) will be auto-assigned. -13. Handle review result: - - **2 approvals** โ†’ auto-merge โ†’ update ROADMAP.md โ†’ go to step 1 - - **request changes** โ†’ fix code, push to same PR โ†’ wait for re-review (max 10 min to fix) - - **closed by reviewer** โ†’ iteration failed โ†’ push update to `docs/iterations/current.md` on main recording the failure (what was attempted, why rejected, reviewer comments) โ†’ go to step 1 with a different task -14. Go to step 1 - -## Review Rules (see REVIEW_POLICY.md) - -- 2 reviewer bots are auto-assigned on PR creation -- Either reviewer can close the PR (idea rejected) โ€” one close = PR dead -- Both must approve for merge -- Reviewer timeout: 15 minutes โ†’ PR auto-closed -- Total round timeout: 1 hour โ†’ PR auto-closed -- Implementer (hlin99) must NEVER approve or merge their own PR - -## Timing - -| Parameter | Value | -|-----------|-------| -| Iteration interval | 10 minutes | -| PR wait for review | max 15 minutes | -| Fix after request changes | max 10 minutes | -| Total round timeout | 1 hour | - -## Deliverables (every iteration) - -Every PR MUST include: -- Code changes (if any) -- Tests for new code -- Updated `docs/iterations/current.md` describing what was done - -## Rules -- Committer must be `hlin99 ` โ€” always set git config before any commit -- All code, docs, issues, PRs in English -- Commit messages: conventional commits format -- Never self-merge โ€” wait for reviewer bots - -## Phase 1: Roadmap-Driven -Follow ROADMAP.md milestones in order. - -## Phase 2: Continuous Evolution -When all milestones are done: -1. Review the project โ€” find limitations, improvements, new scenarios -2. Create new milestones in ROADMAP.md -3. Return to Phase 1 - -## Iteration Tracking - -`docs/iterations/current.md` must maintain a running log at the bottom: - -```markdown -## Iteration History - -| # | Date | Task | Result | Reviewer Comments | -|---|------|------|--------|-------------------| -| 1 | 2026-04-06 | Added X feature | โœ… merged | Both approved | -| 2 | 2026-04-06 | Refactored Y | โŒ closed | BotX: idea not valuable | -| 3 | 2026-04-06 | Fixed Z bug | โœ… merged | Bot requested changes, fixed | -``` - -This table is the source of truth for iteration success/failure rate.