OpenAI-compatible LLM inference simulator for testing and benchmarking.
xPyD-sim simulates prefill and decode nodes with realistic latency behavior, enabling testing of xPyD-proxy and xPyD-bench without real GPU hardware.
- Prefill/Decode simulation — separate modes with configurable latency
- Full OpenAI API — /v1/completions, /v1/chat/completions, /v1/embeddings, /v1/models
- vLLM compatible — accepts all vLLM-specific parameters
- Scheduling simulation — batch formation, decode iteration, queue depth
- Calibration tool — fit latency curves from real hardware measurements
- Prometheus metrics — /metrics endpoint for monitoring
pip install xpyd-simOr as part of the full xPyD toolkit:
pip install xpyd# Start dual mode (prefill + decode)
xpyd-sim --mode dual --port 8000
# Start PD disaggregated
xpyd-sim --mode prefill --port 8001
xpyd-sim --mode decode --port 8002| Component | Description |
|---|---|
| xpyd-proxy | PD-disaggregated proxy |
| xpyd-sim | OpenAI-compatible inference simulator |
| xpyd-bench | Benchmarking & planning tool |
📖 Full Guide → | 💡 Examples → | 🏗️ Contributing →
Apache 2.0 — see LICENSE