Background
PR #852 promotes the llmapi (PyTorch/LLM API) backend as the simpler getting-started path. Reviewer @yinggeh correctly noted that unlike inflight_batcher_llm, there is currently no automated CI coverage for the llmapi path.
The existing tests reference:
Goal
Add an equivalent end-to-end CI test for the llmapi path, covering:
- Launch via
launch_triton_server.py --model_repo=.../llmapi/
- Basic generate request (
/v2/models/tensorrt_llm/generate)
- Optionally: request cancellation
Scope
This likely lives in NVIDIA/TensorRT-LLM under triton_backend/ci/ (mirroring L0_backend_trtllm), coordinated with the server repo's L0_openai_trtllm if OpenAI-compatible endpoint coverage is also needed.
Background
PR #852 promotes the
llmapi(PyTorch/LLM API) backend as the simpler getting-started path. Reviewer @yinggeh correctly noted that unlikeinflight_batcher_llm, there is currently no automated CI coverage for thellmapipath.The existing tests reference:
L0_backend_trtllm— usesinflight_batcher_llmL0_openai_trtllm— usesinflight_batcher_llmGoal
Add an equivalent end-to-end CI test for the
llmapipath, covering:launch_triton_server.py --model_repo=.../llmapi//v2/models/tensorrt_llm/generate)Scope
This likely lives in
NVIDIA/TensorRT-LLMundertriton_backend/ci/(mirroringL0_backend_trtllm), coordinated with the server repo'sL0_openai_trtllmif OpenAI-compatible endpoint coverage is also needed.