test(completions): add E2E tests for /v1/completions gRPC endpoint#1021
test(completions): add E2E tests for /v1/completions gRPC endpoint#1021vschandramourya wants to merge 7 commits intomainfrom
Conversation
Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
…streaming Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
📝 WalkthroughWalkthroughAdds a new e2e completions test package with non‑streaming and streaming pytest cases for the OpenAI Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive suite of end-to-end tests for the OpenAI Completions API, covering both streaming and non-streaming modes. The tests validate core functionalities such as stop sequences, echo, suffixes, and parallel sampling across different backends. Review feedback focuses on enhancing the robustness of streaming tests by ensuring consistent assertions on response object types and finish reasons, and suggests refactoring repetitive streaming logic into shared helper functions.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e_test/completions/test_basic.py`:
- Around line 178-188: Extract the repeated loop that consumes a streaming
response into a reusable helper (e.g., parse_stream or
collect_texts_and_reasons) that takes the stream iterable and returns the
accumulated texts list and finish_reasons list; replace the inline loops that
iterate over stream and inspect chunk.object, chunk.choices, choice.text and
choice.finish_reason (the blocks that build texts and finish_reasons) with calls
to this helper in test_basic.py (the occurrences around lines where
texts/finish_reasons are built). Ensure the helper asserts chunk.object ==
"text_completion" as before and preserves ordering and behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 75e4c2ce-58ff-4c75-8df4-c9691c6e5d4e
📒 Files selected for processing (2)
e2e_test/completions/__init__.pye2e_test/completions/test_basic.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: de67579a90
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
e2e_test/completions/test_basic.py
Outdated
| full_text = "".join(c.choices[0].text for c in stream if c.choices and c.choices[0].text) | ||
|
|
||
| assert len(full_text) > 0 | ||
| assert "Paris" in full_text |
There was a problem hiding this comment.
Avoid hard-coding exact token in streaming concatenation test
This test is meant to validate that streaming chunks concatenate correctly, but assert "Paris" in full_text couples it to one exact model phrasing. In the inspected e2e_test/completions/test_basic.py path (engines sglang/vllm), valid outputs like different casing or alternate wording can make CI fail even when streaming assembly is correct, so this introduces unnecessary flakiness in E2E coverage.
Useful? React with 👍 / 👎.
- Extract _collect_stream helper to deduplicate streaming loop - Assert chunk.object == "text_completion" on every streaming chunk - Strengthen stop sequence assertions (exactly 1 finish_reason + text) - Accept "stop" or "length" for max_tokens=0 finish_reason - Add finish_reason assertion to non-streaming max_tokens=0 test Refs: #1021 Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e_test/completions/test_basic.py`:
- Around line 219-233: The test test_streaming_collects_full_text has a flaky
hard assertion that "Paris" appears in the completion; either remove that
specific content assertion or weaken it (e.g., assert for presence of "France"
or a lowercase-agnostic substring) or add a clarifying comment explaining why
the exact string "Paris" is required; update the
test_streaming_collects_full_text function accordingly and reference the
streaming helper _collect_stream if you need to inspect how the output is
gathered, and consider relying on or linking to test_streaming_basic which
already asserts non-empty output.
- Around line 9-13: The file declares an unused logger variable (logger =
logging.getLogger(__name__)) which creates dead code; remove that declaration
line from e2e_test/completions/test_basic.py or, if you want to keep it for
future debugging, add an inline comment next to the logger symbol explaining
it's intentionally reserved (e.g., "# kept for debug logging in future tests")
so linters and reviewers understand it's intentional.
- Around line 214-217: The assertion that the stop delimiter ("," ) is absent
from the concatenated streaming output is brittle because vLLM may not trim stop
sequences in streaming mode while SGLang does; update the test that checks
finish_reasons and full_text so it conditionally validates the absence of the
stop delimiter only for backends that trim stops (i.e., skip or relax the assert
"," not in full_text when running against vLLM). Locate the variables
finish_reasons and full_text in the test (e.g., in test_basic.py) and add a
backend/runtime check (detect vLLM vs SGLang via your existing backend flag or
client config) to either skip the comma check for vLLM or assert its
presence/acceptance accordingly.
- Around line 66-78: The test test_non_streaming_stop_sequence assumes stop
sequences are trimmed, but backends like sglang and vllm do not trim in
non-streaming mode; add a boolean flag STOP_SEQUENCE_TRIMMED (set to False for
those backends) and change the assertions on response.choices[0].text (and any
full_text checks) to branch: if STOP_SEQUENCE_TRIMMED assert the stop char (",")
is not in the text, else assert the text endswith the stop char (or contains the
stop as a suffix) so the test passes for both trimming and non-trimming engines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e7f21f6b-d208-44e8-8e03-4537e7c0f78a
📒 Files selected for processing (1)
e2e_test/completions/test_basic.py
| @@ -0,0 +1,252 @@ | |||
| """Basic tests for OpenAI Completions API (/v1/completions). | |||
There was a problem hiding this comment.
is this test run at all in CI?
in pr-test-rust.yaml today, we define which folder to run
since this is a new folder
i dont see any exec on this yet
There was a problem hiding this comment.
Good catch, wired it up now. Added e2e-1gpu-completions job to pr-test-rust.yml and with some clean up in test file.
Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
e2e_test/completions/test_basic.py (1)
19-20:⚠️ Potential issue | 🟠 MajorStop-sequence trimming expectations are hardcoded and backend-inaccurate.
With
@pytest.mark.engine("sglang", "vllm"), a fixedSTOP_SEQUENCE_TRIMMED = Truemakes stop assertions incorrect for known backend behavior differences (non-streaming and streaming), which can fail CI on vLLM/sglang paths. Please switch to backend/mode-conditional assertions (same pattern used ine2e_test/chat_completions/test_openai_server.py) instead of a static class constant.Based on learnings: in
lightseekorg/smgPR#606, non-streaming stop trimming isFalsefor bothis_vllm()andis_sglang(), while streaming overrides onlyis_vllm()toFalse.Also applies to: 77-80, 170-171, 221-227
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e_test/completions/test_basic.py` around lines 19 - 20, Replace the hardcoded STOP_SEQUENCE_TRIMMED = True with backend-and-mode conditional assertions: remove the static constant and instead determine expected trimming by calling is_vllm() and is_sglang() (and checking streaming mode where applicable) as done in e2e_test/chat_completions/test_openai_server.py; implement logic so non-streaming returns False for both is_vllm() and is_sglang(), and streaming only overrides is_vllm() to False, and update the other occurrences (the other STOP_SEQUENCE_TRIMMED checks in this test) to use the same conditional pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@e2e_test/completions/test_basic.py`:
- Around line 19-20: Replace the hardcoded STOP_SEQUENCE_TRIMMED = True with
backend-and-mode conditional assertions: remove the static constant and instead
determine expected trimming by calling is_vllm() and is_sglang() (and checking
streaming mode where applicable) as done in
e2e_test/chat_completions/test_openai_server.py; implement logic so
non-streaming returns False for both is_vllm() and is_sglang(), and streaming
only overrides is_vllm() to False, and update the other occurrences (the other
STOP_SEQUENCE_TRIMMED checks in this test) to use the same conditional pattern.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d8c9282a-c93a-46ac-82ab-ba56cc365ab6
📒 Files selected for processing (2)
.github/workflows/pr-test-rust.ymle2e_test/completions/test_basic.py
…y stop seq) Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
e2e_test/completions/test_basic.py (1)
19-19:⚠️ Potential issue | 🟠 MajorStop-sequence trimming expectation is hardcoded and will fail on backend variants.
STOP_SEQUENCE_TRIMMEDis fixed toTruein both classes, but test execution is parametrized forsglangandvllm, whose stop-trimming behavior differs by mode. This makes the assertions brittle and backend-incorrect.♻️ Proposed fix
class TestCompletionBasic: """Tests for OpenAI-compatible /v1/completions API (non-streaming).""" STOP_SEQUENCE_TRIMMED = True - def test_non_streaming_stop_sequence(self, model, api_client): + def test_non_streaming_stop_sequence(self, model, api_client, setup_backend): """Test that stop sequences cause the model to stop generating.""" + backend_name, *_ = setup_backend response = api_client.completions.create( model=model, prompt="Count: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10", max_tokens=200, temperature=0, stop=[","], ) assert response.choices[0].finish_reason == "stop" text = response.choices[0].text - if self.STOP_SEQUENCE_TRIMMED: + stop_sequence_trimmed = ( + False if backend_name in ("sglang", "vllm") else self.STOP_SEQUENCE_TRIMMED + ) + if stop_sequence_trimmed: assert "," not in text, f"Stop sequence ',' should not appear in output: {text}" else: assert text.endswith(","), f"Stop sequence ',' should be the suffix of output: {text}" class TestCompletionStreaming: """Tests for streaming /v1/completions API.""" STOP_SEQUENCE_TRIMMED = True - def test_streaming_stop_sequence(self, model, api_client): + def test_streaming_stop_sequence(self, model, api_client, setup_backend): """Test that stop sequences work in streaming mode.""" + backend_name, *_ = setup_backend stream = api_client.completions.create( model=model, prompt="Count: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10", max_tokens=200, temperature=0, stop=[","], stream=True, ) full_text, finish_reasons = self._collect_stream(stream) assert len(finish_reasons) == 1 assert finish_reasons[0] == "stop" - if self.STOP_SEQUENCE_TRIMMED: + stop_sequence_trimmed = ( + False if backend_name == "vllm" else self.STOP_SEQUENCE_TRIMMED + ) + if stop_sequence_trimmed: assert "," not in full_text, ( f"Stop sequence ',' should not appear in output: {full_text}" ) else: assert full_text.endswith(","), ( f"Stop sequence ',' should be the suffix of output: {full_text}" )Based on learnings: in
lightseekorg/smgPR#606, non-streaming stop trimming isFalsefor bothis_vllm()andis_sglang(), while streaming stop trimming isFalseonly foris_vllm()(the asymmetry is intentional).Also applies to: 64-81, 171-171, 205-228
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e_test/completions/test_basic.py` at line 19, STOP_SEQUENCE_TRIMMED is hardcoded true but must reflect backend and streaming mode; change its assignment to compute dynamically using the existing helpers: set STOP_SEQUENCE_TRIMMED = streaming and (not is_vllm()) so that non-streaming is always False and streaming is False only for vllm (use is_vllm() and is_sglang() where available). Update both class constants (e.g., in TestBasicCompletionSglang / TestBasicCompletionVllm) and the other occurrences noted (lines around 64-81, 171, 205-228) to use this computed value instead of True. Ensure tests refer to this variable everywhere assertions expect stop-sequence trimming behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@e2e_test/completions/test_basic.py`:
- Line 19: STOP_SEQUENCE_TRIMMED is hardcoded true but must reflect backend and
streaming mode; change its assignment to compute dynamically using the existing
helpers: set STOP_SEQUENCE_TRIMMED = streaming and (not is_vllm()) so that
non-streaming is always False and streaming is False only for vllm (use
is_vllm() and is_sglang() where available). Update both class constants (e.g.,
in TestBasicCompletionSglang / TestBasicCompletionVllm) and the other
occurrences noted (lines around 64-81, 171, 205-228) to use this computed value
instead of True. Ensure tests refer to this variable everywhere assertions
expect stop-sequence trimming behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f87d8f37-13b1-4038-a041-9f47515cccf9
📒 Files selected for processing (1)
e2e_test/completions/test_basic.py
Summary
Add E2E tests for the
/v1/completionsgRPC endpoint, covering non-streaming and streaming paths.PR 7 in the Completions API gRPC pipeline series.
What changed
New files
e2e_test/completions/__init__.py— module docstringe2e_test/completions/test_basic.py— 13 tests across two classesTestCompletionBasic (non-streaming)
id,object,model,choices,usage)max_tokenslength limiting (finish_reason: "length")finish_reason: "stop", text trimmed)echo=True(prompt prepended to output)suffix(appended to output)n=1,n=2)echo=Truewithmax_tokens=0(returns just the prompt)TestCompletionStreaming (streaming)
finish_reasonecho=Truewithmax_tokens=0(prompt emitted from Complete path)Prior PRs in series
CompletionPreparationStageCompletionRequestBuildingStageCompletionResponseProcessingStage(non-streaming)Test plan
pytest --collect-only— 13 tests collected, no import errorsSummary by CodeRabbit