test(completions): add E2E tests for /v1/completions gRPC endpoint by vschandramourya · Pull Request #1021 · lightseekorg/smg

vschandramourya · 2026-04-01T23:06:03Z

Summary

Add E2E tests for the /v1/completions gRPC endpoint, covering non-streaming and streaming paths.

PR 7 in the Completions API gRPC pipeline series.

What changed

New files

e2e_test/completions/__init__.py — module docstring
e2e_test/completions/test_basic.py — 13 tests across two classes

TestCompletionBasic (non-streaming)

Basic response structure (id, object, model, choices, usage)
max_tokens length limiting (finish_reason: "length")
Stop sequences (finish_reason: "stop", text trimmed)
echo=True (prompt prepended to output)
suffix (appended to output)
Parallel sampling (n=1, n=2)
Usage statistics validation
echo=True with max_tokens=0 (returns just the prompt)

TestCompletionStreaming (streaming)

Basic SSE chunks with text deltas and single finish_reason
Stop sequences in streaming mode
Full text collection from stream
echo=True with max_tokens=0 (prompt emitted from Complete path)

Prior PRs in series

feat(completions): add native gRPC pipeline typing for /v1/completions #840 — Scaffolding
feat(completions): add CompletionPreparationStage for gRPC pipeline #907 — Stage 1: CompletionPreparationStage
feat(completions): add CompletionRequestBuildingStage and backend sampling params #915 — Stage 4: CompletionRequestBuildingStage
feat(completions): add CompletionResponseProcessingStage for non-streaming responses #953 — Stage 7: CompletionResponseProcessingStage (non-streaming)
feat(completions): wire Completion API pipeline into gRPC routers #964 — Pipeline factory + Router wiring
feat(completions): add Completion API streaming support to gRPC router #978 — Streaming support
This PR — E2E tests

Test plan

pytest --collect-only — 13 tests collected, no import errors
All 11 original tests verified against live Llama-3.1-8B-Instruct

Summary by CodeRabbit

Tests
- Added comprehensive GPU-marked E2E tests for Completions covering non-streaming and streaming flows, stop/echo/suffix behavior, max_tokens truncation, parallel sampling, response structure, finish reasons, and usage metrics.
Documentation
- Added top-level package documentation describing the scope and scenarios covered by the Completions E2E tests.
Chores
- CI updated to detect Completions changes and run GPU E2E jobs for Completions.

Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>

…streaming Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>

coderabbitai · 2026-04-01T23:06:14Z

📝 Walkthrough

Walkthrough

Adds a new e2e completions test package with non‑streaming and streaming pytest cases for the OpenAI /v1/completions API and updates CI to detect completion-related changes and run GPU E2E jobs for the new tests.

Changes

Cohort / File(s)	Summary
E2E Test Package `e2e_test/completions/__init__.py`	New package init with a top-level docstring describing the scope of the completions E2E tests.
E2E Completion Tests `e2e_test/completions/test_basic.py`	New pytest module adding `TestCompletionBasic` (non-streaming validations: response fields, max_tokens, stop/echo/suffix, parallel `n` sampling, usage checks) and `TestCompletionStreaming` (stream delta collection, finish_reason tracking, streaming stop/echo behaviors, runtime skips for known backend limits).
CI Workflow `.github/workflows/pr-test-rust.yml`	Added `completions` output to `detect-changes`, registered `e2e-1gpu-completions` GPU E2E job (matrix over engines, `e2e_test/completions` test dir), and wired the job into `finish`/dependency and failure logic.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix(ci): prevent runner shutdown on concurrent main pushes and restore path filters #678 — Modifies the same CI workflow to add detect-changes filtering and GPU E2E job wiring relevant to completions.
ci: skip irrelevant E2E jobs on PRs with file changes detection #633 — Adjusts GitHub Actions E2E orchestration and detect-changes outputs, overlapping CI gating changes here.
feat(completions): add Completion API streaming support to gRPC router #978 — Adds/updates streaming completion behavior/tests that are validated by the new completions E2E tests.

Suggested labels

grpc

Suggested reviewers

CatherineSue
key4ng
XinyueZhang369

Poem

🐇 I hopped through prompts and streams tonight,
Collected deltas in the pale test light,
Stops and echoes twirled in code,
Tokens tallied on my road,
A cheerful rabbit cheers—CI green and bright!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding E2E tests for the /v1/completions gRPC endpoint, which aligns directly with the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mourya/cmp-6

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a comprehensive suite of end-to-end tests for the OpenAI Completions API, covering both streaming and non-streaming modes. The tests validate core functionalities such as stop sequences, echo, suffixes, and parallel sampling across different backends. Review feedback focuses on enhancing the robustness of streaming tests by ensuring consistent assertions on response object types and finish reasons, and suggests refactoring repetitive streaming logic into shared helper functions.

e2e_test/completions/test_basic.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e_test/completions/test_basic.py`:
- Around line 178-188: Extract the repeated loop that consumes a streaming
response into a reusable helper (e.g., parse_stream or
collect_texts_and_reasons) that takes the stream iterable and returns the
accumulated texts list and finish_reasons list; replace the inline loops that
iterate over stream and inspect chunk.object, chunk.choices, choice.text and
choice.finish_reason (the blocks that build texts and finish_reasons) with calls
to this helper in test_basic.py (the occurrences around lines where
texts/finish_reasons are built). Ensure the helper asserts chunk.object ==
"text_completion" as before and preserves ordering and behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 75e4c2ce-58ff-4c75-8df4-c9691c6e5d4e

📥 Commits

Reviewing files that changed from the base of the PR and between 4654e67 and de67579.

📒 Files selected for processing (2)

e2e_test/completions/__init__.py
e2e_test/completions/test_basic.py

e2e_test/completions/test_basic.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: de67579a90

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T23:09:59Z

e2e_test/completions/test_basic.py

+        full_text = "".join(c.choices[0].text for c in stream if c.choices and c.choices[0].text)
+
+        assert len(full_text) > 0
+        assert "Paris" in full_text


Avoid hard-coding exact token in streaming concatenation test

This test is meant to validate that streaming chunks concatenate correctly, but assert "Paris" in full_text couples it to one exact model phrasing. In the inspected e2e_test/completions/test_basic.py path (engines sglang/vllm), valid outputs like different casing or alternate wording can make CI fail even when streaming assembly is correct, so this introduces unnecessary flakiness in E2E coverage.

Useful? React with 👍 / 👎.

e2e_test/completions/test_basic.py

- Extract _collect_stream helper to deduplicate streaming loop - Assert chunk.object == "text_completion" on every streaming chunk - Strengthen stop sequence assertions (exactly 1 finish_reason + text) - Accept "stop" or "length" for max_tokens=0 finish_reason - Add finish_reason assertion to non-streaming max_tokens=0 test Refs: #1021 Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e_test/completions/test_basic.py`:
- Around line 219-233: The test test_streaming_collects_full_text has a flaky
hard assertion that "Paris" appears in the completion; either remove that
specific content assertion or weaken it (e.g., assert for presence of "France"
or a lowercase-agnostic substring) or add a clarifying comment explaining why
the exact string "Paris" is required; update the
test_streaming_collects_full_text function accordingly and reference the
streaming helper _collect_stream if you need to inspect how the output is
gathered, and consider relying on or linking to test_streaming_basic which
already asserts non-empty output.
- Around line 9-13: The file declares an unused logger variable (logger =
logging.getLogger(__name__)) which creates dead code; remove that declaration
line from e2e_test/completions/test_basic.py or, if you want to keep it for
future debugging, add an inline comment next to the logger symbol explaining
it's intentionally reserved (e.g., "# kept for debug logging in future tests")
so linters and reviewers understand it's intentional.
- Around line 214-217: The assertion that the stop delimiter ("," ) is absent
from the concatenated streaming output is brittle because vLLM may not trim stop
sequences in streaming mode while SGLang does; update the test that checks
finish_reasons and full_text so it conditionally validates the absence of the
stop delimiter only for backends that trim stops (i.e., skip or relax the assert
"," not in full_text when running against vLLM). Locate the variables
finish_reasons and full_text in the test (e.g., in test_basic.py) and add a
backend/runtime check (detect vLLM vs SGLang via your existing backend flag or
client config) to either skip the comma check for vLLM or assert its
presence/acceptance accordingly.
- Around line 66-78: The test test_non_streaming_stop_sequence assumes stop
sequences are trimmed, but backends like sglang and vllm do not trim in
non-streaming mode; add a boolean flag STOP_SEQUENCE_TRIMMED (set to False for
those backends) and change the assertions on response.choices[0].text (and any
full_text checks) to branch: if STOP_SEQUENCE_TRIMMED assert the stop char (",")
is not in the text, else assert the text endswith the stop char (or contains the
stop as a suffix) so the test passes for both trimming and non-trimming engines.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e7f21f6b-d208-44e8-8e03-4537e7c0f78a

📥 Commits

Reviewing files that changed from the base of the PR and between de67579 and 0041ae8.

📒 Files selected for processing (1)

e2e_test/completions/test_basic.py

e2e_test/completions/test_basic.py

slin1237 · 2026-04-08T15:04:16Z

e2e_test/completions/test_basic.py

@@ -0,0 +1,252 @@
+"""Basic tests for OpenAI Completions API (/v1/completions).


is this test run at all in CI?
in pr-test-rust.yaml today, we define which folder to run
since this is a new folder
i dont see any exec on this yet

Good catch, wired it up now. Added e2e-1gpu-completions job to pr-test-rust.yml and with some clean up in test file.

Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>

coderabbitai

♻️ Duplicate comments (1)

e2e_test/completions/test_basic.py (1)
19-20: ⚠️ Potential issue | 🟠 Major

Stop-sequence trimming expectations are hardcoded and backend-inaccurate.

With @pytest.mark.engine("sglang", "vllm"), a fixed STOP_SEQUENCE_TRIMMED = True makes stop assertions incorrect for known backend behavior differences (non-streaming and streaming), which can fail CI on vLLM/sglang paths. Please switch to backend/mode-conditional assertions (same pattern used in e2e_test/chat_completions/test_openai_server.py) instead of a static class constant.

Based on learnings: in lightseekorg/smg PR #606, non-streaming stop trimming is False for both is_vllm() and is_sglang(), while streaming overrides only is_vllm() to False.

Also applies to: 77-80, 170-171, 221-227
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e_test/completions/test_basic.py` around lines 19 - 20, Replace the
hardcoded STOP_SEQUENCE_TRIMMED = True with backend-and-mode conditional
assertions: remove the static constant and instead determine expected trimming
by calling is_vllm() and is_sglang() (and checking streaming mode where
applicable) as done in e2e_test/chat_completions/test_openai_server.py;
implement logic so non-streaming returns False for both is_vllm() and
is_sglang(), and streaming only overrides is_vllm() to False, and update the
other occurrences (the other STOP_SEQUENCE_TRIMMED checks in this test) to use
the same conditional pattern.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@e2e_test/completions/test_basic.py`:
- Around line 19-20: Replace the hardcoded STOP_SEQUENCE_TRIMMED = True with
backend-and-mode conditional assertions: remove the static constant and instead
determine expected trimming by calling is_vllm() and is_sglang() (and checking
streaming mode where applicable) as done in
e2e_test/chat_completions/test_openai_server.py; implement logic so
non-streaming returns False for both is_vllm() and is_sglang(), and streaming
only overrides is_vllm() to False, and update the other occurrences (the other
STOP_SEQUENCE_TRIMMED checks in this test) to use the same conditional pattern.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d8c9282a-c93a-46ac-82ab-ba56cc365ab6

📥 Commits

Reviewing files that changed from the base of the PR and between 0041ae8 and b69b044.

📒 Files selected for processing (2)

.github/workflows/pr-test-rust.yml
e2e_test/completions/test_basic.py

…y stop seq) Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>

coderabbitai

♻️ Duplicate comments (1)

e2e_test/completions/test_basic.py (1)

19-19: ⚠️ Potential issue | 🟠 Major

Stop-sequence trimming expectation is hardcoded and will fail on backend variants.

STOP_SEQUENCE_TRIMMED is fixed to True in both classes, but test execution is parametrized for sglang and vllm, whose stop-trimming behavior differs by mode. This makes the assertions brittle and backend-incorrect.

♻️ Proposed fix

 class TestCompletionBasic:
     """Tests for OpenAI-compatible /v1/completions API (non-streaming)."""

     STOP_SEQUENCE_TRIMMED = True

-    def test_non_streaming_stop_sequence(self, model, api_client):
+    def test_non_streaming_stop_sequence(self, model, api_client, setup_backend):
         """Test that stop sequences cause the model to stop generating."""
+        backend_name, *_ = setup_backend

         response = api_client.completions.create(
             model=model,
             prompt="Count: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10",
             max_tokens=200,
             temperature=0,
             stop=[","],
         )

         assert response.choices[0].finish_reason == "stop"
         text = response.choices[0].text
-        if self.STOP_SEQUENCE_TRIMMED:
+        stop_sequence_trimmed = (
+            False if backend_name in ("sglang", "vllm") else self.STOP_SEQUENCE_TRIMMED
+        )
+        if stop_sequence_trimmed:
             assert "," not in text, f"Stop sequence ',' should not appear in output: {text}"
         else:
             assert text.endswith(","), f"Stop sequence ',' should be the suffix of output: {text}"


 class TestCompletionStreaming:
     """Tests for streaming /v1/completions API."""

     STOP_SEQUENCE_TRIMMED = True

-    def test_streaming_stop_sequence(self, model, api_client):
+    def test_streaming_stop_sequence(self, model, api_client, setup_backend):
         """Test that stop sequences work in streaming mode."""
+        backend_name, *_ = setup_backend

         stream = api_client.completions.create(
             model=model,
             prompt="Count: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10",
             max_tokens=200,
             temperature=0,
             stop=[","],
             stream=True,
         )

         full_text, finish_reasons = self._collect_stream(stream)

         assert len(finish_reasons) == 1
         assert finish_reasons[0] == "stop"
-        if self.STOP_SEQUENCE_TRIMMED:
+        stop_sequence_trimmed = (
+            False if backend_name == "vllm" else self.STOP_SEQUENCE_TRIMMED
+        )
+        if stop_sequence_trimmed:
             assert "," not in full_text, (
                 f"Stop sequence ',' should not appear in output: {full_text}"
             )
         else:
             assert full_text.endswith(","), (
                 f"Stop sequence ',' should be the suffix of output: {full_text}"
             )

Based on learnings: in lightseekorg/smg PR #606, non-streaming stop trimming is False for both is_vllm() and is_sglang(), while streaming stop trimming is False only for is_vllm() (the asymmetry is intentional).

Also applies to: 64-81, 171-171, 205-228

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@e2e_test/completions/test_basic.py` at line 19, STOP_SEQUENCE_TRIMMED is
hardcoded true but must reflect backend and streaming mode; change its
assignment to compute dynamically using the existing helpers: set
STOP_SEQUENCE_TRIMMED = streaming and (not is_vllm()) so that non-streaming is
always False and streaming is False only for vllm (use is_vllm() and is_sglang()
where available). Update both class constants (e.g., in
TestBasicCompletionSglang / TestBasicCompletionVllm) and the other occurrences
noted (lines around 64-81, 171, 205-228) to use this computed value instead of
True. Ensure tests refer to this variable everywhere assertions expect
stop-sequence trimming behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@e2e_test/completions/test_basic.py`:
- Line 19: STOP_SEQUENCE_TRIMMED is hardcoded true but must reflect backend and
streaming mode; change its assignment to compute dynamically using the existing
helpers: set STOP_SEQUENCE_TRIMMED = streaming and (not is_vllm()) so that
non-streaming is always False and streaming is False only for vllm (use
is_vllm() and is_sglang() where available). Update both class constants (e.g.,
in TestBasicCompletionSglang / TestBasicCompletionVllm) and the other
occurrences noted (lines around 64-81, 171, 205-228) to use this computed value
instead of True. Ensure tests refer to this variable everywhere assertions
expect stop-sequence trimming behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f87d8f37-13b1-4038-a041-9f47515cccf9

📥 Commits

Reviewing files that changed from the base of the PR and between b69b044 and cf10d40.

📒 Files selected for processing (1)

e2e_test/completions/test_basic.py

vschandramourya added 2 commits April 1, 2026 15:54

test(completions): add E2E tests for /v1/completions gRPC endpoint

f3af6f2

Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>

test(completions): add max_tokens=0 echo tests for streaming and non-…

de67579

…streaming Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>

vschandramourya requested review from CatherineSue, XinyueZhang369, key4ng and slin1237 as code owners April 1, 2026 23:06

github-actions bot added the tests Test changes label Apr 1, 2026

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

e2e_test/completions/test_basic.py Outdated Show resolved Hide resolved

e2e_test/completions/test_basic.py Outdated Show resolved Hide resolved

e2e_test/completions/test_basic.py Outdated Show resolved Hide resolved

e2e_test/completions/test_basic.py Outdated Show resolved Hide resolved

coderabbitai bot requested changes Apr 1, 2026

View reviewed changes

e2e_test/completions/test_basic.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

claude bot reviewed Apr 1, 2026

View reviewed changes

e2e_test/completions/test_basic.py Show resolved Hide resolved

coderabbitai bot requested changes Apr 1, 2026

View reviewed changes

e2e_test/completions/test_basic.py Outdated Show resolved Hide resolved

e2e_test/completions/test_basic.py Outdated Show resolved Hide resolved

e2e_test/completions/test_basic.py Outdated Show resolved Hide resolved

e2e_test/completions/test_basic.py Outdated Show resolved Hide resolved

slin1237 requested changes Apr 8, 2026

View reviewed changes

vschandramourya added 2 commits April 8, 2026 09:26

ci(completions): add e2e-1gpu-completions job to PR pipeline

8970367

Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>

fix(completions): address review nits on E2E tests

09c5d76

Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>

github-actions bot added the ci CI/CD configuration changes label Apr 8, 2026

Merge branch 'main' into mourya/cmp-6

b69b044

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

coderabbitai bot approved these changes Apr 8, 2026

View reviewed changes

fix(completions): stabilize E2E tests for CI (vllm max_tokens=0, flak…

cf10d40

…y stop seq) Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

		@@ -0,0 +1,252 @@
		"""Basic tests for OpenAI Completions API (/v1/completions).

Conversation

vschandramourya commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

New files

TestCompletionBasic (non-streaming)

TestCompletionStreaming (streaming)

Prior PRs in series

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slin1237 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

vschandramourya Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vschandramourya commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading