Skip to content

test(completions): add E2E tests for /v1/completions gRPC endpoint#1021

Open
vschandramourya wants to merge 7 commits intomainfrom
mourya/cmp-6
Open

test(completions): add E2E tests for /v1/completions gRPC endpoint#1021
vschandramourya wants to merge 7 commits intomainfrom
mourya/cmp-6

Conversation

@vschandramourya
Copy link
Copy Markdown
Collaborator

@vschandramourya vschandramourya commented Apr 1, 2026

Summary

Add E2E tests for the /v1/completions gRPC endpoint, covering non-streaming and streaming paths.

PR 7 in the Completions API gRPC pipeline series.

What changed

New files

  • e2e_test/completions/__init__.py — module docstring
  • e2e_test/completions/test_basic.py — 13 tests across two classes

TestCompletionBasic (non-streaming)

  • Basic response structure (id, object, model, choices, usage)
  • max_tokens length limiting (finish_reason: "length")
  • Stop sequences (finish_reason: "stop", text trimmed)
  • echo=True (prompt prepended to output)
  • suffix (appended to output)
  • Parallel sampling (n=1, n=2)
  • Usage statistics validation
  • echo=True with max_tokens=0 (returns just the prompt)

TestCompletionStreaming (streaming)

  • Basic SSE chunks with text deltas and single finish_reason
  • Stop sequences in streaming mode
  • Full text collection from stream
  • echo=True with max_tokens=0 (prompt emitted from Complete path)

Prior PRs in series

Test plan

  • pytest --collect-only — 13 tests collected, no import errors
  • All 11 original tests verified against live Llama-3.1-8B-Instruct

Summary by CodeRabbit

  • Tests
    • Added comprehensive GPU-marked E2E tests for Completions covering non-streaming and streaming flows, stop/echo/suffix behavior, max_tokens truncation, parallel sampling, response structure, finish reasons, and usage metrics.
  • Documentation
    • Added top-level package documentation describing the scope and scenarios covered by the Completions E2E tests.
  • Chores
    • CI updated to detect Completions changes and run GPU E2E jobs for Completions.

Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
…streaming

Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

Adds a new e2e completions test package with non‑streaming and streaming pytest cases for the OpenAI /v1/completions API and updates CI to detect completion-related changes and run GPU E2E jobs for the new tests.

Changes

Cohort / File(s) Summary
E2E Test Package
e2e_test/completions/__init__.py
New package init with a top-level docstring describing the scope of the completions E2E tests.
E2E Completion Tests
e2e_test/completions/test_basic.py
New pytest module adding TestCompletionBasic (non-streaming validations: response fields, max_tokens, stop/echo/suffix, parallel n sampling, usage checks) and TestCompletionStreaming (stream delta collection, finish_reason tracking, streaming stop/echo behaviors, runtime skips for known backend limits).
CI Workflow
.github/workflows/pr-test-rust.yml
Added completions output to detect-changes, registered e2e-1gpu-completions GPU E2E job (matrix over engines, e2e_test/completions test dir), and wired the job into finish/dependency and failure logic.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

grpc

Suggested reviewers

  • CatherineSue
  • key4ng
  • XinyueZhang369

Poem

🐇 I hopped through prompts and streams tonight,
Collected deltas in the pale test light,
Stops and echoes twirled in code,
Tokens tallied on my road,
A cheerful rabbit cheers—CI green and bright!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding E2E tests for the /v1/completions gRPC endpoint, which aligns directly with the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mourya/cmp-6

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the tests Test changes label Apr 1, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive suite of end-to-end tests for the OpenAI Completions API, covering both streaming and non-streaming modes. The tests validate core functionalities such as stop sequences, echo, suffixes, and parallel sampling across different backends. Review feedback focuses on enhancing the robustness of streaming tests by ensuring consistent assertions on response object types and finish reasons, and suggests refactoring repetitive streaming logic into shared helper functions.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e_test/completions/test_basic.py`:
- Around line 178-188: Extract the repeated loop that consumes a streaming
response into a reusable helper (e.g., parse_stream or
collect_texts_and_reasons) that takes the stream iterable and returns the
accumulated texts list and finish_reasons list; replace the inline loops that
iterate over stream and inspect chunk.object, chunk.choices, choice.text and
choice.finish_reason (the blocks that build texts and finish_reasons) with calls
to this helper in test_basic.py (the occurrences around lines where
texts/finish_reasons are built). Ensure the helper asserts chunk.object ==
"text_completion" as before and preserves ordering and behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 75e4c2ce-58ff-4c75-8df4-c9691c6e5d4e

📥 Commits

Reviewing files that changed from the base of the PR and between 4654e67 and de67579.

📒 Files selected for processing (2)
  • e2e_test/completions/__init__.py
  • e2e_test/completions/test_basic.py

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: de67579a90

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

full_text = "".join(c.choices[0].text for c in stream if c.choices and c.choices[0].text)

assert len(full_text) > 0
assert "Paris" in full_text
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid hard-coding exact token in streaming concatenation test

This test is meant to validate that streaming chunks concatenate correctly, but assert "Paris" in full_text couples it to one exact model phrasing. In the inspected e2e_test/completions/test_basic.py path (engines sglang/vllm), valid outputs like different casing or alternate wording can make CI fail even when streaming assembly is correct, so this introduces unnecessary flakiness in E2E coverage.

Useful? React with 👍 / 👎.

- Extract _collect_stream helper to deduplicate streaming loop
- Assert chunk.object == "text_completion" on every streaming chunk
- Strengthen stop sequence assertions (exactly 1 finish_reason + text)
- Accept "stop" or "length" for max_tokens=0 finish_reason
- Add finish_reason assertion to non-streaming max_tokens=0 test

Refs: #1021
Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e_test/completions/test_basic.py`:
- Around line 219-233: The test test_streaming_collects_full_text has a flaky
hard assertion that "Paris" appears in the completion; either remove that
specific content assertion or weaken it (e.g., assert for presence of "France"
or a lowercase-agnostic substring) or add a clarifying comment explaining why
the exact string "Paris" is required; update the
test_streaming_collects_full_text function accordingly and reference the
streaming helper _collect_stream if you need to inspect how the output is
gathered, and consider relying on or linking to test_streaming_basic which
already asserts non-empty output.
- Around line 9-13: The file declares an unused logger variable (logger =
logging.getLogger(__name__)) which creates dead code; remove that declaration
line from e2e_test/completions/test_basic.py or, if you want to keep it for
future debugging, add an inline comment next to the logger symbol explaining
it's intentionally reserved (e.g., "# kept for debug logging in future tests")
so linters and reviewers understand it's intentional.
- Around line 214-217: The assertion that the stop delimiter ("," ) is absent
from the concatenated streaming output is brittle because vLLM may not trim stop
sequences in streaming mode while SGLang does; update the test that checks
finish_reasons and full_text so it conditionally validates the absence of the
stop delimiter only for backends that trim stops (i.e., skip or relax the assert
"," not in full_text when running against vLLM). Locate the variables
finish_reasons and full_text in the test (e.g., in test_basic.py) and add a
backend/runtime check (detect vLLM vs SGLang via your existing backend flag or
client config) to either skip the comma check for vLLM or assert its
presence/acceptance accordingly.
- Around line 66-78: The test test_non_streaming_stop_sequence assumes stop
sequences are trimmed, but backends like sglang and vllm do not trim in
non-streaming mode; add a boolean flag STOP_SEQUENCE_TRIMMED (set to False for
those backends) and change the assertions on response.choices[0].text (and any
full_text checks) to branch: if STOP_SEQUENCE_TRIMMED assert the stop char (",")
is not in the text, else assert the text endswith the stop char (or contains the
stop as a suffix) so the test passes for both trimming and non-trimming engines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e7f21f6b-d208-44e8-8e03-4537e7c0f78a

📥 Commits

Reviewing files that changed from the base of the PR and between de67579 and 0041ae8.

📒 Files selected for processing (1)
  • e2e_test/completions/test_basic.py

@@ -0,0 +1,252 @@
"""Basic tests for OpenAI Completions API (/v1/completions).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this test run at all in CI?
in pr-test-rust.yaml today, we define which folder to run
since this is a new folder
i dont see any exec on this yet

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, wired it up now. Added e2e-1gpu-completions job to pr-test-rust.yml and with some clean up in test file.

Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
@github-actions github-actions bot added the ci CI/CD configuration changes label Apr 8, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
e2e_test/completions/test_basic.py (1)

19-20: ⚠️ Potential issue | 🟠 Major

Stop-sequence trimming expectations are hardcoded and backend-inaccurate.

With @pytest.mark.engine("sglang", "vllm"), a fixed STOP_SEQUENCE_TRIMMED = True makes stop assertions incorrect for known backend behavior differences (non-streaming and streaming), which can fail CI on vLLM/sglang paths. Please switch to backend/mode-conditional assertions (same pattern used in e2e_test/chat_completions/test_openai_server.py) instead of a static class constant.

Based on learnings: in lightseekorg/smg PR #606, non-streaming stop trimming is False for both is_vllm() and is_sglang(), while streaming overrides only is_vllm() to False.

Also applies to: 77-80, 170-171, 221-227

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e_test/completions/test_basic.py` around lines 19 - 20, Replace the
hardcoded STOP_SEQUENCE_TRIMMED = True with backend-and-mode conditional
assertions: remove the static constant and instead determine expected trimming
by calling is_vllm() and is_sglang() (and checking streaming mode where
applicable) as done in e2e_test/chat_completions/test_openai_server.py;
implement logic so non-streaming returns False for both is_vllm() and
is_sglang(), and streaming only overrides is_vllm() to False, and update the
other occurrences (the other STOP_SEQUENCE_TRIMMED checks in this test) to use
the same conditional pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@e2e_test/completions/test_basic.py`:
- Around line 19-20: Replace the hardcoded STOP_SEQUENCE_TRIMMED = True with
backend-and-mode conditional assertions: remove the static constant and instead
determine expected trimming by calling is_vllm() and is_sglang() (and checking
streaming mode where applicable) as done in
e2e_test/chat_completions/test_openai_server.py; implement logic so
non-streaming returns False for both is_vllm() and is_sglang(), and streaming
only overrides is_vllm() to False, and update the other occurrences (the other
STOP_SEQUENCE_TRIMMED checks in this test) to use the same conditional pattern.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d8c9282a-c93a-46ac-82ab-ba56cc365ab6

📥 Commits

Reviewing files that changed from the base of the PR and between 0041ae8 and b69b044.

📒 Files selected for processing (2)
  • .github/workflows/pr-test-rust.yml
  • e2e_test/completions/test_basic.py

…y stop seq)

Signed-off-by: VS Chandra Mourya <msrinivasa@together.ai>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
e2e_test/completions/test_basic.py (1)

19-19: ⚠️ Potential issue | 🟠 Major

Stop-sequence trimming expectation is hardcoded and will fail on backend variants.

STOP_SEQUENCE_TRIMMED is fixed to True in both classes, but test execution is parametrized for sglang and vllm, whose stop-trimming behavior differs by mode. This makes the assertions brittle and backend-incorrect.

♻️ Proposed fix
 class TestCompletionBasic:
     """Tests for OpenAI-compatible /v1/completions API (non-streaming)."""

     STOP_SEQUENCE_TRIMMED = True

-    def test_non_streaming_stop_sequence(self, model, api_client):
+    def test_non_streaming_stop_sequence(self, model, api_client, setup_backend):
         """Test that stop sequences cause the model to stop generating."""
+        backend_name, *_ = setup_backend

         response = api_client.completions.create(
             model=model,
             prompt="Count: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10",
             max_tokens=200,
             temperature=0,
             stop=[","],
         )

         assert response.choices[0].finish_reason == "stop"
         text = response.choices[0].text
-        if self.STOP_SEQUENCE_TRIMMED:
+        stop_sequence_trimmed = (
+            False if backend_name in ("sglang", "vllm") else self.STOP_SEQUENCE_TRIMMED
+        )
+        if stop_sequence_trimmed:
             assert "," not in text, f"Stop sequence ',' should not appear in output: {text}"
         else:
             assert text.endswith(","), f"Stop sequence ',' should be the suffix of output: {text}"


 class TestCompletionStreaming:
     """Tests for streaming /v1/completions API."""

     STOP_SEQUENCE_TRIMMED = True

-    def test_streaming_stop_sequence(self, model, api_client):
+    def test_streaming_stop_sequence(self, model, api_client, setup_backend):
         """Test that stop sequences work in streaming mode."""
+        backend_name, *_ = setup_backend

         stream = api_client.completions.create(
             model=model,
             prompt="Count: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10",
             max_tokens=200,
             temperature=0,
             stop=[","],
             stream=True,
         )

         full_text, finish_reasons = self._collect_stream(stream)

         assert len(finish_reasons) == 1
         assert finish_reasons[0] == "stop"
-        if self.STOP_SEQUENCE_TRIMMED:
+        stop_sequence_trimmed = (
+            False if backend_name == "vllm" else self.STOP_SEQUENCE_TRIMMED
+        )
+        if stop_sequence_trimmed:
             assert "," not in full_text, (
                 f"Stop sequence ',' should not appear in output: {full_text}"
             )
         else:
             assert full_text.endswith(","), (
                 f"Stop sequence ',' should be the suffix of output: {full_text}"
             )

Based on learnings: in lightseekorg/smg PR #606, non-streaming stop trimming is False for both is_vllm() and is_sglang(), while streaming stop trimming is False only for is_vllm() (the asymmetry is intentional).

Also applies to: 64-81, 171-171, 205-228

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e_test/completions/test_basic.py` at line 19, STOP_SEQUENCE_TRIMMED is
hardcoded true but must reflect backend and streaming mode; change its
assignment to compute dynamically using the existing helpers: set
STOP_SEQUENCE_TRIMMED = streaming and (not is_vllm()) so that non-streaming is
always False and streaming is False only for vllm (use is_vllm() and is_sglang()
where available). Update both class constants (e.g., in
TestBasicCompletionSglang / TestBasicCompletionVllm) and the other occurrences
noted (lines around 64-81, 171, 205-228) to use this computed value instead of
True. Ensure tests refer to this variable everywhere assertions expect
stop-sequence trimming behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@e2e_test/completions/test_basic.py`:
- Line 19: STOP_SEQUENCE_TRIMMED is hardcoded true but must reflect backend and
streaming mode; change its assignment to compute dynamically using the existing
helpers: set STOP_SEQUENCE_TRIMMED = streaming and (not is_vllm()) so that
non-streaming is always False and streaming is False only for vllm (use
is_vllm() and is_sglang() where available). Update both class constants (e.g.,
in TestBasicCompletionSglang / TestBasicCompletionVllm) and the other
occurrences noted (lines around 64-81, 171, 205-228) to use this computed value
instead of True. Ensure tests refer to this variable everywhere assertions
expect stop-sequence trimming behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f87d8f37-13b1-4038-a041-9f47515cccf9

📥 Commits

Reviewing files that changed from the base of the PR and between b69b044 and cf10d40.

📒 Files selected for processing (1)
  • e2e_test/completions/test_basic.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci CI/CD configuration changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants