Skip to content

Add huggingface/pytorch/tei/docker/1.9.3#192

Open
alvarobartt wants to merge 6 commits into
awslabs:mainfrom
alvarobartt:add-text-embeddings-inference-1.9.3
Open

Add huggingface/pytorch/tei/docker/1.9.3#192
alvarobartt wants to merge 6 commits into
awslabs:mainfrom
alvarobartt:add-text-embeddings-inference-1.9.3

Conversation

@alvarobartt
Copy link
Copy Markdown
Contributor

Description of changes:

This PR adds the Text Embeddings Inference (TEI) v1.9.3 container to be released on both AWS SageMaker and AWS EC2, as per the latest release of the upstream in https://github.com/huggingface/text-embeddings-inference/releases/v1.9.3.

This PR is pretty similar to earlier releases as #141 or #153, with some subtle differences:

  • The entrypoint.sh no longer lives on https://github.com/huggingface/text-embeddings-inference, but rather in this repository instead
  • The text-embeddings-router command in the entrypoint.sh no longer requires the --port 8080 flag, as the environment variable PORT has been updated to be set to 8080 instead of 80.
  • The entrypoint.sh for both CPU and NVIDIA GPUs runs the text-embeddings-router binary with exec so that the signals are captured by the process, as otherwise the running PID of text-embeddings-router would be different to 1, hence wouldn't capture the signals
  • Both Dockerfile files drop the gRPC layer as it wasn't used but it was being compiled / built nonetheless

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Note

By submitting this PR, I disclose that all the code in this PR was written entirely by me, @alvarobartt, without the use of any coding assistants or third-party agentic tools.

@alvarobartt alvarobartt requested a review from a team as a code owner May 18, 2026 15:08
@Jyothirmaikottu
Copy link
Copy Markdown
Contributor

The final stage is named AS base but the build system hardcodes --target sagemaker (release_utils.py:390). This is what causes the CI failure. Previous versions use FROM base AS sagemaker as the final stage — please add that here too, or rename AS base to AS sagemaker.

Affects both 1.9.3/gpu/Dockerfile (line 109) and 1.9.3/cpu/Dockerfile (line 64).

Comment thread huggingface/pytorch/tei/docker/1.9.3/cpu/Dockerfile Outdated
Comment thread huggingface/pytorch/tei/docker/1.9.3/gpu/Dockerfile Outdated
Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>
@Jyothirmaikottu
Copy link
Copy Markdown
Contributor

TEI 1.9.3 SM Test Failure — Root Cause Analysis

The CannotStartContainerError on the BAAI/bge-m3 test is caused by two issues in the new entrypoint.sh:

Issue 1: SageMaker passes serve as CMD argument

SageMaker launches containers with docker run <image> serve. In the 1.9.3 entrypoint, this gets passed through $@ to the router binary:

exec text-embeddings-router-80 "$@"   # $@ = "serve" from SageMaker

text-embeddings-router doesn't understand serve as an argument → immediate crash.

Issue 2: Missing --port 8080

Even without the serve issue, the router doesn't get --port 8080. The PORT=8080 env var in the Dockerfile is not read by the router binary — it needs the CLI flag explicitly. SageMaker health-checks on port 8080, gets no response, marks the container as failed.

Compare with working 1.8.2

The 1.8.2 entrypoint (sagemaker-entrypoint-cuda-all.sh) hardcodes the args:

text-embeddings-router-80 --port 8080 --json-output

It never uses $@, so SageMaker's serve argument is safely ignored.

Fix

Update huggingface/pytorch/tei/docker/1.9.3/gpu/entrypoint.sh — replace each exec line:

# Before (broken):
exec text-embeddings-router-80 "$@"

# After (fixed):
exec text-embeddings-router-80 --port 8080 --json-output

Apply the same to all compute cap branches (75, 80, 90, 100, 120) and the CPU entrypoint.

The CMD ["--json-output"] in the Dockerfile can then be removed since the entrypoint hardcodes the args.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants