Based on the Python script below, we’re unable to deploy the model because diffusers ≥ 0.28 no longer accepts the device type "auto". Consequently, the GPU isn’t detected when the endpoint starts.
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel
iam = boto3.client("iam")
role = iam.get_role(RoleName="my-ml-sagemaker-role")["Role"]["Arn"]
# Hub model configuration – see https://huggingface.co/models
hub = {
"HF_MODEL_ID": "stable-diffusion-v1-5/stable-diffusion-v1-5",
"HF_TASK": "text-to-image",
}
# Create a Hugging Face model
huggingface_model = HuggingFaceModel(
transformers_version="4.49.0",
pytorch_version="2.6.0",
py_version="py312",
env=hub,
role=role,
)
# Deploy the model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type="ml.g4dn.4xlarge", # EC2 instance type
)
image_bytes = predictor.predict({"inputs": "Astronaut riding a horse"})
# Display the generated image with PIL
import io
from PIL import Image
image = Image.open(io.BytesIO(image_bytes))
aws sagemaker-runtime invoke-endpoint \
--endpoint-name huggingface-pytorch-inference-2025-05-14-17-08-18-830 \
--body "fileb://input_file.txt" output_file.txt
This returns:
An error occurred (ModelError) when calling the InvokeEndpoint operation:
Received client error (400) from primary with message:
{
"code": 400,
"type": "InternalServerException",
"message": "auto not supported. Supported strategies are: balanced"
}
It looks like the SageMaker Hugging Face inference toolkit needs an update ? It use balanced only if there is more than 2 GPUs.
https://github.com/aws/sagemaker-huggingface-inference-toolkit/blob/main/src/sagemaker_huggingface_inference_toolkit/diffusers_utils.py#L43
This issue might be related, so I'm pinning it : huggingface/diffusers#11555
Any guidance on how to resolve this SageMaker deployment issue would be greatly appreciated.
Thank you!
Based on the Python script below, we’re unable to deploy the model because
diffusers≥ 0.28 no longer accepts the device type "auto". Consequently, the GPU isn’t detected when the endpoint starts.aws sagemaker-runtime invoke-endpoint \ --endpoint-name huggingface-pytorch-inference-2025-05-14-17-08-18-830 \ --body "fileb://input_file.txt" output_file.txtThis returns:
It looks like the SageMaker Hugging Face inference toolkit needs an update ? It use
balancedonly if there is more than 2 GPUs.https://github.com/aws/sagemaker-huggingface-inference-toolkit/blob/main/src/sagemaker_huggingface_inference_toolkit/diffusers_utils.py#L43
This issue might be related, so I'm pinning it : huggingface/diffusers#11555
Any guidance on how to resolve this SageMaker deployment issue would be greatly appreciated.
Thank you!