Skip to content

Support load_lora_weights in inference API deploy #131

@haktan-suren

Description

@haktan-suren

Currently there is no way to add load_lora_weights in deployment

hub = {
    'HF_MODEL_ID': 'black-forest-labs/FLUX.1-dev',
    'HF_TASK':'text-to-image',                         
    'HF_TOKEN':'TOKEN'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,                                                # configuration for loading model from Hub
   role=role,                                              # IAM role with permissions to create an endpoint
   transformers_version="4.26",                             # Transformers version used
   pytorch_version="1.13",                                  # PyTorch version used
   py_version='py39',                                      # Python version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.g4dn.xlarge"
)

Maybe in hub, there could be a new env var as "HF_LORA_MODEL"

Similar implementation present in here aws-samples/sagemaker-stablediffusion-quick-kit@bd37fe9...2d1c43b

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions