Skip to content

Server reruns same task multiple times #133

@kurtgdl

Description

@kurtgdl

I used

deploy = HuggingFaceModel(
  name=model_name,
  role=role,
  code_location="abc",
  model_data=path_to_s3,
  transformers_version="4.37",  
  pytorch_version="2.1",       
  py_version='py310',
  model_server_workers=1,
)
emb = deploy.deploy(
  endpoint_name=model_name,
  initial_instance_count=1,
  instance_type="ml.c5.4xlarge",
  container_startup_health_check_timeout=300,
)

The custom script was

def model_fn(model_dir):
    processor = DataProcess() # A class that contains logic for processing each file.
    return processor

def predict_fn(data, model):
    text = model.process_file(data)
    return {"output": text}

The input data is a base64 string of a file content.
It's strange that when the file is pretty small, under 1MB, the server runs model_fn and predict_fn once, and the process took around 30 seconds. But when I inputted large file of around 1.5MB, it runs model_fn and predict_fn multiple times, each time lasting around 2mins. I know this because the same request gives multiple contents of

 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 5.128383636474609 ms
 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 162199.17178153992 ms
 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 0.00762939453125 ms

It's probably unorthodox to use the server for the data processing job. But what configs did I miss?

Related: aws/amazon-sagemaker-examples#1073

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions