Skip to content

torch.OutOfMemoryError: CUDA out of memory occurred when inferring an image or a video #262

@Iris-wmx

Description

@Iris-wmx

Hi, I tried to infer an image using the following command, but unfortunately an out of memory error occurred.
vila-infer \ --model-path /home/iris/ModelCkps/NVLA15B \ --conv-mode vicuna_v1 \ --text "Please describe the image" \ --media /home/iris/Datasets/imgs/bike.jpeg

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Traceback (most recent call last):
  File "/home/iris/anaconda3/envs/vila/bin/vila-infer", line 8, in <module>
    sys.exit(main())
  File "/home/iris/Projects/scene/VILA-main/llava/cli/infer.py", line 125, in main
    model = llava.load(args.model_path, model_base=None)
  File "/home/iris/Projects/scene/VILA-main/llava/entry.py", line 53, in load
    model = load_pretrained_model(model_path, model_name, model_base, **kwargs)[1]
  File "/home/iris/Projects/scene/VILA-main/llava/model/builder.py", line 142, in load_pretrained_model
    model.resize_token_embeddings(len(tokenizer))
  File "/home/iris/Projects/scene/VILA-main/llava/model/llava_arch.py", line 408, in resize_token_embeddings
    self.get_llm().resize_token_embeddings(embed_size) # , mean_resizing=False add “mean_resizing=False” by iris
  File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2116, in resize_token_embeddings
    model_embeds = self._resize_token_embeddings(new_num_tokens, pad_to_multiple_of, mean_resizing)
  File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2141, in _resize_token_embeddings
    new_embeddings = self._get_resized_embeddings(
  File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2295, in _get_resized_embeddings
    self._init_added_embeddings_weights_with_mean(
  File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2469, in _init_added_embeddings_weights_with_mean
    old_centered_embeddings = old_embeddings_weight - mean_embeddings
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.89 GiB. GPU 0 has a total capacity of 31.36 GiB of which 249.94 MiB is free. Including non-PyTorch memory, this process has 28.41 GiB memory in use. Of the allocated memory 27.83 GiB is allocated by PyTorch, and 1.01 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Image

After setting mean_resizing=False and executing export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True in xxx/VILA-main/llava/model/llava_arch.py (line 408), The torch.OutOfMemoryError has occurred again.

Traceback (most recent call last):
  File "/home/iris/anaconda3/envs/vila/bin/vila-infer", line 8, in <module>
    sys.exit(main())
  File "/home/iris/Projects/scene/VILA-main/llava/cli/infer.py", line 125, in main
    model = llava.load(args.model_path, model_base=None)
  File "/home/iris/Projects/scene/VILA-main/llava/entry.py", line 53, in load
    model = load_pretrained_model(model_path, model_name, model_base, **kwargs)[1]
  File "/home/iris/Projects/scene/VILA-main/llava/model/builder.py", line 143, in load_pretrained_model
    model.resize_token_embeddings(len(tokenizer))
  File "/home/iris/Projects/scene/VILA-main/llava/model/llava_arch.py", line 408, in resize_token_embeddings
    self.get_llm().resize_token_embeddings(embed_size, mean_resizing=False) # , mean_resizing=False add “mean_resizing=False” by iris
  File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2116, in resize_token_embeddings
    model_embeds = self._resize_token_embeddings(new_num_tokens, pad_to_multiple_of, mean_resizing)
  File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2141, in _resize_token_embeddings
    new_embeddings = self._get_resized_embeddings(
  File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2265, in _get_resized_embeddings
    new_embeddings = nn.Embedding(
  File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 169, in __init__
    torch.empty((num_embeddings, embedding_dim), **factory_kwargs),
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.45 GiB. GPU 0 has a total capacity of 31.36 GiB of which 613.50 MiB is free. Including non-PyTorch memory, this process has 28.01 GiB memory in use. Of the allocated memory 27.50 GiB is allocated by PyTorch, and 18.13 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Image

Can you give me a help to solve this problem? Thanks!!

Using NVIDIA GeForce RTX 5090 32GB, pyTorch: 2.9.0, cuda 12.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions