Hi, I tried to infer an image using the following command, but unfortunately an out of memory error occurred.
vila-infer \ --model-path /home/iris/ModelCkps/NVLA15B \ --conv-mode vicuna_v1 \ --text "Please describe the image" \ --media /home/iris/Datasets/imgs/bike.jpeg
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Traceback (most recent call last):
File "/home/iris/anaconda3/envs/vila/bin/vila-infer", line 8, in <module>
sys.exit(main())
File "/home/iris/Projects/scene/VILA-main/llava/cli/infer.py", line 125, in main
model = llava.load(args.model_path, model_base=None)
File "/home/iris/Projects/scene/VILA-main/llava/entry.py", line 53, in load
model = load_pretrained_model(model_path, model_name, model_base, **kwargs)[1]
File "/home/iris/Projects/scene/VILA-main/llava/model/builder.py", line 142, in load_pretrained_model
model.resize_token_embeddings(len(tokenizer))
File "/home/iris/Projects/scene/VILA-main/llava/model/llava_arch.py", line 408, in resize_token_embeddings
self.get_llm().resize_token_embeddings(embed_size) # , mean_resizing=False add “mean_resizing=False” by iris
File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2116, in resize_token_embeddings
model_embeds = self._resize_token_embeddings(new_num_tokens, pad_to_multiple_of, mean_resizing)
File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2141, in _resize_token_embeddings
new_embeddings = self._get_resized_embeddings(
File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2295, in _get_resized_embeddings
self._init_added_embeddings_weights_with_mean(
File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2469, in _init_added_embeddings_weights_with_mean
old_centered_embeddings = old_embeddings_weight - mean_embeddings
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.89 GiB. GPU 0 has a total capacity of 31.36 GiB of which 249.94 MiB is free. Including non-PyTorch memory, this process has 28.41 GiB memory in use. Of the allocated memory 27.83 GiB is allocated by PyTorch, and 1.01 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
After setting mean_resizing=False and executing export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True in xxx/VILA-main/llava/model/llava_arch.py (line 408), The torch.OutOfMemoryError has occurred again.
Traceback (most recent call last):
File "/home/iris/anaconda3/envs/vila/bin/vila-infer", line 8, in <module>
sys.exit(main())
File "/home/iris/Projects/scene/VILA-main/llava/cli/infer.py", line 125, in main
model = llava.load(args.model_path, model_base=None)
File "/home/iris/Projects/scene/VILA-main/llava/entry.py", line 53, in load
model = load_pretrained_model(model_path, model_name, model_base, **kwargs)[1]
File "/home/iris/Projects/scene/VILA-main/llava/model/builder.py", line 143, in load_pretrained_model
model.resize_token_embeddings(len(tokenizer))
File "/home/iris/Projects/scene/VILA-main/llava/model/llava_arch.py", line 408, in resize_token_embeddings
self.get_llm().resize_token_embeddings(embed_size, mean_resizing=False) # , mean_resizing=False add “mean_resizing=False” by iris
File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2116, in resize_token_embeddings
model_embeds = self._resize_token_embeddings(new_num_tokens, pad_to_multiple_of, mean_resizing)
File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2141, in _resize_token_embeddings
new_embeddings = self._get_resized_embeddings(
File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2265, in _get_resized_embeddings
new_embeddings = nn.Embedding(
File "/home/iris/anaconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 169, in __init__
torch.empty((num_embeddings, embedding_dim), **factory_kwargs),
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.45 GiB. GPU 0 has a total capacity of 31.36 GiB of which 613.50 MiB is free. Including non-PyTorch memory, this process has 28.01 GiB memory in use. Of the allocated memory 27.50 GiB is allocated by PyTorch, and 18.13 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Can you give me a help to solve this problem? Thanks!!
Using NVIDIA GeForce RTX 5090 32GB, pyTorch: 2.9.0, cuda 12.8
Hi, I tried to infer an image using the following command, but unfortunately an out of memory error occurred.
vila-infer \ --model-path /home/iris/ModelCkps/NVLA15B \ --conv-mode vicuna_v1 \ --text "Please describe the image" \ --media /home/iris/Datasets/imgs/bike.jpegAfter setting mean_resizing=False and executing export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True in xxx/VILA-main/llava/model/llava_arch.py (line 408), The torch.OutOfMemoryError has occurred again.
Can you give me a help to solve this problem? Thanks!!
Using NVIDIA GeForce RTX 5090 32GB, pyTorch: 2.9.0, cuda 12.8