Skip to content

Running autocaption fails with wrong size of tensor. #25

@ssuukk

Description

@ssuukk

When trying to run auto caption, the script fails with:

Windows detected, using asyncio.WindowsSelectorEventLoopPolicy
starting
input_dir:  input
Downloading model to .cache/model_base_caption_capfilt_large.pth... please wait
Model cached to: .cache/model_base_caption_capfilt_large.pth
Downloading (…)solve/main/vocab.txt: 100%|██████████████████████████████| 232k/232k [00:00<00:00, 6.17MB/s]
Downloading (…)okenizer_config.json: 100%|██████████████████████████████| 28.0/28.0 [00:00<00:00, 14.0kB/s]
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████| 570/570 [00:00<00:00, 228kB/s]
load checkpoint from .cache/model_base_caption_capfilt_large.pth
loading model to cuda
working image:  input\00012-1722407061-gigapixel-standard-height-1024px.jpg
Traceback (most recent call last):
  File ".\scripts\auto_caption.py", line 217, in <module>
    asyncio.run(main(opt))
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\asyncio\runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\asyncio\base_events.py", line 608, in run_until_complete
    return future.result()
  File ".\scripts\auto_caption.py", line 157, in main
    captions = blip_decoder.generate(image, sample=sample, num_beams=16, min_length=opt.min_length, \
  File "scripts/BLIP\models\blip.py", line 156, in generate
    outputs = self.text_decoder.generate(input_ids=input_ids,
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\transformers\generation\utils.py", line 1524, in generate
    return self.beam_search(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\transformers\generation\utils.py", line 2810, in beam_search
    outputs = self(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 886, in forward
    outputs = self.bert(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 781, in forward
    encoder_outputs = self.encoder(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 445, in forward
    layer_outputs = layer_module(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 361, in forward
    cross_attention_outputs = self.crossattention(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 277, in forward
    self_outputs = self.self(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 178, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (16) must match the size of tensor b (256) at non-singleton dimension 0
(dl) PS D:\Projekty\EveryDream>```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions