Skip to content

Contextual Bandits LInUCB errors during training due to mismatch action and state tensor size #125

@kuds

Description

@kuds

Issue

When running the contextual_bandits_tutorial.ipynb, it fails to train LinUCB due an issue with tensor being on different devices.

Image

However, when training strictly on a CPU, you get an error when the optimize method is called due a miss-match in tensor sizes between the action and state.

Image

Batch State Shape - torch.Size([1, 16])
Batch Action Shape - torch.Size([1, 1, 10])

Environment/Libraries

Environment: Google Colab
Python Version: 3.12.11
Torch Version: 2.8.0+cu126
Is Cuda Available: True
Cuda Version: 12.6
GPU Device: Tesla T4
Gymnasium Version: 1.2.0
Numpy Version: 2.0.2
Matplotlib Version: 3.10.0
Pearl Version: 0.1.0

Solutions

By squeezing the Batch Action, that it become a 2 dimensional tensor ([1,10]), it resolves the issue and allows the model to train. However, I would like to dig further into how the one hot encoding is generating the action tensor.

input_features = torch.cat([batch.state, torch.squeeze(batch.action, dim=1)], dim=1)

Also, this fix does not fix the tensors being on different devices. I believe that issue is stemming from the action and labels being on different devices. More research is needed.

I would be interested in fixing this. Let me know your thoughts!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions