Contextual Bandits LInUCB errors during training due to mismatch action and state tensor size

## Issue
When running the [contextual_bandits_tutorial.ipynb](https://colab.research.google.com/github/facebookresearch/Pearl/blob/main/tutorials/contextual_bandits/contextual_bandits_tutorial.ipynb#scrollTo=ISLj7OwSeetB), it fails to train LinUCB due an issue with tensor being on different devices.

<img width="2472" height="672" alt="Image" src="https://github.com/user-attachments/assets/4c072ccf-a854-49fa-ba78-c8858cf0e46c" />

However, when training strictly on a CPU, you get an error when the `optimize` method is called due a miss-match in tensor sizes between the action and state.

<img width="2298" height="648" alt="Image" src="https://github.com/user-attachments/assets/5f39546e-ec35-4bdc-a5bc-bda74acbdd39" />


Batch State Shape - torch.Size([1, 16])
Batch Action Shape -  torch.Size([1, 1, 10])

## Environment/Libraries
Environment: Google Colab
Python Version: 3.12.11
Torch Version: 2.8.0+cu126
Is Cuda Available: True
Cuda Version: 12.6
GPU Device: Tesla T4
Gymnasium Version: 1.2.0
Numpy Version: 2.0.2
Matplotlib Version: 3.10.0
Pearl Version: 0.1.0

## Solutions
By squeezing the Batch Action, that it become a 2 dimensional tensor ([1,10]), it resolves the issue and allows the model to train. However, I would like to dig further into how the one hot encoding is generating the action tensor. 

`input_features = torch.cat([batch.state, torch.squeeze(batch.action, dim=1)], dim=1)`

Also, this fix does not fix the tensors being on different devices. I believe that issue is stemming from the action and labels being on different devices. More research is needed.


**I would be interested in fixing this. Let me know your thoughts!**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contextual Bandits LInUCB errors during training due to mismatch action and state tensor size #125

Issue

Environment/Libraries

Solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Contextual Bandits LInUCB errors during training due to mismatch action and state tensor size #125

Description

Issue

Environment/Libraries

Solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions