Issue
When running the contextual_bandits_tutorial.ipynb, it fails to train LinUCB due an issue with tensor being on different devices.
However, when training strictly on a CPU, you get an error when the optimize method is called due a miss-match in tensor sizes between the action and state.
Batch State Shape - torch.Size([1, 16])
Batch Action Shape - torch.Size([1, 1, 10])
Environment/Libraries
Environment: Google Colab
Python Version: 3.12.11
Torch Version: 2.8.0+cu126
Is Cuda Available: True
Cuda Version: 12.6
GPU Device: Tesla T4
Gymnasium Version: 1.2.0
Numpy Version: 2.0.2
Matplotlib Version: 3.10.0
Pearl Version: 0.1.0
Solutions
By squeezing the Batch Action, that it become a 2 dimensional tensor ([1,10]), it resolves the issue and allows the model to train. However, I would like to dig further into how the one hot encoding is generating the action tensor.
input_features = torch.cat([batch.state, torch.squeeze(batch.action, dim=1)], dim=1)
Also, this fix does not fix the tensors being on different devices. I believe that issue is stemming from the action and labels being on different devices. More research is needed.
I would be interested in fixing this. Let me know your thoughts!
Issue
When running the contextual_bandits_tutorial.ipynb, it fails to train LinUCB due an issue with tensor being on different devices.
However, when training strictly on a CPU, you get an error when the
optimizemethod is called due a miss-match in tensor sizes between the action and state.Batch State Shape - torch.Size([1, 16])
Batch Action Shape - torch.Size([1, 1, 10])
Environment/Libraries
Environment: Google Colab
Python Version: 3.12.11
Torch Version: 2.8.0+cu126
Is Cuda Available: True
Cuda Version: 12.6
GPU Device: Tesla T4
Gymnasium Version: 1.2.0
Numpy Version: 2.0.2
Matplotlib Version: 3.10.0
Pearl Version: 0.1.0
Solutions
By squeezing the Batch Action, that it become a 2 dimensional tensor ([1,10]), it resolves the issue and allows the model to train. However, I would like to dig further into how the one hot encoding is generating the action tensor.
input_features = torch.cat([batch.state, torch.squeeze(batch.action, dim=1)], dim=1)Also, this fix does not fix the tensors being on different devices. I believe that issue is stemming from the action and labels being on different devices. More research is needed.
I would be interested in fixing this. Let me know your thoughts!