Training on custom dataset

Hi I am trying to train on a custom dataset, after writing my config and dataset file, when i try to train the baseline model using :

python3 train.py --config_file configs/SportMOT/vit_base.yml MODEL.DEVICE_ID "('0')"

i get:

===========building transformer===========
using soft triplet loss for training
2025-08-12 20:14:50,259 transreid.train INFO: start training
/space/users//TransReID/processor/processor.py:41: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  scaler = amp.GradScaler()
/space/users//TransReID/processor/processor.py:57: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(enabled=True):
../aten/src/ATen/native/cuda/Loss.cu:250: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [16,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:250: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [17,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:250: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:250: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [19,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "/space/users//TransReID/train.py", line 76, in <module>
    do_train(
  File "/space/users//TransReID/processor/processor.py", line 59, in do_train
    loss = loss_fn(score, feat, target, target_cam)
  File "/space/users//TransReID/loss/make_loss.py", line 69, in loss_func
    TRI_LOSS = triplet(feat, target)[0]
  File "/space/users//TransReID/loss/triplet_loss.py", line 124, in __call__
    dist_mat = euclidean_dist(global_feat, global_feat)
  File "/space/users//TransReID/loss/triplet_loss.py", line 25, in euclidean_dist
    xx = torch.pow(x, 2).sum(1, keepdim=True).expand(m, n)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Any inputs are appreciated, Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on custom dataset #93

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Training on custom dataset #93

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions