Skip to content

Pretrained checkpoint hardcodes data path, causing NotImplementedError when directory structure is different #22

@CiSong10

Description

@CiSong10

Issue Description:

I'm trying to use the provided pretrained model (PointGroup-PAPER.pt checkpoint) to run prediction on my own point cloud dataset, but encountered a NotImplementedError during execution. This appears to be caused by a hardcoded dataroot embedded in the checkpoint config, which assumes the dataset is located at data_set1_5classes.

Since I reorganized the data directory to a different structure (e.g., ./data/data_set1_5classes/), the dataset loader failed to find the data and fell back to download(), which is not implemented — resulting in a crash.

Error Trace

[2025-06-11 14:35:17,201][torch_points3d.trainer][INFO] - DEVICE : cuda
[2025-06-11 14:35:18,132][torch_points3d.metrics.model_checkpoint][INFO] - Loading checkpoint from /home/cisong/ForAINet/outputs/pretrained/PointGroup-PAPER.pt
Traceback (most recent call last):
  File "PointCloudSegmentation/predict.py", line 13, in main
    trainer = Trainer(cfg)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 48, in __init__
    self._initialize_trainer()
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 90, in _initialize_trainer
    self._dataset: BaseDataset = instantiate_dataset(self._checkpoint.data_config)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/dataset_factory.py", line 46, in instantiate_dataset
    dataset = dataset_cls(dataset_config)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/panoptic/treeins_set1.py", line 633, in __init__
    self.test_dataset = dataset_cls(
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 599, in __init__
    super().__init__(root, grid_size, *args, **kwargs)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 239, in __init__
    super(TreeinsOriginalFused, self).__init__(root, transform, pre_transform, pre_filter)

...

  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 625, in download
    super().download()
  File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/in_memory_dataset.py", line 50, in download
    raise NotImplementedError
NotImplementedError

My Analysis

I tried to use VS Code to debug it, after loading the checkpoint, near line 90 of trainer.py, self._checkpoint.data_config.dataroot shows data_set1_classes. And a new ./data_set1_5classes/treeinsfused/raw directory was created. So I assume this fixed dataset configuration (data_config) prevents the model from being easily portable to other data directory structures.

This is also probably the cause of issue #1 (comment) , and why the workaround to change file path proposed in prs-eth/PanopticSegForLargeScalePointCloud#10 (comment) works.

Possible Improvements

If my analysis above is correct -- excuse me if not -- Ideally, checkpoints should store only model weights and architecture-related configs, not absolute or relative paths to training datasets.

Current Workaround

For now I put the /data_set1_5classes/treeinsfused/raw dataset folder back to where it is and solved this issue. I don't quite understand, if I am deploying the pretrained model weights to my own dataset, why does it needs to see the treeinsfused dataset. Or am I doing something wrong in configuration?

More issue

After the workaround, the NotImplementedError solved, another error aroused:

[2025-06-12 16:04:32,709][torch_points3d.trainer][INFO] - DEVICE : cuda
[2025-06-12 16:04:33,722][torch_points3d.metrics.model_checkpoint][INFO] - Loading checkpoint from /home/cisong/ForAINet/outputs/pretrained/PointGroup-PAPER.pt
Processing...
Traceback (most recent call last):
  File "/home/cisong/ForAINet/PointCloudSegmentation/predict.py", line 13, in main
    trainer = Trainer(cfg)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 48, in __init__
    self._initialize_trainer()
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 90, in _initialize_trainer
    self._dataset: BaseDataset = instantiate_dataset(self._checkpoint.data_config)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/dataset_factory.py", line 46, in instantiate_dataset
    dataset = dataset_cls(dataset_config)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/panoptic/treeins_set1.py", line 633, in __init__
    self.test_dataset = dataset_cls(
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 599, in __init__
    super().__init__(root, grid_size, *args, **kwargs)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 239, in __init__
    super(TreeinsOriginalFused, self).__init__(root, transform, pre_transform, pre_filter)
  File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/in_memory_dataset.py", line 60, in __init__
    super().__init__(root, transform, pre_transform, pre_filter)
  File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/dataset.py", line 86, in __init__
    self._process()
  File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/dataset.py", line 165, in _process
    self.process()
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/panoptic/treeins_set1.py", line 557, in process
    super().process()
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 622, in process
    super().process_test(self.test_area)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 492, in process_test
    xyz, semantic_labels, instance_labels = read_treeins_format(
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 73, in read_treeins_format
    semantic_labels = data['semantic_seg'].astype(np.int64)-1
ValueError: no field of name semantic_seg

Can you help me with this? Why Python is looking for ['semantic_seg'] column from the data? My main python script is predict.py and the configuration file I use is predict.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions