Hi developers,
I am testing GARF on a custom dataset of real-world objects that I have personally scanned, and I would like to clarify two specific behaviors I encountered during my experiments.
First, regarding segmentation versus reconstruction quality: I observed that even when the fracture-aware pretraining module produces suboptimal segmentation (failing to accurately identify or highlight fracture surfaces on the fragments), the final reconstruction sometimes still places some fragments in the correct position, even though the fracture surface was not properly identified. I would like to understand whether this robustness is due to the reassembly module leveraging the rich 64-channel latent features extracted by the PTv3 backbone, rather than relying strictly on the binary output of the segmentation head.
Second, I noticed that if I pass the fragments to the model individually centered at the origin (0, 0, 0), the assembly performance degrades significantly and the model fails to reconstruct the object correctly. Since the framework integrates point cloud coordinates, normals, and scale information as pose-invariant shape priors in the position embedding, I would like to know whether centering the fragments individually breaks the reassembly process by removing necessary spatial cues, or if the model expects a specific initial spatial distribution of the input point clouds.
I would appreciate any guidance on the expected input format and further clarification on the internal dependencies between the segmentation labels and the flow-based reassembly module.
Hi developers,
I am testing GARF on a custom dataset of real-world objects that I have personally scanned, and I would like to clarify two specific behaviors I encountered during my experiments.
First, regarding segmentation versus reconstruction quality: I observed that even when the fracture-aware pretraining module produces suboptimal segmentation (failing to accurately identify or highlight fracture surfaces on the fragments), the final reconstruction sometimes still places some fragments in the correct position, even though the fracture surface was not properly identified. I would like to understand whether this robustness is due to the reassembly module leveraging the rich 64-channel latent features extracted by the PTv3 backbone, rather than relying strictly on the binary output of the segmentation head.
Second, I noticed that if I pass the fragments to the model individually centered at the origin (0, 0, 0), the assembly performance degrades significantly and the model fails to reconstruct the object correctly. Since the framework integrates point cloud coordinates, normals, and scale information as pose-invariant shape priors in the position embedding, I would like to know whether centering the fragments individually breaks the reassembly process by removing necessary spatial cues, or if the model expects a specific initial spatial distribution of the input point clouds.
I would appreciate any guidance on the expected input format and further clarification on the internal dependencies between the segmentation labels and the flow-based reassembly module.