Dear authors,
Thank you for your wonderful work. I have some questions and would greatly appreciate your guidance.
- Since PixelLM does not rely on SAM, does this make it need more training data than SAM-based approaches (e.g., LISA)?
- In the paper demo, it looks like multiple tokens are inserted within the model’s text response sometimes, but in the released code it seems that there is only one token appended at the end for multi-object segmentation.
Thank you very much for your time.
Best,
Xinyan
Dear authors,
Thank you for your wonderful work. I have some questions and would greatly appreciate your guidance.
Thank you very much for your time.
Best,
Xinyan