Hi, congrats on the NeurIPS acceptance, and thanks a lot for the open-source work!
I found that the GroundingDINO baseline itself already gives very strong performance.
-
Is this mainly because of using the Swin backbone?
-
Are there any extra fine-tuning details or tips you could share?
Thanks!