Hello @feymanpriv,
Thanks for providing this pytorch implementation of DELG! This can be very helpful to the community.
I was looking at the retrieval evaluation code, and I have a question about the experimental protocol on the Revisited Oxford and Paris datasets:
I don't see where query images are cropped before feature extraction. I believe feature extraction is performed here, if I am not mistaken, and this does not seem to distinguish between query and index images (query ones should be cropped, while index ones should not). Possibly the query images would be pre-cropped before image loading? (this is uncommon, and not done in the dataset preparation code in this repo as far as I can tell)
The need for query image cropping is described in the Revisited Oxford/Paris paper. See section 2.3,
Only the cropped regions are to be used as queries; never the full image, since the ground-truth labeling strictly considers only the visual content inside the query region.
Also the authors provide example code where it's shown query cropping, see here. As another example, here's how we do it in the DELG TF codebase (cropping before extraction if images are queries).
Naturally, if images are not cropped before feature extraction, the performance should be much higher given the much larger context.
I am also wondering whether this was the same protocol used in your DOLG paper. There, I see huge gains due to simple reimplementation of DELG (eg, +14-18pp improvement for results in the R1M large-scale dataset) -- those look a bit suspicious.
Again, thanks a lot for your work here. My goal is not to remove merit from your work at all (I find DOLG very interesting!) -- I just really want to clarify the protocol and make sure we would be comparing results apples-to-apples.
Best,
Andre
Hello @feymanpriv,
Thanks for providing this pytorch implementation of DELG! This can be very helpful to the community.
I was looking at the retrieval evaluation code, and I have a question about the experimental protocol on the Revisited Oxford and Paris datasets:
I don't see where query images are cropped before feature extraction. I believe feature extraction is performed here, if I am not mistaken, and this does not seem to distinguish between query and index images (query ones should be cropped, while index ones should not). Possibly the query images would be pre-cropped before image loading? (this is uncommon, and not done in the dataset preparation code in this repo as far as I can tell)
The need for query image cropping is described in the Revisited Oxford/Paris paper. See section 2.3,
Also the authors provide example code where it's shown query cropping, see here. As another example, here's how we do it in the DELG TF codebase (cropping before extraction if images are queries).
Naturally, if images are not cropped before feature extraction, the performance should be much higher given the much larger context.
I am also wondering whether this was the same protocol used in your DOLG paper. There, I see huge gains due to simple reimplementation of DELG (eg, +14-18pp improvement for results in the R1M large-scale dataset) -- those look a bit suspicious.
Again, thanks a lot for your work here. My goal is not to remove merit from your work at all (I find DOLG very interesting!) -- I just really want to clarify the protocol and make sure we would be comparing results apples-to-apples.
Best,
Andre