1. Overview (basic ideas)
Framework for unsupervised phrase-grounding using scene-graphs and linguistic knowledge injection.
2. Novelty
3. Method (Technical details)
4. Results
5. links to papers, codes, etc.
https://openaccess.thecvf.com/content_CVPRW_2020/papers/w56/Parcalabescu_Exploring_Phrase_Grounding_Without_Training_Contextualisation_and_Extension_to_Text-Based_CVPRW_2020_paper.pdf
6. Thoughts, Comments
I was very surprised because the core of this framework is very similar to news2meme. They were much clever in extracting tags from the images, but the relationship is only on a word-level (even when phrases contain multiple words). I wonder how this framework would perform if we would include the subspace representation.
7. bibtex
@inproceedings{parcalabescu2020exploring,
title={Exploring Phrase Grounding without Training: Contextualisation and Extension to Text-Based Image Retrieval},
author={Parcalabescu, Letitia and Frank, Anette},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
pages={962--963},
year={2020}
}
8. Related Papers