Multimodal Search

In the task of near similar image search, features from Deep Neural Network are often used to compare images and measure similarity. Our analysis explores the vast data to compute k-nearest neighbors using both image and text (both derived fron the image, and provided by the dataset/user). We reduce the problem of searching over multiple modes into a single space by drawing apt correlations between the modes. Algorithmic details can be found in [1] and report.pdf.

Getting Started

Install pytorch, torchvision, pickle, annoy and tkinter libraries.
Ensure paths are correct , vis-a-vis the model_path and pwd variable in all the files of the relevant subdirectory.
When executing for the first time, the models will be saved in the models/ directory. Later, they can be loaded from the corresponding .pth files.
Update the execution environment in user.py and run python user.py for receiving a prompt to upload query image and text, and voila!

File Structure

Global files

user.py - Top Level User-Interface, prompts user for query image and text, and prresents retrieved results
metric.py - for computing similarity index to analyse the quality of the retrieved results
text_embedding.ipynb - contains functionality to generate the text embeddings from a text file

`inception_df/`

Contains files for the DeepFashion dataset, trained on the inception_v3 model. w = 2000 set as default. Here, :-

captions.json - contains the per-image captions, generated from label files
train.py - houses functionality to generate the learnt-database, compressing and storing visual and textual features
query.py - houses functionality to read and query the database for an image-text pair
classifier.py - implementation of the model for generating one-hot and (embeddings for )features inferred from the image.
visual.py - implementation of the inception framework for extracting visual features of the image.
preprocess_words.py - for stopword elimination and text -> vector conversion.

`inception_df_color/`

Contains files for the DeepFashion dataset, trained on the inception_v3 model. The caption file used here has colors appended to it, this had to be done manually, since the dataset does not provision for colors. A separate multi-label classifier model for color extraction was trained for 11 colors, and the generated colors per image were appended to the caption file. Files used for this have been bundled in the color/ subdirectory. w = 200 set by default. The other files are similar to inception_df/. This lets the user also query for terms like [blue] or [brown] as well!

`vgg_df/`

Contains files for the DeepFashion dataset, trained on the VGG19 model. This was done to analyse perfomance difference of the retrieval system by contrasting with the inception_v3 model, and to experiment with the vector-size of the visual features used in the dataset. w = 500 set by default.

`fashion200/`

Contains files for the Fashion 200k dataset, (w = 200 as default) where:-

captions_200.json - contains the per-image captions, generated from label files
train_200.py - houses functionality to generate the learnt-database, compressing and storing visual and textual features
query_200.py - houses functionality to read and query the database for an image-text pair
models_200.py - implementation of the visual encoder used for this dataset
preprocess_words.py - for stopword elimination and text -> vector conversion.

References

[1] Jonghwa Yim, Junghun James Kim, Daekyu Shin , “One-Shot Item Search with Multimodal Data”

[2] The DeepFashion - Multimodal Dataset

[3] The Fashion200k Dataset

[4] Spotify's Annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Credits

A team project by Atharv Dabli and Gurarmaan S. Panjeta .

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
fashion200		fashion200
inception_df		inception_df
inception_df_color		inception_df_color
vgg_df		vgg_df
.gitignore		.gitignore
README.md		README.md
datasets.py		datasets.py
image_viewer.py		image_viewer.py
logs.txt		logs.txt
metric.py		metric.py
report.pdf		report.pdf
test_retrieval.py		test_retrieval.py
text_embedding.ipynb		text_embedding.ipynb
user.py		user.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal Search

Getting Started

File Structure

Global files

`inception_df/`

`inception_df_color/`

`vgg_df/`

`fashion200/`

References

Credits

About

Uh oh!

Releases

Packages

Languages

Panjete/projDisc

Folders and files

Latest commit

History

Repository files navigation

Multimodal Search

Getting Started

File Structure

Global files

inception_df/

inception_df_color/

vgg_df/

fashion200/

References

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`inception_df/`

`inception_df_color/`

`vgg_df/`

`fashion200/`

Packages