This project demonstrates how to generate captions for images using the BLIP (Bootstrapped Language-Image Pretraining) model by Salesforce, powered by the π€ Hugging Face Transformers library.
It is designed to run in Google Colab and uses a dataset of images (such as a subset of Flickr8k) to generate natural language captions.
- β
Uses
Salesforce/blip-image-captioning-basefor image captioning - β Automatically loads and processes images from a ZIP file
- β GPU-accelerated via Google Colab
- β
Shows sample outputs using
matplotlib - β Clean and modular Python code
The dataset used is a 2,000-image subset of the Flickr8k dataset.
π₯ Download here:
https://www.kaggle.com/datasets/sanjeetbeniwal/flicker8k-2k
Expected structure inside the ZIP file:
Flickr8k\_2k.zip
βββ Flicker8k\_2kDataset/
βββ image1.jpg
βββ image2.jpg
βββ ...
Upload this ZIP file to your Colab environment before running the notebook.
The following Python packages are required:
pip install torch torchvision torchaudio
pip install transformers
pip install matplotlibAll dependencies are automatically installed in the Colab notebook.
- Setup: Install required libraries and enable GPU runtime.
- Dataset Unzipping: Upload and extract the dataset in Colab.
- Model Loading: Load BLIP processor and model to GPU.
- Captioning: Select and caption random images.
- Visualization: Display images with generated captions using
matplotlib.
Below is an example of the model generating a caption for an image from the dataset:
Image: screenshot_20250721_235853.jpg
Generated Caption: `a child sitting in a play area'
- Model:
Salesforce/blip-image-captioning-base - Library: Hugging Face Transformers
- Pretrained for general image-to-text tasks.
-
Open the notebook in Google Colab.
-
Upload your dataset ZIP file to Colab (
Flickr8k_2k.zip). -
Set runtime to GPU:
RuntimeβChange runtime typeβGPU
-
Run all cells sequentially.
-
View the images and their generated captions.
This project is for educational and research purposes. It uses publicly available pretrained models under their respective licenses.
