This repository contains the code and models for image captioning using the COCO 2017 dataset.
You can find the project dependencies in requirements.txt.
To download the COCO 2017 dataset, please follow the instructions on the official website.
To train a model, first, download the dataset and preprocess the images and captions using the preprocess.py script. The script can be run as follows:
python preprocess.py --data-root /path/to/coco2017 --output /path/to/outputThis will create an HDF5 file containing the preprocessed images and captions.
Next, run the train.py script to train the model:
python train.py --input /path/to/preprocessed/data.hdf5 --output /path/to/outputTo generate captions for new images using a trained model, run the generate.py script:
python generate.py --model /path/to/trained/model.pth --image /path/to/image.jpg- Plotting examples: [https://cocodataset.org/#download](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoDemo.ipynb)
- COCO 2017 dataset: https://cocodataset.org/#download
This project is licensed under the MIT License. See the LICENSE file for details.