Image Captioning API (FastAPI: BLIP, BLIP2, Gemma 3, InternVLM)

A production‑like FastAPI service for automatic image captioning using multiple vision‑language models, with Redis caching, a minimal browser test page, and production run instructions.

Framework: FastAPI (auto‑docs at /docs)
Python: 3.11 (recommended and assumed)
Models: BLIP, BLIP2, Gemma 3, InternVLM (select per request via ?model=)
Caching: Redis with TTL and namespaced keys
Admin: cache reset route protected by API key (disabled in DEBUG)

Supported models

Model is selected per request using the query parameter ?model=blip|blip2|gemma|intern_vlm.

Requirements

Python 3.11
Redis 7.x (local or container)

Quick start (local dev)

Create and activate a Conda environment (Python 3.11)

conda create -n vlm-api python=3.11 -y
conda activate vlm-api

Install dependencies

pip install -U pip
pip install -r requirements.txt

Start Redis (Docker example)

docker run -d --name redis -p 6379:6379 redis:7-alpine

Configure environment

cp .env.example .env
# edit .env as needed (API_KEY, Redis host, etc.)

Run the API (development)

uvicorn --env-file .env app.main:app --reload --port 8000

or simply

sh scripts/start-local.sh

Open the test page and docs

Test page (upload images): http://localhost:8000/
API docs (Swagger UI): http://localhost:8000/docs
Health check: http://localhost:8000/healthz
Metrics (Prometheus): http://localhost:8000/metrics

Environment variables

See .env.example for all options. Common values:

ENV=development|production|test
DEBUG=true|false
PORT=8000
API_KEY=change-me # required for /api/admin/* when DEBUG=false
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
MODEL_CAPACITY=1 # max distinct models kept in memory
CACHE_TTL_SECONDS=86400 # Redis TTL for cache entries

Endpoints

POST /api/caption-images

Multipart form: images (one or more files)
Query params:
- model = blip|blip2|gemma|intern_vlm (default blip)
- caption_prompt (optional)
- flag_caption_prompt (optional)
Accepts: image/jpeg, image/png, image/webp
Response:

{
  "results": [
    { "filename": "img.jpg", "caption": "...", "tags": ["..."], "flagged": false, "cache": false }
  ]
}

Example:

curl -X POST \
  -F "images=@path/to/image.jpg" \
  "http://localhost:8000/api/caption-images?model=blip&caption_prompt=a%20photo%20of"

POST /api/caption-collective-images

Multipart form: images (multiple files)
Only supported for gemma or intern_vlm
Query params: same as above
Response:

{
  "collective_caption": "...",
  "count": 3,
  "tags": ["..."],
  "flagged": false
}

Example:

curl -X POST \
  -F "images=@img1.jpg" -F "images=@img2.jpg" \
  "http://localhost:8000/api/caption-collective-images?model=gemma"

POST /api/admin/reset-cache

Header: X-API-Key: <your-key> (not required when DEBUG=true)
Effect: Flushes the Redis database used by this service.
Example:

curl -X POST -H "X-API-Key: change-me" http://localhost:8000/api/admin/reset-cache

Caching details

Single image cache key: v1:img:{sha256(image_bytes)}
Collective cache key: v1:collection:{sha256(concatenated_hashes)}
TTL: CACHE_TTL_SECONDS (default 86400 seconds)
Redis failures are tolerated: requests still proceed without cache.

Test page

Served at / via Jinja2 template app/templates/index.html with assets under app/static/
Lets you pick the model, optional prompts, and upload one or more images
Calls the endpoints above and renders the results

Running in production (Gunicorn + Uvicorn workers)

Use the provided config in scripts/gunicorn_conf.py.

ENV=production API_KEY=change-me \
  gunicorn -c scripts/gunicorn_conf.py app.main:app

Defaults bind to 0.0.0.0:8000. GPU workloads typically run 1 worker per GPU.

Docker (CPU baseline)

Build and run:

docker build -t vlm-api:latest .
docker run --rm -p 8000:8000 --env-file .env vlm-api:latest

For GPU, base on an NVIDIA CUDA runtime image and install matching Torch CUDA wheels; run with --gpus all.

Notes and tips

First request to a given model will download weights; start with ?model=blip for a quick first run.
Per‑request model selection avoids global mutable state.
Model registry capacity (MODEL_CAPACITY) limits the number of simultaneously loaded models to avoid OOM.
Logging uses Python stdlib logging.basicConfig(...) initialized in app/main.py.

Development and tests

Linting/formatting and type checking are optional; focus is on working API.
Basic tests live under tests/.
Run tests (pytest):

pytest -q

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
app		app
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning API (FastAPI: BLIP, BLIP2, Gemma 3, InternVLM)

Supported models

Requirements

Quick start (local dev)

Environment variables

Endpoints

POST /api/caption-images

POST /api/caption-collective-images

POST /api/admin/reset-cache

Caching details

Test page

Running in production (Gunicorn + Uvicorn workers)

Docker (CPU baseline)

Notes and tips

Development and tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Image Captioning API (FastAPI: BLIP, BLIP2, Gemma 3, InternVLM)

Supported models

Requirements

Quick start (local dev)

Environment variables

Endpoints

POST /api/caption-images

POST /api/caption-collective-images

POST /api/admin/reset-cache

Caching details

Test page

Running in production (Gunicorn + Uvicorn workers)

Docker (CPU baseline)

Notes and tips

Development and tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages