Search for images matching a text, using CLIP, PostgreSQL® and pgvector

A Python web app that searches for images matching a given text

Note: The slides/ directory contains slides for a 25 minute talk about version 1 of this code, as given at PyCon UK 2025. That version was organised around a single app and its container file.

Architecture

There are four components in use here:

A PostgreSQL® database, with the pgvector extension installed. This is used to store image and text embeddings
A FastAPI application that can take a text, or the URL for am image file, and use the CLIP model to calculate the vector embedding for that text or image.
A script that sets the database up. It makes sure that pgvector is set up, and then uses the CLIP app to calculate the embedding for each image in the photos directory. It adds an entry in the database for each image name/URL and its embedding.
A FastAPI application that allows the user to enter a text string. It uses the CLIP app to calculate the embedding for the text string, and then looks in the database for images with a similar embedding, so that it can present the four closest images to the user.

Four ways to run this code

As a single self-contained service, complete with its own PostgreSQL database, using the compose.yaml file.
As a single self-contained service with an external PostgreSQL database, using the compose-implicit-db.yaml
As three separate services at the command line, using an external PG database.
As three separate containers, using an external PG database.

The instructions for the first two are below.

A summary of how to do the last two is also below, but details are in the individual README files in each service subdirectory (clip_app, setup_db, query_app).

One service using compose

Set environment variables to describe your database

These will be used when creating the database service.

For bash or other traditional shells:

export POSTGRES_USER=embeddings_user
export POSTGRES_PASSWORD=please-do-not-use-this-password
export POSTGRES_DB=embeddings

For the fish shell:

set -x POSTGRES_USER embeddings_user
set -x POSTGRES_PASSWORD please-do-not-use-this-password
set -x POSTGRES_DB embeddings

Or set the same values in a .env file

POSTGRES_USER=embeddings_user
POSTGRES_PASSWORD=please-do-not-use-this-password
POSTGRES_DB=embeddings

And as it says, please use a proper password 🙂.

Create the images and start the services:

docker compose up -d

And when that's all running, go to http://0.0.0.0:3000/ to find the prompt.

One service and an external database, using compose

Create your external PostgreSQL® database

An Aiven for PostgreSQL service will do very well - see the Create a service section in the Aiven documentation.

Set the environment variable to access your database

Since the database already exists, you need to let the other services know how to connect to it. The URL you need should look something like

postgres://<user>:<password>@<host>:<port>/dbname?sslmode=require

We'll refer to that URL as <service URI> in the following notes.

Note If you're using an Aiven for PostgreSQL service, then you can find this as the Service URI value from the service Overview in the Aiven console.

For bash or other traditional shells:
```
export DATABASE_URL=<service URI>
```
For the fish shell:
```
set -x DATABASE_URL=<service URI>
```
Or set the same values in a .env file
```
DATABASE_URL=<service URI>
```

Create the images and start the services

docker compose -f compose-implicit-db.yaml up -d

And when that's all running1G, go to http://0.0.0.0:3000/ to find the prompt.

Running individual services

The order in which things are done matters, because the different services depend on each other.

Create an external database, as described in One service and an external database, using compose
Start the CLIP application, as described in the clip_app README
Run the database setup script, as described in the setup_db README
Start the query application, as described in the query_app README

And when that's all running, go to http://0.0.0.0:3000/ to find the prompt.

Other considerations

The sample photos

The images in the photos directory are the same as those used in Workshop: Searching for images with vector search - OpenSearch and CLIP model.

They came from Unsplash and have been reduced in size to make them fit within GitHub filesize limits for a repository.

Note Both setup_db and query_app retrieve the sample images directly from this GitHub repository. This is not good practice for a production app, as GitHub is not intended to act as an image repository for web apps.

Use the right Python image

When writing the Dockerfile, the default FROM python:3.11 downloads much of Ubuntu, which we don't need. We can vastly reduce the size of the image by using FROM python:3.11-slim, at the cost of needing to install git (needed by the requirements to download git+https://github.com/openai/CLIP.git) and curl. See https://hub.docker.com/_/python for more about the Python images available.

Use `redirect_slashes=FALSE` in FastAPI

At one point I was running the Dockerised application in an HTTPS context. In order to make the redirect to /search_form also use HTTPS, I needed to tell FastAPI redirect_slashes=FALSE (and make sure that the /search_form in the templates/index.html file didn't end with /).

I found the information at FastAPI redirection for trailing slash returns non-SSL link very helpful, particularly this comment.

Inspirations

The Workshop: Searching for images with vector search - OpenSearch and CLIP model which does (essentially) the same thing, but using OpenSearch and Jupyter notebooks, and the OpenAI CLIP model.
Building a movie recommendation system with Tensorflow and PGVector which searches text, and produces a web app using JavaScript.

For help understanding how to use HTMX

Using HTMX with FastAPI
and for help understanding how I wanted to use forms, Updating Other Content from the HTMX documentation (I went for option 1, as suggested).

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.github		.github
clip_app		clip_app
photos		photos
query_app		query_app
setup_db		setup_db
slides		slides
.env_example		.env_example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compose-implicit-db.yaml		compose-implicit-db.yaml
compose.yaml		compose.yaml
example-video.mp4		example-video.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search for images matching a text, using CLIP, PostgreSQL® and pgvector

Architecture

Four ways to run this code

One service using compose

Set environment variables to describe your database

Create the images and start the services:

One service and an external database, using compose

Create your external PostgreSQL® database

Set the environment variable to access your database

Create the images and start the services

Running individual services

Other considerations

The sample photos

Use the right Python image

Use `redirect_slashes=FALSE` in FastAPI

Inspirations

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Search for images matching a text, using CLIP, PostgreSQL® and pgvector

Architecture

Four ways to run this code

One service using compose

Set environment variables to describe your database

Create the images and start the services:

One service and an external database, using compose

Create your external PostgreSQL® database

Set the environment variable to access your database

Create the images and start the services

Running individual services

Other considerations

The sample photos

Use the right Python image

Use redirect_slashes=FALSE in FastAPI

Inspirations

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Use `redirect_slashes=FALSE` in FastAPI

Packages