A Python web app that searches for images matching a given text
Note: The slides/ directory contains slides for a 25 minute talk about version 1 of this code, as given at PyCon UK 2025. That version was organised around a single app and its container file.
There are four components in use here:
- A PostgreSQL® database, with the
pgvectorextension installed. This is used to store image and text embeddings - A FastAPI application that can take a text, or the URL for am image file, and use the CLIP model to calculate the vector embedding for that text or image.
- A script that sets the database up. It makes sure that
pgvectoris set up, and then uses the CLIP app to calculate the embedding for each image in thephotosdirectory. It adds an entry in the database for each image name/URL and its embedding. - A FastAPI application that allows the user to enter a text string. It uses the CLIP app to calculate the embedding for the text string, and then looks in the database for images with a similar embedding, so that it can present the four closest images to the user.
- As a single self-contained service, complete with its own PostgreSQL
database, using the
compose.yamlfile. - As a single self-contained service with an external PostgreSQL database,
using the
compose-implicit-db.yaml - As three separate services at the command line, using an external PG database.
- As three separate containers, using an external PG database.
The instructions for the first two are below.
A summary of how to do the last two
is also below, but details are in the
individual README files in each service subdirectory
(clip_app,
setup_db,
query_app).
These will be used when creating the database service.
-
For bash or other traditional shells:
export POSTGRES_USER=embeddings_user export POSTGRES_PASSWORD=please-do-not-use-this-password export POSTGRES_DB=embeddings
-
For the fish shell:
set -x POSTGRES_USER embeddings_user set -x POSTGRES_PASSWORD please-do-not-use-this-password set -x POSTGRES_DB embeddings
-
Or set the same values in a
.envfilePOSTGRES_USER=embeddings_user POSTGRES_PASSWORD=please-do-not-use-this-password POSTGRES_DB=embeddings
And as it says, please use a proper password 🙂.
docker compose up -dAnd when that's all running, go to http://0.0.0.0:3000/ to find the prompt.
An Aiven for PostgreSQL service will do very well - see the Create a service section in the Aiven documentation.
Since the database already exists, you need to let the other services know how to connect to it. The URL you need should look something like
postgres://<user>:<password>@<host>:<port>/dbname?sslmode=require
We'll refer to that URL as <service URI> in the following notes.
Note If you're using an Aiven for PostgreSQL service, then you can find this as the Service URI value from the service Overview in the Aiven console.
-
For bash or other traditional shells:
export DATABASE_URL=<service URI>
-
For the fish shell:
set -x DATABASE_URL=<service URI>
-
Or set the same values in a
.envfileDATABASE_URL=<service URI>
docker compose -f compose-implicit-db.yaml up -dAnd when that's all running1G, go to http://0.0.0.0:3000/ to find the prompt.
The order in which things are done matters, because the different services depend on each other.
- Create an external database, as described in One service and an external database, using compose
- Start the CLIP application, as described in
the clip_app README - Run the database setup script, as described in
the setup_db README - Start the query application, as described in
the
query_app README
And when that's all running, go to http://0.0.0.0:3000/ to find the prompt.
The images in the photos directory are the same as those used in Workshop: Searching for images with vector search - OpenSearch and CLIP model.
They came from Unsplash and have been reduced in size to make them fit within GitHub filesize limits for a repository.
Note Both
setup_dbandquery_appretrieve the sample images directly from this GitHub repository. This is not good practice for a production app, as GitHub is not intended to act as an image repository for web apps.
When writing the Dockerfile, the default FROM python:3.11 downloads much
of Ubuntu, which we don't need. We can vastly reduce the size of the image
by using FROM python:3.11-slim, at the cost of needing to install git
(needed by the requirements to download
git+https://github.com/openai/CLIP.git) and curl. See
https://hub.docker.com/_/python for more about the Python images available.
At one point I was running the Dockerised application in an HTTPS context.
In order to make the redirect to /search_form also use HTTPS, I
needed to tell FastAPI redirect_slashes=FALSE (and make sure that the
/search_form in the templates/index.html file didn't end with /).
I found the information at FastAPI redirection for trailing slash returns non-SSL link very helpful, particularly this comment.
-
The Workshop: Searching for images with vector search - OpenSearch and CLIP model which does (essentially) the same thing, but using OpenSearch and Jupyter notebooks, and the OpenAI CLIP model.
-
Building a movie recommendation system with Tensorflow and PGVector which searches text, and produces a web app using JavaScript.
For help understanding how to use HTMX
- Using HTMX with FastAPI
- and for help understanding how I wanted to use forms, Updating Other Content from the HTMX documentation (I went for option 1, as suggested).
