When enabling steps in Docling Serve that require extra models (such as picture classification, picture description, table detection, code recognition, formula extraction, or vision-language modules), you must ensure those models are available in the runtime environment. The standard container image includes only the default models. Any additional models must be downloaded and made available before use. If required models are missing, Docling Serve will raise runtime errors rather than downloading them automatically. This default choice wants to guarantee the system is not calling external services.
Docling Serve loads models from the directory specified by the DOCLING_SERVE_ARTIFACTS_PATH environment variable. This path must be consistent across model download and runtime. When running with multiple workers or reload enabled, you must use the environment variable rather than the CLI argument for configuration [source].
There are several ways to ensure required models are present:
You can configure the container to download all models at startup by clearing the artifacts path:
podman run -d -p 5001:5001 --name docling-serve \
-e DOCLING_SERVE_ARTIFACTS_PATH="" \
-e DOCLING_SERVE_ENABLE_UI=true \
quay.io/docling-project/docling-serveThis approach is simple for local development but not recommended for production, as it increases startup time and depends on network availability.
You can create a new image that includes the required models:
FROM quay.io/docling-project/docling-serve
RUN docling-tools models download smolvlmThis method is suitable for production, as it ensures all models are present in the image and avoids runtime downloads.
You can override the entrypoint to download models before starting the service:
podman run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=true \
quay.io/docling-project/docling-serve \
-- sh -c 'exec docling-tools models download smolvlm && exec docling-serve run'This is useful for environments where you want to keep the base image unchanged but still automate model preparation.
Download models locally and mount them into the container:
# Download the models locally
docling-tools models download --all -o models
# Start the container with the local models folder
podman run -p 5001:5001 \
-v $(pwd)/models:/opt/app-root/src/models \
-e DOCLING_SERVE_ARTIFACTS_PATH="/opt/app-root/src/models" \
-e DOCLING_SERVE_ENABLE_UI=true \
quay.io/docling-project/docling-serveThis approach is robust for both local and production deployments, especially when using persistent storage.
For Kubernetes or OpenShift clusters, the recommended approach is to use a PersistentVolumeClaim (PVC) for model storage, a Kubernetes Job to download models, and mount the volume into the deployment. This ensures models persist across pod restarts and scale-out scenarios.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: docling-model-cache-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10GiIf you don't want to use default storage class, set your custom storage class with following:
spec:
...
storageClassName: <Storage Class Name>Manifest example: docling-model-cache-pvc.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: docling-model-cache-load
spec:
template:
spec:
containers:
- name: loader
image: ghcr.io/docling-project/docling-serve-cpu:main
command:
- docling-tools
- models
- download
- '--output-dir=/modelcache'
- 'layout'
- 'tableformer'
- 'code_formula'
- 'picture_classifier'
- 'smolvlm'
- 'granite_vision'
- 'easyocr'
volumeMounts:
- name: docling-model-cache
mountPath: /modelcache
volumes:
- name: docling-model-cache
persistentVolumeClaim:
claimName: docling-model-cache-pvc
restartPolicy: NeverThe job will mount the previously created persistent volume and execute command similar to how we would load models locally:
docling-tools models download --output-dir <MOUNT-PATH> [LIST_OF_MODELS]
In manifest, we specify desired models individually, or we can use --all parameter to download all models.
Manifest example: docling-model-cache-job.yaml
spec:
template:
spec:
containers:
- name: api
env:
- name: DOCLING_SERVE_ARTIFACTS_PATH
value: '/modelcache'
volumeMounts:
- name: docling-model-cache
mountPath: /modelcache
volumes:
- name: docling-model-cache
persistentVolumeClaim:
claimName: docling-model-cache-pvcThe value of DOCLING_SERVE_ARTIFACTS_PATH must match the mount path where models are stored.
Now, when docling-serve is executing tasks, the underlying docling installation will load model weights from mounted volume.
Manifest example: docling-model-cache-deployment.yaml
For local Docker or Podman execution, you can use any of the approaches above. Mounting a local directory with pre-downloaded models is the most reliable for repeated runs and avoids network dependencies.
- If a required model is missing from the artifacts path, Docling Serve will raise a runtime error.
- Always ensure the value of
DOCLING_SERVE_ARTIFACTS_PATHmatches the directory where models are stored and mounted. - For production and cluster environments, prefer persistent storage and pre-loading models via a dedicated job.
For more details and YAML manifest examples, see the deployment documentation.