Enabling image captioning will yield higher accuracy for querstions relevant to images in the ingested documents at the cost of higher ingestion latency. Once you have followed steps in quick start guide to launch the blueprint, to enable image captioning support, developers have two options:
-
Deploy the VLM model on-prem. You need a H100 or A100 or B200 GPU to deploy this model.
export VLM_MS_GPU_ID=<AVAILABLE_GPU_ID> USERID=$(id -u) docker compose -f deploy/compose/nims.yaml --profile vlm up -d
-
Make sure the vlm container is up and running
docker ps --filter "name=nemo-vlm-microservice" --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
Example Output
NAMES STATUS nemo-vlm-microservice Up 5 minutes (healthy) -
Enable image captioning Export the below environment variable and relaunch the ingestor-server container.
export APP_NVINGEST_EXTRACTIMAGES="True" export APP_NVINGEST_CAPTIONENDPOINTURL="http://vlm-ms:8000/v1/chat/completions" docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
-
Set caption endpoint and model to API catalog
export APP_NVINGEST_CAPTIONENDPOINTURL="https://integrate.api.nvidia.com/v1/chat/completions" export APP_NVINGEST_CAPTIONMODELNAME="nvidia/llama-3.1-nemotron-nano-vl-8b-v1"
-
Enable image captioning Export the below environment variable and relaunch the ingestor-server container.
export APP_NVINGEST_EXTRACTIMAGES="True" docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
[!TIP]: You can change the model name and model endpoint in case of an externally hosted VLM model by setting these two environment variables and restarting the ingestion services
export APP_NVINGEST_CAPTIONMODELNAME="<vlm_nim_http_endpoint_url>"
export APP_NVINGEST_CAPTIONMODELNAME="<model_name>"To enable image captioning in Helm-based deployments by using an on-prem VLM model, use the following procedure.
-
In the
values.yamlfile, in theingestor-server.envVarssection, set the following environment variables.APP_NVINGEST_EXTRACTIMAGES: "True" APP_NVINGEST_CAPTIONENDPOINTURL: "http://nim-vlm:8000/v1/chat/completions" APP_NVINGEST_CAPTIONMODELNAME: "nvidia/llama-3.1-nemotron-nano-vl-8b-v1"
-
Enable the VLM image captioning model in your
values.yamlfile.nim-vlm: enabled: true
-
Apply the updated Helm chart by running the following code.
helm upgrade --install rag -n rag https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-rag-v2.2.0.tgz \ --username '$oauthtoken' \ --password "${NGC_API_KEY}" \ --set imagePullSecret.password=$NGC_API_KEY \ --set ngcApiSecret.password=$NGC_API_KEY \ -f rag-server/values.yaml
Note
Enabling the on-prem VLM model increases the total GPU requirement to 9xH100 GPUs.
Warning
With image captioning enabled, uploaded files will fail to get ingested, if they do not contain any graphs, charts, tables or plots. This is currently a known limitation and will be fixed in a future release.