You can use a paid API like OpenAI, Anthropic, or Google Gemini by configuring your API key, however for local hosting, you'll need a service to host your VLM model.
-
Install one of the following: LM Studio, vllm, ollama, or any other local LLM service that serves via the "OpenAI API" (most of them do).
LM Studio is likely the easiest for most people to get working since it is entirely GUI based. I've only included extra steps below for LM Studio. If you want to use ollama, vllm, or another service, please refer to that application's documentation for installation.
-
Download your preferred model inside the service you installed. You should select a model and quant that is a few gigabytes less than your VRAM to leave room for context.
a. For LM Studio, open the app and go to Discover, search for models and download one.
-
Make sure local hosting is enabled:
a. For LM Studio, enable developer mode (bottom left
User - Power User - Developer, click onDeveloper), then go to theDevelopersection, at the top left click the toggle to enable the service. Make sure to copy the uri shown at the top right (see point 5 below).If you are using the standalone GUI, you will also need to
Enable CORSto allow app to call the LM Studio service API.
-
Make sure the service works. You can typically check the /v1/models route in any web browser to make sure the service is running and models are available to serve. (ex. something like
http://192.168.0.5:11434/v1/modelsorhttp://localhost:1234/v1/models-- just open in Chrome) -
Paste the IP and port and paste into
caption.yamlin thebase_urlvalue, and add/v1. You may also seelocalhostin place of the IP if you are not configured to host to the rest of your local network.
Congrats! You're running your own offline LLM/VLM server.
Check the documentation for the server/app you are using if you need more information or support on configuring your service. Further info for LM Studio is here
