Skip to content

hotwa/vllm-deployments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vLLM Model Deployment Configurations

This repository contains Docker Compose configurations for deploying various large language models using vLLM with LiteLLM as an API gateway.

Models

Model Directory Description
GLM-4.6V-NVFP4 glm46v/ Visual language model from Zhipu AI (GLM)
IQuest-Coder 40B IQuest-Coder/instruct/ Code-focused LLM from Fireworks AI
GPT-OSS 120B gpt-oss-120b/ 120B parameter open-source model
Qwen3-Coder w8a8/qwen3-coder/ Code model from Alibaba (W8A8 quantization)

Architecture

Claude Code → LiteLLM (port 4000) → vLLM (port 8000) → Model

Each deployment includes:

  • vLLM: Inference backend serving the model via OpenAI-compatible API
  • LiteLLM: API gateway providing Anthropic-compatible endpoints
  • Tailscale (optional): Network connectivity for distributed setups

Quick Start

  1. Navigate to the model directory:

    cd glm46v/  # or any other model directory
  2. Copy .env.example to .env and configure:

    cp .env.example .env
    # Edit .env with your API keys and settings
  3. Start the services:

    docker compose up -d
  4. Verify the deployment:

    curl http://localhost:4000/models

Configuration

Environment Variables

Variable Description
HF_TOKEN Hugging Face access token for model downloads
LITELLM_MASTER_KEY API key for LiteLLM gateway authentication
TS_AUTHKEY Tailscale authentication key (if using tailnet)
HEADSCALE_URL Headscale server URL (if using self-hosted)

Model-Specific Ports

Service Port
LiteLLM Gateway 4000
vLLM Backend 8000

Directory Structure

vllm-deployments/
├── glm46v/                    # GLM-4.6V deployment
│   ├── .env                   # Environment variables
│   ├── .env.litellm          # LiteLLM-specific config
│   ├── litellm-config.yaml   # LiteLLM model routing
│   ├── docker-compose.yml    # Service orchestration
│   └── Dockerfile            # vLLM image build
├── IQuest-Coder/
│   ├── instruct/             # IQuest-Coder 40B deployment
│   │   ├── .env
│   │   ├── litellm-config.yaml
│   │   └── docker-compose.yml
│   └── loop/                 # Additional config
├── gpt-oss-120b/             # GPT-OSS 120B deployment
│   ├── .env
│   └── docker-compose.yml
└── w8a8/
    └── qwen3-coder/          # Qwen3-Coder deployment

Requirements

  • Docker & Docker Compose
  • NVIDIA GPU with sufficient VRAM
  • CUDA drivers
  • (Optional) Tailscale/Headscale for networking

License

[Add your license here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors