Complete configuration reference for tuning SMG behavior.
SMG can be configured through:
Command-line arguments (highest priority)
Environment variables
Default values (lowest priority)
Network interface to bind to.
Option
--host
Environment
-
Default
0.0.0.0
Value
Description
127.0.0.1
Localhost only
0.0.0.0
All IPv4 interfaces
::
All IPv6 interfaces
::1
IPv6 localhost
Port for the main API server.
Option
--port
Environment
-
Default
30000
List of worker URLs to route requests to.
Option
--worker-urls
Environment
-
Default
Empty
Format
Space-separated URLs
Examples :
--worker-urls http://worker1:8000 http://worker2:8000
--worker-urls http://[::1]:8000 http://192.168.1.1:8000 # IPv6 and IPv4
--worker-urls grpc://worker1:50051 # gRPC mode
Routing Policy Configuration
Controls how requests are distributed across workers.
Option
--policy
Environment
-
Default
cache_aware
Values
random, round_robin, cache_aware, power_of_two, prefix_hash, consistent_hashing, bucket, manual
Policy Comparison :
Policy
Use Case
KV Cache
Load Balance
random
Simple deployments
Poor
Fair
round_robin
Uniform workloads
Poor
Good
power_of_two
Variable workloads
Poor
Excellent
cache_aware
LLM inference
Excellent
Good
prefix_hash
Consistent routing by prefix
Good
Good
consistent_hashing
Session affinity via hash ring
Good
Good
bucket
Load balancing with bucket boundaries
Poor
Excellent
manual
Sticky sessions with LRU eviction
Good
Manual
Recommendation : Use cache_aware for LLM workloads to maximize KV cache hit rates.
Cache-Aware Policy Options
Option
Description
Default
--cache-threshold
Cache threshold (0.0-1.0) for cache-aware routing
0.3
--balance-abs-threshold
Absolute threshold for load balancing trigger
64
--balance-rel-threshold
Relative threshold for load balancing trigger
1.5
--eviction-interval
Interval in seconds between cache eviction operations
120
--max-tree-size
Maximum size of the approximation tree
67108864
--block-size
KV cache block size for event-driven cache-aware routing
16
Prefix Hash Policy Options
Option
Description
Default
--prefix-token-count
Number of prefix tokens to use for hashing
256
--prefix-hash-load-factor
Load factor threshold for rebalancing
1.25
Option
Description
Default
--max-idle-secs
Maximum idle time before eviction
14400 (4 hours)
--assignment-mode
Mode for new routing key assignment
random
Assignment Modes :
random - Assign to a random worker
min_load - Assign to worker with fewest active requests
min_group - Assign to worker with fewest routing keys
Option
Description
Default
--dp-aware
Enable data parallelism aware scheduling
false
--enable-igw
Enable IGW (Inference Gateway) mode for multi-model support
false
--dp-minimum-tokens-scheduler
Enable minimum tokens scheduler for data parallel group
false
--load-monitor-interval
Interval in seconds between load monitor checks for PowerOfTwo routing
10
PD Disaggregation Configuration
Prefill-Decode disaggregated mode separates prefill and decode operations across different workers.
Option
--pd-disaggregation
Environment
-
Default
false
Option
--prefill
Format
URL [BOOTSTRAP_PORT]
Multiple
Yes (specify multiple times)
Examples :
--prefill http://prefill1:30001 9001 \
--prefill http://prefill2:30002 9002 \
--prefill http://prefill3:30003 none # No bootstrap port
Option
--decode
Format
URL
Multiple
Yes (specify multiple times)
Example :
--decode http://decode1:30003 \
--decode http://decode2:30004
Option
Description
Default
--prefill-policy
Specific policy for prefill nodes
Uses main --policy
--decode-policy
Specific policy for decode nodes
Uses main --policy
Worker Startup Configuration
Option
Description
Default
--worker-startup-timeout-secs
Timeout for worker startup and registration
1800 (30 min)
--worker-startup-check-interval
Interval between worker startup checks
30
Service Discovery (Kubernetes)
Option
--service-discovery
Environment
-
Default
false
Note: Enabling service discovery automatically enables IGW mode.
Option
--selector
Format
key=value (space-separated for multiple)
Example :
--selector app=sglang-worker tier=inference
Option
--service-discovery-namespace
Environment
-
Default
All namespaces
Option
--service-discovery-port
Environment
-
Default
80
PD Service Discovery Selectors
Option
Description
--prefill-selector
Label selector for prefill server pods
--decode-selector
Label selector for decode server pods
Option
Description
--router-selector
Label selector for router pod discovery in HA mesh mode (format: key=value)
Per-Worker Model ID Override
Option
Description
--model-id-from
Override each worker's model_id from pod metadata. Accepted values: namespace, label:<key>, or annotation:<key>.
Option
--model-path
Environment
-
Default
None
Description
HuggingFace model ID or local path for loading tokenizer
Option
--tokenizer-path
Environment
-
Default
None
Description
Explicit tokenizer path (overrides model_path tokenizer)
Option
--chat-template
Environment
-
Default
None
Description
Path to chat template file
Disable Tokenizer Autoload
Option
--disable-tokenizer-autoload
Environment
-
Default
false
Description
Disable automatic tokenizer loading at startup and during worker registration. Useful when tokenizers are loaded on-demand via the API.
Tokenizer Cache (L0 - Exact Match)
Option
Description
Default
--tokenizer-cache-enable-l0
Enable L0 exact match cache
false
--tokenizer-cache-l0-max-entries
Maximum entries in L0 cache
10000
Tokenizer Cache (L1 - Prefix Matching)
Option
Description
Default
--tokenizer-cache-enable-l1
Enable L1 prefix matching cache
false
--tokenizer-cache-l1-max-memory
Maximum memory for L1 cache (bytes)
52428800 (50MB)
Option
--reasoning-parser
Environment
-
Default
None
Values
deepseek-r1, qwen3, etc.
Description
Parser for reasoning models with thinking tokens
Option
--tool-call-parser
Environment
-
Default
None
Values
json, qwen, etc.
Description
Parser for tool-call/function-calling interactions
Option
--mcp-config-path
Environment
-
Default
None
Description
Path to MCP (Model Context Protocol) server configuration file
Option
--backend
Environment
-
Default
None (auto-detected)
Values
sglang, vllm, trtllm, openai, anthropic, gemini
Option
--history-backend
Environment
-
Default
memory
Values
memory, none, oracle, postgres, redis
Option
Environment
Description
--oracle-wallet-path
ATP_WALLET_PATH
Path to Oracle ATP wallet directory
--oracle-tns-alias
ATP_TNS_ALIAS
Oracle TNS alias from tnsnames.ora
--oracle-dsn
ATP_DSN
Oracle connection descriptor/DSN
--oracle-user
ATP_USER
Oracle database username
--oracle-password
ATP_PASSWORD
Oracle database password
--oracle-external-auth
ATP_EXTERNAL_AUTH
Enable Oracle external authentication (default: false)
--oracle-pool-min
ATP_POOL_MIN
Minimum connection pool size (default: 1)
--oracle-pool-max
ATP_POOL_MAX
Maximum connection pool size (default: 16)
--oracle-pool-timeout-secs
ATP_POOL_TIMEOUT_SECS
Pool timeout in seconds (default: 30)
Option
Environment
Description
Default
--postgres-db-url
POSTGRES_DB_URL
PostgreSQL connection URL
-
--postgres-pool-max-size
POSTGRES_POOL_MAX
Maximum pool size
16
Option
Environment
Description
Default
--redis-url
REDIS_URL
Redis connection URL
-
--redis-pool-max-size
REDIS_POOL_MAX
Maximum pool size
16
--redis-retention-days
REDIS_RETENTION_DAYS
Data retention (-1 for persistent)
30
Option
--enable-wasm
Environment
-
Default
false
Description
Enable WebAssembly support
Storage Hook WASM Component
Option
--storage-hook-wasm-path
Environment
-
Default
None
Description
Path to a WASM component implementing storage hooks. When set, wraps all storage backends with hook-based interceptors.
Option
--schema-config
Environment
-
Default
None
Description
Path to a YAML schema config file for storage table/column remapping.
Option
Description
Default
--webrtc-bind-addr
Bind address for WebRTC UDP sockets (client-facing ICE candidate IP). Set to 127.0.0.1 for local development on the same machine.
0.0.0.0 (auto-detect via routing table)
--webrtc-stun-server
STUN server for ICE candidate gathering (host:port). Set to your own STUN server for enterprise deployments that restrict outbound traffic to external STUN servers.
stun.l.google.com:19302
Mesh Server Configuration
High-availability mesh networking for multi-router coordination.
Option
Description
Default
--enable-mesh
Enable mesh server for HA multi-router coordination. Requires at least two SMG instances.
false
--mesh-server-name
Name for this mesh node. If not set, a random name is generated (e.g., Mesh_a1b2).
Auto-generated
--mesh-host
Bind address for the mesh server.
0.0.0.0
--mesh-advertise-host
Routable address advertised to other mesh peers. Required when --mesh-host is an unspecified bind address such as 0.0.0.0.
--mesh-host
--mesh-port
Port for the mesh server.
39527
--mesh-peer-urls
Peer mesh node addresses to join (format: host:port). Used for initial cluster formation.
(none)
Example :
smg \
--enable-mesh \
--mesh-server-name router-1 \
--mesh-advertise-host 192.168.1.10 \
--mesh-port 39527 \
--mesh-peer-urls 192.168.1.10:39527
Request Handling Configuration
Option
--request-timeout-secs
Environment
-
Default
1800 (30 minutes)
Description
Maximum time for request processing
Option
--shutdown-grace-period-secs
Environment
-
Default
180 (3 minutes)
Description
Time to wait for in-flight requests during shutdown
Option
--max-payload-size
Environment
-
Default
536870912 (512MB)
Description
Maximum request payload size in bytes
Option
--cors-allowed-origins
Environment
-
Default
Empty
Format
Space-separated URLs
Example :
--cors-allowed-origins http://localhost:3000 https://example.com
Request ID Headers
Option
--request-id-headers
Environment
-
Default
None (uses common defaults)
Description
Custom HTTP headers to check for request IDs
Example :
--request-id-headers x-request-id x-trace-id x-correlation-id
Storage Context Headers
Option
--storage-context-headers
Environment
-
Default
Empty
Format
Space-separated header=context_key entries
Description
Maps request headers into storage hook request context
Example :
--storage-context-headers x-tenant-id=tenant_id x-user-id=user_id
This lets storage hooks read values such as tenant_id and user_id from the
request context without hard-coding specific headers in the gateway.
Only map headers that are injected or sanitized by a trusted upstream. Client-supplied
headers can otherwise spoof storage hook request context values.
Rate Limiting Configuration
Option
--max-concurrent-requests
Environment
-
Default
-1 (unlimited)
Range
-1 or 1+
Sizing Guide :
max_concurrent_requests = num_workers * requests_per_worker_capacity
Worker GPU Memory
Suggested per Worker
16GB
4-8
40GB
8-16
80GB
16-32
Option
Description
Default
--queue-size
Maximum requests waiting when rate limit reached
100
--queue-timeout-secs
Maximum time a request can wait in queue
60
Token Bucket Rate Limiting
Option
--rate-limit-tokens-per-second
Environment
-
Default
Same as max-concurrent-requests
Description
Token bucket refill rate
Option
Description
Default
--retry-max-retries
Maximum retry attempts
5
--retry-initial-backoff-ms
Initial backoff delay (ms)
50
--retry-max-backoff-ms
Maximum backoff delay (ms)
30000
--retry-backoff-multiplier
Exponential backoff multiplier
1.5
--retry-jitter-factor
Jitter factor (0.0-1.0)
0.2
--disable-retries
Disable automatic retries
false
Backoff Formula :
delay = min(initial_backoff * multiplier^attempt, max_backoff) * (1 + random(0, jitter_factor))
Circuit Breaker Configuration
Option
Description
Default
--cb-failure-threshold
Failures before circuit opens
10
--cb-success-threshold
Successes needed to close in half-open state
3
--cb-timeout-duration-secs
Time before attempting recovery
60
--cb-window-duration-secs
Sliding window for tracking failures
120
--disable-circuit-breaker
Disable circuit breaker
false
Circuit Breaker States :
Closed : Normal operation, tracking failures
Open : All requests fail fast, circuit tripped
Half-Open : Testing if service recovered
Health Check Configuration
Option
Description
Default
--health-failure-threshold
Failures before marking unhealthy
3
--health-success-threshold
Successes before marking healthy
2
--health-check-timeout-secs
Timeout for health check requests
5
--health-check-interval-secs
Interval between health checks
60
--health-check-endpoint
Health check endpoint path
/health
--disable-health-check
Disable all health checks
false
--remove-unhealthy-workers
Remove workers from the registry when marked unhealthy by health checks. Useful for ephemeral worker pools where failed workers should be deregistered.
false
Prometheus Metrics Configuration
Option
Description
Default
--prometheus-port
Port for Prometheus metrics endpoint
29000
--prometheus-host
Host for Prometheus metrics server
0.0.0.0
--prometheus-duration-buckets
Custom histogram buckets
Default buckets
Example :
--prometheus-duration-buckets 0.001 0.005 0.01 0.025 0.05 0.1 0.25 0.5 1.0 2.5 5.0 10.0
OpenTelemetry Configuration
Option
--enable-trace
Environment
-
Default
false
Option
--otlp-traces-endpoint
Environment
-
Default
localhost:4317
Format
host:port
Example :
smg --enable-trace --otlp-traces-endpoint jaeger:4317
TLS/mTLS Security Configuration
For HTTPS on the gateway:
Option
Description
--tls-cert-path
Path to server certificate (PEM format)
--tls-key-path
Path to server private key (PEM format)
For secure communication to workers (Python bindings):
Option
Description
--client-cert-path
Path to client certificate
--client-key-path
Path to client private key
--ca-cert-paths
Path(s) to CA certificate(s)
Control Plane Authentication
API Key (Worker Authorization)
Option
--api-key
Environment
-
Default
None
Description
API key for worker authorization (useful with dp-aware scheduling)
Option
--control-plane-api-keys
Environment
CONTROL_PLANE_API_KEYS
Format
id:name:role:key
Multiple
Yes
Example :
--control-plane-api-keys ' key1:Admin:admin:secret123' ' key2:ReadOnly:user:secret456'
Option
Environment
Description
--jwt-issuer
JWT_ISSUER
OIDC issuer URL
--jwt-audience
JWT_AUDIENCE
Expected audience claim
--jwt-jwks-uri
JWT_JWKS_URI
Explicit JWKS URI (auto-discovered if not set)
--jwt-role-claim
-
JWT claim containing role (default: roles)
--jwt-role-mapping
-
Role mapping from IDP to gateway role
JWT Role Mapping Example :
--jwt-role-mapping ' Gateway.Admin=admin' ' Gateway.User=user'
Option
--disable-audit-logging
Environment
-
Default
false (audit logging enabled)
Option
--log-level
Environment
RUST_LOG
Default
info
Values
debug, info, warn, error
Per-Module Logging :
RUST_LOG=smg=debug,hyper=warn smg ...
Option
--log-dir
Environment
-
Default
None (console only)
Description
Directory to store log files
Option
--log-json
Environment
-
Default
false
Description
Output logs as JSON (structured). Defaults to human-readable text logs.
smg --worker-urls http://localhost:8000
High-Throughput Configuration
smg \
--worker-urls http://w1:8000 http://w2:8000 http://w3:8000 http://w4:8000 \
--policy cache_aware \
--max-concurrent-requests 200 \
--queue-size 400 \
--queue-timeout-secs 60 \
--retry-max-retries 3
Low-Latency Configuration
smg \
--worker-urls http://w1:8000 http://w2:8000 \
--policy power_of_two \
--max-concurrent-requests 50 \
--queue-size 25 \
--queue-timeout-secs 5 \
--health-check-interval-secs 5 \
--request-timeout-secs 30
smg \
--pd-disaggregation \
--prefill http://prefill1:30001 9001 \
--prefill http://prefill2:30002 9002 \
--decode http://decode1:30003 \
--decode http://decode2:30004 \
--prefill-policy cache_aware \
--decode-policy round_robin
Kubernetes Service Discovery
smg \
--service-discovery \
--selector app=sglang-worker \
--service-discovery-namespace inference \
--service-discovery-port 8000 \
--policy cache_aware
# Router 1
smg \
--enable-mesh \
--mesh-server-name router-1 \
--mesh-advertise-host 192.168.1.10 \
--mesh-port 39527 \
--mesh-peer-urls 192.168.1.11:39527 \
--worker-urls http://worker1:8000
# Router 2
smg \
--enable-mesh \
--mesh-server-name router-2 \
--mesh-advertise-host 192.168.1.11 \
--mesh-port 39527 \
--mesh-peer-urls 192.168.1.10:39527 \
--worker-urls http://worker2:8000
Secure Production Configuration
smg \
--service-discovery \
--selector app=sglang-worker \
--service-discovery-namespace inference \
--policy cache_aware \
--max-concurrent-requests 100 \
--tls-cert-path /etc/certs/server.crt \
--tls-key-path /etc/certs/server.key \
--jwt-issuer https://login.microsoftonline.com/tenant/v2.0 \
--jwt-audience api://smg-gateway \
--jwt-role-mapping ' Gateway.Admin=admin' ' Gateway.User=user' \
--enable-trace \
--otlp-traces-endpoint jaeger:4317 \
--host 0.0.0.0 \
--port 443
With Tokenizer and Parsers
smg \
--worker-urls http://localhost:8000 \
--model-path meta-llama/Llama-3-8B-Instruct \
--tokenizer-cache-enable-l0 \
--tokenizer-cache-l0-max-entries 50000 \
--reasoning-parser deepseek-r1 \
--tool-call-parser json
# PostgreSQL
smg \
--worker-urls http://localhost:8000 \
--history-backend postgres \
--postgres-db-url " postgres://user:pass@localhost:5432/smg" \
--postgres-pool-max-size 32
# Redis
smg \
--worker-urls http://localhost:8000 \
--history-backend redis \
--redis-url " redis://localhost:6379" \
--redis-pool-max-size 32 \
--redis-retention-days 7
Environment Variable Reference
Environment Variable
CLI Option
Description
RUST_LOG
--log-level
Log level
ATP_WALLET_PATH
--oracle-wallet-path
Oracle wallet path
ATP_TNS_ALIAS
--oracle-tns-alias
Oracle TNS alias
ATP_DSN
--oracle-dsn
Oracle DSN
ATP_USER
--oracle-user
Oracle username
ATP_PASSWORD
--oracle-password
Oracle password
ATP_EXTERNAL_AUTH
--oracle-external-auth
Enable Oracle external authentication
ATP_POOL_MIN
--oracle-pool-min
Oracle min pool size
ATP_POOL_MAX
--oracle-pool-max
Oracle max pool size
ATP_POOL_TIMEOUT_SECS
--oracle-pool-timeout-secs
Oracle pool timeout
POSTGRES_DB_URL
--postgres-db-url
PostgreSQL URL
POSTGRES_POOL_MAX
--postgres-pool-max-size
PostgreSQL max pool
REDIS_URL
--redis-url
Redis URL
REDIS_POOL_MAX
--redis-pool-max-size
Redis max pool
REDIS_RETENTION_DAYS
--redis-retention-days
Redis retention
JWT_ISSUER
--jwt-issuer
JWT issuer URL
JWT_AUDIENCE
--jwt-audience
JWT audience
JWT_JWKS_URI
--jwt-jwks-uri
JWKS URI
CONTROL_PLANE_API_KEYS
--control-plane-api-keys
Control plane API keys