diff --git a/inference/a4x/disaggregated-serving/dynamo/README.md b/inference/a4x/disaggregated-serving/dynamo/README.md
new file mode 100644
index 00000000..b185b4fb
--- /dev/null
+++ b/inference/a4x/disaggregated-serving/dynamo/README.md
@@ -0,0 +1,292 @@
+# Disaggregated Multi-Node Inference with NVIDIA Dynamo on A4X GKE
+
+This document outlines the steps to deploy and serve Large Language Models (LLMs) using [NVIDIA Dynamo](https://github.com/ai-dynamo/dynamo) disaggregated inference platform on [A4X GKE Node pools](https://cloud.google.com/kubernetes-engine).
+
+Dynamo provides a disaggregated architecture that separates prefill and decode operations for optimized inference performance, supporting both single-node (8 GPUs) and multi-node (16 GPUs) configurations. Dynamo also supports various inference framework backends like [vLLM](https://docs.nvidia.com/dynamo/latest/components/backends/vllm/README.html) and [SGLang](https://docs.nvidia.com/dynamo/latest/components/backends/sglang/README.html). In this recipe, we will focus on serving using the SGLang backend.
+
+
+## Table of Contents
+
+* [1. Test Environment](#test-environment)
+* [2. Environment Setup (One-Time)](#environment-setup)
+ * [2.1. Clone the Repository](#clone-repo)
+ * [2.2. Configure Environment Variables](#configure-vars)
+ * [2.3. Connect to your GKE Cluster](#connect-cluster)
+ * [2.4. Create Secrets](#create-secrets)
+ * [2.5. Install Dynamo Platform](#install-platform)
+* [3. Deploy with SGLang Backend](#deploy-sglang)
+ * [3.1. Multi-Node SGLang Deployment (16 GPUs)](#sglang-multi-node)
+* [4. Inference Request](#inference-request)
+* [5. Monitoring and Troubleshooting](#monitoring)
+* [6. Cleanup](#cleanup)
+
+
+## 1. Test Environment
+
+[Back to Top](#table-of-contents)
+
+This recipe has been tested with the following configuration:
+
+* **GKE Cluster**:
+ * GPU node pools with [a4x-highgpu-4g](https://docs.cloud.google.com/compute/docs/gpus#gb200-gpus) machines:
+ * For multi-node deployment: 4 machines with 4 GPUs each (16 GPUs total)
+ * [Workload Identity Federation for GKE](https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity) enabled
+ * [Cloud Storage FUSE CSI driver for GKE](https://cloud.google.com/kubernetes-engine/docs/concepts/cloud-storage-fuse-csi-driver) enabled
+
+> [!IMPORTANT]
+> To prepare the required environment, see the [GKE environment setup guide](../../../../docs/configuring-environment-gke-a4x.md).
+
+
+## 2. Environment Setup (One-Time)
+
+[Back to Top](#table-of-contents)
+
+
+### 2.1. Clone the Repository
+
+```bash
+git clone https://github.com/ai-hypercomputer/gpu-recipes.git
+cd gpu-recipes
+export REPO_ROOT=$(pwd)
+export RECIPE_ROOT=$REPO_ROOT/inference/a4x/disaggregated-serving/dynamo
+```
+
+
+### 2.2. Configure Environment Variables
+
+```bash
+export PROJECT_ID=
+export CLUSTER_REGION=
+export CLUSTER_NAME=
+export NAMESPACE=dynamo-cloud
+export NGC_API_KEY=
+export HF_TOKEN=
+export RELEASE_VERSION=0.7.0
+
+# Set the project for gcloud commands
+gcloud config set project $PROJECT_ID
+```
+
+Replace the following values:
+
+| Variable | Description | Example |
+| -------- | ----------- | ------- |
+| `PROJECT_ID` | Your Google Cloud Project ID | `gcp-project-12345` |
+| `CLUSTER_REGION` | The GCP region where your GKE cluster is located | `us-central1` |
+| `CLUSTER_NAME` | The name of your GKE cluster | `a4x-cluster` |
+| `NGC_API_KEY` | Your NVIDIA NGC API key (get from [NGC](https://ngc.nvidia.com)) | `nvapi-xxx...` |
+| `HF_TOKEN` | Your Hugging Face access token | `hf_xxx...` |
+
+
+### 2.3. Connect to your GKE Cluster
+
+```bash
+gcloud container clusters get-credentials $CLUSTER_NAME --region $CLUSTER_REGION
+```
+
+
+### 2.4. Create Secrets
+
+Create the namespace:
+```bash
+kubectl create namespace ${NAMESPACE}
+kubectl config set-context --current --namespace=$NAMESPACE
+```
+
+Create the Docker registry secret for NVIDIA Container Registry:
+```bash
+kubectl create secret docker-registry nvcr-secret \
+ --namespace=${NAMESPACE} \
+ --docker-server=nvcr.io \
+ --docker-username='$oauthtoken' \
+ --docker-password=${NGC_API_KEY}
+```
+
+Create the secret for the Hugging Face token:
+```bash
+kubectl create secret generic hf-token-secret \
+ --from-literal=HF_TOKEN=${HF_TOKEN} \
+ -n ${NAMESPACE}
+```
+
+
+### 2.5. Install Dynamo Platform (One-Time Setup)
+
+Add the NVIDIA Helm repository:
+```bash
+helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
+ --username='$oauthtoken' --password=${NGC_API_KEY}
+helm repo update
+```
+
+Fetch the Dynamo Helm charts:
+```bash
+helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
+helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
+```
+
+Install the Dynamo CRDs:
+```bash
+helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz \
+ --namespace default \
+ --wait \
+ --atomic
+```
+
+Install the Dynamo Platform with Grove & Kai scheduler enabled:
+```bash
+helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
+ --namespace ${NAMESPACE} --set grove.enabled=true --set kai-scheduler.enabled=true
+```
+
+Verify the installation:
+```bash
+kubectl get pods -n ${NAMESPACE}
+```
+
+Wait until all pods show a `Running` status before proceeding.
+
+
+## 3. Deploy with SGLang Backend
+
+[Back to Top](#table-of-contents)
+
+Deploy Dynamo with SGLang backend for high-performance inference.
+
+
+### 3.1. Multi-Node vLLM Deployment (16 GPUs)
+
+Multi-node deployment uses 16 GPUs across 4 A4X machines, providing increased capacity for larger models or higher throughput.
+
+#### DeepSeekR1 671B Model
+
+Deploy DeepSeekR1-671B across multiple nodes for production workloads. Note the use of `--set-file prefill_serving_config` and `--set-file decode_serving_config` pointing to the correct model config file for a multi node deployment scenario:
+
+```bash
+cd $RECIPE_ROOT
+helm install -f values.yaml \
+--set-file prefill_serving_config=$REPO_ROOT/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-multi-node-prefill.yaml \
+--set-file decode_serving_config=$REPO_ROOT/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-multi-node-decode.yaml \
+$USER-dynamo-a4x-multi-node \
+$REPO_ROOT/src/helm-charts/a4x/inference-templates/dynamo-deployment
+```
+
+
+## 4. Inference Request
+[Back to Top](#table-of-contents)
+
+To make an inference request to test the server, we can first run a health check against the server using `curl`
+
+```bash
+kubectl exec -it -n ${NAMESPACE} deployment/$USER-dynamo-a4x-multi-node -- curl http://localhost:8000/health | jq
+```
+
+You should see a server status like this. Wait for it to be in a `healthy` state.
+
+```json
+{
+ "instances": [
+ {
+ "component": "backend",
+ "endpoint": "load_metrics",
+ "instance_id": 3994861215823793160,
+ "namespace": "dynamo",
+ "transport": {
+ "nats_tcp": "dynamo_backend.load_metrics-3770991c30298c08"
+ }
+ },
+ {
+ "component": "prefill",
+ "endpoint": "clear_kv_blocks",
+ "instance_id": 3994861215823793153,
+ "namespace": "dynamo",
+ "transport": {
+ "nats_tcp": "dynamo_prefill.clear_kv_blocks-3770991c30298c01"
+ }
+ },
+ {
+ "component": "prefill",
+ "endpoint": "generate",
+ "instance_id": 3994861215823793153,
+ "namespace": "dynamo",
+ "transport": {
+ "nats_tcp": "dynamo_prefill.generate-3770991c30298c01"
+ }
+ }
+ ],
+ "message": "No endpoints available",
+ "status": "unhealthy"
+}
+```
+
+Then we can send a benchmark request with like this:
+
+```bash
+kubectl exec -n ${NAMESPACE} $USER-dynamo-multi-node-serving-frontend -- python3 -u -m sglang.bench_serving --backend sglang-oai-chat --base-url http://localhost:8000 --model "deepseek-ai/DeepSeek-R1" --tokenizer /data/model/deepseek-ai/DeepSeek-R1 --dataset-name random --num-prompts 2048 --random-input-len 2048 --random-output-len 512 --max-concurrency 512
+```
+
+
+## 5. Monitoring and Troubleshooting
+
+[Back to Top](#table-of-contents)
+
+View logs for different components (replace with your deployment name):
+
+You can find the exact pod name by:
+```bash
+kubectl get pods -n ${NAMESPACE}
+```
+
+Frontend logs:
+```bash
+kubectl logs -f deployment/$USER-dynamo-multi-node-serving-frontend -n ${NAMESPACE}
+```
+
+Decode worker logs:
+```bash
+kubectl logs -f deployment/$USER-dynamo-multi-node-serving-decode-worker -n ${NAMESPACE}
+```
+
+Prefill worker logs:
+```bash
+kubectl logs -f deployment/$USER-dynamo-multi-node-serving-prefill-worker -n ${NAMESPACE}
+```
+
+Common issues:
+
+* **Pods stuck in Pending**: Check if nodes have sufficient resources (especially for multi-node deployments)
+* **Model download slow**: Large models like DeepSeekR1 671B can take 30 minutes to download
+* **Multi-node issues**: Verify network connectivity between nodes and proper subnet configuration
+
+
+## 6. Cleanup
+
+[Back to Top](#table-of-contents)
+
+List deployed releases:
+```bash
+helm list -n ${NAMESPACE} --filter $USER-dynamo-
+```
+
+Uninstall specific deployments:
+```bash
+helm uninstall $USER-dynamo-multi-node-serving -n ${NAMESPACE}
+```
+
+Uninstall Dynamo platform (if no longer needed):
+```bash
+helm uninstall dynamo-platform -n ${NAMESPACE}
+helm uninstall dynamo-crds -n default
+```
+
+Delete namespace and secrets:
+```bash
+kubectl delete namespace ${NAMESPACE}
+```
+
+Clean up downloaded charts:
+```bash
+rm -f dynamo-crds-${RELEASE_VERSION}.tgz
+rm -f dynamo-platform-${RELEASE_VERSION}.tgz
+```
+
diff --git a/inference/a4x/disaggregated-serving/dynamo/values.yaml b/inference/a4x/disaggregated-serving/dynamo/values.yaml
new file mode 100644
index 00000000..a047a65f
--- /dev/null
+++ b/inference/a4x/disaggregated-serving/dynamo/values.yaml
@@ -0,0 +1,218 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+dynamo:
+ namespace: yijiaj-test
+ releaseVersion: "0.7.0"
+ deploymentName: disagg2p2d-yijiaj
+ computeDomain:
+ name: yijiaj-a4x-domain
+ numNodes: 4
+ resourceClaimTemplateName: yijiaj-a4x-channel
+ serviceAccountName: dynamo-platform-dynamo-operator-component
+ frontend:
+ image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.7.0
+ replicas: 1
+ livenessProbe:
+ initialDelaySeconds: 3000
+ periodSeconds: 60
+ timeoutSeconds: 150
+ failureThreshold: 100
+ readinessProbe:
+ initialDelaySeconds: 3000
+ periodSeconds: 60
+ timeoutSeconds: 300
+ failureThreshold: 100
+ decodeWorker:
+ image: us-central1-docker.pkg.dev/linglinll-gke-dev/dynamo/dynamo-base:dynamo-wideep-gb200-v0.7.0-sglang-0.5.5.post2-timeout
+ #image: us-central1-docker.pkg.dev/linglinll-gke-dev/dynamo/dynamo-base:dynamo-wideep-gb200-v0.6.1
+ nodeCount: 2
+ replicas: 1
+ envs:
+ - name: HF_TOKEN
+ valueFrom:
+ secretKeyRef:
+ name: hf-token-secret
+ key: HF_TOKEN
+ - name: HF_HUB_ENABLE_HF_TRANSFER
+ value: "1"
+ - name: LD_LIBRARY_PATH
+ value: "/usr/local/ucx/lib:/usr/local/ucx/lib/ucx:/opt/nvidia/nvda_nixl/lib/aarch64-linux-gnu:/opt/nvidia/nvda_nixl/lib/aarch64-linux-gnu/plugins:/usr/local/nvidia/lib64"
+ - name: GLOO_SOCKET_IFNAME
+ value: eth0
+ - name: TP_SOCKET_IFNAME
+ value: eth0
+ # - name: SGLANG_ENABLE_JIT_DEEPGEMM
+ # value: "1"
+ - name: DYN_SKIP_SGLANG_LOG_FORMATTING
+ value: "1"
+ - name: SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK
+ value: "256"
+ - name: MC_TE_METRIC
+ value: "true"
+ # - name: SGLANG_ENABLE_FLASHINFER_GEMM
+ # value: "1"
+ - name: SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE
+ value: "100000"
+ - name: SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT
+ value: "100000"
+ - name: SGLANG_DISAGGREGATION_WAITING_TIMEOUT
+ value: "100000"
+ - name: SGLANG_DECODE_BOOTSTRAP_TIMEOUT
+ value: "1000"
+ - name: SGLANG_HACK_SEQ_BOOTSTRAP_ROOM
+ value: "1"
+ - name: SGLANG_MOONCAKE_CUSTOM_MEM_POOL
+ value: "True"
+ - name: MC_FORCE_MNNVL
+ value: "1"
+ - name: NCCL_MNNVL_ENABLE
+ value: "1"
+ - name: NCCL_CUMEM_ENABLE
+ value: "1"
+ - name: SGLANG_USE_MESSAGE_QUEUE_BROADCASTER
+ value: "0"
+ - name: SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK
+ value: "1"
+ - name: PYTHONUNBUFFERED
+ value: "1"
+ # - name: NCCL_DEBUG
+ # value: INFO
+ # - name: NCCL_DEBUG_SUBSYS
+ # value: INIT,BOOTSTRAP,ENV,NET,GRAPH
+ # - name: NCCL_SOCKET_FAMILY
+ # value: "AF_INET"
+ # - name: GLOO_SOCKET_FAMILY
+ # value: "AF_INET"
+ livenessProbe:
+ initialDelaySeconds: 3000
+ periodSeconds: 60
+ timeoutSeconds: 150
+ failureThreshold: 100
+ readinessProbe:
+ initialDelaySeconds: 3000
+ periodSeconds: 60
+ timeoutSeconds: 300
+ failureThreshold: 100
+ startupProbe:
+ initialDelaySeconds: 3000
+ periodSeconds: 60
+ timeoutSeconds: 600
+ failureThreshold: 3000
+ prefillWorker:
+ image: us-central1-docker.pkg.dev/linglinll-gke-dev/dynamo/dynamo-base:dynamo-wideep-gb200-v0.7.0-sglang-0.5.5.post2-timeout
+ #image: us-central1-docker.pkg.dev/linglinll-gke-dev/dynamo/dynamo-base:dynamo-wideep-gb200-v0.6.1
+ nodeCount: 2
+ replicas: 1
+ envs:
+ - name: HF_TOKEN
+ valueFrom:
+ secretKeyRef:
+ name: hf-token-secret
+ key: HF_TOKEN
+ - name: HF_HUB_ENABLE_HF_TRANSFER
+ value: "1"
+ - name: LD_LIBRARY_PATH
+ value: "/usr/local/ucx/lib:/usr/local/ucx/lib/ucx:/opt/nvidia/nvda_nixl/lib/aarch64-linux-gnu:/opt/nvidia/nvda_nixl/lib/aarch64-linux-gnu/plugins:/usr/local/nvidia/lib64"
+ - name: UCX_TLS
+ value: "^tcp"
+ - name: GLOO_SOCKET_IFNAME
+ value: eth0
+ - name: TP_SOCKET_IFNAME
+ value: eth0
+ # - name: SGLANG_ENABLE_JIT_DEEPGEMM
+ # value: "1"
+ - name: DYN_SKIP_SGLANG_LOG_FORMATTING
+ value: "1"
+ - name: MC_TE_METRIC
+ value: "true"
+ # - name: SGLANG_ENABLE_FLASHINFER_GEMM
+ # value: "1"
+ - name: SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE
+ value: "100000"
+ - name: SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT
+ value: "100000"
+ - name: SGLANG_DISAGGREGATION_WAITING_TIMEOUT
+ value: "100000"
+ - name: SGLANG_MOONCAKE_CUSTOM_MEM_POOL
+ value: "True"
+ - name: MC_FORCE_MNNVL
+ value: "1"
+ - name: NCCL_MNNVL_ENABLE
+ value: "1"
+ - name: NCCL_CUMEM_ENABLE
+ value: "1"
+ - name: SGLANG_USE_MESSAGE_QUEUE_BROADCASTER
+ value: "0"
+ - name: SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK
+ value: "1"
+ - name: PYTHONUNBUFFERED
+ value: "1"
+ livenessProbe:
+ initialDelaySeconds: 3000
+ periodSeconds: 60
+ timeoutSeconds: 150
+ failureThreshold: 100
+ readinessProbe:
+ initialDelaySeconds: 3000
+ periodSeconds: 60
+ timeoutSeconds: 300
+ failureThreshold: 100
+ startupProbe:
+ initialDelaySeconds: 3000
+ periodSeconds: 60
+ timeoutSeconds: 600
+ failureThreshold: 3000
+
+
+secrets:
+ ngc:
+ secretName: nvcr-secret
+ huggingface:
+ secretName: hf-token-secret
+ secretData:
+ token: "hf_api_token"
+
+volumes:
+ useGcs: true
+ gcsfuse:
+ bucketName: "yijiaj-test"
+ fileCacheCapacity: "500G"
+ cachePath: "/gcs-cache"
+ ssdMountPath: "/ssd"
+ gcsMounts:
+ mountPath: "/data/model"
+
+service:
+ type: ClusterIP
+ ports:
+ frontend: 8000
+ worker: 9090
+
+workload:
+ model: deepseek-ai/DeepSeek-R1
+ gpus: 16
+ framework: sglang
+ configFile: serving-args.yaml
+ configPath: /workload/configs
+
+network:
+ subnetworks: []
+ gibVersion: us-docker.pkg.dev/gce-ai-infra/gpudirect-gib/nccl-plugin-gib-diagnostic-arm64:v1.0.7
+ ncclSettings:
+ - name: NCCL_DEBUG
+ value: "VERSION"
+
+quantizations:
+ - "fp8"
diff --git a/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-10p8d-decode.yaml b/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-10p8d-decode.yaml
new file mode 100644
index 00000000..bbbdf18f
--- /dev/null
+++ b/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-10p8d-decode.yaml
@@ -0,0 +1,50 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+served-model-name: deepseek-ai/DeepSeek-R1
+disaggregation-mode: decode
+disaggregation-bootstrap-port: "30001"
+host: "0.0.0.0"
+port: "9090"
+trust-remote-code: true
+skip-tokenizer-init: true
+tp-size: "32"
+dp-size: "32"
+ep-size: "32"
+quantization: "fp8"
+# page-size: "1"
+enable-dp-attention: true
+attention-backend: "trtllm_mla"
+kv-cache-dtype: "fp8_e4m3"
+disable-radix-cache: true
+stream-interval: "50"
+# disaggregation-transfer-backend: nixl
+decode-log-interval: "1000"
+max-running-requests: "8192"
+context-length: "9300"
+watchdog-timeout: "1000000"
+disable-shared-experts-fusion: true
+eplb-algorithm: deepseek
+mem-fraction-static: "0.82"
+chunked-prefill-size: "36864"
+moe-a2a-backend: "deepep"
+deepep-mode: "low_latency"
+ep-dispatch-algorithm: static
+moe-dense-tp-size: "1"
+enable-dp-lm-head: true
+prefill-round-robin-balance: true
+ep-num-redundant-experts: "32"
+cuda-graph-max-bs: "256"
+# disable-cuda-graph: true
+deepep-config: '{"normal_dispatch": {"num_sms": 128,"num_max_nvl_chunked_send_tokens": 28,"num_max_nvl_chunked_recv_tokens": 256,"num_max_rdma_chunked_send_tokens": 6,"num_max_rdma_chunked_recv_tokens": 256}, "normal_combine": {"num_sms": 128,"num_max_nvl_chunked_send_tokens": 15,"num_max_nvl_chunked_recv_tokens": 256,"num_max_rdma_chunked_send_tokens": 6,"num_max_rdma_chunked_recv_tokens": 128}}'
\ No newline at end of file
diff --git a/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-10p8d-prefill.yaml b/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-10p8d-prefill.yaml
new file mode 100644
index 00000000..f5748607
--- /dev/null
+++ b/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-10p8d-prefill.yaml
@@ -0,0 +1,50 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+served-model-name: deepseek-ai/DeepSeek-R1
+# log-level: DEBUG
+disaggregation-mode: prefill
+disaggregation-bootstrap-port: "30001"
+host: "0.0.0.0"
+port: "9090"
+trust-remote-code: true
+tp-size: "8"
+dp-size: "8"
+ep-size: "8"
+quantization: "fp8"
+enable-dp-attention: true
+attention-backend: "trtllm_mla"
+kv-cache-dtype: "fp8_e4m3"
+disable-radix-cache: true
+stream-interval: "50"
+max-running-requests: "30000"
+context-length: "9300"
+# decode-log-interval: "1"
+# page-size: "1"
+# disaggregation-transfer-backend: nixl
+watchdog-timeout: "1000000"
+disable-shared-experts-fusion: true
+eplb-algorithm: deepseek
+mem-fraction-static: "0.8"
+max-total-tokens: "524288"
+chunked-prefill-size: "131072"
+load-balance-method: round_robin
+disable-cuda-graph: true
+moe-a2a-backend: deepep
+deepep-mode: normal
+ep-dispatch-algorithm: "dynamic"
+moe-dense-tp-size: "1"
+enable-dp-lm-head: true
+ep-num-redundant-experts: "32"
+deepep-config: '{"normal_dispatch": {"num_sms": 128,"num_max_nvl_chunked_send_tokens": 28,"num_max_nvl_chunked_recv_tokens": 256,"num_max_rdma_chunked_send_tokens": 6,"num_max_rdma_chunked_recv_tokens": 256}, "normal_combine": {"num_sms": 128,"num_max_nvl_chunked_send_tokens": 15,"num_max_nvl_chunked_recv_tokens": 256,"num_max_rdma_chunked_send_tokens": 6,"num_max_rdma_chunked_recv_tokens": 128}}'
\ No newline at end of file
diff --git a/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-2p2d-decode.yaml b/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-2p2d-decode.yaml
new file mode 100644
index 00000000..a2287217
--- /dev/null
+++ b/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-2p2d-decode.yaml
@@ -0,0 +1,45 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+served-model-name: deepseek-ai/DeepSeek-R1
+log-level: DEBUG
+tp: "8"
+dp-size: "8"
+decode-log-interval: "1"
+page-size: "1"
+enable-dp-attention: true
+trust-remote-code: true
+disaggregation-mode: decode
+disaggregation-transfer-backend: nixl
+disaggregation-bootstrap-port: "30001"
+host: "0.0.0.0"
+port: "9090"
+max-running-requests: "36864"
+context-length: "2716"
+disable-radix-cache: true
+moe-a2a-backend: deepep
+prefill-round-robin-balance: true
+deepep-mode: normal
+moe-dense-tp-size: "1"
+enable-dp-lm-head: true
+disable-cuda-graph: true
+cuda-graph-max-bs: "256"
+disable-shared-experts-fusion: true
+ep-num-redundant-experts: "32"
+ep-dispatch-algorithm: static
+eplb-algorithm: deepseek
+attention-backend: cutlass_mla
+watchdog-timeout: "1000000"
+chunked-prefill-size: "36864"
+mem-fraction-static: "0.8"
diff --git a/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-2p2d-prefill.yaml b/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-2p2d-prefill.yaml
new file mode 100644
index 00000000..f2abbcd4
--- /dev/null
+++ b/src/frameworks/a4x/dynamo-configs/deepseekr1-fp8-2p2d-prefill.yaml
@@ -0,0 +1,45 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+served-model-name: deepseek-ai/DeepSeek-R1
+log-level: DEBUG
+tp: "8"
+dp-size: "8"
+trust-remote-code: true
+decode-log-interval: "1"
+page-size: "1"
+enable-dp-attention: true
+disaggregation-mode: prefill
+disaggregation-transfer-backend: nixl
+disaggregation-bootstrap-port: "30001"
+host: "0.0.0.0"
+port: "9090"
+max-running-requests: "6144"
+context-length: "2716"
+disable-radix-cache: true
+moe-a2a-backend: deepep
+load-balance-method: round_robin
+deepep-mode: normal
+moe-dense-tp-size: "1"
+enable-dp-lm-head: true
+disable-shared-experts-fusion: true
+ep-num-redundant-experts: "32"
+ep-dispatch-algorithm: static
+eplb-algorithm: deepseek
+attention-backend: cutlass_mla
+watchdog-timeout: "1000000"
+disable-cuda-graph: true
+chunked-prefill-size: "16384"
+max-total-tokens: "32768"
+mem-fraction-static: "0.8"
diff --git a/src/helm-charts/a4x/inference-templates/dynamo-deployment/Chart.yaml b/src/helm-charts/a4x/inference-templates/dynamo-deployment/Chart.yaml
new file mode 100644
index 00000000..25a2209e
--- /dev/null
+++ b/src/helm-charts/a4x/inference-templates/dynamo-deployment/Chart.yaml
@@ -0,0 +1,20 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: v2
+name: a4x-dynamo-deployment
+description: a4x-dynamo-deployment
+type: application
+version: 0.1.0
+appVersion: "0.4.0"
\ No newline at end of file
diff --git a/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-compute-domain.yaml b/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-compute-domain.yaml
new file mode 100644
index 00000000..dc2ab53a
--- /dev/null
+++ b/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-compute-domain.yaml
@@ -0,0 +1,24 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: resource.nvidia.com/v1beta1
+kind: ComputeDomain
+metadata:
+ name: {{ .Values.dynamo.computeDomain.name }}
+ namespace: {{ .Values.dynamo.namespace }}
+spec:
+ numNodes: {{ .Values.dynamo.computeDomain.numNodes }}
+ channel:
+ resourceClaimTemplate:
+ name: {{ .Values.dynamo.computeDomain.resourceClaimTemplateName }}
diff --git a/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-graph-deployment.yaml b/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-graph-deployment.yaml
new file mode 100644
index 00000000..0ac6cdf5
--- /dev/null
+++ b/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-graph-deployment.yaml
@@ -0,0 +1,408 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: nvidia.com/v1alpha1
+kind: DynamoGraphDeployment
+metadata:
+ name: {{ .Values.dynamo.deploymentName }}
+ namespace: {{ .Values.dynamo.namespace }}
+spec:
+ {{- if .Values.workload.framework }}
+ backendFramework: {{ .Values.workload.framework }}
+ {{- end }}
+ services:
+ Frontend:
+ dynamoNamespace: {{ .Values.dynamo.namespace }}
+ componentType: frontend
+ replicas: {{ .Values.dynamo.frontend.replicas }}
+ resources:
+ requests:
+ cpu: "5"
+ memory: "10Gi"
+ limits:
+ cpu: "5"
+ memory: "10Gi"
+ extraPodMetadata:
+ annotations:
+ {{- if eq .Values.volumes.useGcs true }}
+ gke-gcsfuse/volumes: "true"
+ gke-gcsfuse/cpu-limit: "0"
+ gke-gcsfuse/memory-limit: "0"
+ gke-gcsfuse/ephemeral-storage-limit: "0"
+ gke-gcsfuse/file-cache-capacity: "500Gi"
+ gke-gcsfuse/cache-path: "/gcs-cache"
+ {{- end }}
+ extraPodSpec:
+ tolerations:
+ - key: "kubernetes.io/arch"
+ operator: "Equal"
+ value: "arm64"
+ effect: "NoSchedule"
+ - key: "nvidia.com/gpu"
+ operator: "Exists"
+ effect: "NoSchedule"
+ volumes:
+ - name: local-ssd
+ emptyDir: {}
+ {{- if eq .Values.volumes.useGcs true }}
+ - name: gcs-model-volume
+ csi:
+ driver: gcsfuse.csi.storage.gke.io
+ volumeAttributes:
+ bucketName: {{ .Values.volumes.gcsfuse.bucketName }}
+ mountOptions: "implicit-dirs,file-cache:enable-parallel-downloads:true,file-cache:parallel-downloads-per-file:100,file-cache:max-parallel-downloads:-1,file-cache:download-chunk-size-mb:50,file-cache:max-size-mb:-1"
+ {{- end }}
+ mainContainer:
+ image: {{ .Values.dynamo.frontend.image }}
+ {{- if eq .Values.volumes.useGcs true }}
+ volumeMounts:
+ - name: local-ssd
+ mountPath: /gcs-cache
+ - name: gcs-model-volume
+ mountPath: /data/model
+ readOnly: true
+ {{- end }}
+ resources:
+ requests:
+ ephemeral-storage: "30Gi"
+ limits:
+ ephemeral-storage: "30Gi"
+
+ Decode:
+ multinode:
+ nodeCount: {{ .Values.dynamo.decodeWorker.nodeCount }}
+ dynamoNamespace: {{ .Values.dynamo.namespace }}
+ envFromSecret: {{ .Values.secrets.huggingface.secretName }}
+ componentType: worker
+ subComponentType: decode
+ replicas: {{ .Values.dynamo.decodeWorker.replicas }}
+ livenessProbe:
+ httpGet:
+ path: /live
+ port: system
+ initialDelaySeconds: {{ .Values.dynamo.decodeWorker.livenessProbe.initialDelaySeconds }}
+ periodSeconds: {{ .Values.dynamo.decodeWorker.livenessProbe.periodSeconds }}
+ timeoutSeconds: {{ .Values.dynamo.decodeWorker.livenessProbe.timeoutSeconds }}
+ failureThreshold: {{ .Values.dynamo.decodeWorker.livenessProbe.failureThreshold }}
+ readinessProbe:
+ httpGet:
+ path: /health
+ port: system
+ initialDelaySeconds: {{ .Values.dynamo.decodeWorker.readinessProbe.initialDelaySeconds }}
+ timeoutSeconds: {{ .Values.dynamo.decodeWorker.readinessProbe.timeoutSeconds }}
+ periodSeconds: {{ .Values.dynamo.decodeWorker.readinessProbe.periodSeconds }}
+ failureThreshold: {{ .Values.dynamo.decodeWorker.readinessProbe.failureThreshold }}
+ sharedMemory:
+ size: 80Gi
+ resources:
+ limits:
+ gpu: "4"
+ claims:
+ - name: compute-domain-channel
+ envs:
+ - name: SERVER_ARGS_FILE
+ value: {{ .Values.workload.configPath }}/{{ .Values.workload.configFile }}
+ {{- if eq .Values.volumes.useGcs true }}
+ - name: MODEL_PATH
+ value: {{ .Values.volumes.gcsMounts.mountPath }}/{{ .Values.workload.model }}
+ {{- end }}
+ {{- if .Values.dynamo.decodeWorker.envs }}
+ {{- toYaml .Values.dynamo.decodeWorker.envs | nindent 8 }}
+ {{- end }}
+ extraPodMetadata:
+ annotations:
+ {{- if eq .Values.volumes.useGcs true }}
+ gke-gcsfuse/cpu-limit: "0"
+ gke-gcsfuse/ephemeral-storage-limit: "0"
+ gke-gcsfuse/memory-limit: "0"
+ gke-gcsfuse/volumes: "true"
+ {{- end }}
+ networking.gke.io/default-interface: 'eth0'
+ networking.gke.io/interfaces: |
+ [
+ {"interfaceName":"eth0","network":"default"},
+ {"interfaceName":"eth2","network":"rdma-0"},
+ {"interfaceName":"eth3","network":"rdma-1"},
+ {"interfaceName":"eth4","network":"rdma-2"},
+ {"interfaceName":"eth5","network":"rdma-3"}
+ ]
+ extraPodSpec:
+ {{- if .Values.dynamo.serviceAccountName }}
+ serviceAccountName: {{ .Values.dynamo.serviceAccountName }}
+ {{- end }}
+ resourceClaims:
+ - name: compute-domain-channel
+ resourceClaimTemplateName: {{ .Values.dynamo.computeDomain.resourceClaimTemplateName }}
+ affinity:
+ nodeAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ nodeSelectorTerms:
+ - matchExpressions:
+ - key: kubernetes.io/arch
+ operator: In
+ values:
+ - arm64
+ mainContainer:
+ securityContext:
+ privileged: true
+ image: {{ .Values.dynamo.decodeWorker.image }}
+ workingDir: /sgl-workspace/dynamo/components/backends/sglang
+ startupProbe:
+ failureThreshold: {{ .Values.dynamo.decodeWorker.startupProbe.failureThreshold }}
+ httpGet:
+ path: /live
+ port: system
+ periodSeconds: {{ .Values.dynamo.decodeWorker.startupProbe.periodSeconds }}
+ timeoutSeconds: {{ .Values.dynamo.decodeWorker.startupProbe.timeoutSeconds }}
+ initialDelaySeconds: {{ .Values.dynamo.decodeWorker.startupProbe.initialDelaySeconds }}
+ command: ["/bin/bash", "-c"]
+ stdin: true
+ tty: true
+ args:
+ - |
+ set -e
+ nvidia-smi
+ . /usr/local/gib/scripts/set_nccl_env.sh
+
+ echo "--- VERIFYING NCCL ENV VARS IN SHELL ---"
+ env | grep NCCL_
+ echo "--- END VERIFICATION ---"
+ pip install hf_transfer
+
+ ARGS=()
+ if [ -n "$MODEL_PATH" ]; then
+ echo "Adding model path from env var: $MODEL_PATH"
+ ARGS+=("--model-path" "$MODEL_PATH")
+ else
+ echo "No MODEL_PATH env var set from gcsfuse, relying on config file for model"
+ ARGS+=("--model" "{{ .Values.workload.model }}")
+ fi
+ if [ -f "$SERVER_ARGS_FILE" ]; then
+ echo "Loading server arguments from ConfigMap"
+ while IFS=': ' read -r key value || [ -n "$key" ]; do
+ [[ -z "$key" || "$key" == \#* ]] && continue
+ key=$(echo "$key" | xargs)
+ value=$(echo "$value" | xargs)
+
+ if [ -n "$key" ]; then
+ if [[ "$value" == "true" ]]; then
+ ARGS+=("--$key")
+ elif [[ "$value" == "false" ]]; then
+ ARGS+=("--$key" "false")
+ elif [ -n "$value" ]; then
+ ARGS+=("--$key" "$value")
+ else
+ ARGS+=("--$key")
+ fi
+ fi
+ done < "$SERVER_ARGS_FILE"
+ fi
+ echo "Running: python3 -m dynamo.sglang ${ARGS[@]}"
+ exec python3 -m dynamo.sglang "${ARGS[@]}"
+
+ volumeMounts:
+ {{- if eq .Values.volumes.useGcs true }}
+ - mountPath: /data/model
+ name: gcs-model-volume
+ {{- end }}
+ - name: library-dir-host
+ mountPath: /usr/local/nvidia
+ - name: gib
+ mountPath: /usr/local/gib
+ - name: serving-configuration
+ mountPath: {{ .Values.workload.configPath | default "/workload/configs" }}
+ volumes:
+ {{- if eq .Values.volumes.useGcs true }}
+ - name: gcs-model-volume
+ csi:
+ driver: gcsfuse.csi.storage.gke.io
+ volumeAttributes:
+ bucketName: {{ .Values.volumes.gcsfuse.bucketName }}
+ mountOptions: implicit-dirs,file-cache:enable-parallel-downloads:true,file-cache:parallel-downloads-per-file:100,file-cache:max-parallel-downloads:-1,file-cache:download-chunk-size-mb:10,file-cache:max-size-mb:-1
+ {{- end }}
+ - name: library-dir-host
+ hostPath:
+ path: /home/kubernetes/bin/nvidia
+ - name: gib
+ hostPath:
+ path: /home/kubernetes/bin/gib
+ - name: serving-configuration
+ configMap:
+ name: "{{ .Release.Name }}-decode-config"
+ items:
+ - key: serving-configuration
+ path: {{ .Values.workload.configFile | default "serving-args.yaml" }}
+
+ Prefill:
+ multinode:
+ nodeCount: {{ .Values.dynamo.prefillWorker.nodeCount }}
+ dynamoNamespace: {{ .Values.dynamo.namespace }}
+ envFromSecret: {{ .Values.secrets.huggingface.secretName }}
+ componentType: worker
+ subComponentType: prefill
+ replicas: {{ .Values.dynamo.prefillWorker.replicas }}
+ livenessProbe:
+ exec:
+ command:
+ - /bin/sh
+ - -c
+ - "exit 0"
+ initialDelaySeconds: {{ .Values.dynamo.prefillWorker.livenessProbe.initialDelaySeconds }}
+ periodSeconds: {{ .Values.dynamo.prefillWorker.livenessProbe.periodSeconds }}
+ timeoutSeconds: {{ .Values.dynamo.prefillWorker.livenessProbe.timeoutSeconds }}
+ failureThreshold: {{ .Values.dynamo.prefillWorker.livenessProbe.failureThreshold }}
+ readinessProbe:
+ httpGet:
+ path: /health
+ port: system
+ initialDelaySeconds: {{ .Values.dynamo.prefillWorker.readinessProbe.initialDelaySeconds }}
+ timeoutSeconds: {{ .Values.dynamo.prefillWorker.readinessProbe.timeoutSeconds }}
+ periodSeconds: {{ .Values.dynamo.prefillWorker.readinessProbe.periodSeconds }}
+ failureThreshold: {{ .Values.dynamo.prefillWorker.readinessProbe.failureThreshold }}
+ sharedMemory:
+ size: 80Gi
+ resources:
+ limits:
+ gpu: "4"
+ claims:
+ - name: compute-domain-channel
+ envs:
+ - name: SERVER_ARGS_FILE
+ value: {{ .Values.workload.configPath }}/{{ .Values.workload.configFile }}
+ {{- if eq .Values.volumes.useGcs true }}
+ - name: MODEL_PATH
+ value: {{ .Values.volumes.gcsMounts.mountPath }}/{{ .Values.workload.model }}
+ {{- end }}
+ {{- if .Values.dynamo.prefillWorker.envs }}
+ {{- toYaml .Values.dynamo.prefillWorker.envs | nindent 8 }}
+ {{- end }}
+ extraPodMetadata:
+ annotations:
+ {{- if eq .Values.volumes.useGcs true }}
+ gke-gcsfuse/cpu-limit: "0"
+ gke-gcsfuse/ephemeral-storage-limit: "0"
+ gke-gcsfuse/memory-limit: "0"
+ gke-gcsfuse/volumes: "true"
+ {{- end }}
+ networking.gke.io/default-interface: 'eth0'
+ networking.gke.io/interfaces: |
+ [
+ {"interfaceName":"eth0","network":"default"},
+ {"interfaceName":"eth2","network":"rdma-0"},
+ {"interfaceName":"eth3","network":"rdma-1"},
+ {"interfaceName":"eth4","network":"rdma-2"},
+ {"interfaceName":"eth5","network":"rdma-3"}
+ ]
+ extraPodSpec:
+ {{- if .Values.dynamo.serviceAccountName }}
+ serviceAccountName: {{ .Values.dynamo.serviceAccountName }}
+ {{- end }}
+ resourceClaims:
+ - name: compute-domain-channel
+ resourceClaimTemplateName: {{ .Values.dynamo.computeDomain.resourceClaimTemplateName }}
+ affinity:
+ nodeAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ nodeSelectorTerms:
+ - matchExpressions:
+ - key: kubernetes.io/arch
+ operator: In
+ values:
+ - arm64
+ mainContainer:
+ securityContext:
+ privileged: true
+ stdin: true
+ tty: true
+ image: {{ .Values.dynamo.prefillWorker.image }}
+ workingDir: /sgl-workspace/dynamo/components/backends/sglang
+ startupProbe:
+ failureThreshold: {{ .Values.dynamo.prefillWorker.startupProbe.failureThreshold }}
+ httpGet:
+ path: /live
+ port: system
+ periodSeconds: {{ .Values.dynamo.prefillWorker.startupProbe.periodSeconds }}
+ timeoutSeconds: {{ .Values.dynamo.prefillWorker.startupProbe.timeoutSeconds }}
+ initialDelaySeconds: {{ .Values.dynamo.prefillWorker.startupProbe.initialDelaySeconds }}
+ command: ["/bin/bash", "-c"]
+ args:
+ - |
+ set -e
+ nvidia-smi
+ . /usr/local/gib/scripts/set_nccl_env.sh
+ pip install hf_transfer
+
+ ARGS=()
+ if [ -n "$MODEL_PATH" ]; then
+ echo "Adding model path from env var: $MODEL_PATH"
+ ARGS+=("--model-path" "$MODEL_PATH")
+ else
+ echo "No MODEL_PATH env var set from gcsfuse, relying on config file for model"
+ ARGS+=("--model" "{{ .Values.workload.model }}")
+ fi
+ if [ -f "$SERVER_ARGS_FILE" ]; then
+ echo "Loading server arguments from ConfigMap"
+ while IFS=': ' read -r key value || [ -n "$key" ]; do
+ [[ -z "$key" || "$key" == \#* ]] && continue
+ key=$(echo "$key" | xargs)
+ value=$(echo "$value" | xargs)
+
+ if [ -n "$key" ]; then
+ if [[ "$value" == "true" ]]; then
+ ARGS+=("--$key")
+ elif [[ "$value" == "false" ]]; then
+ ARGS+=("--$key" "false")
+ elif [ -n "$value" ]; then
+ ARGS+=("--$key" "$value")
+ else
+ ARGS+=("--$key")
+ fi
+ fi
+ done < "$SERVER_ARGS_FILE"
+ fi
+ echo "Running: python3 -m dynamo.sglang ${ARGS[@]}"
+ exec python3 -m dynamo.sglang "${ARGS[@]}"
+
+ volumeMounts:
+ {{- if eq .Values.volumes.useGcs true }}
+ - mountPath: /data/model
+ name: gcs-model-volume
+ {{- end }}
+ - name: library-dir-host
+ mountPath: /usr/local/nvidia
+ - name: gib
+ mountPath: /usr/local/gib
+ - name: serving-configuration
+ mountPath: {{ .Values.workload.configPath | default "/workload/configs" }}
+ volumes:
+ {{- if eq .Values.volumes.useGcs true }}
+ - name: gcs-model-volume
+ csi:
+ driver: gcsfuse.csi.storage.gke.io
+ volumeAttributes:
+ bucketName: {{ .Values.volumes.gcsfuse.bucketName }}
+ mountOptions: implicit-dirs,file-cache:enable-parallel-downloads:true,file-cache:parallel-downloads-per-file:100,file-cache:max-parallel-downloads:-1,file-cache:download-chunk-size-mb:10,file-cache:max-size-mb:-1
+ {{- end }}
+ - name: library-dir-host
+ hostPath:
+ path: /home/kubernetes/bin/nvidia
+ - name: gib
+ hostPath:
+ path: /home/kubernetes/bin/gib
+ - name: serving-configuration
+ configMap:
+ name: "{{ .Release.Name }}-prefill-config"
+ items:
+ - key: serving-configuration
+ path: {{ .Values.workload.configFile | default "serving-args.yaml" }}
\ No newline at end of file
diff --git a/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-launcher-configmap.yaml b/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-launcher-configmap.yaml
new file mode 100644
index 00000000..01e9b51f
--- /dev/null
+++ b/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-launcher-configmap.yaml
@@ -0,0 +1,28 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: "{{ .Release.Name }}-launcher"
+ namespace: {{ .Values.dynamo.namespace }}
+data:
+ launch-workload.sh: |-
+{{- if .Values.workload_launcher }}
+{{ .Values.workload_launcher | nindent 4 }}
+{{- else }}
+ #!/bin/bash
+ echo "No workload launcher specified"
+ exit 1
+{{- end }}
\ No newline at end of file
diff --git a/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-worker-configmap.yaml b/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-worker-configmap.yaml
new file mode 100644
index 00000000..f82580ae
--- /dev/null
+++ b/src/helm-charts/a4x/inference-templates/dynamo-deployment/templates/dynamo-worker-configmap.yaml
@@ -0,0 +1,35 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+{{- if .Values.prefill_serving_config }}
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: "{{ .Release.Name }}-prefill-config"
+ namespace: {{ .Values.dynamo.namespace }}
+data:
+ serving-configuration: |-
+{{ .Values.prefill_serving_config | nindent 4 }}
+{{- end }}
+---
+{{- if .Values.decode_serving_config }}
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: "{{ .Release.Name }}-decode-config"
+ namespace: {{ .Values.dynamo.namespace }}
+data:
+ serving-configuration: |-
+{{ .Values.decode_serving_config | nindent 4 }}
+{{- end }}
\ No newline at end of file