Red Hat OpenShift AI 3.3 — Installation Manual

Version: 3.3 Self-Managed
Target Platform: OpenShift Container Platform 4.20
Date: March 2026
Classification: Internal / Operations

Overview
Using This Guide with Claude Code or OpenCode
Global Prerequisites
Prerequisite Operators
Installing the Red Hat OpenShift AI Operator
Configuring the DataScienceCluster
TLS Certificate Management
OpenTelemetry Observability for RHOAI
Distributed Inference with llm-d
Model as a Service (MaaS)
Validation and Testing
Appendix A — Quick-Reference Commands
Appendix B — Troubleshooting
Appendix C — Reference Links

1. Overview

Red Hat OpenShift AI (RHOAI) 3.3 is a self-managed AI/ML platform that provides an integrated environment for developing, training, serving, and monitoring models across hybrid cloud environments. This manual covers a full installation plan organized into two tiers.

RHOAI Basic Features:

Dashboard
Data Science Pipelines
Model Serving (KServe single-model serving)
Model Registry
Workbenches
TrustyAI (model monitoring and bias detection)

Note: Multi-Model Serving via ModelMesh is not supported in RHOAI 3.x. KServe is the only supported model-serving platform from RHOAI 3.0 onwards.

Additional Features:

Distributed Inference with llm-d — GA in RHOAI 3.3 (disaggregated prefill/decode, Inference Gateway, KV-cache-aware routing). Requires OCP 4.20 or later.
Model as a Service — MaaS (governed, rate-limited LLM access via Gateway API and Connectivity Link)
Llama Stack Operator (OpenAI-compatible RAG APIs and agentic AI) — documentation in progress

Cross-Cutting Concerns:

OpenTelemetry observability (traces, metrics, and logs for RHOAI and model serving components)
TLS certificate management (via cert-manager Operator or manual certificate generation)

Important: There is no upgrade path from OpenShift AI 2.x to 3.3. This version requires a fresh installation. For distributed inference with llm-d, OCP 4.20 is required.

Official Documentation:

Using This Guide with Claude Code or OpenCode

This repository includes an AGENTS.md file that gives Claude Code (and compatible tools such as OpenCode) full context about the installation phases, required environment variables, wait conditions, and known gotchas — so an AI assistant can co-pilot the deployment rather than just answer questions about it.

What the AI assistant can do for you

Run preflight checks and report failures before you touch anything.
Fill in helm template and oc apply commands with your actual environment variables.
Watch pod and operator status and tell you when it is safe to move to the next phase.
Diagnose errors by reading command output you paste into the chat.
Stop and ask for confirmation before any destructive or cluster-wide action (InstallPlan approvals, RBAC changes).

How to start a session

Open this repository in Claude Code or OpenCode — the tool will read AGENTS.md automatically.
Make sure you are logged in to the cluster (oc whoami).
Tell the assistant which phase you are on and provide any environment variables it asks for:

"I'm on Phase 0. My AWS region is eu-west-1. Let's start the preflight checks."
After each phase the assistant will report a human gate — a set of conditions you need to confirm before it proceeds.

Phase overview

Phase	What happens	Approx. time
0	Cluster validation (OCP version, admin access, StorageClass, no conflicting operators)	5 min
1	ArgoCD + cert-manager + Let's Encrypt certificates for Ingress and API	15–20 min
2	GPU nodes (AWS MachineSets), Node Feature Discovery, NVIDIA GPU Operator	20–40 min
3	Connectivity Link, Leader Worker Set, RHOAI operator, DataScienceCluster	20–30 min
4	Monitoring stack — Tempo, OpenTelemetry, Grafana	10 min
5	llm-d Quick Start — Gateway, namespace, LLMInferenceService, curl smoke test	15–20 min

Resuming after an error

Paste the failing command and its output into the chat and say which phase you were on. The assistant will diagnose the problem and suggest the next step without restarting from scratch.

2. Global Prerequisites

2.1 Cluster Requirements

Requirement	Specification
OpenShift Container Platform	4.20 (required for llm-d)
Worker nodes (base)	Minimum 2 nodes, 8 vCPU / 32 GiB RAM each
Single-node OpenShift	32 vCPU / 128 GiB RAM
GPU nodes (model serving, llm-d)	NVIDIA A100 / H100 / H200 / A10G / L40S or AMD MI250+
Architecture	x86_64 (primary); aarch64, ppc64le, s390x also supported
Cluster admin access	Required for operator installation
OpenShift CLI (`oc`)	Installed and authenticated
Open Data Hub	Must not be installed on the cluster

2.2 Storage Requirements

A default StorageClass with dynamic provisioning must be configured. Verify with:

oc get storageclass | grep '(default)'

S3-compatible object storage is needed for Pipelines, Model Registry, and model artifact storage (OpenShift Data Foundation, MinIO, or AWS S3).

2.3 Network Requirements

Outbound access to registry.redhat.io and quay.io (or a disconnected mirror).
For llm-d with RoCE: RDMA-capable NICs (see Section 8.3).
DNS must be properly configured. In private cloud environments, manually configure DNS A/CNAME records after LoadBalancer IPs become available.

2.4 Credentials

Hugging Face token (HF_TOKEN) for downloading gated model weights used with llm-d and MaaS.
Red Hat pull secret (from console.redhat.com).

2.5 RHOAI operator version (stable 3.x vs 3.x early access)

The Red Hat OpenShift AI operator is installed from OperatorHub via a Subscription. This repository ships a Helm chart at gitops/operators/rhoai so you can pick the OLM channel and startingCSV without editing YAML by hand.

Goal	OLM channel	Example `startingCSV`
GA stable 3.3.x (default for this guide)	`stable-3.x`	`rhods-operator.3.3.2`
3.4 early access	`beta`	`rhods-operator.3.4.ea2`

Early access builds are published on the beta channel; GA releases use stable-3.x. Pin the CSV you want with startingCSV so upgrades are predictable.

Set RHOAI_OLM_PROFILE when rendering the operator chart (defaults to stable if unset):

`RHOAI_OLM_PROFILE`	Effect
`stable` (default)	`channel: stable-3.x`, `startingCSV: rhods-operator.3.3.2`
`ea`	`channel: beta`, `startingCSV: rhods-operator.3.4.ea2`

You can instead edit gitops/operators/rhoai/values.yaml (olmProfile or explicit channel / startingCSV) or pass --set olmProfile=ea to helm template.

3. Prerequisite Operators

RHOAI 3.3 requires several operators installed before creating the DataScienceCluster. Install them via Operators → OperatorHub in the web console or via CLI Subscription objects.

Note on cert-manager: The cert-manager Operator for Red Hat OpenShift is recommended for automating TLS certificate lifecycle across RHOAI, llm-d, OpenTelemetry, and Llama Stack. It is not a hard requirement — you can provide manually generated certificates wherever TLS is needed. That said, several components document cert-manager as a dependency in their official guides, making it the path of least resistance for most deployments.

Note on Service Mesh: Do not install OpenShift Service Mesh 2.x under any circumstances. It is not supported in RHOAI 3.x and its CRDs conflict with the llm-d gateway component. Service Mesh 3.x is only required if you plan to deploy the Llama Stack Operator — it is not needed for base RHOAI or llm-d.

3.0 ArgoCD (Red Hat OpenShift GitOps)

Go to Ecosystem / Software Catalog, search for gitops, then click Red Hat OpenShift GitOps.

Leave the defaults and click Install.

Leave the defaults as shown and click Install.

3.1 Cert-Manager Operator and Let's Encrypt Certificate Issuer

RBAC Permissions for cert-manager and supporting components

Grant cert-manager the permissions it needs for Certificates, CertificateRequests, Orders, Challenges, ClusterIssuers, Issuers, and optional monitoring integration:

CLOUD can be none or aws, change it to aws if running on AWS.

CLOUD=none
helm template gitops/operators/cert-manager-operator-helm/ --set cloud=${CLOUD} --name-template test | oc apply -f -

If you want to use ArgoCD:

oc apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: credentialsrequest-manager
rules:
- apiGroups:
  - cloudcredential.openshift.io
  resources:
  - credentialsrequests
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete
- apiGroups:
  - monitoring.coreos.com
  resources:
  - servicemonitors
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete
- apiGroups:
  - cert-manager.io
  resources:
  - clusterissuers
  - issuers
  - certificates
  - certificaterequests
  - orders
  - challenges
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: argocd-credentialsrequest-manager
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: credentialsrequest-manager
subjects:
- kind: ServiceAccount
  name: openshift-gitops-argocd-application-controller
  namespace: openshift-gitops
EOF

Installing the operator with ArgoCD

cat <<EOF | oc apply -f -
  apiVersion: argoproj.io/v1alpha1
  kind: Application
  metadata:
    labels:   
      app: cert-manager-operator
    name: cert-manager-operator
    namespace: openshift-gitops
  spec:   
    destination:
      server: 'https://kubernetes.default.svc'
    project: default                                        
    source:
      path: gitops/operators/cert-manager-operator
      repoURL: https://github.com/alpha-hack-program/llm-d-guide.git
      targetRevision: main                                                                                                                       
      helm:
        values: |   
          cloud: ${CLOUD}
    syncPolicy:   
      automated:
        prune: false                                                                                                                             
        selfHeal: false                                     
EOF

Installing Let's Encrypt Cluster Issuers and certificates for OpenShift Ingress and API Server

# 0) Check if logged in with oc
if ! oc whoami &>/dev/null; then
  echo "Error: Not logged in to OpenShift. Please run 'oc login ...' before proceeding."
  exit 1
fi

# 1) Wait for the operator to be ready ==> TODO REVIEW
echo -n "Waiting for cert-manager pods to be ready..."
while [[ $(oc get pods -l app.kubernetes.io/instance=cert-manager -n cert-manager \
  -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}') != "True True True" ]]; do
  echo -n "." && sleep 1
done
echo -e "  [OK]"

# 2) Detect cluster domain and AWS region
CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}')
AWS_DEFAULT_REGION="${AWS_DEFAULT_REGION:=eu-west-1}"

[[ -z "${CLUSTER_DOMAIN}" ]] && { echo "Error: CLUSTER_DOMAIN could not be detected."; exit 1; }
[[ -z "${AWS_DEFAULT_REGION}" ]] && { echo "Error: AWS_DEFAULT_REGION is not set."; exit 1; }

echo "CLUSTER_DOMAIN=${CLUSTER_DOMAIN}"
echo "AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION}"

Install the certificate issuers:

cat <<EOF | oc apply -f -
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  labels:
    app: cert-manager-route53
  name: cert-manager-route53
  namespace: openshift-gitops
spec:
  destination:
    server: 'https://kubernetes.default.svc'
  project: default
  source:
    path: gitops/operators/cert-manager-route53
    repoURL: https://github.com/alpha-hack-program/llm-d-guide.git
    targetRevision: main
    helm:
      parameters:
        - name: clusterDomain
          value: ${CLUSTER_DOMAIN}
        - name: route53.region
          value: ${AWS_DEFAULT_REGION}
  syncPolicy:
    automated:
      prune: false
      selfHeal: false
EOF

Verify certificate status:

oc get certificates.cert-manager.io --all-namespaces \
  -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,STATUS:.status.conditions[0].type,READY:.status.conditions[0].status'

Note (direct helm apply): The first helm template gitops/operators/cert-manager-operator | oc apply -f - will fail on the CertManager CR because the operator CRD is not registered until the CSV reaches Succeeded. Wait for the CSV, then re-run the command — it applies cleanly on the second pass.

3.2 GPU and Hardware Dependencies

Operator	Channel	Source	Purpose
Node Feature Discovery (NFD) Operator	`stable`	`redhat-operators`	Detects GPU hardware capabilities
NVIDIA GPU Operator	`v26.3` (latest)	`certified-operators`	GPU device plugin, drivers, DCGM

Installing the operators

Option A — CLI (recommended):

Each directory contains an install.sh script that queries oc get packagemanifest at runtime to resolve the current default channel and CSV, so the manifests stay valid across OCP releases.

Install NFD first, wait for it to be ready, then install the NVIDIA GPU operator:

# 1. Install NFD operator (resolves channel + CSV dynamically)
bash gitops/operators/nfd/install.sh
oc get csv -n openshift-nfd -w | grep nfd

# Wait for NFD CSV to reach Succeeded
oc wait --for=jsonpath='{.status.phase}'=Succeeded csv \
  -n openshift-nfd -l operators.coreos.com/nfd.openshift-nfd= --timeout=300s

# 2. Install NVIDIA GPU operator (resolves channel + CSV dynamically)
bash gitops/operators/nvidia/install.sh
oc get csv -n nvidia-gpu-operator -w | grep gpu-operator

Option B — OpenShift Console:

Go to Operators → OperatorHub, search for each operator by name, and install with the channel and namespace shown in the table above.

Applying the instance CRs

Once both operator CSVs are Succeeded, create the NodeFeatureDiscovery and ClusterPolicy custom resources:

# Apply NFD instance (NodeFeatureDiscovery CR)
oc apply -k gitops/instance/nfd

# Wait for NFD labels to appear on nodes before applying ClusterPolicy
oc wait --for=condition=Established crd/nodefeaturediscoveries.nfd.openshift.io --timeout=120s

# Apply NVIDIA instance (ClusterPolicy CR)
oc apply -k gitops/instance/nvidia

See: NVIDIA GPU Operator on Red Hat OpenShift Container Platform

Adding A10G GPU nodes in AWS with MachineSets

export INFRA_ID=$(oc get infrastructure cluster -o jsonpath='{.status.infrastructureName}')
export AWS_REGION="${AWS_REGION:=eu-west-1}"
export AMI_ID="${AMI_ID:=ami-0b8c325b7499597c6}"
export AWS_INSTANCE_TYPE="${AWS_INSTANCE_TYPE:=g5.2xlarge}"
export AWS_INSTANCES_PER_AZ=${AWS_INSTANCES_PER_AZ:=1}

echo "INFRA_ID=${INFRA_ID}, AWS_REGION=${AWS_REGION}, AMI_ID=${AMI_ID}, AWS_INSTANCE_TYPE=${AWS_INSTANCE_TYPE}"

for AZ in a b c; do
  helm template gpu-worker ./gitops/instance/machine-sets/gpu-worker \
    --set infrastructureId="${INFRA_ID}" \
    --set region=${AWS_REGION} \
    --set instanceType=${AWS_INSTANCE_TYPE} \
    --set amiId="${AMI_ID}" \
    --set devicePluginConfig="" \
    --set az=${AZ} | oc apply -f -
done

3.3 Core Dependencies (All Installations)

Operator	Channel	Purpose	Required For
Red Hat — Authorino Operator	`managed-services`	Token auth for single-model serving endpoints	KServe / llm-d
cert-manager Operator for Red Hat OpenShift	`stable-v1`	Automated TLS certificate lifecycle	Recommended (see above)
Red Hat Build of Kueue	`stable`	Distributed workload quota and scheduling	GPUaaS / Distributed Workloads only (not required for llm-d)
Red Hat OpenShift Leader Worker Set Operator	`stable`	Multi-node leader/worker pod sets	llm-d (required)

Note on Serverless: The Red Hat OpenShift Serverless operator (Knative Serving) is not required for RHOAI 3.x. It was a prerequisite for the legacy KServe serverless mode in RHOAI 2.x, but RHOAI 3.x uses KServe in raw deployment mode by default and does not require Serverless.

Note on Service Mesh 3.x: Install OpenShift Service Mesh 3.x only if you intend to use the Llama Stack Operator. It is not a prerequisite for llm-d or base RHOAI model serving.

# 1. Connectivity Link (Authorino + Limitador — required for RHOAI 3.x KServe auth and MaaS)
oc apply -k ./gitops/operators/connectivity-link
# InstallPlan may require manual approval due to dependencies
oc get installplan -n openshift-operators | grep -i "requiresapproval"
# If an InstallPlan is pending, approve it:
# oc patch installplan <NAME> -n openshift-operators --type merge -p '{"spec":{"approved":true}}'
oc get csv -n openshift-operators -w | grep -E "rhcl|authorino|limitador"
# Wait for AuthPolicy CRD
oc wait --for=condition=Established crd/authpolicies.kuadrant.io --timeout=300s

# 2. Leader Worker Set (required for llm-d multi-node deployments)
# Apply in a loop to work around potential CRD install race conditions
until oc apply -k ./gitops/operators/leader-worker-set; do
  echo "Waiting for LeaderWorkerSet CRD to become available..."
  sleep 10
done
oc wait --for=condition=Established crd/leaderworkersetoperators.operator.openshift.io --timeout=300s
oc get csv -n openshift-lws-operator -w | grep -E "leader-worker-set"

# 3. Red Hat OpenShift AI Operator
# Choose olmProfile: "stable" (GA, stable-3.x) or "ea" (Early Access, beta channel).
# Before applying, verify startingCSV matches the live packagemanifest:
#   oc get packagemanifest rhods-operator -n openshift-marketplace \
#     -o jsonpath='{.status.channels[?(@.name=="<channel>")].currentCSV}'
# If you need to switch channels after a first install, delete the Subscription and CSV first:
#   oc delete subscription rhods-operator -n redhat-ods-operator
#   oc delete csv <previous-csv> -n redhat-ods-operator
RHOAI_OLM_PROFILE="${RHOAI_OLM_PROFILE:-stable}"
helm template rhoai-operator ./gitops/operators/rhoai \
  --set olmProfile="${RHOAI_OLM_PROFILE}" | oc apply -f -
oc get csv -n redhat-ods-operator -w | grep -E "rhods"

# 4. Configure OpenShift AI (DSCInitialization and DataScienceCluster)
oc wait --for=condition=Established crd/dashboards.components.platform.opendatahub.io --timeout=600s
# Render and apply (chart emits resources across multiple namespaces).
# Note: OdhDashboardConfig CRD may not be ready on the first pass. If the apply fails on
# OdhDashboardConfig, wait for the CRD and re-run:
#   oc wait --for=condition=Established crd/odhdashboardconfigs.opendatahub.io --timeout=120s
helm template rhoai ./gitops/instance/rhoai | oc apply -f -

# Wait for LLMInferenceService CRD and controller pods
oc wait --for=condition=Established crd/llminferenceservices.serving.kserve.io --timeout=300s
oc wait --for=condition=ready pod -l control-plane=odh-model-controller \
  -n redhat-ods-applications --timeout=300s
oc wait --for=condition=ready pod -l control-plane=kserve-controller-manager \
  -n redhat-ods-applications --timeout=300s

# 5. Monitoring stack
# a) Tempo Operator (distributed tracing)
oc apply -k gitops/operators/tempo-operator
oc get csv -n openshift-operators -w | grep -E "tempo"

# b) OpenTelemetry Operator
oc apply -k gitops/operators/opentelemetry-operator
oc get csv -n openshift-operators -w | grep -E "opentelemetry"
oc wait --for=condition=Established crd/instrumentations.opentelemetry.io --timeout=120s

# c) Grafana Operator (optional — for custom dashboards)
oc apply -k gitops/operators/grafana-operator
oc get csv -n grafana-operator -w | grep -E "grafana"
oc wait --for=jsonpath='{.status.phase}'=Succeeded csv -n grafana-operator \
  -l operators.coreos.com/grafana-operator.grafana-operator= --timeout=300s

Optional — Red Hat Build of Kueue (GPUaaS / Distributed Workloads only)

⚠️ Do NOT install unless you specifically need GPUaaS or distributed workload queue management (Ray, PyTorch distributed training). Installing Kueue causes the RHOAI dashboard to label all new projects with kueue.openshift.io/managed=true. Projects with this label only see hardware profiles with scheduling.type: Queue — standard Node-type profiles become invisible unless matching Queue-type profiles and LocalQueues are also configured.

Known issue (RHOAI 3.3.0): The dashboard does not reload its configuration when disableKueue is toggled in OdhDashboardConfig. Restart the dashboard after any change: oc rollout restart deployment/rhods-dashboard -n redhat-ods-applications

# OPTIONAL — only for GPUaaS / distributed workloads
oc apply -k gitops/operators/kueue-operator
oc get csv -n openshift-operators -w | grep -E "kueue"

# OPTIONAL — wait for Kueue CRDs before configuring ClusterQueue
oc wait --for=condition=Established crd/clusterqueues.kueue.x-k8s.io --timeout=600s
oc wait --for=condition=Established crd/resourceflavors.kueue.x-k8s.io --timeout=600s
oc wait --for=condition=Established crd/localqueues.kueue.x-k8s.io --timeout=600s

# OPTIONAL — set Kueue to Managed in the DataScienceCluster after operator is ready
oc patch datasciencecluster default-dsc \
  --type='merge' \
  -p '{"spec":{"components":{"kueue":{"managementState":"Managed","defaultClusterQueueName":"default","defaultLocalQueueName":"default"}}}}'

# OPTIONAL — minimal ClusterQueue + ResourceFlavor setup
cat <<EOF | oc apply -f -
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: default-flavor
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: default
spec:
  namespaceSelector: {}
  resourceGroups:
  - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
    flavors:
    - name: default-flavor
      resources:
      - name: cpu
        nominalQuota: "64"
      - name: memory
        nominalQuota: "256Gi"
      - name: nvidia.com/gpu
        nominalQuota: "8"
EOF

# OPTIONAL — create a LocalQueue in each Kueue-managed namespace
cat <<EOF | oc apply -f -
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  name: default
  namespace: <your-namespace>
spec:
  clusterQueue: default
EOF

# OPTIONAL — Queue-type hardware profile for Kueue-managed namespaces
# (Node-type profiles are invisible in namespaces with kueue.openshift.io/managed=true)
cat <<EOF | oc apply -f -
apiVersion: infrastructure.opendatahub.io/v1
kind: HardwareProfile
metadata:
  name: default-cpu-queue
  namespace: redhat-ods-applications
  annotations:
    opendatahub.io/display-name: "Default CPU (Kueue)"
    opendatahub.io/disabled: "false"
spec:
  identifiers:
  - displayName: CPU
    identifier: cpu
    minCount: 1
    maxCount: 4
    defaultCount: 2
    resourceType: CPU
  - displayName: Memory
    identifier: memory
    minCount: 2Gi
    maxCount: 8Gi
    defaultCount: 4Gi
    resourceType: Memory
  scheduling:
    type: Queue
    queue:
      localQueueName: default
EOF

3.4 Pipeline Dependencies

Operator	Channel	Purpose
Red Hat OpenShift Pipelines	`latest`	Tekton pipelines for data science workflows

Note: The OpenShift Pipelines operator is optional for llm-d. It is required only if you plan to use Data Science Pipelines features in RHOAI.

oc apply -k gitops/operators/pipelines

# If the InstallPlan requires manual approval (YOU MAY NEED TO WAIT FOR SOME MINS TO SEE THE INSTALLPLAN!!):
INSTALLPLAN_NAME=$(oc get installplan -n openshift-operators -o json | \
  jq -r '.items[] | select(.spec.clusterServiceVersionNames[]? | contains("openshift-pipelines-operator-rh")) | .metadata.name')
oc patch installplan "$INSTALLPLAN_NAME" -n openshift-operators \
  --type merge --patch '{"spec":{"approved":true}}'

oc get csv -n openshift-operators -w | grep -E "pipelines"

3.5 Check Operators

./scripts/check-operators.sh

Quick Start Guide to Deploy llm-d

Deploy llm-d on a connected OpenShift 4.20 cluster with RHOAI 3.3.

Prerequisites: Complete all steps in Section 3 before proceeding. In particular, confirm that the LLMInferenceService CRD is available (oc get crd llminferenceservices.serving.kserve.io) and that both odh-model-controller and kserve-controller-manager pods are Running in redhat-ods-applications.

Step 1: Configure the Gateway

Create the GatewayClass and Gateway for llm-d.

Using a LoadBalancer with a pre-existing certificate:

APP_NAME=gateway
GATEWAY_NAME=${GATEWAY_NAME:=openshift-ai-inference}
CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
echo "CLUSTER_DOMAIN=${CLUSTER_DOMAIN}"

helm template gitops/instance/llm-d/gateway \
  --name-template ${APP_NAME} \
  --set gatewayName="${GATEWAY_NAME}" \
  --set clusterDomain="${CLUSTER_DOMAIN}" \
  --set subdomain=inference \
  --set useOpenShiftRoute=false \
  --set tls.secretName=ingress-certs \
  --include-crds | oc apply -f -

Using OpenShift router and generating a self-signed certificate

APP_NAME=gateway
GATEWAY_NAME=${GATEWAY_NAME:=openshift-ai-inference}
CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
echo "CLUSTER_DOMAIN=${CLUSTER_DOMAIN}"

helm template gitops/instance/llm-d/gateway \
  --name-template ${APP_NAME} \
  --set gatewayName="${GATEWAY_NAME}" \
  --set clusterDomain="${CLUSTER_DOMAIN}" \
  --set subdomain=inference \
  --set useOpenShiftRoute=true \
  --set tls.secretName="${GATEWAY_NAME}" \
  --set tls.generate=true --include-crds | oc apply -f -

Other gateway configurations: See gitops/instance/llm-d/gateway/README.md for alternative setups (bare metal, self-signed certs, OpenShift Routes).

Verify the Gateway is ready:

oc get gateway -n openshift-ingress

# Expected output:
# NAME                              CLASS                            PROGRAMMED   AGE
# openshift-ai-inference            openshift-ai-inference-class     True         ...

Step 2: Create Namespace

PROJECT="llm-d-demo"

oc new-project ${PROJECT}
oc label namespace ${PROJECT} modelmesh-enabled=false opendatahub.io/dashboard=true

Step 3: Deploy an LLMInferenceService

Option A — Qwen3-8B-FP8 via OCI ModelCar (recommended for air-gapped / registry-cached deployments)

Create a values override file:

cat <<EOF > qwen3-8b-fp8-dynamic-oci.tmp.yaml
deploymentType: intelligent-inference
serviceName: qwen3-8b
replicas: 2
useStartupProbe: true
storage:
  type: oci
  uri: oci://registry.redhat.io/rhelai1/modelcar-qwen3-8b-fp8-dynamic:1.5
model:
  name: alibaba/qwen3-8b
resources:
  limits: { cpu: "4", memory: 16Gi, gpuCount: "1" }
  requests: { cpu: "1", memory: 8Gi, gpuCount: "1" }
env:
  - name: VLLM_ADDITIONAL_ARGS
    value: "--disable-uvicorn-access-log --enable-auto-tool-choice --tool-call-parser hermes"
EOF

Render and apply:

helm template gitops/instance/llm-d/inference \
  --name-template qwen3-8b -n ${PROJECT} \
  -f gitops/instance/llm-d/inference/values.yaml \
  -f qwen3-8b-fp8-dynamic-oci.tmp.yaml \
  --include-crds | oc apply -f -

Option B — Facebook OPT-125m via HuggingFace (quick test with a small public model)

cat <<EOF > facebook-opt-125m-hf.tmp.yaml
deploymentType: intelligent-inference
serviceName: opt-125m
replicas: 1
useStartupProbe: true
storage:
  type: hf
  uri: hf://facebook/opt-125m
model:
  name: facebook/opt-125m
resources:
  limits: { cpu: "2", memory: 8Gi, gpuCount: 1 }
  requests: { cpu: "1", memory: 4Gi, gpuCount: 1 }
EOF

helm template gitops/instance/llm-d/inference \
  --name-template opt-125m -n ${PROJECT} \
  -f gitops/instance/llm-d/inference/values.yaml \
  -f facebook-opt-125m-hf.tmp.yaml \
  --include-crds | oc apply -f -

HuggingFace access: If using a gated model, ensure your HF_TOKEN secret is configured in the namespace before deploying.

Step 4: Verify Deployment

Check LLMInferenceService status

oc get llminferenceservice -w -n ${PROJECT}

# Expected output:
# NAME       URL                                                    READY   AGE
# qwen3-8b   https://<gateway-url>/${PROJECT}/qwen3-8b             True    5m

Check pods

oc get pods -w -n ${PROJECT}

# Expected output:
# NAME                                            READY   STATUS    AGE
# qwen3-8b-kserve-xxxxx-xxxxx                    1/1     Running   3m
# qwen3-8b-kserve-xxxxx-xxxxx                    1/1     Running   3m
# qwen3-8b-kserve-router-scheduler-xxxxx         1/1     Running   3m

Watch pod logs

# vLLM server logs
oc logs -f \
  -l app.kubernetes.io/name=qwen3-8b,app.kubernetes.io/component=llminferenceservice-workload \
  -n ${PROJECT}

# Scheduler logs
oc logs -f \
  -l app.kubernetes.io/name=qwen3-8b,app.kubernetes.io/component=llminferenceservice-router-scheduler \
  -n ${PROJECT}

Step 5: Test the Endpoint

Get the inference URL

INFERENCE_URL=$(oc get gateway openshift-ai-inference -n openshift-ingress \
  -o json | jq -r '.spec.listeners[] | select(.name=="https").hostname')
echo "Inference URL: https://${INFERENCE_URL}"

List available models

curl -s https://${INFERENCE_URL}/${PROJECT}/qwen3-8b/v1/models | jq

Send a completion request

curl -s -X POST https://${INFERENCE_URL}/${PROJECT}/qwen3-8b/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "alibaba/qwen3-8b",
    "prompt": "Explain the difference between supervised and unsupervised learning.",
    "max_tokens": 50,
    "temperature": 0.7
  }' | jq '.choices[0].text'

Send a chat completion request

curl -s -X POST https://${INFERENCE_URL}/${PROJECT}/qwen3-8b/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "alibaba/qwen3-8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant. Be VERY concise"},
      {"role": "user", "content": "Answer to the Ultimate Question of Life, the Universe, and Everything."}
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }' | jq '.choices[0].message.content'

Step 6: Deploy Monitoring (Optional)

Deploy Prometheus and Grafana for performance monitoring (TTFT, inter-token latency, KV cache hit rates, GPU utilization):

until oc apply -k gitops/instance/llm-d-monitoring; do : ; done

# Get Grafana URL
oc get route grafana -n llm-d-monitoring -o jsonpath='{.spec.host}'

Access Grafana with default credentials: admin / admin

Quick Start Summary

Step	Command	Verification
1. Configure Gateway	`CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}'); helm template gitops/instance/llm-d/gateway --name-template gateway --set clusterDomain="${CLUSTER_DOMAIN}" --include-crds \| oc apply -f -`	`oc get gateway -n openshift-ingress`
2. Create namespace	`PROJECT=llm-d-demo; oc new-project ${PROJECT}; oc label namespace ${PROJECT} modelmesh-enabled=false opendatahub.io/dashboard=true`	`oc get ns ${PROJECT}`
3. Deploy model	Create override file (see Step 3), then: `helm template gitops/instance/llm-d/inference --name-template qwen3-8b -n ${PROJECT} -f gitops/instance/llm-d/inference/values.yaml -f qwen3-8b-fp8-dynamic-oci.tmp.yaml --include-crds \| oc apply -f -`	`oc get llminferenceservice -n ${PROJECT}`
4. Test endpoint	`INFERENCE_URL=$(oc get gateway openshift-ai-inference -n openshift-ingress -o json \| jq -r '.spec.listeners[] \| select(.name=="https").hostname'); curl -s https://${INFERENCE_URL}/${PROJECT}/qwen3-8b/v1/models \| jq`	JSON response

Cleanup

Resources were applied with helm template ... | oc apply -f - (no Helm release state), so remove them by piping the same template to oc delete -f -:

# Remove inference deployment
helm template gitops/instance/llm-d/inference \
  --name-template qwen3-8b -n ${PROJECT} \
  -f gitops/instance/llm-d/inference/values.yaml \
  -f qwen3-8b-fp8-dynamic-oci.tmp.yaml \
  --include-crds | oc delete -f -

# Remove gateway
CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
helm template gitops/instance/llm-d/gateway \
  --name-template gateway \
  --set clusterDomain="${CLUSTER_DOMAIN}" \
  --include-crds | oc delete -f -

# Delete namespace
oc delete ns ${PROJECT}

To remove only the LLMInferenceService and leave the gateway in place:

oc delete llminferenceservice qwen3-8b -n ${PROJECT}

Appendix A — Quick-Reference Commands

# Check all operator CSVs
oc get csv -A | grep -v Succeeded

# Watch RHOAI pods
oc get pods -n redhat-ods-applications -w

# Check llm-d CRD availability
oc get crd | grep llminference

# Describe a failing LLMInferenceService
oc describe llminferenceservice <name> -n <namespace>

# Check gateway status
oc get gateway,httproute -n openshift-ingress

# Stream scheduler logs
oc logs -f -l app.kubernetes.io/component=llminferenceservice-router-scheduler -n <namespace>

Appendix B — Troubleshooting

Symptom	Likely Cause	Resolution
`LLMInferenceService` stuck in `Not Ready`	Controller pods not running	Check `odh-model-controller` and `kserve-controller-manager` pods in `redhat-ods-applications`
Gateway not `PROGRAMMED`	Connectivity Link CRDs missing or Authorino not running	Verify `oc get authpolicies.kuadrant.io` and Authorino pod status
`resource mapping not found` during helm apply	CRDs not yet established	Re-run `oc wait --for=condition=Established crd/...` before applying
InstallPlan stuck pending	Manual approval required	`oc patch installplan <NAME> -n openshift-operators --type merge -p '{"spec":{"approved":true}}'`
GPU nodes not scheduling	NFD labels missing	Check `oc get nodes -l feature.node.kubernetes.io/pci-10de.present=true`
cert-manager webhook errors	cert-manager pods not ready	Wait for all 3 cert-manager pods (controller, cainjector, webhook) to be Ready
No hardware profiles in RHOAI dashboard	`kueue.openshift.io/managed=true` on namespace but Kueue not installed or no `Queue`-type profiles exist	Either remove the label (`oc label namespace <ns> kueue.openshift.io/managed-`) or create `Queue`-type hardware profiles with a matching LocalQueue
Hardware profiles missing after toggling `disableKueue`	Dashboard does not reload config automatically	Restart the dashboard: `oc rollout restart deployment/rhods-dashboard -n redhat-ods-applications`
model-catalog API returns 500 errors	PostgreSQL schema empty (migrations did not apply)	Restart model-catalog: `oc rollout restart deployment/model-catalog -n rhoai-model-registries`

Appendix C — Reference Links

Resource	URL
RHOAI 3.3 Documentation	https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/3.3
Supported Configurations 3.x	https://access.redhat.com/articles/rhoai-supported-configs-3.x
Supported Hardware Configurations	https://docs.redhat.com/en/documentation/red_hat_ai/3/html/supported_product_and_hardware_configurations/index
llm-d Release Component Versions	https://access.redhat.com/articles/7136620
NVIDIA GPU Operator on OCP	https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html
cert-manager on OpenShift	https://docs.openshift.com/container-platform/4.20/security/cert_manager_operator/index.html
ocp-secured-integration (cert-manager GitOps)	https://github.com/alvarolop/ocp-secured-integration
RHOAI GitOps reference	https://github.com/alvarolop/rhoai-gitops
llm-d upstream project	https://github.com/llm-d/llm-d

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
gitops		gitops
images		images
metallb		metallb
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Red Hat OpenShift AI 3.3 — Installation Manual

Table of Contents

1. Overview

Using This Guide with Claude Code or OpenCode

What the AI assistant can do for you

How to start a session

Phase overview

Resuming after an error

2. Global Prerequisites

2.1 Cluster Requirements

2.2 Storage Requirements

2.3 Network Requirements

2.4 Credentials

2.5 RHOAI operator version (stable 3.x vs 3.x early access)

3. Prerequisite Operators

3.0 ArgoCD (Red Hat OpenShift GitOps)

3.1 Cert-Manager Operator and Let's Encrypt Certificate Issuer

RBAC Permissions for cert-manager and supporting components

Installing the operator with ArgoCD

Installing Let's Encrypt Cluster Issuers and certificates for OpenShift Ingress and API Server

3.2 GPU and Hardware Dependencies

Installing the operators

Applying the instance CRs

Adding A10G GPU nodes in AWS with MachineSets

3.3 Core Dependencies (All Installations)

Optional — Red Hat Build of Kueue (GPUaaS / Distributed Workloads only)

3.4 Pipeline Dependencies

3.5 Check Operators

Quick Start Guide to Deploy llm-d

Step 1: Configure the Gateway

Step 2: Create Namespace

Step 3: Deploy an LLMInferenceService

Option A — Qwen3-8B-FP8 via OCI ModelCar (recommended for air-gapped / registry-cached deployments)

Option B — Facebook OPT-125m via HuggingFace (quick test with a small public model)

Step 4: Verify Deployment

Check LLMInferenceService status

Check pods

Watch pod logs

Step 5: Test the Endpoint

Get the inference URL

List available models

Send a completion request

Send a chat completion request

Step 6: Deploy Monitoring (Optional)

Quick Start Summary

Cleanup

Appendix A — Quick-Reference Commands

Appendix B — Troubleshooting

Appendix C — Reference Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages