From 0a624197a0ad3b6c568ca15ee61b567ffdf7a2f2 Mon Sep 17 00:00:00 2001 From: Vitaliy Emporopulo Date: Thu, 7 Mar 2024 15:05:17 +0200 Subject: [PATCH] Document DRA driver installation on OpenShift Signed-off-by: Vitaliy Emporopulo --- demo/clusters/openshift/README.md | 53 +++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 demo/clusters/openshift/README.md diff --git a/demo/clusters/openshift/README.md b/demo/clusters/openshift/README.md new file mode 100644 index 000000000..c1215e8c4 --- /dev/null +++ b/demo/clusters/openshift/README.md @@ -0,0 +1,53 @@ +# Running on Red Hat OpenShift + +This document covers OpenShift-specific requirements for deploying the driver. + +## OpenShift Version + +Ensure you have OpenShift 4.19 or later. +For installation methods, see [OpenShift documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/installation_overview/ocp-installation-overview). + +## `TechPreviewNoUpgrade` Feature Set + +On OpenShift 4.20 (Kubernetes 1.33) and earlier, enable DRA through the `TechPreviewNoUpgrade` feature set (includes the `DynamicResourceAllocation` feature gate), either during installation or post-install. +See [Enabling features using FeatureGates](https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/nodes/working-with-clusters#nodes-cluster-enabling-features-about_nodes-cluster-enabling). + +**WARNING:** Enabling `TechPreviewNoUpgrade` disables future cluster upgrades and is irreversible. + +## Scheduler Profile + +On OpenShift 4.19, enable DRA in the cluster scheduler: + +```console +$ oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster +``` + +## NVIDIA GPU Operator + +Install the NVIDIA GPU Operator for OpenShift. +Ensure you have version 25.10.0 or later. +For installation steps, see [NVIDIA GPU Operator on Red Hat OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html). + +Ensure CDI (Container Device Interface) mode is enabled in the `ClusterPolicy`: + +```yaml +cdi: + enabled: true +``` + +## NVIDIA GPU Driver Location + +On OpenShift, the location of the driver provided by the GPU operator differs from the default. +Set the following value when installing the Helm chart: + +```yaml +nvidiaDriverRoot: /run/nvidia/driver +``` + +Example using `--set`: + +```console +$ helm install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \ + <...> + --set nvidiaDriverRoot=/run/nvidia/driver +```