Installing the Red Hat OpenShift AI add-on
Follow the steps to install the OpenShift AI add-on to an existing cluster.
Want to deploy the OpenShift AI operator on a new cluster? Try the OpenShift AI on IBM Cloud deployable architecture.
Minimum requirements
Review the following considerations before setting up the add-on.
- Review the Supported cluster add-on versions and determine which add-on version to install in your cluster.
- Your cluster must have at least 2 worker nodes. Each worker node must have a minimum of 8vCPU and 32GB memory.
- Your worker nodes must use the RHCOS operating system.
- You must allow outbound traffic from your cluster to install the required operators.
Supported versions
Review the supported OpenShift AI add-on versions and the corresponding OpenShift AI and Red Hat OpenShift on IBM Cloud versions.
| OpenShift AI add-on version | Red Hat OpenShift AI version | Supported Red Hat OpenShift on IBM Cloud versions |
|---|---|---|
| 418 | 2.22 | 4.18, 4.19 |
| 417 | 2.22 | 4.17, 4.18 |
| 416 | 2.22 | 4.16, 4.17 |
Considerations
- To use all the capabilities provided by OpenShift AI, at least 1 GPU is recommended.
- Your cluster can have a mix of GPU and non-GPU nodes. However, if you use this configuration, make sure to deploy your app on a GPU node to leverage its resources.
- Beginning with version 2.19.0 of OpenShift AI Operator, KServe is available in either Advanced or Standard mode. By default, when you install KServe using the OpenShift AI using the IBM Cloud add-on, it is installed in Standard mode. If you want to use Advanced mode, you must complete the optional steps after installing the add-on.
Before you begin
-
Optional: If you don't already have one, create a VPC Public Gateway.
-
If you want to use the OpenShift Pipelines, Node Feature Discovery, or NVIDIA GPU operators with the OpenShift AI add-on, you must disable outbound traffic protection. If you do not want to use those operators, skip this step.
Disabling outbound traffic protection permits all external network connections. See Managing outbound traffic protection in VPC clusters for more information.
ibmcloud oc vpc outbound-traffic-protection disable --cluster CLUSTER -
Enable OperatorHub on your cluster.
oc patch operatorhub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": false}]'
Before you begin
-
Optional: If you don't already have one, create a VPC Public Gateway.
-
If you want to use the OpenShift Pipelines, Node Feature Discovery, or NVIDIA GPU operators with the OpenShift AI add-on, you must disable outbound traffic protection. If you do not want to use those operators, skip this step.
Disabling outbound traffic protection permits all external network connections. See Managing outbound traffic protection in VPC clusters for more information.
- In the console, navigate to your Clusters page and click the relevant cluster.
- On the Overview page for the cluster, find the Networking section and select the Outbound traffic protection disabled option.
-
Enable OperatorHub on your cluster.
oc patch operatorhub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": false}]'
Step 1: Choose customization options
You can enhance your Red Hat OpenShift AI projects by specifying different options to include with your add-on installation, such as data pipelines for building portable machine learning workflows or tools for managing and scaling your resources. You can also customize upgrade policies and deletion policies. For descriptions of each available option, see OpenShift AI customization options. If you do not include a specific option when you install the add-on, the default value applies. These options are managed by the Red Hat OpenShift AI platform.
In the CLI, run the command to list all options. In the UI, these options are listed in the Capabilities section during installation.
ibmcloud oc cluster addon options --addon openshift-ai
Step 2: Review additional recommended operators
You can choose to also install additional operators that are recommended for the use of certain OpenShift AI features. If they are not already installed on your cluster, you can choose to include them in the add-on installation. Or, you can install them at anytime by using OperatorHub or by following the operator-specific installation steps. To use these operators, you must disable outbound traffic protection for your cluster.
You are responsible for managing these operators, including but not limited to updating, monitoring, recovery, and re-installation.
The following operators are recommended.
Some of these operators might include additional customizations that you can choose to specify when you install the add-on. Review the list of customizations available for the recommended operators.
Step 3: Install the add-on in the CLI
Run the command to install the Red Hat OpenShift AI add-on. Specify customizations with the format --parameter PARAM=VALUE. For example,
to include the Data Science Pipelines option, specify --parameter oaiDataSciencePipelines=Managed.
To include the recommended operators when installing the add-on with the CLI, specify the following options when you run the installation command.
- OpenShift Pipelines:
--parameter pipelineEnabled=true - Node Feature Discovery:
--parameter nfdEnabled=true - NVIDIA GPU Operator:
--parameter nvidiaEnabled=true
For a list of available versions, see Supported versions.
Installation command.
ibmcloud oc cluster addon enable openshift-ai --cluster CLUSTER [-f] [--param PARAM] [-q] [--version VERSION]
Example command to install the add-on with automatic minor and patch updates, CodeFlare, and KServe enabled.
ibmcloud oc cluster addon enable openshift-ai --cluster CLUSTER --param oaiInstallPlanApproval=Automatic --param oaiCodeflare=Managed --param oaiKserve=Managed
Step 2: Review the additional recommended operators
You can choose to also install additional operators that are recommended for the use of certain OpenShift AI features. If they are not already installed on your cluster, you can choose to include them in the add-on installation. Or, you can install them at anytime by using OperatorHub or by following the operator-specific installation steps. To use these operators, you must disable outbound traffic protection for your cluster.
You are responsible for managing these operators, including but not limited to updating, monitoring, recovery, and re-installation.
The following operators are recommended.
Some of these operators might include additional customizations that you can choose to specify when you install the add-on. Review the list of customizations available for the recommended operators.
Step 3: Install the add-on in the UI
Install the Red Hat OpenShift AI add-on with the UI.
-
Navigate to your cluster page and click the relevant cluster.
-
On the cluster details page, find the Add-ons section. Find the Red Hat OpenShift AI option and click Install.
-
In the Capabilities section, review the description of the available add-on customization options and enable the options you want to include with the installation.
-
In the Additional recommended operators section, click to expand each operator and select the customization options you want to include. These additional operators and customizations are recommended for certain Red Hat OpenShift AI features. You can choose to install these options later by using OperatorHub or by following the operator-specific installation steps.
You are responsible for managing these operators, including but not limited to updating, monitoring, recovery, and re-installation.
-
Click Install.
Step 4: Access the OpenShift AI dashboard
After you install the OpenShift AI add-on, you can access the OpenShift AI dashboard.
- From your cluster overview page, click OpenShift web console.
- From the Application launcher
, select the Red Hat OpenShift AI dashboard option.
- If prompted, click
Log in with OpenShift.
Optional: Setting up KServe in Advanced mode
If you want to use KServe in Advanced mode, you must complete the following steps.
- Install the OpenShift Serverless Operator from OperatorHub.
- Install the OpenShift Service Mesh Operator from OperatorHub.
- Set the service mesh management state to
Managedin your Data Science Cluster Initialization CR.serviceMesh: controlPlane: metricsCollection: Istio name: data-science-smcp namespace: istio-system managementState: Managed - Set the KServe serving management state to
Managedin your Data Science Cluster custom resource.kserve: managementState: Managed serving: managementState: Managed name: knative-serving
OpenShift AI customization options
Review the customization options available for the OpenShift AI add-on. These operators are managed by the Red Hat OpenShift AI platform.
To include an option when you install the OpenShift AI add-on with the CLI, include the option with the --parameter PARAM=VALUE format
when you run the ibmcloud oc cluster addon enable openshift-ai. For example, to install the add-on with the Data Science Pipelines option, specify --parameter oaiDataSciencePipelines=Managed.
To include an option when you install the OpenShift AI add-on with the UI, click to enable the option when prompted. These options are included in the Capabilities section.
| Customization | CLI Parameter | Description | Values | Default value |
|---|---|---|---|---|
| OpenShift AI platform minor and patch updates approval policy | oaiInstallPlanApproval |
Apply minor and patch updates automatically or manually. | Automatic or Manual |
Automatic |
| OpenShift AI deletion policy | oaiDeletePolicy |
Retain or delete any operators or components installed by the add-on if the add-on is removed. | Retain or Delete |
Retain |
| Open Data Hub Dashboard | oaiDashboard |
Enable or disable the component. If enabled, it is managed by OpenShift AI platform. | Managed to enableRemoved to disable |
Managed (enabled) |
| Kueue | oaiKueue |
Enable or disable the component. If enabled, it is managed by OpenShift AI platform. | Managed to enableRemoved to disable |
Managed (enabled) |
| CodeFlare | oaiCodeflare |
Enable or disable the component. If enabled, it is managed by OpenShift AI platform. | Managed to enableRemoved to disable |
Managed (enabled) |
| ModelMesh Serving | oaiModelmeshserving |
Enable or disable the component. If enabled, it is managed by OpenShift AI platform. | Managed to enableRemoved to disable |
Managed (enabled) |
| Workbench | oaiWorkbenches |
Enable or disable the component. If enabled, it is managed by OpenShift AI platform. | Managed to enableRemoved to disable |
Managed (enabled) |
| Data Science Pipelines | oaiDataSciencePipelines |
Enable or disable the component. If enabled, it is managed by OpenShift AI platform. | Managed to enableRemoved to disable |
Managed (enabled) |
| KServe | oaiKserve |
Enable or disable the component. If enabled, it is managed by OpenShift AI platform. | Managed to enableRemoved to disable |
Managed (enabled) |
| Ray | oaiRay |
Enable or disable the component. If enabled, it is managed by OpenShift AI platform. | Managed to enableRemoved to disable |
Managed (enabled) |
Additional recommended operators
Review the recommended operators and the optional customizations you can include during installation. You are responsible for managing these operators, including but not limited to updating, monitoring, recovery, and re-installation.
To include a customization for an operator when you install the OpenShift AI add-on with the CLI, include the option with the --parameter PARAM=VALUE format when you run the ibmcloud oc cluster addon enable openshift-ai. For example, to include the NVIDIA GPUDirect Storage customization for the NVIDIA operator, specify --parameter nvidiaGpuDirectStorageEnabled=true.
To include a customization for an operator when you install the OpenShift AI add-on with the UI, click to enable the option when prompted. Note that there are additional NVIDIA operators with default settings that can only be changed in the CLI, or by editing the ClusterPolicy.
| Customization | CLI Parameter | Description | Value | Default setting |
|---|---|---|---|---|
| Node Feature Discovery deletion policy | nfdDeletePolicy |
Retain or delete the operator if the OpenShift AI add-on is removed. | Retain (default) or Delete |
Retain |
| NVIDIA GPU Operator deletion policy | nvidiaDeletePolicy |
Retain or delete the operator if the OpenShift AI add-on is removed. | Retain (default) or Delete |
Retain |
| NVIDIA CUDA post-installation testing | nvidiaCudaTest |
Enable NVIDIA CUDA testing. | true (enabled, default)false (disabled) |
false (disabled) |
| Pipeline Operator deletion policy | pipelineDeletePolicy |
Retain or delete the operator if the OpenShift AI add-on is removed. | Retain (default) or Delete |
The following NVIDIA GPU operators apply with the default settings. You can only change these settings during installation if you install the add-on with the CLI. After installation, you can change these settings, in either the UI or the CLI, by editing the ClusterPolicy.
| Customization | CLI Parameter | Description | CLI values | UI default setting |
|---|---|---|---|---|
| NVIDIA Sandbox Workloads | nvidiaSandboxWorkloads |
Enable management of additional operands required for sandbox workloads. | true (enabled, default)false (disabled) |
Enabled |
| NVIDIA vGPU Manager | nvidiaVgpuManagerEnabled |
Enable NVIDIA vGPU Manager. | true (enabled, default)false (disabled) |
Enabled |
| NVIDIA GPUDirect Storage | nvidiaGpuDirectStorageEnabled |
Enable GPUDirect Storage. | true (enabled, default)false (disabled) |
Enabled |
| NVIDIA Node Status Exporter | nvidiaNodeStatusExporterEnabled |
Enable Node Status Exporter. | true (enabled, default)false (disabled) |
Enabled |
| NVIDIA Virtual Function I/O Manager | nvidiaVfioManagerEnabled |
Enable VFIO Manager for configuration to deploy VFIO-PCI. | true (enabled, default)false (disabled) |
Enabled |
| NVIDIA Sandbox Device Plug-in | nvidiaSandboxDevicePluginEnabled |
Enable NVIDIA Sandbox Device Plug-in. | true (enabled, default)false (disabled) |
Enabled |
| NVIDA Multi Instance GPU Manager | nvidiaMigManagerEnabled |
Enable NVIDIA Multi Instance GPU Manager. | true (enabled, default)false (disabled) |
Enabled |
| NVIDIA Data Center GPU Manager Hostengine Deployment | nvidiaDcgmEnabled |
Enable deployment of the NVIDIA Data Center GPU Manager Hostengine as a separate pod. | true (enabled, default)false (disabled) |
Enabled |
| NVIDIA vGPU Device Manager | nvidiaVgpuDeviceManagerEnabled |
Enable NVIDIA vGPU Device Manager. | true (enabled, default)false (disabled) |
Enabled |
What's next?
- Try out a tutorial for using OpenShift AI for fraud detection
- See information on managing the OpenShift AI add-on.
- Make sure you understand the update process for the OpenShift AI add-on.