Readme file
Introduction
IBM Cloud Pak for Data is an end-to-end data and AI platform that you can use to modernize how your organization collects, organizes, and analyzes data to infuse AI into your business. Learn more about Cloud Pak for Data.
Features
The features that you can use depend on the services that you install. You can choose which services to install when you install Cloud Pak for Data on IBM Cloud.
- Apache Spark
Use the Analytics Engine powered by Apache Spark as a compute engine to run analytical and machine learning jobs. - Cognos Dashboards
Use sophisticated visualizations in an analytics project to identify patterns in your data so that you can make timely and effective decisions. - Data Virtualization
Create data sets from disparate data sources so that you can query and use the data as if it came from a single source. When you provision the service, make sure to check the You must check this box if you updated the kernel semaphore parameter check box and you use the recommended storage class mentioned in the storage section in below documentation. - Db2 Data Gate
Extract, load, and synchronize your mission-critical data from Db2 for z/OS to Cloud Pak for Data for quick access by your new, high volume, read-only transactional, and analytic applications. - Db2 Data Management Console
Administer, monitor, manage and optimize the performance of IBM Db2 for Linux, UNIX and Windows databases. - Db2 Warehouse
Take advantage of in-memory data processing and in-database analytics in an analytics data warehouse that supports automated scaling. - RStudio Server
Use an integrated development environment for working with R in Watson Studio to create R Shiny applications. - Scheduling
The scheduling service offers enhancements over the default Kubernetes scheduler including Quota enforcement, Co-scheduling of pods and GPU sharing. - Watson Knowledge Catalog
Know your data inside and out. Ensure that your data is high quality, aligns with business objectives, and complies with regulations. - Watson Studio
Unearth the meaning in your data. Build custom models and infuse your business with AI and machine learning. - Watson Machine Learning
Build analytical models and neural networks that are trained with your data. Then, deploy them into production at scale. - Watson Machine Learning Accelerator
Deep learning platform that data scientists can use to build, train, and deploy deep learning models. - Watson OpenScale
Infuse your AI with trust and transparency. Understand how your AI models make decisions to detect and mitigate bias.
Tip: If you want to install a service later to the existing deployed namespace, you can return to the Deployment values section and set the appropriate parameter to true or you can select a service from the Services catalog and follow the installation instructions for the service.
For more information, see Installing services.
Details
License
To deploy Cloud Pak for Data, you must already have a valid license. If your organization has already purchased a valid license, your account administrator must bind the entitlement to your IBM Cloud account before you can assign an entitlement by using the Create tab. If your organization has not yet purchased a license, contact your IBM sales representative. For more information, see Licenses and entitlements.
Prerequisites
To install Cloud Pak for Data on IBM Cloud, you must have an IBM Red Hat OpenShift Version 4.6.1 or later fixes cluster on IBM Cloud. The automated installation from IBM Cloud Catalog is not supported on an IBM Red Hat OpenShift Satellite cluster. For more information, see Getting started with Red Hat OpenShift on IBM Cloud.
Roles
To install Cloud Pak for Data on IBM Cloud, a user must have the following IAM Roles:
- Account Management > License and Entitlement > Platform Editor role - To assign license
- IAM Services > Schematics > Service Manager role in any resource group - To create workspace
- Classic Infrastructure > Services > Storage Manage , Classic Infrastructure > Account > Add/Upgrade Storage - To modify image registry volume
- IAM Services > Kubernetes Service > Service Manager role - To run pre-install script
- IAM Services > Kubernetes Service > Service Writer role - To run Install script
Storage
The following storage options are supported to install Cloud Pak for Data:
- Single zone classic cluster with storage IBM Cloud File Storage
- Single zone classic cluster with storage IBM Cloud Portworx Enterprise
- Multi zone classic cluster with storage IBM Cloud Portworx Enterprise
- Single zone VPC Gen2 cluster with storage IBM Cloud Portworx Enterprise
- Multi zone VPC Gen2 cluster with storage IBM Cloud Portworx Enterprise
If you are using a single zone classic cluster with IBM Cloud File Storage, IBM Cloud accounts have a default quota of 250 storage volumes. Before you start the installation, ensure that each account has enough storage volumes for Cloud Pak for Data to be installed. For more information, see How many volumes can be ordered?
If you are using a classic cluster with Portworx storage, the cluster must be configured with Bare Metal worker nodes because Portworx recommends 10 Gbps network and virtual machines come with only 1 Gbps network speed in a classic cluster.
If you are using Portworx storage, you must configure IBM Cloud Portworx Enterprise on the cluster before you start the Cloud Pak for Data installation. For more information, see Configure Portworx. You must use the 10 IOPS/GB option for the Endurance block storage used to configure Portworx.
Make sure the image registry volume size is modified before installation. For more information, see Complete the Preinstallation on the Create tab. If the OpenShift cluster image registry has images of other applications, you might need to increase the image registry volume size to more than 200GB.
You must also ensure that your cluster has sufficient resources and is configured to use supported storage.
Resources Required
By default, you provision a 3-node Red Hat OpenShift cluster. Each node is automatically provisioned with a 25 GB SSD primary disk and 100 GB SSD secondary disk. This disk storage is different from persistent storage.
To install only Cloud Pak for Data Control plane, you need 6 VPCs. The minimum recommendation for Cloud Pak for Data is 16 cores, 64GB RAM, 1 TB Persistent storage.
This minimum recommendation is not sufficient to install all of the services. You must ensure that you have sufficient resources for the services that you planned to install.
The installation does not verify whether there are sufficient resources on the cluster to install Cloud Pak for Data. If you are running other applications on your Red Hat OpenShift cluster, ensure that you have sufficient resources on the cluster before you install Cloud Pak for Data.
For more information, see System Requirements for IBM Cloud Pak for Data.
Supported storage
When you install your Red Hat OpenShift cluster, IBM Cloud File Storage is set up by default. If you choose to use Portworx storage, you must configure Portworx before you start the Cloud Pak for Data installation.
For more information, see Storing data on classic IBM Cloud File Storage and Storing data on Portworx.
You can choose one of the following options as storage for Cloud Pak for Data:
- EnduranceFileStorage - This option uses the storage class
ibmc-file-gold-gid
to install Cloud Pak for Data. For more information, see Endurance Storage. You can use the same storage class while provisioning the instances of services. - PerformanceFileStorage - This option uses the storage class
ibmc-file-custom-gold-gid
to install Cloud Pak for Data. For more information, see Performance Storage. You can use the same storage class while provisioning the instances of services. - Portworx - This option uses the storage class mentioned in Storage considerations. You can choose the storage class mentioned in the service instance creation documentation when you provision instances of services.
Cloud Pak for Data uses dynamic provisioning. You must have sufficient persistent storage for the services that you plan to install.
For more information, see System Requirements for IBM Cloud Pak for Data.
Configuration
When you install Cloud Pak for Data, you can specify which services are on the Cloud Pak for Data control plane.
- To install a service, set the appropriate parameter to true in the Deployment values section.
- After you install Cloud Pak for Data, log in to the web console with the
admin
username and get default password by connecting to your OpenShift cluster, runningoc extract secret/admin-user-details --keys=initial_admin_password --to=-
. - Launch the web console from the workspace by clicking Offering Dashboard.
For more information, see Getting started with Cloud Pak for Data.
Cloud Pak for Data installation though IBM Cloud Catalog uses Express installations. By default iam_integration is set to false and cert_manager to true. For more inforamtion, see Express installations.
Limitations
For more information, see Limitations.
Documentation
Documentation for IBM Cloud Pak for Data Version 4.0.0 is available on IBM Knowledge Center.