IBM Cloud Docs
Build, deploy, test and monitor a predictive machine learning model

Build, deploy, test and monitor a predictive machine learning model

This tutorial may incur costs. Use the Cost Estimator to generate a cost estimate based on your projected usage.

This tutorial walks you through the process of building a predictive machine learning model, deploying the generated model as an API to be used in your applications and testing the model all of this happening in an integrated and unified self-service experience on IBM Cloud. You will then monitor the deployed model with IBM Watson OpenScale.

In this tutorial, the Iris flower data set is used for creating a machine learning model to classify species of flowers.

In the terminology of machine learning, classification is considered an instance of supervised learning, i.e. learning where a training set of correctly identified observations is available.

Watson Studio provides you with the environment and tools to solve your business problems by collaboratively working with data. You can choose the tools you need to analyze and visualize data, to cleanse and shape data, to ingest streaming data, or to create and train machine learning models.

Objectives

  • Import data to a project.
  • Build a machine learning model.
  • Deploy the model and try out the API.
  • Test a machine learning model.
  • Monitor the deployed model
  • Retrain your model.

Architecture Diagram
Architecture diagram of the tutorial

  1. The admin uploads a CSV file from a local machine.
  2. The uploaded CSV file is stored in IBM Cloud Object Storage service as a dataset.
  3. The dataset is then used to build and deploy a machine learning model. The deployed model is exposed as an API (scoring-endpoint).
  4. The user makes an API call to predict the outcome with the test data.
  5. The deployed machine learning model is monitored for quality, accuracy and other key parameters with the test data.

Import data to a project

A project is how you organize your resources to achieve a particular goal. Your project resources can include data, collaborators, and analytic tools like Jupyter notebooks and machine learning models.

You can create a project to add data and open a data asset in the data refiner for cleansing and shaping your data.

Create a project

  1. From the catalog, create Watson Studio
    1. Select a region
    2. Select a Lite pricing plan
    3. Change the Service name to watson-studio-tutorial
    4. Select a resource group and click Create
  2. Click on the Launch in twisty and select IBM watsonx.
  3. Create a project by clicking + Create a new project in the Projects section.
  4. Provide iris_project as the project name.
  5. In the Define storage, Add a new instance of a Object Storage service.
  6. Click Create. Your new project opens and you can start adding resources to it.

Import data

As mentioned earlier, you will be using the Iris data set. The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. This small dataset is often used for testing out machine learning algorithms and visualizations. The aim is to classify Iris flowers among three species (Setosa, Versicolor or Virginica) from measurements of length and width of sepals and petals. The iris data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.

Iris Example
Iris Example

A sepal is a part of the flower of angiosperms (flowering plants). Usually green, sepals typically function as protection for the flower in bud, and often as support for the petals when in bloom. Petals are modified leaves that surround the reproductive parts of flowers. They are often brightly colored or unusually shaped to attract pollinators. https://en.wikipedia.org/wiki/Iris_flower_data_set

Download iris_initial.csv which consists of 40 instances of each species. Make sure the downloaded file is named iris_initial.csv.

  1. Select the Assets tab if not already selected.
  2. Under Data in this project, click Drop data files here or browse for files to upload.
  3. Upload the downloaded iris_initial.csv.
  4. Once added, you should see iris_initial.csv under the All assets section of the project.

Associate the Machine Learning service

  1. In the top navigation menu, of the iris-project click on Manage then select the Services & integrations section on left.
  2. Click Associate Service.
  3. If you have an existing Watson Machine Learning service instance, skip to the next step. Otherwise continue with the following steps to create a new instance.
    1. Click New service and then click on the Watson Machine Learning tile.
    2. Select a region same as the Watson Studio service and choose a Lite plan.
    3. Enter machine-learning-tutorial as the Service name and select a resource group.
    4. Click Create to provision a Machine Learning service.
  4. Check the checkbox next to the Machine Learning service and click Associate.

Build a machine learning model

  1. In the top navigation menu, click Assets.
  2. Click on New asset + and search for auto.
    1. Click on the Build machine models automatically tile.
    2. Set the name to iris_auto.
    3. Under Watson Machine Learning service instance, notice the service previously associated.
  3. Click Create.

Once the model is created,

  1. Add training data by clicking Select data from project.

    1. Choose the Data asset under Categories and check iris_initial.csv.
    2. Click Select asset.
  2. If prompted, answer No to Create a time series analysis?.

  3. Select Species as your What do you want to predict?.

  4. Click Experiment settings.

  5. Select Data source.

  6. Under Training and holdout method, set Holdout data split to 14% by moving the slider.

  7. On the left menu, Click on Prediction:

    1. Set Prediction type to Multiclass classification.
    2. Set Optimized metric as Accuracy.
    3. Click on Save settings.
  8. Click on Run experiment.

  9. The AutoAI experiment may take up to 5 minutes to select the right Algorithm for your model.

    Each model pipeline is scored for a variety of metrics and then ranked. The default ranking metric for binary classification models is the area under the ROC curve, for multi-class classification models is accuracy, and for for regression models is the root mean-squared error (RMSE). The highest-ranked pipelines are displayed in a leaderboard, so you can view more information about them. The leaderboard also provides the option to save select model pipelines after reviewing them.

Once the experiment completes running,

  1. Scroll down to the Pipeline leaderboard.
  2. Click a pipeline to view more detail about the metrics and performance. When finished dismiss by clicking the X.
  3. Next to the model with Rank 1 click on Save as
    1. Select Model.
    2. Keep the default name.
    3. Click Create.
  4. From the received notification, click View in project.

Deploy and test your model

In this section, you will deploy the saved model and test the deployed model,

  1. Using the breadcrumb navigation, click on iris_project.
  2. In the Assets tab open Models on the left.
  3. In the Models table locate the model and click on the hamburger menu and choose Promote to space. You use deployment spaces to deploy models and manage your deployments.
    1. Set the Name to iris_deployment_space.
    2. Select the Object Storage storage service used in previous steps in the corresponding drop down.
    3. Select the machine-learning-tutorial service in the corresponding drop down.
    4. Click Create.
  4. Click on Promote.
  5. From the received notification, navigate to the deployment space.

In the Deployments > iris_deployment_space:

  1. Click on the name of the model you just created.
  2. Click the New deployment button.
  3. Select Online as the Deployment type, provide iris_deployment as the name and then click Create.
  4. Under Deployments tab, once the status changes to Deployed, Click on the Name in the table. The properties of the deployed web service for the model will be displayed.

Test the deployed model

  1. Under Test tab of your deployment, click on JSON input icon next to Enter input data and provide the JSONbelow as input.
       {
       "input_data": [{
         "fields": ["sepal_length", "sepal_width", "petal_length", "petal_width"],
         "values": [
           [5.1,3.5,1.4,0.2], [3.2,1.2,5.2,1.7]
         ]
       }]
     }
    
  2. Click Predict and you should see the Prediction results in table and JSON view.
  3. You can change the input data and continue testing your model.

Try out the API

Along with the UI, you can also do predictions using the API scoring endpoint by exposing the deployed model as an API to be accessed from your applications.

  1. Under API reference tab of the deployment, you can see the Endpoint under Direct link and code snippets in various programming languages.

  2. Copy the Public endpoint in a notepad for future reference.

  3. In a browser, launch the IBM Cloud Shell and export the scoring End-point to be used in subsequent requests. Make sure you don't close this window/tab..

    export SCORING_ENDPOINT='<SCORING_ENDPOINT_FROM_ABOVE_STEP>'
    

    IBM Cloud Shell is a cloud-based shell workspace that you can access through your browser. It's preconfigured with the full IBM Cloud CLI and many plug-ins and tools that you can use to manage apps, resources, and infrastructure.

  4. To use the Watson Machine Learning REST API, you need to obtain an IBM Cloud Identity and Access Management (IAM) token. Run the below command, it will copy the complete IAM token along with Bearer from the above response and export it as an IAM_TOKEN to be used in the subsequent API requests

    export IAM_TOKEN=$(ibmcloud iam oauth-tokens --output JSON | jq -r .iam_token)
    echo $IAM_TOKEN
    
  5. Run the below cURL code in the cloud shell to see the prediction results.

    curl -X POST \
    --header 'Content-Type: application/json' \
    --header 'Accept: application/json' \
    --header "Authorization: $IAM_TOKEN" \
    -d '{"input_data": [{"fields": ["sepal_length", "sepal_width", "petal_length","petal_width"],"values": [[5.1,3.5,1.4,0.2], [3.2,1.2,5.2,1.7]]}]}' \
    $SCORING_ENDPOINT
    

    If you observe, the code is from the cURL tab of the deployment your created above. Thereafter, replacing the [$ARRAY_OF_INPUT_FIELDS] placeholder with ["sepal_length", "sepal_width", "petal_length","petal_width"], [$ARRAY_OF_VALUES_TO_BE_SCORED] placeholder with [5.1,3.5,1.4,0.2] and [$ANOTHER_ARRAY_OF_VALUES_TO_BE_SCORED] placeholder with [3.2,1.2,5.2,1.7] respectively.

Monitor your deployed model with IBM Watson OpenScale

IBM® Watson OpenScale tracks and measures outcomes from your AI models, and helps ensure they remain fair, explainable, and compliant wherever your models were built or are running. Watson OpenScale also detects and helps correct the drift in accuracy when an AI model is in production.

For ease of understanding, the tutorial concentrates only on improving the quality (accuracy) of the AI model through Watson OpenScale service.

Provision IBM Watson OpenScale service

In this section, you will create a Watson OpenScale service to monitor the health, performance, accuracy and quality metrics of your deployed machine learning model.

  1. Create a IBM Watson OpenScale service (watsonx.governance)
    1. Select a region preferably in the same region where you created the Machine Learning service.
    2. Choose Lite plan.
    3. Set the service name to watson-openscale-tutorial.
    4. Select a resource group.
    5. Click Create.
  2. Once the service is provisioned, Click Manage on the left pane and click Launch Watson OpenScale.
  3. Click on Manual setup to manually setup the monitors.

System setup

In this section, as part of preparing your model for monitoring you will set up and enable monitors for each deployment that you are tracking with IBM Watson OpenScale.

  1. Click on Database. This is to store your model transactions and model evaluation results. (it may already be selected)
    1. Click the Edit icon on the Database tile
    2. Choose Free lite plan database as your Database type
    3. Click Save.
  2. Click on Machine learning providers
    1. Click on Add machine learning provider and click the edit icon on the Connection tile.
    2. Select Watson Machine Learning(V2) as your service provider type.
    3. In the Deployment space dropdown, select the deployment space iris_deployment_space you created above.
    4. Leave the Environment type to Pre-production.
    5. Click Save.
  3. On the far left pane:
    1. Click the icon for Insights dashboard(first icon) to add a deployment
    2. Click on Add to dashboard to start the wizard on the Select model location page.
      1. On the Deployment spaces tab click on the iris_deployment_space radio button
      2. Click Next
    3. On the Select deployed model page:
      1. Click iris_deployment
      2. Click Next
    4. On the Provide model information page:
      1. Data type: Numerical/categorical
      2. Algorithm type: Multi-class classification
      3. Click View summary
    5. Click Finish

The iris_deployment pre production dashboard is now displayed.

Click Actions > Configure monitors

  1. Click the pencil icon on the Training data tile to start the wizard.
    1. In the Select configuration method page
      1. Click Use manual setup
      2. Click Next
    2. In the Specify training data method page
      1. For Training data option choose Database or cloud storage
      2. For Location choose Cloud Object Storage
      3. For Resource instance ID and API key, run the below command in the Cloud Shell. Make sure to change the value after --instance-name to match the name of the Object Storage instance you have been using for this tutorial.
        ibmcloud resource service-key $(ibmcloud resource service-keys --instance-name "cloud-object-storage-tutorial" | awk '/WDP-Project-Management/ {print $1}')
        
      4. Copy and paste the Credentials resource_instance_id. It will begin with crn and end with two colons ::.
      5. Copy and paste the Credentials api key without any trailing spaces.
      6. Click Connect.
      7. Select the Bucket that starts with irisproject-donotdelete-.
      8. Select iris_initial.csv from the Data set dropdown.
      9. Click Next
    3. In the Select the feature columns and label column method page
      1. The defaults should be correct. Species as the Label/Target and the rest as Features.
      2. Click Next
    4. In the Select model output method page
      1. The defaults should be correct, prediction for Prediction and probability for Probability.
      2. Click View summary
    5. Click Finish
  2. Click the pencil icon on the Model output details tile to start the wizard.
    1. In the Specify model output details method page
      1. The defaults should be correct.
      2. Click Save
  3. On the left pane, click on Quality under Evaluations and click the edit icon on the Quality thresholds tile
    1. In the Quality thresholds page set the following values:
      1. Accuracy 0.98
      2. Click Next
    2. In the Sample size page
      1. Set Minimum sample size to 10
      2. Click Save

On the left pane, Click on Go to model summary

The quality monitor (previously known as the accuracy monitor) reveals how well your model predicts outcomes.

As the tutorial uses a small dataset, configuring Fairness and Drift won't have any impact.

Evaluate the deployed model

In this section, you will evaluate the model by uploading a iris_retrain.csv file which contains 10 instances of each species. Download iris_retrain.csv.

  1. Click on Actions and then Evaluate now.
  2. Choose from CSV file as your import option and click on browse, upload the iris_retrain.csv file.
  3. Click and click on Upload and evaluate.
  4. After the evaluation is completed, you should see the dashboard with different metrics.

To understand the quality metrics, refer to Quality metric overview

Remove resources

  1. Navigate to IBM Cloud® Resource List.
  2. Under Name, enter tutorial in the search box.
  3. Delete the services which you created for this tutorial.

Depending on the resource it might not be deleted immediately, but retained (by default for 7 days). You can reclaim the resource by deleting it permanently or restore it within the retention period. See this document on how to use resource reclamation.

Related content