IBM Cloud Docs
Creating custom classification models

Creating custom classification models

The custom classifications feature allows you to train a multi-label text classifier using your own labeled data. Once trained, the model will be automatically deployed in IBM Watson® Natural Language Understanding and available for analyze calls.

Creating classifications model training data

Create and train a custom classifications model using the Natural Language Understanding training API. You can use this example Python notebook that shows how to create a classifications model, or the more advanced notebook that shows how to train and fine-tune your classifications model.

Training data in JSON format

Classifications accepts training data in the following JSON format:

[
  {
    "text": "Example 1",
    "labels": ["label1"]
  },
  {
    "text": "Example 2",
    "labels": ["label1", "label2"]
  }
]

Training data in CSV format

You can also provide training data in comma-separated value (CSV) format.

Example 1,label1
Example 2,label1,label2

In CSV format, a row in the file represents an example record. Each record has two or more columns. The first column is the representative text to classify. The additional columns are classes that apply to that text.

Headers are not expected for the CSV file.

Classifications training data requirements

  • Classifications training data consists of an array containing multiple JSON objects.
  • Each of these JSON objects, needs to contain, 1 text and 1 labels field.
  • text consists of the training examples and labels consists of 1 or more labels associated with an example.
  • labels are case-sensitive
  • Minimum number of unique labels required: 2
  • Maximum number of unique labels allowed: 3000
  • Minimum number of examples required per label: 5
  • Maximum size of each example (training and predict): 2000 codepoints
  • Maximum number of examples: 20000

Classifications training parameters

Passing in the optional training_parameters object allows you to specify characteristics of your classifier. Not passing in the object or an empty object into the request will train the model using default values.

Supported training parameters:

Keys Default Value Optional Values
model_type multi_label single_label

Description:

  • model_type: Passing the single_label value will result in a single-label classifier, capable of handling training datasets with only one label per example. The single-label classifier will output normalized confidence scores so that the scores sum up to one. Passing the multi_label value will result in a multi-label classifier, capable of handling training datasets with multiple labels per example. The multi-label classifier will not output normalized confidence scores, in order to account for the added flexibility of associating multiple labels with examples.

Training a custom classifications model

When your training data is ready, use the Create classifications model method to create your custom classifications model. Make sure to substitute your credentials for {apikey} and {url}, and use the path to your training data file in the training_data parameter. Optionally, you can also specify characteristics of your classifier using training_parameters.

curl -X POST -u "apikey:{apikey}" \
-H "Content-Type: multipart/form-data" \
-F "name=MyClassificationsModel" \
-F "language=en" \
-F "model_version=1.0.1" \
-F 'training_parameters={"model_type": "multi_label"}' \
-F "training_data=@classifications_data.json;type=application/json" \
"{url}/v1/models/classifications?version=2021-03-23"

Use the model_id in the response to check the status of your model.

Checking the status of a classifications model

The following sample request for the Get classifications model method checks the status for the classifications model with ID cb3755ad-d226-4587-b956-43a4a7202202.

curl -X GET -u "apikey:{apikey}" \
"{url}/v1/models/classifications/cb3755ad-d226-4587-b956-43a4a7202202?version=2021-03-23"

To get information for all classifications models deployed to your instance, use the List classifications models method.

curl -X GET -u "apikey:{apikey}" \
"{url}/v1/models/classifications?version=2021-03-23"

When the status is available, the classification is ready to use.

Analyzing text with a custom classifications model

To use your classifications model, specify the model that you deployed in the classifications options of your API request:

  • Example parameters.json file:

    {
      "url": "www.url.example",
      "features": {
        "classifications": {
          "model": "your-model-id-here"
        }
      }
    }
    
  • Example cURL request:

    curl --request POST \
    --header "Content-Type: application/json" \
    --user "apikey":"{apikey}" \
    "{url}/v1/analyze?version=2021-03-23" \
    --data @parameters.json
    

Deleting a custom classifications model

To delete a classifications model from your service instance, use the Delete classifications model method. Replace {url} and {apikey} with your service URL and API key, and replace {model_id} with the model ID of the classifications model you want to delete.

  • The following example deletes a classification model.

    curl --user "apikey":"{apikey}" \
    "{url}/v1/models/classifications/{model_id}?version=2021-03-23" \
    --request DELETE
    

Migrating from Natural Language Classifier to Natural Language Understanding

On 9 August 2021, IBM announced the deprecation of the IBM Watson® Natural Language Classifier service. The service will no longer be available from 8 August 2022. As of 9 September 2021, you can't create new instances, and access to free instances will be removed. Existing premium plan instances are supported until 8 August 2022. Any instance that still exists on that date will be deleted. As an alternative, we encourage Natural Language Classifier users to consider migrating to the Natural Language Understanding service.

When training data is available

You can directly use the available training data to train classifications in Natural Language Understanding. Natural Language Understanding accepts the same CSV file format.

When training data is not available

You can fetch the data you used to train Natural Language Classifier from the service. Refer to this tutorial.