Creating custom classification models
The custom classifications feature allows you to train a multi-label text classifier using your own labeled data. Once trained, the model will be automatically deployed in IBM Watson® Natural Language Understanding and available for analyze calls.
Creating classifications model training data
Create and train a custom classifications model using the Natural Language Understanding training API. You can use this example Python notebook that shows how to create a classifications model, or the more advanced notebook that shows how to train and fine-tune your classifications model.
Training data in JSON format
Classifications accepts training data in the following JSON format:
[
{
"text": "Example 1",
"labels": ["label1"]
},
{
"text": "Example 2",
"labels": ["label1", "label2"]
}
]
Training data in CSV format
You can also provide training data in comma-separated value (CSV) format.
Example 1,label1
Example 2,label1,label2
In CSV format, a row in the file represents an example record. Each record has two or more columns. The first column is the representative text to classify. The additional columns are classes that apply to that text.
Headers are not expected for the CSV file.
Classifications training data requirements
- Classifications training data consists of an array containing multiple JSON objects.
- Each of these JSON objects, needs to contain, 1
text
and 1labels
field. text
consists of the training examples andlabels
consists of 1 or more labels associated with an example.labels
are case-sensitive- Minimum number of unique labels required:
2
- Maximum number of unique labels allowed:
3000
- Minimum number of examples required per label:
5
- Maximum size of each example (training and predict):
2000
codepoints - Maximum number of examples:
20000
Classifications training parameters
Passing in the optional training_parameters
object allows you to specify characteristics of your classifier. Not passing in the object or an empty object into the request will train the model using default values.
Supported training parameters:
Keys | Default Value | Optional Values |
---|---|---|
model_type |
multi_label |
single_label |
Description:
model_type
: Passing thesingle_label
value will result in a single-label classifier, capable of handling training datasets with only one label per example. The single-label classifier will output normalized confidence scores so that the scores sum up to one. Passing themulti_label
value will result in a multi-label classifier, capable of handling training datasets with multiple labels per example. The multi-label classifier will not output normalized confidence scores, in order to account for the added flexibility of associating multiple labels with examples.
Training a custom classifications model
When your training data is ready, use the Create classifications model method to create your custom classifications model. Make sure to substitute your credentials for {apikey}
and {url}
, and use the
path to your training data file in the training_data
parameter. Optionally, you can also specify characteristics of your classifier using training_parameters
.
curl -X POST -u "apikey:{apikey}" \
-H "Content-Type: multipart/form-data" \
-F "name=MyClassificationsModel" \
-F "language=en" \
-F "model_version=1.0.1" \
-F 'training_parameters={"model_type": "multi_label"}' \
-F "training_data=@classifications_data.json;type=application/json" \
"{url}/v1/models/classifications?version=2021-03-23"
Use the model_id
in the response to check the status of your model.
Checking the status of a classifications model
The following sample request for the Get classifications model method checks the status for the classifications model with ID cb3755ad-d226-4587-b956-43a4a7202202
.
curl -X GET -u "apikey:{apikey}" \
"{url}/v1/models/classifications/cb3755ad-d226-4587-b956-43a4a7202202?version=2021-03-23"
To get information for all classifications models deployed to your instance, use the List classifications models method.
curl -X GET -u "apikey:{apikey}" \
"{url}/v1/models/classifications?version=2021-03-23"
When the status is available
, the classification is ready to use.
Analyzing text with a custom classifications model
To use your classifications model, specify the model
that you deployed in the classifications options
of your API request:
-
Example parameters.json file:
{ "url": "www.url.example", "features": { "classifications": { "model": "your-model-id-here" } } }
-
Example cURL request:
curl --request POST \ --header "Content-Type: application/json" \ --user "apikey":"{apikey}" \ "{url}/v1/analyze?version=2021-03-23" \ --data @parameters.json
Deleting a custom classifications model
To delete a classifications model from your service instance, use the Delete classifications model method. Replace {url}
and {apikey}
with your service URL and API key, and replace {model_id}
with the model ID of the classifications model you want to delete.
-
The following example deletes a classification model.
curl --user "apikey":"{apikey}" \ "{url}/v1/models/classifications/{model_id}?version=2021-03-23" \ --request DELETE
Migrating from Natural Language Classifier to Natural Language Understanding
On 9 August 2021, IBM announced the deprecation of the IBM Watson® Natural Language Classifier service. The service will no longer be available from 8 August 2022. As of 9 September 2021, you can't create new instances, and access to free instances will be removed. Existing premium plan instances are supported until 8 August 2022. Any instance that still exists on that date will be deleted. As an alternative, we encourage Natural Language Classifier users to consider migrating to the Natural Language Understanding service.
When training data is available
You can directly use the available training data to train classifications
in Natural Language Understanding. Natural Language Understanding accepts the same CSV file format.
When training data is not available
You can fetch the data you used to train Natural Language Classifier from the service. Refer to this tutorial.