This documentation is for IBM Watson® Knowledge Studio on IBM Cloud®. To see the documentation for the previous version of Knowledge Studio on IBM Marketplace, click this link.
Pre-annotating documents
This tutorial helps you understand how to pre-annotate documents, which bootstraps the annotation process of human annotation.
Learning objectives
After you complete this tutorial, you will know how to pre-annotate documents with a machine learning model.
This tutorial should take approximately 5 minutes to finish. If you explore other concepts related to this tutorial, it could take longer to complete.
Before you begin
- You're using a supported browser. For more information, see Browser requirements.
- You successfully completed Getting started with Knowledge Studio, which covers creating a workspace, creating a type system, and adding a dictionary.
- You successfully completed Creating a machine learning model.
- You have at least one user ID in either the Admin or Project Manager role. For more information about user roles, see User roles in Knowledge Studio.
Results
After completing this tutorial, you will have a set of partially annotated documents. Then, you can assign the documents to human annotators to finish the annotation work.
Lesson 1: Pre-annotating new documents with a machine learning model
In this lesson, you will learn how to use a machine learning model to pre-annotate documents in Knowledge Studio.
About this task
After you train a machine learning model, you can use it to pre-annotate new documents that you add to the corpus.
Do not run a pre-annotator on documents that have been annotated by humans, but not been added to the ground truth yet. If you do, all current annotations will be stripped from the documents.
In this tutorial, you can add a second set of documents by using the documents-ml.csv
file. Do not re-add the documents-new.csv
file, since this addition would result in duplicate documents in the ground truth. Duplication
causes the following problems:
- If annotations on each document do not match, they lower the quality of the machine learning model.
- If annotations on each document match, they over-train the machine learning model on the duplicated files.
For more information about pre-annotating documents, see Bootstrapping annotation. You can also read about other pre-annotation methods.
Procedure
-
Log in to Knowledge Studio as the administrator.
-
Upload more documents to the workspace. You can use the
documents-ml.csv
file.
For more information about adding documents to a workspace, see Adding documents for annotation.
-
Create an annotation set that uses the
documents-ml.csv
file as the base set, and assign it to yourself, the administrator.After you complete the following steps to pre-annotate the new documents, you can view the annotation set to see how the machine learning model annotated the documents. Typically, you assign annotation sets to one or more human annotators. For more information about creating and assigning annotation sets, see Adding documents for annotation.
-
To pre-annotate the new documents:
- On the Machine Learning Model > Pre-annotation page click Run Pre-annotators.
- Select Machine Learning Model, then click Next.
- Select the document set that you added to the corpus,
documents-ml.csv
, and click Run.
-
After the pre-annotation is complete, create a human annotation task that includes the annotation set you created.
For more information about creating an annotation task, see Annotation setup.
-
To view the annotations that were applied by the machine learning model to the new documents, open the annotation task.
Because the new documents were pre-annotated with the machine learning model, human annotation requires less time. For more information about adding annotations by human annotators, see Annotating documents.
Results
By using your machine learning model to pre-annotate new document sets, you can expedite human annotation tasks for those documents.