Migrating Knowledge Studio solutions
Use custom models and other resources that you created in Knowledge Studio by migrating them to Discovery.
Using a model as is
To start using your Knowledge Studio model immediately, export the model from Knowledge Studio and import it to Discovery as a machine learning enrichment.
When you import a Knowledge Studio model to use as is in Discovery, root-level entity types that were defined in the model can be recognized when they occur in your documents. Any mentions of entity subtypes that occur are identified as mentions of the parent entity type. The subtype entities themselves are not preserved. If you want the model to continue to distinguish between different subtypes of an entity, you must take extra steps. For more information, see Retaining subtype information.
You cannot continue to update a model that you import as an ML enrichment.
The following types of models can be imported and used as is:
- Rule-based models created in Knowledge Studio that find entities in documents based on rules that you define. (File format: .pear)
- Machine learning models created in Knowledge Studio that understand the linguistic nuances, meaning, and relationships specific to your industry (file format: .zip)
The models that you can add depend on your deployment type:
- IBM Cloud You can add models that were created with a IBM Watson® Knowledge Studio instance that is hosted in IBM Cloud only.
- IBM Cloud Pak for Data You can add models that were created with an instance of IBM Watson® Knowledge Studio that is hosted on IBM Cloud Pak® for Data or IBM Cloud.
For more information, see Using imported ML models to find custom terms.
Using a corpus as training data
Discovery has an entity extractor tool that you can use to define a type system. The entity extractor user interface is similar to the Knowledge Studio user interface that is used to annotate documents that you add to corpus for a machine learning model. However, in Knowledge Studio, you define root-level entities only, not subtypes or relationships.
As an alternative to importing a Knowledge Studio model as is and applying it as an enrichment, you can also import a Knowledge Studio corpus. When you add a Knowledge Studio corpus to the Discovery entity extractor tool, any root-level entities from the corpus are represented as new entities in the Discovery entity extractor workspace. Entity subtypes are not recognized. Although, you can take extra steps to retain subtype information.
Relations and coreferences from the Knowledge Studio machine learning model are not represented, neither are any custom dictionaries that are associated with the model.
Things to consider when choosing whether to import a model or import a corpus:
- You can continue to edit the type system when you import the corpus. When you import a trained model, you cannot subsequently edit it in Discovery.
- An imported model that you apply to a collection as an enrichment can recognize any entity subtype, relation, and coreference information that the original model was trained to recognize in addition to root-level entities. An entity extractor enrichment can find and tag entities only.
For more information, see Importing a Knowledge Studio corpus.
Retaining subtype information
When you import a Knowledge Studio model to Discovery, any subtypes that were defined in the model are identified as mentions of the parent entity type. The subtype entities themselves are not preserved. To retain the subtype information, you must flatten your type system by converting entity subtypes into new root-level entity types.
Follow these steps only if you are sure that the subtype distinctions add significant value to the model. In many use cases, using the root-level entity types is sufficient.
You cannot use this procedure to retain subtypes if any of the documents in your corpus were pre-annotated with the Natural Language Understanding service. Make sure that your flattened type system doesn't surpass the allowed number of entity types for your plan. For more information, see Entity extractor limits.
For example, your model might have entity types with the following hierarchy:
APPLIANCES
FURNITURE
PATIO
LIVING
DINING
A flattened version of the type sytem looks like this:
APPLIANCES
FURNITURE_NONE
FURNITURE_PATIO
FURNITURE_LIVING
FURNITURE_DINING
A useful approach for flattening the type system involves the following changes:
- Add the parent entity type label (
FURNITURE
) as a prefix to the label of each child subtype to produce a new root-level entity that preserves the hierarchical relationship in its label. For example,FURNITURE_PATIO
,FURNITURE_LIVING
, andFURNITURE_DINING
. - Append the word NONE to the parent root-level entity label to identify it as the parent. For example,
FURNITURE_NONE
. - Leave the labels of entity types that don't have subtypes unchanged. For example, the label
APPLIANCES
doesn't change.
To retain entity subtype information, complete the following steps:
-
Ensure that the annotation and training of the Knowledge Studio model is completed and the model is ready to be deployed.
-
Export the type system that was used to annotate the documents in your corpus from Knowledge Studio as a .json file.
Follow the appropriate steps for exporting based on your Knowledge Studio deployment type:
- IBM Cloud Uploading resources from another workspace
- IBM Cloud Pak for Data Uploading resources from another workspace
-
Modify the type system JSON file. For each subtype, add a new root-level entity type.
For example, the original type system might contain the following types:
{ "id":"b9d6caa2-90ac-47ff-91f6-2149b8ffcf20", "label":"FURNITURE", "sireProp":{ "mentionType":null, "subtypes":["PATIO","LIVING","DINING"], "roles":["b9d6caa2-90ac-47ff-91f6-2149b8ffcf20","93ba1f27-173f-4714-b31e-77bdd8cb9932"], "clazz":null, "color":"black", "hotkey":"m", "backGroundColor":"#00FFFF", "active":true, "roleOnly":false}, "creationDate":1610611788484, "source":null, "modifiedDate":0, "typeType":null, "typeClass":null, "typeVersion":null, "typeDesc":null, "typeSuperType":null, "typeSuperTypeId":null, "typeCreateDate":null, "typeUpdateDate":null, "typeProvenance":null, "alchemyAPITypes":null, "nluAPITypes":null},
To convert the subtypes to new root-level types, make the following change:
{ "id":"b9d6caa2-90ac-47ff-91f6-2149b8ffcf20", "label":"FURNITURE_NONE", "sireProp":{ "mentionType":null, "subtypes":null, "roles":["b9d6caa2-90ac-47ff-91f6-2149b8ffcf20","93ba1f27-173f-4714-b31e-77bdd8cb9932"], "clazz":null, "and so on" } }, { "id":"b9d6caa2-90ac-47ff-91f6-2149b8ffcf20", "label":"FURNITURE_PATIO", "sireProp":{ "mentionType":null, "subtypes":null, "roles":["b9d6caa2-90ac-47ff-91f6-2149b8ffcf20","93ba1f27-173f-4714-b31e-77bdd8cb9932"], "clazz":null, "and so on" } }, { "id":"b9d6caa2-90ac-47ff-91f6-2149b8ffcf20", "label":"FURNITURE_LIVING", "sireProp":{ "mentionType":null, "subtypes":null, "roles":["b9d6caa2-90ac-47ff-91f6-2149b8ffcf20","93ba1f27-173f-4714-b31e-77bdd8cb9932"], "clazz":null, "and so on" } }, { "id":"b9d6caa2-90ac-47ff-91f6-2149b8ffcf20", "label":"FURNITURE_DINING", "sireProp":{ "mentionType":null, "subtypes":null, "roles":["b9d6caa2-90ac-47ff-91f6-2149b8ffcf20","93ba1f27-173f-4714-b31e-77bdd8cb9932"], "clazz":null, "and so on" } },
-
Assign a unique ID to each new root-level entity type.
-
Export the corpus for your machine learning model from Knowledge Studio as a compressed file.
Follow the appropriate steps for exporting based on your Knowledge Studio deployment type:
- IBM Cloud Uploading resources from another workspace
- IBM Cloud Pak for Data Uploading resources from another workspace
-
In the downloaded corpus, for all mentions with a subtype defined, update the type information for the mention to specify the new root-level entity type.
For example, the original type system might include the
PATIO
subtype mention:{ "id" : "Blogs_shopper.com_dc5cf4764d91f87575b17ac8a5268462.en-M92", "source" : "IMPORT", "properties" : { "SIRE_ENTITY_CLASS" : "SPC", "SIRE_MENTION_CLASS" : "SPC", "SIRE_ENTITY_LEVEL" : "NONE", "SIRE_ENTITY_SUBTYPE" : "PATIO", "SIRE_MENTION_ROLE" : "FURNITURE", "SIRE_MENTION_TYPE" : "NONE" }, "type" : "FURNITURE", "begin" : 3221, "end" : 3234, "inCoref" : false },
Replace the value of the
SIRE_MENTION_ROLE
andtype
for the mention with the new root-level entity label, such asFURNITURE_PATIO
. SpecifyNONE
as theSIRE_ENTITY_SUBTYPE
value.{ "id" : "Blogs_shopper.com_dc5cf4764d91f87575b17ac8a5268462.en-M92", "source" : "IMPORT", "properties" : { "SIRE_ENTITY_CLASS" : "SPC", "SIRE_MENTION_CLASS" : "SPC", "SIRE_ENTITY_LEVEL" : "NONE", "SIRE_ENTITY_SUBTYPE" : "NONE", "SIRE_MENTION_ROLE" : "FURNITURE_PATIO", "SIRE_MENTION_TYPE" : "NONE" }, "type" : "FURNITURE_PATIO", "begin" : 3221, "end" : 3234, "inCoref" : false },
Don't forget to rename the parent mention labels.
For example, find mentions that specify
"SIRE_ENTITY_SUBTYPE" : "OTHER"
, and then change the value fromOTHER
toNONE
.Change the value of the
SIRE_MENTION_ROLE
andtype
for the mention to the new parent entity type label.For example, change the
SIRE_MENTION_ROLE
andtype
values for these mentions fromFURNITURE
toFURNITURE_NONE
, and theSIRE_ENTITY_SUBTYPE
toNONE
.{ "id" : "Sports_herald.com_be99aca94a7cff5abb74476b844a11b6.en-M75", "source" : "IMPORT", "properties" : { "SIRE_MENTION_CLASS" : "SPC", "SIRE_ENTITY_LEVEL" : "NONE", "SIRE_ENTITY_SUBTYPE" : "NONE", "SIRE_ENTITY_CLASS" : "SPC", "SIRE_MENTION_TYPE" : "NONE", "SIRE_MENTION_ROLE" : "FURNITURE_NONE" }, "type" : "FURNITURE_NONE", "begin" : 2063, "end" : 2071, "inCoref" : false },
-
Add annotations for relationships that are missing based on the new flattened entity types.
-
Create a Knowledge Studio workspace, and then upload the converted type system.
Follow the appropriate steps for uploading a type system based on your Knowledge Studio deployment type:
- IBM Cloud Adding a type system to the workspace
- IBM Cloud Pak for Data Adding a type system to the workspace
-
Upload the annotated documents to the workspace. Retain the original file structure of the exported data. Ensure that the compressed file has the same root-level directory as the original exported file, for example.
Follow the appropriate steps for uploading documents based on your Knowledge Studio deployment type:
- IBM Cloud Adding documents to a workspace
- IBM Cloud Pak for Data Adding documents to a workspace
-
From Knowledge Studio, click Train to retrain the model.
For more information, see the appropriate topic for your deployment type:
- IBM Cloud Training the machine learning model
- IBM Cloud Pak for Data Training the machine learning model
-
Now, you're ready to export the model from Knowledge Studio and import it to Discovery to use the model as a machine learning enrichment.
For more information, see Using imported ML models to find custom terms.