Backing up and restoring data
If you need to backup and restore a workspace for Knowledge Studio, perform these tasks. Also, if you need to perform a manual data migration from one Knowledge Studio instance to another, such as when migrating from an instance on IBM Marketplace to an instance on IBM Cloud, perform these tasks.
Manual data migration is the process of backing up your data from one instance and restoring it on another instance.
Knowledge Studio on IBM Cloud uses the term workspace, while Knowledge Studio on IBM Marketplace uses the term project. The functionality is the same. Only the terminology is different.
To back up and restore your data complete the following steps:
- Understand which data can be backed up
- Prepare for backup
- Download artifacts from the current instance
- Re-create workspaces on the new instance
- Restore the workspace data
- Restore the models
- Restore any incomplete annotation tasks
Data that can be backed up
The following artifacts can be backed up and migrated manually:
- Editable dictionaries
- Type system
- Approved ground truth document sets
The following types of artifacts cannot be backed up and migrated manually:
- In-progress human annotation documents
- Annotation tasks
- Models and snapshots
- Read-only dictionaries
Preparing for backup
To prepare for backing up and restoring your data, complete the following steps:
-
Complete any work that is in progress in your workspace.
-
Finish any in-progress annotation tasks. Only documents that have been annotated, adjudicated, and approved and promoted to ground truth can be backed up. If you do not finish the annotation work, you will lose any annotation effort that is in progress but not completely done.
-
If you created annotation tasks to track work that you want to do, but none of the annotation work has begun and will not take place until after the workspace is restored, then make a list of the outstanding annotation tasks. Be sure to note the document sets that you imported but that have not been added to the ground truth yet. Also, make a note of whom you assigned to annotate each document set. Re-upload these document sets and re-create the annotation tasks after the workspace is restored.
-
-
Understand tokenizer use.
For machine learning models, workspaces use the machine learning-based tokenizer by default. If you are using a dictionary-based tokenizer and have a specific need to continue doing so, you can configure the workspace to use the dictionary-based tokenizer when you restore it. For more information, see Tokenizers.
-
Manage model resources.
Your model, its versions, and snapshot data cannot be migrated. The resources (except read-only dictionaries) that you used to train those artifacts can be migrated. Therefore, after the migration, you can re-create the model. The model that is produced performs the same as models that you generated before the migration. They use the same resources for training.
If you have a model that is already deployed and you plan to delete the workspace after you back it up, withdraw the model from deployment. You can rebuild and redeploy the model after you restore the workspace from the backup. For information about undeploying models, see Undeploying machine learning models and Undeploying rule-based models.
If you fail to withdraw the model from deployment, the result is an orphaned deployed model. Orphaned deployed models continue to generate charges on your monthly bills.
-
Manage read-only dictionary information.
Read-only dictionaries cannot be migrated. Find out where the read-only dictionary was imported from, so you can re-upload it to your workspace after the migration.
-
Make a list of current user roles.
This step is optional. It is typically performed when a workspace is migrated across instances and the new workspace must be identical to the original workspace. If you want to migrate only the workspace data into a new workspace, you can skip this step.
If you are migrating workspaces across different instances, consider making a list of users and their roles for the instance that you are backing up. Someone with the Admin role can print the list from the User Account Management page. After the workspaces are re-created on the new instance, someone with the Admin role must add the users and assign their roles.
For more information about roles, see User roles in Knowledge Studio.
-
Make note of workspace information.
While you still have access to the current instance, for each workspace that you want to migrate, make a note of the following information:
- Workspace name
- Workspace description
- Workspace owner
- Language
- If you have any incomplete annotation tasks, because they can't be backed up or migrated, note the human annotators who are assigned to incomplete tasks in the workspace. Also note the annotation task details, such as the task name, due date, and which document sets are assigned to which users.
Downloading artifacts
For each workspace that you want to migrate, download the following artifacts. Store them in a secure location from which you will be able to upload them into the new instance later.
-
Type system
-
Dictionaries
Note:
- Only editable dictionaries will be downloaded. You cannot download read-only dictionaries.
- For dictionaries, entity type mappings are not migrated. After you restore these artifacts, you will need to map the dictionaries to entity types, as necessary.
-
Documents
For more information about how to download these artifacts so that they can be imported into a new workspace, see Uploading resources from another workspace.
Recreating workspaces
In this step, the only setting that must match the setting of the original (downloaded) workspace is the language. The rest of the settings can differ from the settings of the original workspace.
Re-create each workspace by copying the following information from the previous instance to the new one:
-
Workspace name
-
Workspace description
-
Language of documents (this setting must match the setting in the original workspace)
-
If you previously used a dictionary-based tokenizer in the workspace and have a valid need to continue using it, you must specify that you want to use the dictionary-based tokenizer instead of the default tokenizer when you create the workspace. For more information about the options, see Tokenizers.
To use a dictionary-based tokenizer, expand the Advanced Options section of the "Create Workspace" window (in IBM Marketplace, the "Create Workspace" window) and change the tokenizer setting.
Restoring workspace data
After re-creating the workspaces, upload the previously downloaded artifacts:
-
Upload the type system from the previously created type system backup. For more information, see Uploading resources from another workspace.
You must upload the type system before you can upload any other artifacts that you are moving from the backed up workspace.
-
Upload the dictionaries from the previously created dictionary backup. For more information, see Uploading resources from another workspace.
If you used any read-only dictionaries in the previous version of the workspace, re-upload them into this workspace from their original source.
-
For dictionary pre-annotators, associate the dictionaries with an entity type. Dictionaries that don't have mappings for entity types will not apply annotations when you pre-annotate documents.
-
Upload the documents that you downloaded from the previous version of the workspace into this version of the workspace. For more information, see Uploading resources from another workspace.
Restoring models
At this point, all the artifacts that were used to train the model in the previous (backed up) version of the workspace are now available in this new instance.
To redeploy a machine learning model that you deployed in the previous instance, complete the following steps:
-
Train the machine learning model.
Do not run pre-annotators on annotated documents that you migrated to this workspace. They will lose annotations that were added by human annotators.
-
After creating the model, deploy it again. For more information, see Using the machine learning model.
To redeploy a rule-based model that you deployed in the previous instance, complete the following steps:
Restoring incomplete annotation tasks
If you had any annotation tasks that were created but not completed in the previous workspace, complete the following steps to re-create the incomplete annotation tasks:
- Upload any documents that have not been annotated yet but that you want to add to the ground truth to continue to improve the model.
- From the newly imported and unannotated documents, create annotation sets.
- Re-create the annotation tasks. Give the task the same name and an appropriate due date, and assign annotation sets to the appropriate human annotators.