Backing up and restoring data
If you need to backup and restore a workspace for Knowledge Studio, perform these tasks. Also, if you need to perform a manual data migration from one Knowledge Studio instance to another, perform these tasks.
This topic describes backing up and restoring workspace data, such as type systems and ground truth. For information on backing up and restoring databases, see Backing up and restoring databases.
Manual data migration is the process of backing up your data from one instance and restoring it on another instance.
To back up and restore your data complete the following steps:
Data that can be backed up
The following artifacts can be backed up and migrated manually:
- Editable dictionaries
- Type system
- Approved ground truth document sets
The following types of artifacts cannot be backed up and migrated manually:
- In-progress human annotation documents
- Annotation tasks
- Models and snapshots
- Read-only dictionaries
Preparing for backup
To prepare for backing up and restoring your data, complete the following steps:
-
Complete any work that is in progress in your workspace.
-
Important: Finish any in-progress annotation tasks. Only documents that have been annotated, adjudicated, and approved and promoted to ground truth can be backed up. If you do not finish the annotation work, you will lose any annotation effort that is in progress, but not completely done.
-
If you created annotation tasks to track work that you want to do, but none of the annotation work has begun and will not take place until after the workspace is restored, then make a list of the outstanding annotation tasks. Be sure to note the document sets that you imported, but that have not been added to the ground truth yet. Also, make a note of who you assigned to annotate each document set. You will need to reupload these document sets and recreate the annotation tasks after the workspace is restored.
-
-
Understand tokenizer use.
For machine learning models, workspaces use the machine learning-based tokenizer by default. If you are using a dictionary-based tokenizer and have a specific need to continue doing so, you can configure the workspace to use the dictionary-based tokenizer when you restore it. For more information, see Tokenizers.
-
Manage model resources.
Your model, its versions, and snapshot data cannot be migrated. The resources (except read-only dictionaries) that you used to train those artifacts can be migrated. Therefore, after the migration, you can recreate the model. The model that will be produced will perform the same as the models you generated prior to the migration because the resources that are used for training will be the same.
ATTENTION: If you have a model that is already deployed and you plan to delete the workspace after you back it up, withdraw the model from deployment. You can rebuild and redeploy the model after you restore the workspace from the backup. For information about undeploying models, see Undeploying machine learning models and Undeploying rule-based models.
Be aware that if you fail to withdraw the model from deployment, the result is an orphaned deployed model. Orphaned deployed models will continue to generate charges on your monthly bills.
-
Manage read-only dictionary information.
Read-only dictionaries cannot be migrated. Find out where the read-only dictionary was imported from, so you can reupload it to your workspace after the migration.
-
Make a list of current user roles.
NOTE: This step is optional. It is typically performed when a workspace is migrated across instances and the new workspace must be identical to the original workspace. If you want to migrate only the workspace data into a new workspace, you can skip this step.
If you are migrating workspaces across different instances, consider making a list of users and their roles for the instance that you are backing up. Someone with the Admin role can print the list from the User Account Management page. After the workspaces are recreated on the new instance, someone with the Admin role must add the users and assign their roles.
For more information about roles, see User roles in Knowledge Studio.
-
Make note of workspace information.
While you still have access to the current instance, for each workspace that you want to migrate, make a note of the following information:
- Workspace name
- Workspace description
- Workspace owner
- Language
- If you have any incomplete annotation tasks, because they can't be backed up or migrated, note the human annotators who are assigned to incomplete tasks in the workspace. Also note the annotation task details, such as the task name, due date, and which document sets are assigned to which users.
Downloading artifacts
For each workspace that you want to migrate, download the following artifacts. Store them in a secure location from which you will be able to upload them into the new instance later.
-
Type system
-
Dictionaries
Note:
- Only editable dictionaries will be downloaded. You cannot download read-only dictionaries.
- For dictionaries, entity type mappings are not migrated. After you restore these artifacts, you will need to map the dictionaries to entity types, as necessary.
-
Documents
See Uploading resources from another workspace for information about how to download these artifacts so that they can be imported into a new workspace.
Recreating workspaces
NOTE: In this step, the only setting that must match the setting of the original (downloaded) workspace is the language. The rest of the settings can differ from the settings of the original workspace.
Recreate each workspace by copying the following information from the previous instance to the new one:
-
Workspace name
-
Workspace description
-
Language of documents (this setting must match the setting in the original workspace)
-
If you previously used a dictionary-based tokenizer in the workspace, and have a valid need to continue using it, you must specify that you want to use the dictionary-based tokenizer instead of the default tokenizer at the time that you create the workspace. For more information about the options, see Tokenizers.
To use a dictionary-based tokenizer, expand the Advanced Options section of the Create Workspace window and change the tokenizer setting.
Restoring workspace data
After recreating the workspaces, upload the previously downloaded artifacts:
-
Upload the type system from the previously created type system backup. For details, see Uploading resources from another workspace.
Note: You must upload the type system before you can upload any other artifacts that you are moving from the backed up workspace.
-
Upload the dictionaries from the previously created dictionary backup. For details, see Uploading resources from another workspace.
If you used any read-only dictionaries in the previous version of the workspace, reupload them into this workspace from their original source.
-
For dictionary pre-annotators, associate the dictionaries with an entity type. Dictionaries that don't have mappings for entity types will not apply annotations when you pre-annotate documents.
-
Upload the documents that you downloaded from the previous version of the workspace into this version of the workspace. For details, see Uploading resources from another workspace.
Restoring models
At this point, all the artifacts that were used to train the model in the previous (backed up) version of the workspace are now available in this new instance.
To redeploy a machine learning model that you deployed in the previous instance, complete the following steps:
-
After creating the model, deploy it again. For details, see Using the machine learning model.
To redeploy a rule-based model that you deployed in the previous instance, complete the following steps:
Restoring incomplete annotation tasks
If you had any annotation tasks that were created, but not completed in the previous workspace, complete the following steps to recreate the incomplete annotation tasks:
- Upload any documents that have not been annotated yet, but that you want to add to the ground truth to continue to improve the model.
- From the newly imported and unannotated documents, create annotation sets.
- Recreate the annotation tasks. Give the task the same name, an appropriate due date, and assign annotation sets to the appropriate human annotators.