IBM Cloud Docs
Backing up and restoring databases (version 1.1.2)

Backing up and restoring databases (version 1.1.2)

You can back up and restore databases in IBM Watson® Knowledge Studio for IBM Cloud Pak® for Data version 1.1.2 by running scripts.

The all-backup-command.sh script backs up or restores all the databases and deactivates the pods to prevent access. It then reactivates the pods. However, with the individual database scripts, you must run those procedures.

For more information about backing up databases with previous versions, see v1.1.1, v1.0.1, or v1.0.0. For more information about how to back up and restore workspace data, such as type systems and ground truth, see Backing up and restoring data.

Before you begin

  • Download the scripts.

  • Review information about the script's use of the MinIO client. The client is used for the for MinIO commands.

    • The scripts download the client from the MinIO website if the client isn't installed.
    • If you want the script to use your installed version, verify that you can run the client by issuing the command mc on the command line.

all-backup-command script

The all-backup-command script.sh script backs up or restores the MongoDB, PostgreSQL, Minio databases, and PVC.

Unless you need to back up a single database, use the all-backup-command script, which deactivates and reactivates Knowledge Studio.

The script backs up or restores the data in the following order:

  1. MongoDB
  2. PostgreSQL
  3. MinIO
  4. PVC
all-backup-restore.sh backup | restore RELEASE_NAME BACKUP_DIR -n NAMESPACE

Use either the backup or restore command.

backup
Backs up each database to a subdirectory of the `BACKUP_DIR` directory.
restore
Restores the data from each database in the `BACKUP_DIR` directory.

Arguments and options

RELEASE_NAME
The release name that was specified when the Knowledge Studio Helm chart was installed in your cluster. Required.
For version 1.1.2, the value is `wks`.
BACKUP_DIR
The base directory of each database where backups are stored to or restored from. Each database is stored in a subdirectory of the backup directory (`mongodb`, `postgresql`, `minio`, or `pvc`). Required.
-n NAMESPACE
Namespace for the pods.
The default value is `zen`.

Output

The script returns the following output:

[SUCCESS] MongoDB,PostgreSQL,Minio,PVC

and indicates either the backup or restore command.

If the process fails, the following message is displayed.

[FAIL] MongoDB,PostgreSQL,Minio,PVC

and indicates either the backup or restore command.

If the script fails, the data is corrupted. Do not use the corrupted data to restore.

Scripts location

The backup and restore scripts for version 1.1.2 are available from the knowledge-studio/1.1.2 directory of the watson-developer-cloud/doc-tutorial-downloads GitHub project. Download the scripts and the contents of the lib directory.

Backing up and restoring all databases

The all-backup-restore.sh script backs up or restores the databases. The script deactivates the pods before it backs up or restores data and then reactivates the pods when the script completes.

Backing up all databases

Run the all-backup-restore.sh script with the backup command.

Restoring all databases

Run the all-backup-restore.sh script with the restore command.

Database-specific scripts

If you need to back up or restore a single database, use one of the database-specific scripts. However, make sure that you deactivate the pods before you run the script and reactivate the pods after the script completes successfully.

MongoDB

Use this script instead of all-backup-command.sh to back up or restore only the MongoDB database.

mongodb-backup-restore.sh backup | restore RELEASE_NAME BACKUP_DIR -n NAMESPACE

Use either the backup or restore command. For more information about the arguments and options, see the all-backup-command script.

Backing up MongoDB

Back up your MongoDB data. Databases named WKSDATA, ENVDATA, and escloud_sbsep store data for Knowledge Studio.

  1. Deactivate Knowledge Studio.
  2. Run the mongodb-backup-restore.sh script with the backup command. The script runs the following operations:
    1. Creates a remote temporary file under the mongoDB pod and extracts the following data: WKSDATA,ENVDATA, and escloud_sbsep.
    2. Copies the WKSDATA, ENVDATA, and escloud_sbsep data to the BACKUP_DIR that you specify and deletes the temporary file.
  3. Reactivate Knowledge Studio.

Restoring MongoDB data

Restore the backed-up data to MongoDB.

  1. Deactivate Knowledge Studio.
  2. Run the mongodb-backup-restore.sh script with the restore command. The script runs the following operations:
    1. Create a remote temporary file under the mongoDB pod
    2. Copies the WKSDATA ENVDATA escloud_sbsep data from the BACKUP_DIR that you specify to the remote temporary file.
    3. Restores the data from the temporary file and deletes the temporary file.
  3. Reactivate Knowledge Studio.

PostgreSQL

Use this script instead of all-backup-command.sh to back up or restore only the PostgreSQL database.

postgresql-backup-restore.sh backup | restore RELEASE_NAME BACKUP_DIR -n NAMESPACE

Use either the backup or restore command. For more information about the arguments and options, see the all-backup-command script.

Backing up PostgreSQL

Back up your PostgreSQL data by getting a data dump. Databases named awt, jobq_RELEASE_NAME_,model_management_api, and model_management_api_v2 store data for Knowledge Studio.

  1. Deactivate Knowledge Studio.
  2. Run the postgresql-backup-restore.sh script with the backup command. The script runs the following operations:
    1. Creates and sets up a .pgpass file.
    2. Dumps the databases. The filenames are the database names with the .custom extension.
    3. Copies the dump files to the BACKUP_DIR that you specify.
    4. Deletes the .pgpass file.
  3. Reactivate Knowledge Studio.

Restoring PostgreSQL data

Restores the backed-up data to PostgreSQL.

  1. Deactivate Knowledge Studio.
  2. Run the postgresql-backup-restore.sh script with the restore command. The script runs the following operations:
    1. Creates and sets up a .pgpass file.
    2. Restores the databases by loading the .custom files from the BACKUP_DIR that you specify.
    3. Deletes the .pgpass file.
  3. Reactivate Knowledge Studio.

MinIO

Use this script instead of all-backup-command.sh to back up or restore only the MinIO database.

minio-backup-restore.sh backup | restore RELEASE_NAME BACKUP_DIR -n NAMESPACE

Use either the backup or restore command. For more information about the arguments and options, see the all-backup-command script.

Backing up MinIO

Back up your MinIO database by taking a snapshot of the data. A bucket named wks-icp stores data for Knowledge Studio.

  1. Deactivate Knowledge Studio.
  2. Run the minio-backup-restore.sh script with the backup command. The script runs the following operations:
    1. Establishes a connection to the pod RELEASE_NAME-ibm-minio by running kubectl -n NAMESPACE port-forward.
    2. Configures a MinIO alias named wks-minio.
    3. Copies data from wks-minio/wks-icp to the BACKUP_DIR you specify.
    4. Closes the port-forward connection.
  3. Reactivate Knowledge Studio.

Restoring MinIO data

Restores the snapshot data to MinIO. Deletes the existing data in the MinIO server, and then restores the backup data.

  1. Deactivate Knowledge Studio.
  2. Run the minio-backup-restore.sh script with the backup command. The script runs the following operations:
    1. Establishes a connection to the pod RELEASE_NAME-ibm-minio by running kubectl -n NAMESPACE port-forward.
    2. Configures a MinIO alias named wks-minio.
    3. Copies data from the BACKUP_DIR you specify to wks-minio/wks-icp.
    4. Closes the port-forward connection.
  3. Reactivate Knowledge Studio.

PVC

Use this script instead of all-backup-command.sh to back up or restore only the Persistent volume claim (PVC) data.

pvc-backup-restore.sh backup | restore RELEASE_NAME BACKUP_DIR DOCKERREGISTRY PVC_USER_ID -n NAMESPACE

Arguments for PVC

DOCKERREGISTRY
The same Docker registry as the `RELEASE_NAME-ibm-watson-ks-aql-web-tooling` pod.
PVC_USER_ID
The user ID for the running containers in the `RELEASE_NAME-ibm-watson-ks-aql-web-tooling` pod.

Use either the backup or restore command. For more information about the other arguments and options, see the all-backup-command script.

Backing up PVC

  1. Identify the name of Docker registry and user ID before you deactivate Knowledge Studio.
  2. Deactivate Knowledge Studio.
  3. Run the pvc-backup-restore.sh script with the backup command. The script runs the following operations:
    1. Creates a temporary pod at RELEASE_NAME-ibm-watson-ks-aql-web-tooling-backup. Compresses /opt/ibm/watson/aql-web-tooling/target/sandbox and saves it as sandbox.tgz to RELEASE_NAME-ibm-watson-ks-aql-web-tooling-backup.
    2. Copies sandbox.tgz to the BACKUP_DIR that you specify.
    3. Deletes the temporary pod
  4. Reactivate Knowledge Studio.

Restoring PVC data

Restores data to the PVC. Deletes the existing data in the sandbox, and then restores the backup data.

  1. Deactivate Knowledge Studio.
  2. Run the pvc-backup-restore.sh script with the restore command. The script runs the following operations:
    1. Copies sandbox.tgz from the BACKUP_DIR that you specify to a temporary pod at RELEASE_NAME-ibm-watson-ks-aql-web-tooling-backup.
    2. Deletes the data in /opt/ibm/watson/aql-web-tooling/target/sandbox.
    3. Decompresses sandbox.tgz to /opt/ibm/watson/aql-web-tooling/target/sandbox.
      1. Deletes the temporary pod
  3. Reactivate Knowledge Studio.

Deactivate Knowledge Studio

You don't need to deactivate when you run the all-backup-restore.sh script because the script handles the process.

To ensure that users don't have access to Knowledge Studio when you back up or restore a single database, stop the Knowledge Studio front-end pods before you back up or restore data.

  1. Make sure that no training and evaluation processes are running. You can check job status with the following command:

    kubectl -n NAMESPACE get jobs
    

    Training jobs of Knowledge Studio are named in the format wks-train-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx, and evaluation jobs are named in the format wks-batch-apply-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx. If the COMPLETIONS column of a training job reads 0/1, that job is still running. Wait until all of the training jobs finish.

  2. List information about the deployment:

    kubectl -n NAMESPACE get deployment RELEASE_NAME-ibm-watson-ks
    kubectl -n NAMESPACE get deployment RELEASE_NAME-sire-training-jobq
    kubectl -n NAMESPACE get deployment RELEASE_NAME-ibm-watson-mma-prod-model-management-api
    kubectl -n NAMESPACE get deployment RELEASE_NAME-ibm-watson-ks-servicebroker
    kubectl -n NAMESPACE get deployment RELEASE_NAME-ibm-watson-ks-aql-web-tooling
    kubectl -n NAMESPACE get deployment RELEASE_NAME-ibm-watson-ks-glimpse-builder
    kubectl -n NAMESPACE get deployment RELEASE_NAME-ibm-watson-ks-glimpse-query
    

    where NAMESPACE is the namespace where Knowledge Studio is deployed and RELEASE_NAME is the name that was specified when the Knowledge Studio Helm chart was installed in your cluster.

    Make sure to note the number of pods in the DESIRED column so you can restore the same number later.

  3. Temporarily stop the pods by issuing the following commands. Make sure you know the number of pods from the previous step.

    kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-ks --replicas=0
    kubectl -n NAMESPACE scale deployment RELEASE_NAME-sire-training-jobq --replicas=0
    kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-mma-prod-model-management-api --replicas=0
    kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-ks-servicebroker --replicas=0
    kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-ks-aql-web-tooling --replicas=0
    kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-ks-glimpse-builder --replicas=0
    kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-ks-glimpse-query --replicas=0
    

Reactivate Knowledge Studio

You don't need to reactivate when you run the all-backup-restore.sh script because the script handles the process.

To activate Knowledge Studio, start that pods that you stopped before you began backing up or restoring data. If you stopped pods with the kubectl scale command, you can start the pods with the following commands:

kubectl -n NAMESPACE scale deployment RELEASE_NAME-sire-training-jobq --replicas=JOBQ_PODS
kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-mma-prod-model-management-api --replicas=MMA_PODS
kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-ks --replicas=FRONTEND_PODS
kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-ks-servicebroker --replicas=BROKER_PODS
kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-ks-aql-web-tooling --replicas=AWT_PODS
kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-ks-glimpse-builder --replicas=GLIMPSE_BUILDER_PODS
kubectl -n NAMESPACE scale deployment RELEASE_NAME-ibm-watson-ks-glimpse-query --replicas=GLIMPSE_QUERY_PODS

where the values refer to the number of pods that you deactivated earlier:

  • FRONTEND_PODS: The number of front-end pods that you stopped.
  • JOBQ_PODS: The number of SIRE job queue pods that you stopped.
  • MMA_PODS: The number of MMA pods that you stopped.
  • BROKER_PODS: The number of Service Broker pods that you stopped.
  • AWT_PODS: The number of AQL Web Tooling pods that you stopped.
  • GLIMPSE_BUILDER_PODS: The number of Glimpse builder pods that you stopped.
  • GLIMPSE_QUERY_PODS: The number of Glimpse query pods that you stopped.