IBM Cloud Docs
Backing up and restoring data

Documentation about IBM Watson® Assistant for IBM Cloud Pak® for Data has moved. For the most up-to-date version, see Backing up and restoring data for IBM Cloud Pak for Data.

Backing up and restoring data

You can back up and restore the data that is associated with your Watson Assistant deployment in IBM Cloud Pak for Data.

The primary data storage for Watson Assistant is a Postgres database. Your data, such as workspaces, assistants, and skills are stored in Postgres. Other internal data, such as trained models, can be recreated from the data in Postgres.

Choose one of the following ways to manage the back up of data:

  • Kubernetes CronJob: Use the $INSTANCE-store-cronjob cron job that is provided for you.
  • backupPG.sh script: Use the backupPG.sh bash script.
  • pg_dump tool: Run the pg_dump tool on each cluster directly. This is the most manual option, but also affords the most control over the process.

When you back up data with one of these procedures before you upgrade from one version to another, the workspace IDs of your skills are preserved, but the service instance IDs and credentials change.

Before you begin

  • When you create a backup by using this procedure, the backup includes all of the assistants and skills from all of the service instances. Meaning it can include even skills and assistants to which you do not have access.
  • The access permissions information of the original service instances is not stored in the backup. Meaning original access rights, which determine who can see a service instance and who cannot, are not preserved.
  • You cannot use this procedure to back up the data that is returned by the search skill. Data that is retrieved by the search skill comes from a data collection in a Discovery instance. See the Discovery documentation to find out how to back up its data.
  • If you back up and restore or otherwise change the Discovery service that your search skill connects to, then you cannot restore the search skill, but must recreate it. When you set up a search skill, you map sections of the assistant's response to fields in a data collection that is hosted by an instance of Discovery on the same cluster. If the Discovery instance changes, your mapping to it is broken. If your Discovery service does not change, then the search skill can continue to connect to the data collection.
  • The tool that restores the data clears the current database before it restores the backup. Therefore, if you might need to revert to the current database, create a backup of it first.
  • The target IBM Cloud Pak for Data cluster where you restore the data must have the same number of provisioned Watson Assistant service instances as the environment from which you back up the database. To verify in the IBM Cloud Pak for Data web client, select Services from the main navigation menu, select Instances, and then open the Provisioned instances tab. If more than one user created instances, then ask the other users who created instances to log in and check the number they created. You can then add up the total sum of instances for your deployment. Note that not even an administrative user can see instances that were created by others from the web client user interface.

Backing up data by using the CronJob

A CronJob named $INSTANCE-store-cronjob is created and enabled for you automatically when you deploy the service. A CronJob is a type of Kubernetes controller. A CronJob creates Jobs on a repeating schedule. For more information, see CronJob in the Kubernetes documentation.

The jobs that are created by the store cron job are called $INSTANCE-backup-job-$TIMESTAMP. Each job deletes old logs and runs a backup of the store Postgres database. Postgres provides a tool that is called pg_dump. The dump tool creates a backup by sending the database contents to stdout where you can write it to a file. The backups are created with the pg_dump command and stored in a persistent volume claim (PVC) named $INSTANCE-store-pvc.

You are responsible for moving the backup to a more secure location after its initial creation, preferrably a location that can be accessed outside of the cluster where the backups cannot be deleted easily. Ensure this happens for all environments, especially for Production clusters.

The following table lists the configuration values that control the backup cron job. You can edit these settings by editing the cron job after the service is deployed by using the oc edit cronjob $INSTANCE-store-cronjob command.

Cron job variables
Variable Description Default value
store.backup.suspend If True, the cron job does not create any backup jobs. False
store.backup.schedule Specifies the time of day at which to run the backup jobs. Specify the schedule by using a cron expression. For example {minute} {hour} {day} {month} {day-of-week} where {day-of-week} is specified as 0=Sunday, 1=Monday, and so on. The default schedule is to run every day at 11 PM. 0 23 * * *
store.backup.history.jobs.success The number of successful jobs to keep. 30
store.backup.history.jobs.failed The number of failed jobs to keep in the job logs. 10
store.backup.history.files.weekly_backup_day A day of the week is designated as the weekly backup day. 0=Sunday, 1=Monday and so on. 0
store.backup.history.files.keep_weekly The number of backups to keep that were taken on weekly_backup_day. 4
store.backup.history.files.keep_daily The number of backups to keep that were taken on all the other days of the week 6

Accessing backed-up files from Portworx

To access the backup files from Portworx, complete the following steps:

  1. Get the name of the persistent volume that is used for the Postgres backup:

    oc get pv |grep $INSTANCE-store
    

    This command returns the name of the persistent volume claim where the store backup is located, such as pvc-d2b7aa93-3602-4617-acea-e05baba94de3. The name is referred to later in this procedure as the $pv_name.

  2. Find nodes where Portworx is running:

    oc get pods -n kube-system -o wide -l name=portworx-api
    
  3. Log in as the core user to one of the nodes where Portworx is running:

    ssh core@<node hostname>
    sudo su -
    
  4. Make sure the persistent volume is in a detached state and that no store backups are scheduled to occur during the time you plan to transfer the backup files.

    Remember, backups occur daily at 11 PM (in the time zone configured for the nodes) unless you change the schedule by editing the value of the postgres.backup.schedule configuration parameter. You can run the oc get cronjobs command to check the current schedule for the $RELEASE-backup-cronjob job. In the following command, $pvc_node is the name of the node that you discovered in the first step of this task:

    pxctl volume inspect $pv_name |head -40
    
  5. Attach the persistent volume to the host:

    pxctl host attach $pv_name
    
  6. Create a folder where you want to mount the node:

    mkdir /var/lib/osd/mounts/voldir
    
  7. Mount the node:

    pxctl host mount $pv_name --path /var/lib/osd/mounts/voldir
    
  8. Change directory to /var/lib/osd/mounts/voldir. Transfer backup files to a secure location. Afterwards, exit the directory. Unmount the volume:

    pxctl host unmount --path /var/lib/osd/mounts/voldir $pv_name
    
  9. Detach the volume from the host:

    pxctl host detach $pv_name
    
  10. Make sure the volume is in the detached state. Otherwise, subsequent backups will fail:

    pxctl volume inspect $pv_name |head -40
    

Accessing backed-up files from OpenShift Container Storage

To access the backup files from OpenShift Container Storage (OCS), complete the following steps:

  1. Create a volume snapshot of the persistent volume claim that is used for the Postgres backup:

    cat <<EOF | oc apply -f -
    apiVersion: snapshot.storage.k8s.io/v1
    kind: VolumeSnapshot
    metadata:
      name: wa-backup-snapshot
    spec:
      source:
        persistentVolumeClaimName: ${INSTANCE_NAME}-store-pvc
      volumeSnapshotClassName: ocs-storagecluster-rbdplugin-snapclass
    EOF
    
  2. Create a persistent volume claim from the volume snapshot:

    cat <<EOF | oc apply -f -
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: wa-backup-snapshot-pvc
    spec:
      storageClassName: ocs-storagecluster-ceph-rbd
      accessModes:
      - ReadWriteOnce
      volumeMode: Filesystem
      dataSource:
        apiGroup: snapshot.storage.k8s.io
        kind: VolumeSnapshot
        name: wa-backup-snapshot
      resources:
        requests:
          storage: 1Gi     
    EOF
    
  3. Create a pod to access the persistent volume claim:

    cat <<EOF | oc apply -f -
    kind: Pod
    apiVersion: v1
    metadata:
      name: wa-retrieve-backup
    spec:
      volumes:
        - name: backup-snapshot-pvc
          persistentVolumeClaim:
           claimName: wa-backup-snapshot-pvc
      containers:
        - name: retrieve-backup-container
          image: cp.icr.io/cp/watson-assistant/conan-tools:20210630-0901-signed@sha256:e6bee20736bd88116f8dac96d3417afdfad477af21702217f8e6321a99190278
          command: ['sh', '-c', 'echo The pod is running && sleep 360000']
          volumeMounts:
            - mountPath: "/watson_data"
              name: backup-snapshot-pvc
    EOF
    
  4. If you do not know the name of the backup file that you want to extract and are unable to check the most recent backup cron job, run the following command:

    oc exec -it wa-retrieve-backup -- ls /watson_data
    
  5. Transfer the backup files to a secure location:

    kubectl cp wa-retrieve-backup:/watson_data/${FILENAME} ${SECURE_LOCAL_DIRECTORY}/${FILENAME}
    
  6. Run the following commands to clean up the resources that you created for to retrieve the files:

    oc delete pod wa-retrieve-backup
    oc delete pvc wa-backup-snapshot-pvc
    oc delete volumesnapshot wa-backup-snapshot
    

Backing up data by using the script

The backupPG.sh script gathers the pod name and credentials for one of your Postgres pods, which is the pod from which the pg_dump command must be run, and then runs the command for you.

To back up data by using the provided script, complete the following steps:

  1. Download the backupPG.sh script.

    Go to GitHub, and find the directory for your version to find the file.

  2. Log in to the OpenShift project namespace where you installed the product.

  3. Run the script:

    ./backupPG.sh --instance ${INSTANCE} > ${BACKUP_DIR}
    

    Replace the following values in the command:

    • ${BACKUP_DIR}: Specify a file where you want to write the downloaded data. Be sure to specify a backup directory in which to store the file. For example, /bu/backup-file-name.dump creates a backup directory named bu.
    • --instance ${INSTANCE}: Select the specific instance of Watson Assistant to be backed up.

If you prefer to back up data by using the Postgres tool directly, you can complete the procedure to back up data manually.

Backing up data manually

Complete the steps in this procedure to back up your data by using the Postgres tool directly.

To back up your data, complete these steps:

  1. Fetch a running Postgres pod:

    oc get pods -l app=${INSTANCE}-postgres -o jsonpath="{.items[0].metadata.name}"
    

    Replace ${INSTANCE} with the instance of the Watson Assistant deployment that you want to back up.

  2. Fetch the store VCAP secret name:

    oc get secrets -l component=store,app.kubernetes.io/instance=${INSTANCE} -o=custom-columns=NAME:.metadata.name | grep store-vcap
    
  3. Fetch the Postgres connection values. You will pass these values to the command that you run in the next step. You must have jq installed.

    • To get the database:

      oc get secret $VCAP_SECRET_NAME -o jsonpath="{.data.vcap_services}" | base64 --decode | jq --raw-output '.["user-provided"][]|.credentials|.database'
      
    • To get the hostname:

      oc get secret $VCAP_SECRET_NAME -o jsonpath="{.data.vcap_services}" | base64 --decode | jq --raw-output '.["user-provided"][]|.credentials|.host'
      
    • To get the username:

      oc get secret $VCAP_SECRET_NAME -o jsonpath="{.data.vcap_services}" | base64 --decode | jq --raw-output '.["user-provided"][]|.credentials|.username'
      
    • To get the password:

      oc get secret $VCAP_SECRET_NAME -o jsonpath="{.data.vcap_services}" | base64 --decode | jq --raw-output '.["user-provided"][]|.credentials|.password'
      
  4. Run the following command:

    oc exec $KEEPER_POD -- bash -c "export PGPASSWORD='$PASSWORD' && pg_dump -Fc -h $HOSTNAME -d $DATABASE -U $USERNAME" > ${BACKUP_DIR}
    

    The following lists describes the arguments. You retrieved the values for some of these parameters in the previous step:

    • $KEEPER_POD: Any Postgres pod in your Watson Assistant instance.
    • ${BACKUP_DIR}: Specify a file where you want to write the downloaded data. Be sure to specify a backup directory in which to store the file. For example, /bu/backup-file-name.dump creates a backup directory named bu.
    • $DATABASE: The store database name that was retrieved from the Store VCAP secret in step 3.
    • $HOSTNAME: The hostname that was retrieved from the Store VCAP secret in step 3.
    • $USERNAME: The username that was retrieved from the Store VCAP secret in step 3.
    • $PASSWORD: The password that was retrieved from the Store VCAP secret in step 3.

    To see more information about the pg_dump command, you can run this command:

    oc exec -it ${KEEPER_POD} -- pg_dump --help
    

Restoring data

IBM created a restore tool called pgmig. The tool restores your database backup by adding it to a database you choose. It also upgrades the schema to the one that is associated with the version of the product where you restore the data. Before the tool adds the backed-up data, it removes the data for all instances in the current service deployment, so any spares are also removed.

  1. Install the target IBM Cloud Pak for Data cluster to which you want to restore the data.

    From the web client for the target cluster, create one service instance of Watson Assistant for each service instance that was backed up on the old cluster. The target IBM Cloud Pak for Data cluster must have the same number of instances as there were in the environment where you backed up the database.

  2. Back up the current database before you replace it with the backed-up database.

    The tool clears the current database before it restores the backup. So, if you might need to revert to the current database, be sure to create a backup of it first.

  3. Go to the backup directory that you specified in the ${BACKUP_DIR} parameter in the previous procedure.

  4. Run the following command to download the pgmig tool from the GitHub Watson Developer Cloud Community repository.

    In the first command, update <WA_VERSION> to the version that you want to restore. For example, update <WA_VERSION> to 4.6.0 if you want to restore Watson Assistant 4.6.0.

    wget https://github.com/watson-developer-cloud/community/raw/master/watson-assistant/data/<WA_VERSION>/pgmig
    chmod 755 pgmig
    
  5. Create the following two configuration files and store them in the same backup directory:

  6. Get the secret:

    oc get secret ${INSTANCE}-postgres-ca -o jsonpath='{.data.ca\.crt}' | base64 -d | tee ${BACKUP_DIR}/ca.crt | openssl x509 -noout -text
    
    • Replace ${INSTANCE} with the name of the Watson Assistant instance that you want to back up.
    • Replace ${BACKUP_DIR} with the directory where the postgres.yaml and resourceController.yaml files are located.
  7. Copy the files that you downloaded and created in the previous steps to any existing directory on a Postgres pod.

    1. Run the following command to find Postgres pods:

      oc get pods | grep ${INSTANCE}-postgres
      
    2. The files that you must copy are pgmig, postgres.yaml, resourceController.yaml, ca.crt (the secret file generated in step 6), and the file that you created for your downloaded data. Run the following commands to copy the files.

      If you are restoring data to a stand-alone IBM Cloud Pak for Data cluster, then replace all references to oc with kubectl in these sample commands.

      oc exec -it ${POSTGRES_POD} -- mkdir /controller/tmp
      oc exec -it ${POSTGRES_POD} -- mkdir /controller/tmp/bu
      oc rsync ${BACKUP_DIR}/ ${POSTGRES_POD}:/controller/tmp/bu/
      
    • Replace ${POSTGRES_POD} with the name of one of the Postgres pods from the previous step.
  8. Stop the store deployment by scaling the store deployment down to 0 replicas:

    oc scale deploy ibm-watson-assistant-operator -n ${OPERATOR_NS} --replicas=0
    oc get deployments -l component=store
    

    Make a note of how many replicas there are in the store deployment:

    oc scale deployment ${STORE_DEPLOYMENT} --replicas=0
    
  9. Initiate the execution of a remote command in the Postgres pod:

    oc exec -it ${POSTGRES_POD} /bin/bash
    
  10. Run the pgmig tool:

    cd /controller/tmp/bu
    export PG_CA_FILE=/controller/tmp/bu/ca.crt
    ./pgmig --resourceController resourceController.yaml --target postgres.yaml --source <backup-file-name.dump>
    
    • Replace <backup-file-name.dump> with the name of the file that you created for your downloaded data.

    For more command options, see Postgres migration tool details.

    As the script runs, you are prompted for information that includes the instance on the target cluster to which to add the backed-up data. The data on the instance you specify will be removed and replaced. If there are multiple instances in the backup, you are prompted multiple times to specify the target instance information.

  11. Scale the store deployment back up:

    oc scale deployment ${STORE_DEPLOYMENT} --replicas=${ORIGINAL_NUMBER_OF_REPLICAS}
    oc scale deploy ibm-watson-assistant-operator -n ${OPERATOR_NS} --replicas=1
    

    You might need to wait a few minutes before the skills you restored are visible from the web interface.

  12. After restoring the data, you have to train the backend model. Ensure you reopen only one assistant or dialog skill at a time. Each time you open a dialog skill after its training data has been changed, training is initiated automatically. Give the skill time to retrain on the restored data. It usually takes less than 10 minutes to get trained. Remember that the process of training a machine learning model requires at least one node to have 4 CPUs that can be dedicated to training. Therefore, open restored assistants and skills during low traffic periods and open them one at a time. If the assistant or dialog skill does not respond, then modify the workspace (for example, add an intent and then remove it). Check and confirm.

Creating the resourceController.yaml file

The resourceController.yaml file contains details about the new environment where you are adding the backed-up data. Add the following information to the file:

accessTokens: 
  - value
  - value2
host: localhost
port: 5000

To add the values that are required but currently missing from the file, complete the following steps:

  1. To get the accessTokens values list, you need to get a list of bearer tokens for the service instances.

    • Log in to the IBM Cloud Pak for Data web client.
    • From the main IBM Cloud Pak for Data web client navigation menu, select My instances.
    • On the Provisioned instances tab, click your Watson Assistant instance.
    • In the Access information of the instance, find the Bearer token. Copy the token and paste it into the accessTokens list.

    A bearer token for an instance can access all instances that are owned by the user. Therefore, if a single user owns all of the instances, then only one bearer token is required.

    If the service has multiple instances, each owned by a different user, then you must gather bearer tokens for each user who owns an instance. You can list multiple bearer token values in the accessTokens section.

  2. To get the host information, you need details for the pod that hosts the Watson Assistant UI component:

    oc describe pod -l component=ui
    

    Look for the section that says, RESOURCE_CONTROLLER_URL: https://${release-name}-addon-assistant-gateway-svc.zen:5000/api/ibmcloud/resource-controller

    For example, you can use a command like this to find it:

    oc describe pod -l component=ui | grep RESOURCE_CONTROLLER_URL
    

    Copy the host that is specified in the RESOURCE_CONTROLLER_URL. The host value is the RESOURCE_CONTROLLER_URL value, excluding the protocol at the beginning and everything from the port to the end of the value. For example, for the previous example, the host is ${release-name}-addon-assistant-gateway-svc.zen.

  3. To get the port information, again check the RESOURCE_CONTROLLER_URL entry. The port is specified after <host>: in the URL. In this sample URL, the port is 5000.

  4. Paste the values that you discovered into the YAML file and save it.

Creating the postgres.yaml file

The postgres.yaml file contains details about the Postgres pods in your target environment (the environment where you will restore the data). Add the following information to the file:

host: localhost
port: 5432
database: store
username: user
su_username: admin
su_password: password

To add the values that are required but currently missing from the file, complete the following steps:

  1. To get information about the host, you must get the Store VCAP secret.

    oc get secret ${INSTANCE}-store-vcap -o jsonpath='{.data.vcap_services}' | base64 -d
    

    Information for the Redis and Postgres databases is returned. Look for the segment of JSON code for the Postgres database, named pgservice. It looks like this:

    {
      "user-provided":[
        {
          "name": "pgservice",
          "label": "user-provided",
          "credentials":
          {
            "host": "${INSTANCE}-rw",
            "port": 5432,
            "database": "conversation_pprd_${INSTANCE}",
            "username": "${dbadmin}",
            "password": "${password}"
          }
        }
      ],
      ...
    }
    
  2. Copy the values for user-provided credentials (host, port, database, username, and password).

    You can specify the same values that were returned for username and password as the su_username and su_password values.

    The updated file will look something like this:

    host: wa_inst-postgres-rw
    port: 5432
    database: conversation_pprd_wa_inst
    username: dbadmin
    su_username: dbadmin
    su_password: mypassword
    
  3. Save the postgres.yaml file.

Postgres migration tool details

The following table lists the arguments that are supported by the pgmig tool:

pgmig tool arguments
Argument Description
-h, --help Command usage
-f, --force Erase data if present in target Store
-s, --source string Backup file name
-r, --resourceController string Resource Controller configuration file name
-t, --target string Target Postgres server configuration file name
-m, --mapping string Service instance-mapping configuration file name (optional)
--testRCConnection Test the connection for Resource Controller, then exit
--testPGConnection Test the connection for Postgres server, then exit
-v, --version Get Build version

The mapping configuration file

After you run the script and specify the mappings when prompted, the tool generates a file that is named enteredMapping.yaml in the current directory. This file reflects the mapping of the old cluster details to the new cluster based on the interactive inputs that were provided while the script was running.

For example, the YAML file contains values like this:

instance-mappings:
  00000000-0000-0000-0000-001570184978: 00000000-0000-0000-0000-001570194490

where the first value (00000000-0000-0000-0000-001570184978) is the instance ID in the database backup and the second value (00000000-0000-0000-0000-001570194490) is the ID of a provisioned instance in the Watson Assistant service on the system.

You can pass this file to the script for subsequent runs of the script in the same environment. Or you can edit it for use in other back up and restore operations. The mapping file is optional. If it is not provided, the tool prompts you for the mapping details based on information you provide in the YAML files.