IBM Cloud Docs


Crawl documents that are stored in a Salesforce data source.

IBM Cloud IBM Cloud only

This information applies only to managed deployments. For more information about connecting to Salesforce from an installed deployment, see Salesforce.

What documents are crawled

During the initial crawl of the content, documents from all of the objects that can be accessed from the URL that you specify are crawled and added to your collection. Knowledge Articles are crawled only if their version is published and their languages is en-us.

During subsequent scheduled recrawls, only new and modified documents are crawled and any changes are reflected in your collection. Documents that are deleted from the external data source are not deleted from the collection.

All Discovery data source connectors are read-only. Regardless of the permissions that are granted to the crawl account, Discovery never writes, updates, or deletes any content in the original data source.

Discovery can crawl the following objects:

  • Any default and custom objects that you have access to
  • Accounts
  • Contacts
  • Cases
  • Contracts
  • Knowledge articles
  • Attachments

Data source requirements

In addition to the data source requirements for all managed deployments, your Salesforce data source must meet the following requirements:

  • The instance that you plan to connect to must be part of an Enterprise plan or higher.
  • You must obtain any required service licenses for the data source that you want to connect to. For more information about licenses, contact the system administrator of the data source.

What you need before you begin

You must have the following information ready. If you don't know it, ask your Salesforce administrator to provide the information or consult the Salesforce developer documentation.

The username of an account that has access to the Salesforce site. For example,
The password associated with the username. For example, myP@ssw0rd.
Service token
A valid Salesforce security token. For example, mnaO8jsRET5CiJww9JnURlNN.
The URL of the Salesforce site that you want to crawl. For example,

Connecting to the data source

To configure the Salesforce data source, complete the following steps in Discovery:

  1. From the navigation pane, choose Manage collections.

  2. Click New collection.

  3. Click the link next to the Need to connect to a data source? field, click Salesforce, and then click Next.

  4. Add values to the following fields:

    • Username

    • Password plus service token

      To form the password, concatenate the Password and Service token values that you noted earlier. For example, myP@ssw0rdmnaO8jsRET5CiJww9JnURlNN. The password and token values are never returned and are used only when credentials are created or modified.

    • URL

    Click Next.

  5. Name the collection.

  6. If the language of the documents on the site is not English, select the appropriate language.

    For a list of supported languages, see Language support.

  7. Optional: Change the synchronization schedule.

    For more information, see Crawl schedule options.

  8. Select the objects that you want to crawl.

    The more objects that you select, the longer the processing of the documents takes.

  9. If you want to limit the types of files to add to the collection, you can list the file extensions for file types to either include or exclude.

    When you choose to list extensions for file types to exclude, you must add at least one file extension.

    For a list of supported file types, see Supported file types.

  10. If you want the crawler to extract text from images on the site, expand More processing settings, and set Apply optical character recognition (OCR) to On.

    When OCR is enabled and your documents contain images, processing takes longer. For more information, see Optical character recognition.

  11. Click Finish.

The collection is created quickly. It takes more time for the data to be processed as it is added to the collection.

If you want to check the progress, go to the Activity page. From the navigation pane, click Manage collections, and then click to open the collection.