In the 5.2.1 release, the bundled JVM for the crawler plug-in and customer connector features will be transitioned to IBM Semeru Runtimes, Version 21. If your crawler plug-in or custom connectors used in version 5.2.0 or earlier utilize any features that are incompatible between IBM SDK, Java Technology Edition, Version 8 and IBM Semeru Runtimes, Version 21, you need to revise your code to ensure compatibility with future releases, and re-deploy the built package for version 5.2.1.
For JVM migration, see the following pages:
- https://www.ibm.com/support/pages/semeru-runtimes-migration-guide
- https://www.ibm.com/support/pages/semeru-runtimes-security-migration-guide
Building a custom crawler plug-in
Discovery features the option to build your own crawler plug-in with a Java SDK. By using crawler plug-ins, you can now quickly develop relevant solutions for your use cases. You can download the SDK from your installed Discovery cluster. For more information, see Obtaining the crawler plug-in SDK package.
IBM Cloud Pak for Data IBM Software Hub
This information applies only to installed deployments.
Any custom code that you use with IBM Watson® Discovery is the responsibility of the developer; IBM Support does not cover any custom code that the developer creates.
The crawler plug-ins support the following functions:
- Update the metadata list of a crawled document
- Update the content of a crawled document
- Exclude a crawled document
- Reference crawler configurations, masking password values
- Show notice messages in the Discovery user interface
- Output log messages to the crawlerpod console
However, the crawler plug-ins cannot support the following functions:
- Split a crawled document into multiple documents
- Combine content from multiple documents into a single document
- Modify access control lists
Crawler plug-in requirements
Make sure that the following items are installed on the development server that you plan to use to develop a crawler plug-in by using this SDK:
- Java SE Development Kit (JDK) 1.8 or higher
- Gradle
- cURL
- sed (stream editor)
Obtaining the crawler plug-in SDK package
- 
              Log in to your Discovery cluster. 
- 
              Enter the following command to obtain your crawlerpod name:oc get pods | grep crawlerThe following example shows sample output. wd-discovery-crawler-57985fc5cf-rxk89 1/1 Running 0 85m
- 
              Enter the following command to obtain the SDK package name, replacing {crawler-pod-name}with thecrawlerpod name that you obtained in step 2:oc exec {crawler-pod-name} -- ls -l /opt/ibm/wex/zing/resources/ | grep wd-crawler-plugin-sdkThe following example shows sample output. -rw-r--r--. 1 dadmin dadmin 35575 Oct 1 16:51 wd-crawler-plugin-sdk-${build-version}.zip
- 
              Enter the following command to copy the SDK package to the host server, replacing {build-version}with the build version number from the previous step:oc cp {crawler-pod-name}:/opt/ibm/wex/zing/resources/wd-crawler-plugin-sdk-${build-version}.zip wd-crawler-plugin-sdk.zip
- 
              If necessary, copy the SDK package to the development server. 
Building a crawler plug-in package
- Extract the SDK compressed file.
- Implement the plug-in logic in src/. Ensure that the dependency is written inbuild.gradle.
- Enter gradle packageCrawlerPluginto create the plug-in package. The package is generated asbuild/distributed/wd-crawler-plugin-sample.zip.