In an upcoming release, the bundled JVM for the crawler plug-in and customer connector features will be transitioned to IBM Semeru Runtimes, Version 21. If your crawler plug-in or custom connectors utilize any features that are incompatible between IBM SDK, Java Technology Edition, Version 8 and IBM Semeru Runtimes, Version 21, you need to revise your code to ensure compatibility with future releases, such as Discovery 5.2.x and later.

For JVM migration, see the following pages:

Building a custom crawler plug-in

Discovery features the option to build your own crawler plug-in with a Java SDK. By using crawler plug-ins, you can now quickly develop relevant solutions for your use cases. You can download the SDK from your installed Discovery cluster. For more information, see Obtaining the crawler plug-in SDK package.

IBM Cloud Pak for Data IBM Software Hub

This information applies only to installed deployments.

Any custom code that you use with IBM Watson® Discovery is the responsibility of the developer; IBM Support does not cover any custom code that the developer creates.

The crawler plug-ins support the following functions:

Update the metadata list of a crawled document
Update the content of a crawled document
Exclude a crawled document
Reference crawler configurations, masking password values
Show notice messages in the Discovery user interface
Output log messages to the crawler pod console

However, the crawler plug-ins cannot support the following functions:

Split a crawled document into multiple documents
Combine content from multiple documents into a single document
Modify access control lists

Crawler plug-in requirements

Make sure that the following items are installed on the development server that you plan to use to develop a crawler plug-in by using this SDK:

Java SE Development Kit (JDK) 1.8 or higher
Gradle
cURL
sed (stream editor)

Obtaining the crawler plug-in SDK package

Enter the following command to obtain your crawler pod name:

oc get pods | grep crawler

The following example shows sample output.

wd-discovery-crawler-57985fc5cf-rxk89     1/1     Running     0          85m

Enter the following command to obtain the SDK package name, replacing {crawler-pod-name} with the crawler pod name that you obtained in step 2:

oc exec {crawler-pod-name} -- ls -l /opt/ibm/wex/zing/resources/ | grep wd-crawler-plugin-sdk

The following example shows sample output.

-rw-r--r--. 1 dadmin dadmin 35575 Oct  1 16:51 wd-crawler-plugin-sdk-${build-version}.zip

Enter the following command to copy the SDK package to the host server, replacing {build-version} with the build version number from the previous step:
```
oc cp {crawler-pod-name}:/opt/ibm/wex/zing/resources/wd-crawler-plugin-sdk-${build-version}.zip wd-crawler-plugin-sdk.zip
```
If necessary, copy the SDK package to the development server.

Building a crawler plug-in package

Extract the SDK compressed file.
Implement the plug-in logic in src/. Ensure that the dependency is written in build.gradle.
Enter gradle packageCrawlerPlugin to create the plug-in package. The package is generated as build/distributed/wd-crawler-plugin-sample.zip.