Adding content with the Data Crawler

The Data Crawler is no longer supported or available for download beginning 17 April 2019. This content is provided for existing installations only. See Connecting to Data Sources for other available connectivity options.

The data crawler lets you automate the upload of content to the Discovery Service.

Crawling data with the Data Crawler

The Data Crawler is a command line tool that helps you take your documents from the repositories where they reside (for example: file shares, databases) and push them to the cloud, to be used by Discovery.

When to use the Data Crawler

Use the Data Crawler if you want to have a managed upload of a significant number of files from a remote system or if you want to extract content from a supported repository, such as a DB2 database.

The Data Crawler is not intended to be a solution for uploading files from your local drive. If you upload files from a local drive, use the tooling or direct API calls. Another option to upload large numbers of files into Discovery is discovery-files on GitHub.

Using the Data Crawler

Configure Discovery.
Download and install the Data Crawler on a supported Linux system that has access to the content that you want to crawl.
Connect the Data Crawler to your content.
Configure the Data Crawler to connect to the Discovery Service.
Crawl your content.

You can get started quickly with the Data Crawler by following the example in Getting started with the Data Crawler.