Introduction
IBM Watson™ Discovery v1 is a cognitive search and content analytics engine that you can add to applications to identify patterns, trends and actionable insights to drive better decision-making. Securely unify structured and unstructured data with pre-enriched content, and use a simplified query language to eliminate the need for manual filtering of results.
This documentation describes Java SDK major version 9. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Node SDK major version 6. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Python SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Ruby SDK major version 2. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes .NET Standard SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Go SDK major version 2. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Swift SDK major version 4. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Unity SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.
Discovery v1 is deprecated. As of 11 July 2023, you cannot create new instances. Existing Advanced plan instances are supported until 11 July 2023. Any instances that still exist on that date will be deleted. For more information about Discovery v2, see the v2 API. For more information about how to migrate to Discovery v2, see Getting the most from Discovery.
Discovery v1 is deprecated. As of 11 July 2023, you cannot create new instances. Existing Advanced plan instances are supported until 11 July 2023. Any instances that still exist on that date will be deleted. For more information about Discovery v2, see the v2 API. For more information about how to migrate to Discovery v2, see Getting the most from Discovery.
Discovery v1 is deprecated. As of 11 July 2023, you cannot create new instances. Existing Advanced plan instances are supported until 11 July 2023. Any instances that still exist on that date will be deleted. For more information about Discovery v2, see the v2 API. For more information about how to migrate to Discovery v2, see Getting the most from Discovery.
Discovery v1 is deprecated. As of 11 July 2023, you cannot create new instances. Existing Advanced plan instances are supported until 11 July 2023. Any instances that still exist on that date will be deleted. For more information about Discovery v2, see the v2 API. For more information about how to migrate to Discovery v2, see Getting the most from Discovery.
Discovery v1 is deprecated. As of 11 July 2023, you cannot create new instances. Existing Advanced plan instances are supported until 11 July 2023. Any instances that still exist on that date will be deleted. For more information about Discovery v2, see the v2 API. For more information about how to migrate to Discovery v2, see Getting the most from Discovery.
The IBM Watson Unity SDK has the following requirements.
- The SDK requires Unity version 2018.2 or later to support Transport Layer Security (TLS) 1.2.
- Set the project settings for both the Scripting Runtime Version and the Api Compatibility Level to
.NET 4.x Equivalent
. - For more information, see TLS 1.0 support.
- Set the project settings for both the Scripting Runtime Version and the Api Compatibility Level to
- The SDK doesn't support the WebGL projects. Change your build settings to any platform except
WebGL
.
For more information about how to install and configure the SDK and SDK Core, see https://github.com/watson-developer-cloud/unity-sdk.
The code examples on this tab use the client library that is provided for Java.
Maven
<dependency>
<groupId>com.ibm.watson</groupId>
<artifactId>ibm-watson</artifactId>
<version>11.0.0</version>
</dependency>
Gradle
compile 'com.ibm.watson:ibm-watson:11.0.0'
GitHub
The code examples on this tab use the client library that is provided for Node.js.
Installation
npm install ibm-watson@^8.0.0
GitHub
The code examples on this tab use the client library that is provided for Python.
Installation
pip install --upgrade "ibm-watson>=7.0.0"
GitHub
The code examples on this tab use the client library that is provided for Ruby.
Installation
gem install ibm_watson
GitHub
The code examples on this tab use the client library that is provided for Go.
go get -u github.com/watson-developer-cloud/go-sdk/v2@v3.0.0
GitHub
The code examples on this tab use the client library that is provided for Swift.
Cocoapods
pod 'IBMWatsonDiscoveryV1', '~> 5.0.0'
Carthage
github "watson-developer-cloud/swift-sdk" ~> 5.0.0
Swift Package Manager
.package(url: "https://github.com/watson-developer-cloud/swift-sdk", from: "5.0.0")
GitHub
The code examples on this tab use the client library that is provided for .NET Standard.
Package Manager
Install-Package IBM.Watson.Discovery.v1 -Version 7.0.0
.NET CLI
dotnet add package IBM.Watson.Discovery.v1 --version 7.0.0
PackageReference
<PackageReference Include="IBM.Watson.Discovery.v1" Version="7.0.0" />
GitHub
The code examples on this tab use the client library that is provided for Unity.
GitHub
IBM Cloud URLs
The base URLs come from the service instance. To find the URL, view the service credentials by clicking the name of the service in the Resource list. Use the value of the URL. Add the method to form the complete API endpoint for your request.
The following example URL represents a Discovery instance that is hosted in Washington DC:
https://api.us-east.discovery.watson.cloud.ibm.com/instances/6bbda3b3-d572-45e1-8c54-22d6ed9e52c2
The following URLs represent the base URLs for Discovery. When you call the API, use the URL that corresponds to the location of your service instance.
- Dallas:
https://api.us-south.discovery.watson.cloud.ibm.com
- Washington DC:
https://api.us-east.discovery.watson.cloud.ibm.com
- Frankfurt:
https://api.eu-de.discovery.watson.cloud.ibm.com
- Sydney:
https://api.au-syd.discovery.watson.cloud.ibm.com
- Tokyo:
https://api.jp-tok.discovery.watson.cloud.ibm.com
- London:
https://api.eu-gb.discovery.watson.cloud.ibm.com
- Seoul:
https://api.kr-seo.discovery.watson.cloud.ibm.com
Set the correct service URL by calling the setServiceUrl()
method of the service instance.
Set the correct service URL by specifying the serviceUrl
parameter when you create the service instance.
Set the correct service URL by calling the set_service_url()
method of the service instance.
Set the correct service URL by specifying the service_url
property of the service instance.
Set the correct service URL by calling the SetServiceURL()
method of the service instance.
Set the correct service URL by setting the serviceURL
property of the service instance.
Set the correct service URL by calling the SetServiceUrl()
method of the service instance.
Set the correct service URL by calling the SetServiceUrl()
method of the service instance.
Dallas API endpoint example for services managed on IBM Cloud
curl -X {request_method} -u "apikey:{apikey}" "https://api.us-south.discovery.watson.cloud.ibm.com/instances/{instance_id}"
Your service instance might not use this URL
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
Discovery discovery = new Discovery("{version}", authenticator);
discovery.setServiceUrl("https://api.us-east.discovery.watson.cloud.ibm.com");
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
const DiscoveryV1 = require('ibm-watson/discovery/v1');
const { IamAuthenticator } = require('ibm-watson/auth');
const discovery = new DiscoveryV1({
version: '{version}',
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: 'https://api.us-east.discovery.watson.cloud.ibm.com',
});
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
from ibm_watson import DiscoveryV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('{apikey}')
discovery = DiscoveryV1(
version='{version}',
authenticator=authenticator
)
discovery.set_service_url('https://api.us-east.discovery.watson.cloud.ibm.com')
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
require "ibm_watson/authenticators"
require "ibm_watson/discovery_v1"
include IBMWatson
authenticator = Authenticators::IamAuthenticator.new(
apikey: "{apikey}"
)
discovery = DiscoveryV1.new(
version: "{version}",
authenticator: authenticator
)
discovery.service_url = "https://api.us-east.discovery.watson.cloud.ibm.com"
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
discovery, discoveryErr := discoveryv1.NewDiscoveryV1(options)
if discoveryErr != nil {
panic(discoveryErr)
}
discovery.SetServiceURL("https://api.us-east.discovery.watson.cloud.ibm.com")
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let discovery = Discovery(version: "{version}", authenticator: authenticator)
discovery.serviceURL = "https://api.us-east.discovery.watson.cloud.ibm.com"
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("https://api.us-east.discovery.watson.cloud.ibm.com");
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("https://api.us-east.discovery.watson.cloud.ibm.com");
Disabling SSL verification
All Watson services use Secure Sockets Layer (SSL) (or Transport Layer Security (TLS)) for secure connections between the client and server. The connection is verified against the local certificate store to ensure authentication, integrity, and confidentiality.
If you use a self-signed certificate, you need to disable SSL verification to make a successful connection.
Enabling SSL verification is highly recommended. Disabling SSL jeopardizes the security of the connection and data. Disable SSL only if necessary, and take steps to enable SSL as soon as possible.
To disable SSL verification for a curl request, use the --insecure
(-k
) option with the request.
To disable SSL verification, create an HttpConfigOptions
object and set the disableSslVerification
property to true
. Then, pass the object to the service instance by using the configureClient
method.
To disable SSL verification, set the disableSslVerification
parameter to true
when you create the service instance.
To disable SSL verification, specify True
on the set_disable_ssl_verification
method for the service instance.
To disable SSL verification, set the disable_ssl_verification
parameter to true
in the configure_http_client()
method for the service instance.
To disable SSL verification, call the DisableSSLVerification
method on the service instance.
To disable SSL verification, call the disableSSLVerification()
method on the service instance. You cannot disable SSL verification on Linux.
To disable SSL verification, set the DisableSslVerification
method to true
on the service instance.
To disable SSL verification, set the DisableSslVerification
method to true
on the service instance.
Example to disable SSL verification. Replace {apikey}
and {url}
with your service credentials.
curl -k -X {request_method} -u "apikey:{apikey}" "{url}/{method}"
Example to disable SSL verification
IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
Discovery discovery = new Discovery("{version}", authenticator);
discovery.setServiceUrl("{url}");
HttpConfigOptions configOptions = new HttpConfigOptions.Builder()
.disableSslVerification(true)
.build();
discovery.configureClient(configOptions);
Example to disable SSL verification
const DiscoveryV1 = require('ibm-watson/discovery/v1');
const { IamAuthenticator } = require('ibm-watson/auth');
const discovery = new DiscoveryV1({
version: '{version}',
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: '{url}',
disableSslVerification: true,
});
Example to disable SSL verification
from ibm_watson import DiscoveryV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('{apikey}')
discovery = DiscoveryV1(
version='{version}',
authenticator=authenticator
)
discovery.set_service_url('{url}')
discovery.set_disable_ssl_verification(True)
Example to disable SSL verification
require "ibm_watson/authenticators"
require "ibm_watson/discovery_v1"
include IBMWatson
authenticator = Authenticators::IamAuthenticator.new(
apikey: "{apikey}"
)
discovery = DiscoveryV1.new(
version: "{version}",
authenticator: authenticator
)
discovery.service_url = "{url}"
discovery.configure_http_client(disable_ssl_verification: true)
Example to disable SSL verification
discovery, discoveryErr := discoveryv1.NewDiscoveryV1(options)
if discoveryErr != nil {
panic(discoveryErr)
}
discovery.SetServiceURL("{url}")
discovery.DisableSSLVerification()
Example to disable SSL verification
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let discovery = Discovery(version: "{version}", authenticator: authenticator)
discovery.serviceURL = "{url}"
discovery.disableSSLVerification()
Example to disable SSL verification
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.DisableSslVerification(true);
Example to disable SSL verification
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.DisableSslVerification = true;
Authentication
You authenticate to the API by using IBM Cloud Identity and Access Management (IAM).
You can pass either a bearer token in an authorization header or an API key. Tokens support authenticated requests without embedding service credentials in every call. API keys use basic authentication. For more information, see Authenticating to Watson services.
- For testing and development, you can pass an API key directly.
- For production use, unless you use the Watson SDKs, use an IAM token.
If you pass in an API key, use apikey
for the username and the value of the API key as the password. For example, if the API key is f5sAznhrKQyvBFFaZbtF60m5tzLbqWhyALQawBg5TjRI
in the service credentials, include the credentials in your call like this:
curl -u "apikey:f5sAznhrKQyvBFFaZbtF60m5tzLbqWhyALQawBg5TjRI"
For IBM Cloud instances, the SDK provides initialization methods for each form of authentication.
- Use the API key to have the SDK manage the lifecycle of the access token. The SDK requests an access token, ensures that the access token is valid, and refreshes it if necessary.
- Use the access token to manage the lifecycle yourself. You must periodically refresh the token.
For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.
Replace {apikey}
and {url}
with your service credentials.
curl -X {request_method} -u "apikey:{apikey}" "{url}/v1/{method}"
SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
Discovery discovery = new Discovery("{version}", authenticator);
discovery.setServiceUrl("{url}");
SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
const DiscoveryV1 = require('ibm-watson/discovery/v1');
const { IamAuthenticator } = require('ibm-watson/auth');
const discovery = new DiscoveryV1({
version: '{version}',
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: '{url}',
});
SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
from ibm_watson import DiscoveryV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('{apikey}')
discovery = DiscoveryV1(
version='{version}',
authenticator=authenticator
)
discovery.set_service_url('{url}')
SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
require "ibm_watson/authenticators"
require "ibm_watson/discovery_v1"
include IBMWatson
authenticator = Authenticators::IamAuthenticator.new(
apikey: "{apikey}"
)
discovery = DiscoveryV1.new(
version: "{version}",
authenticator: authenticator
)
discovery.service_url = "{url}"
SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
import (
"github.com/IBM/go-sdk-core/core"
"github.com/watson-developer-cloud/go-sdk/discoveryv1"
)
func main() {
authenticator := &core.IamAuthenticator{
ApiKey: "{apikey}",
}
options := &discoveryv1.DiscoveryV1Options{
Version: "{version}",
Authenticator: authenticator,
}
discovery, discoveryErr := discoveryv1.NewDiscoveryV1(options)
if discoveryErr != nil {
panic(discoveryErr)
}
discovery.SetServiceURL("{url}")
}
SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let discovery = Discovery(version: "{version}", authenticator: authenticator)
discovery.serviceURL = "{url}"
SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
Access between services
Your application might use more than one Watson service. You can grant access between services and you can grant access to more than one service for your applications.
For IBM Cloud services, the method to grant access between Watson services varies depending on the type of API key. For more information, see IAM access.
- To grant access between IBM Cloud services, create an authorization between the services. For more information, see Granting access between services.
- To grant access to your services by applications without using user credentials, create a service ID, add an API key, and assign access policies. For more information, see Creating and working with service IDs.
When you give a user ID access to multiple services, use an endpoint URL that includes the service instance ID (for example, https://api.us-south.discovery.watson.cloud.ibm.com/instances/6bbda3b3-d572-45e1-8c54-22d6ed9e52c2
). You can find the instance ID in two places:
-
By clicking the service instance row in the Resource list. The instance ID is the GUID in the details pane.
-
By clicking the name of the service instance in the list and looking at the credentials URL.
If you don't see the instance ID in the URL, the credentials predate service IDs. Add new credentials from the Service credentials page and use those credentials.
Versioning
API requests require a version parameter that takes a date in the format version=YYYY-MM-DD
. When the API is updated with any breaking changes, the service introduces a new version date for the API.
Send the version parameter with every API request. The service uses the API version for the date you specify, or the most recent version before that date. Don't default to the current date. Instead, specify a date that matches a version that is compatible with your app, and don't change it until your app is ready for a later version.
Specify the version to use on API requests with the version parameter when you create the service instance. The service uses the API version for the date you specify, or the most recent version before that date. Don't default to the current date. Instead, specify a date that matches a version that is compatible with your app, and don't change it until your app is ready for a later version.
This documentation describes the current version of Discovery, 2019-04-30
. In some cases, differences in earlier versions are noted in the descriptions of parameters and response models.
Error handling
Discovery uses standard HTTP response codes to indicate whether a method completed successfully. HTTP response codes in the 2xx range indicate success. A response in the 4xx range is some sort of failure, and a response in the 5xx range usually indicates an internal system error that cannot be resolved by the user. Response codes are listed with the method.
ErrorResponse
Name | Description |
---|---|
code integer |
The HTTP response code. |
error string |
General description of an error. |
The Java SDK generates an exception for any unsuccessful method invocation. All methods that accept an argument can also throw an IllegalArgumentException
.
Exception | Description |
---|---|
IllegalArgumentException | An invalid argument was passed to the method. |
When the Java SDK receives an error response from the Discovery service, it generates an exception from the com.ibm.watson.developer_cloud.service.exception
package. All service exceptions contain the following fields.
Field | Description |
---|---|
statusCode | The HTTP response code that is returned. |
message | A message that describes the error. |
When the Node SDK receives an error response from the Discovery service, it creates an Error
object with information that describes the error that occurred. This error object is passed as the first parameter to the callback function for the method. The contents of the error object are as shown in the following table.
Error
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
The Python SDK generates an exception for any unsuccessful method invocation. When the Python SDK receives an error response from the Discovery service, it generates an ApiException
with the following fields.
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
info | A dictionary of additional information about the error. |
When the Ruby SDK receives an error response from the Discovery service, it generates an ApiException
with the following fields.
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
info | A dictionary of additional information about the error. |
The Go SDK generates an error for any unsuccessful service instantiation and method invocation. You can check for the error immediately. The contents of the error object are as shown in the following table.
Error
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
The Swift SDK returns a WatsonError
in the completionHandler
any unsuccessful method invocation. This error type is an enum that conforms to LocalizedError
and contains an errorDescription
property that returns an error message. Some of the WatsonError
cases contain associated values that reveal more information about the error.
Field | Description |
---|---|
errorDescription | A message that describes the error. |
When the .NET Standard SDK receives an error response from the Discovery service, it generates a ServiceResponseException
with the following fields.
Field | Description |
---|---|
Message | A message that describes the error. |
CodeDescription | The HTTP response code that is returned. |
When the Unity SDK receives an error response from the Discovery service, it generates an IBMError
with the following fields.
Field | Description |
---|---|
Url | The URL that generated the error. |
StatusCode | The HTTP response code returned. |
ErrorMessage | A message that describes the error. |
Response | The contents of the response from the server. |
ResponseHeaders | A dictionary of headers returned by the request. |
Example error handling
try {
// Invoke a method
} catch (NotFoundException e) {
// Handle Not Found (404) exception
} catch (RequestTooLargeException e) {
// Handle Request Too Large (413) exception
} catch (ServiceResponseException e) {
// Base class for all exceptions caused by error responses from the service
System.out.println("Service returned status code "
+ e.getStatusCode() + ": " + e.getMessage());
}
Example error handling
discovery.method(params)
.catch(err => {
console.log('error:', err);
});
Example error handling
from ibm_watson import ApiException
try:
# Invoke a method
except ApiException as ex:
print "Method failed with status code " + str(ex.code) + ": " + ex.message
Example error handling
require "ibm_watson"
begin
# Invoke a method
rescue IBMWatson::ApiException => ex
print "Method failed with status code #{ex.code}: #{ex.error}"
end
Example error handling
import "github.com/watson-developer-cloud/go-sdk/discoveryv1"
// Instantiate a service
discovery, discoveryErr := discoveryv1.NewDiscoveryV1(options)
// Check for errors
if discoveryErr != nil {
panic(discoveryErr)
}
// Call a method
result, _, responseErr := discovery.MethodName(&methodOptions)
// Check for errors
if responseErr != nil {
panic(responseErr)
}
Example error handling
discovery.method() {
response, error in
if let error = error {
switch error {
case let .http(statusCode, message, metadata):
switch statusCode {
case .some(404):
// Handle Not Found (404) exception
print("Not found")
case .some(413):
// Handle Request Too Large (413) exception
print("Payload too large")
default:
if let statusCode = statusCode {
print("Error - code: \(statusCode), \(message ?? "")")
}
}
default:
print(error.localizedDescription)
}
return
}
guard let result = response?.result else {
print(error?.localizedDescription ?? "unknown error")
return
}
print(result)
}
Example error handling
try
{
// Invoke a method
}
catch(ServiceResponseException e)
{
Console.WriteLine("Error: " + e.Message);
}
catch (Exception e)
{
Console.WriteLine("Error: " + e.Message);
}
Example error handling
// Invoke a method
discovery.MethodName(Callback, Parameters);
// Check for errors
private void Callback(DetailedResponse<ExampleResponse> response, IBMError error)
{
if (error == null)
{
Log.Debug("ExampleCallback", "Response received: {0}", response.Response);
}
else
{
Log.Debug("ExampleCallback", "Error received: {0}, {1}, {3}", error.StatusCode, error.ErrorMessage, error.Response);
}
}
Additional headers
Some Watson services accept special parameters in headers that are passed with the request.
You can pass request header parameters in all requests or in a single request to the service.
To pass a request header, use the --header
(-H
) option with a curl request.
To pass header parameters with every request, use the setDefaultHeaders
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, use the addHeader
method as a modifier on the request before you execute it.
To pass header parameters with every request, specify the headers
parameter when you create the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, use the headers
method as a modifier on the request before you execute it.
To pass header parameters with every request, specify the set_default_headers
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, include headers
as a dict
in the request.
To pass header parameters with every request, specify the add_default_headers
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, specify the headers
method as a chainable method in the request.
To pass header parameters with every request, specify the SetDefaultHeaders
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, specify the Headers
as a map
in the request.
To pass header parameters with every request, add them to the defaultHeaders
property of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, pass the headers
parameter to the request method.
To pass header parameters in a single request, use the WithHeader()
method as a modifier on the request before you execute it. See Data collection for an example use of this method.
To pass header parameters in a single request, use the WithHeader()
method as a modifier on the request before you execute it.
Example header parameter in a request
curl -X {request_method} -H "Request-Header: {header_value}" "{url}/v1/{method}"
Example header parameter in a request
ReturnType returnValue = discovery.methodName(parameters)
.addHeader("Custom-Header", "{header_value}")
.execute();
Example header parameter in a request
const parameters = {
{parameters}
};
discovery.methodName(
parameters,
headers: {
'Custom-Header': '{header_value}'
})
.then(result => {
console.log(response);
})
.catch(err => {
console.log('error:', err);
});
Example header parameter in a request
response = discovery.methodName(
parameters,
headers = {
'Custom-Header': '{header_value}'
})
Example header parameter in a request
response = discovery.headers(
"Custom-Header" => "{header_value}"
).methodName(parameters)
Example header parameter in a request
result, _, responseErr := discovery.MethodName(
&methodOptions{
Headers: map[string]string{
"Accept": "application/json",
},
},
)
Example header parameter in a request
let customHeader: [String: String] = ["Custom-Header": "{header_value}"]
discovery.methodName(parameters, headers: customHeader) {
response, error in
}
Example header parameter in a request
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.WithHeader("Custom-Header", "header_value");
Example header parameter in a request
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.WithHeader("Custom-Header", "header_value");
Response details
The Discovery service might return information to the application in response headers.
To access all response headers that the service returns, include the --include
(-i
) option with a curl request. To see detailed response data for the request, including request headers, response headers, and extra debugging information, include the --verbose
(-v
) option with the request.
Example request to access response headers
curl -X {request_method} {authentication_method} --include "{url}/v1/{method}"
To access information in the response headers, use one of the request methods that returns details with the response: executeWithDetails()
, enqueueWithDetails()
, or rxWithDetails()
. These methods return a Response<T>
object, where T
is the expected response model. Use the getResult()
method to access the response object for the method, and use the getHeaders()
method to access information in response headers.
Example request to access response headers
Response<ReturnType> response = discovery.methodName(parameters)
.executeWithDetails();
// Access response from methodName
ReturnType returnValue = response.getResult();
// Access information in response headers
Headers responseHeaders = response.getHeaders();
All response data is available in the Response<T>
object that is returned by each method. To access information in the response
object, use the following properties.
Property | Description |
---|---|
result |
Returns the response for the service-specific method. |
headers |
Returns the response header information. |
status |
Returns the HTTP status code. |
Example request to access response headers
discovery.methodName(parameters)
.then(response => {
console.log(response.headers);
})
.catch(err => {
console.log('error:', err);
});
The return value from all service methods is a DetailedResponse
object. To access information in the result object or response headers, use the following methods.
DetailedResponse
Method | Description |
---|---|
get_result() |
Returns the response for the service-specific method. |
get_headers() |
Returns the response header information. |
get_status_code() |
Returns the HTTP status code. |
Example request to access response headers
discovery.set_detailed_response(True)
response = discovery.methodName(parameters)
# Access response from methodName
print(json.dumps(response.get_result(), indent=2))
# Access information in response headers
print(response.get_headers())
# Access HTTP response status
print(response.get_status_code())
The return value from all service methods is a DetailedResponse
object. To access information in the response
object, use the following properties.
DetailedResponse
Property | Description |
---|---|
result |
Returns the response for the service-specific method. |
headers |
Returns the response header information. |
status |
Returns the HTTP status code. |
Example request to access response headers
response = discovery.methodName(parameters)
# Access response from methodName
print response.result
# Access information in response headers
print response.headers
# Access HTTP response status
print response.status
The return value from all service methods is a DetailedResponse
object. To access information in the response
object or response headers, use the following methods.
DetailedResponse
Method | Description |
---|---|
GetResult() |
Returns the response for the service-specific method. |
GetHeaders() |
Returns the response header information. |
GetStatusCode() |
Returns the HTTP status code. |
Example request to access response headers
import (
"github.com/IBM/go-sdk-core/core"
"github.com/watson-developer-cloud/go-sdk/discoveryv1"
)
result, response, responseErr := discovery.MethodName(
&methodOptions{})
// Access result
core.PrettyPrint(response.GetResult(), "Result ")
// Access response headers
core.PrettyPrint(response.GetHeaders(), "Headers ")
// Access status code
core.PrettyPrint(response.GetStatusCode(), "Status Code ")
All response data is available in the WatsonResponse<T>
object that is returned in each method's completionHandler
.
Example request to access response headers
discovery.methodName(parameters) {
response, error in
guard let result = response?.result else {
print(error?.localizedDescription ?? "unknown error")
return
}
print(result) // The data returned by the service
print(response?.statusCode)
print(response?.headers)
}
The response contains fields for response headers, response JSON, and the status code.
DetailedResponse
Property | Description |
---|---|
Result |
Returns the result for the service-specific method. |
Response |
Returns the raw JSON response for the service-specific method. |
Headers |
Returns the response header information. |
StatusCode |
Returns the HTTP status code. |
Example request to access response headers
var results = discovery.MethodName(parameters);
var result = results.Result; // The result object
var responseHeaders = results.Headers; // The response headers
var responseJson = results.Response; // The raw response JSON
var statusCode = results.StatusCode; // The response status code
The response contains fields for response headers, response JSON, and the status code.
DetailedResponse
Property | Description |
---|---|
Result |
Returns the result for the service-specific method. |
Response |
Returns the raw JSON response for the service-specific method. |
Headers |
Returns the response header information. |
StatusCode |
Returns the HTTP status code. |
Example request to access response headers
private void Example()
{
discovery.MethodName(Callback, Parameters);
}
private void Callback(DetailedResponse<ResponseType> response, IBMError error)
{
var result = response.Result; // The result object
var responseHeaders = response.Headers; // The response headers
var responseJson = reresponsesults.Response; // The raw response JSON
var statusCode = response.StatusCode; // The response status code
}
Data labels
You can remove data associated with a specific customer if you label the data with a customer ID when you send a request to the service.
-
Use the
X-Watson-Metadata
header to associate a customer ID with the data. By adding a customer ID to a request, you indicate that it contains data that belongs to that customer.Specify a random or generic string for the customer ID. Do not include personal data, such as an email address. Pass the string
customer_id={id}
as the argument of the header.Labeling data is used only by methods that accept customer data.
-
Use the Delete labeled data method to remove data that is associated with a customer ID.
Use this process of labeling and deleting data only when you want to remove the data that is associated with a single customer, not when you want to remove data for multiple customers. For more information about Discovery and labeling data, see Information security.
For more information about how to pass headers, see Additional headers.
Data collection
By default, Discovery service instances that are not part of Premium plans collect data about API requests and their results. This data is collected only to improve the services for future users. The collected data is not shared or made public. Data is not collected for services that are part of Premium plans.
To prevent IBM usage of your data for an API request, set the X-Watson-Learning-Opt-Out header parameter to true
. You can also disable request logging at the account level. For more information, see Controlling request logging for Watson services.
You must set the header on each request that you do not want IBM to access for general service improvements.
You can set the header by using the setDefaultHeaders
method of the service object.
You can set the header by using the headers
parameter when you create the service object.
You can set the header by using the set_default_headers
method of the service object.
You can set the header by using the add_default_headers
method of the service object.
You can set the header by using the SetDefaultHeaders
method of the service object.
You can set the header by adding it to the defaultHeaders
property of the service object.
You can set the header by using the WithHeader()
method of the service object.
Example request
curl -u "apikey:{apikey}" -H "X-Watson-Learning-Opt-Out: true" "{url}/{method}"
Example request
Map<String, String> headers = new HashMap<String, String>();
headers.put("X-Watson-Learning-Opt-Out", "true");
discovery.setDefaultHeaders(headers);
Example request
const DiscoveryV1 = require('ibm-watson/discovery/v1');
const { IamAuthenticator } = require('ibm-watson/auth');
const discovery = new DiscoveryV1({
version: '{version}',
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: '{url}',
headers: {
'X-Watson-Learning-Opt-Out': 'true'
}
});
Example request
discovery.set_default_headers({'x-watson-learning-opt-out': "true"})
Example request
discovery.add_default_headers(headers: {"x-watson-learning-opt-out" => "true"})
Example request
import "net/http"
headers := http.Header{}
headers.Add("x-watson-learning-opt-out", "true")
discovery.SetDefaultHeaders(headers)
Example request
discovery.defaultHeaders["X-Watson-Learning-Opt-Out"] = "true"
Example request
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.WithHeader("X-Watson-Learning-Opt-Out", "true");
Example request
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.WithHeader("X-Watson-Learning-Opt-Out", "true");
Synchronous and asynchronous requests
The Java SDK supports both synchronous (blocking) and asynchronous (non-blocking) execution of service methods. All service methods implement the ServiceCall interface.
- To call a method synchronously, use the
execute
method of theServiceCall
interface. You can call theexecute
method directly from an instance of the service. - To call a method asynchronously, use the
enqueue
method of theServiceCall
interface to receive a callback when the response arrives. The ServiceCallback interface of the method's argument providesonResponse
andonFailure
methods that you override to handle the callback.
The Ruby SDK supports both synchronous (blocking) and asynchronous (non-blocking) execution of service methods. All service methods implement the Concurrent::Async module. When you use the synchronous or asynchronous methods, an IVar object is returned. You access the DetailedResponse
object by calling ivar_object.value
.
For more information about the Ivar object, see the IVar class docs.
-
To call a method synchronously, either call the method directly or use the
.await
chainable method of theConcurrent::Async
module.Calling a method directly (without
.await
) returns aDetailedResponse
object. -
To call a method asynchronously, use the
.async
chainable method of theConcurrent::Async
module.
You can call the .await
and .async
methods directly from an instance of the service.
Example synchronous request
ReturnType returnValue = discovery.method(parameters).execute();
Example asynchronous request
discovery.method(parameters).enqueue(new ServiceCallback<ReturnType>() {
@Override public void onResponse(ReturnType response) {
. . .
}
@Override public void onFailure(Exception e) {
. . .
}
});
Example synchronous request
response = discovery.method_name(parameters)
or
response = discovery.await.method_name(parameters)
Example asynchronous request
response = discovery.async.method_name(parameters)
Methods
Create an environment
Creates a new environment for private data. An environment must be created before collections can be created.
Note: You can create only one environment for private data per service instance. An attempt to create another environment results in an error.
Creates a new environment for private data. An environment must be created before collections can be created.
Note: You can create only one environment for private data per service instance. An attempt to create another environment results in an error.
Creates a new environment for private data. An environment must be created before collections can be created.
Note: You can create only one environment for private data per service instance. An attempt to create another environment results in an error.
Creates a new environment for private data. An environment must be created before collections can be created.
Note: You can create only one environment for private data per service instance. An attempt to create another environment results in an error.
Creates a new environment for private data. An environment must be created before collections can be created.
Note: You can create only one environment for private data per service instance. An attempt to create another environment results in an error.
POST /v1/environments
ServiceCall<Environment> createEnvironment(CreateEnvironmentOptions createEnvironmentOptions)
createEnvironment(params)
create_environment(
self,
name: str,
*,
description: str = None,
size: str = None,
**kwargs,
) -> DetailedResponse
CreateEnvironment(string name, string description = null, string size = null)
Request
Use the CreateEnvironmentOptions.Builder
to create a CreateEnvironmentOptions
object that contains the parameter values for the createEnvironment
method.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2019-04-30
.
An object that defines an environment name and optional description. The fields in this object are not approved for personal information and cannot be deleted based on customer ID.
{
"name": "Example Environment",
"description": "Description of Environment."
}
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Default:
Size of the environment. In the Lite plan the default and only accepted value is
LT
, in all other plans the default isS
Allowable values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]
The createEnvironment options.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Default:
Size of the environment. In the Lite plan the default and only accepted value is
LT
, in all other plans the default isS
.Allowable values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]
parameters
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Default:
Size of the environment. In the Lite plan the default and only accepted value is
LT
, in all other plans the default isS
.Allowable values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]
parameters
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Default:
Size of the environment. In the Lite plan the default and only accepted value is
LT
, in all other plans the default isS
.Allowable values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]
parameters
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Default:
Size of the environment. In the Lite plan the default and only accepted value is
LT
, in all other plans the default isS
.Allowable values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]
curl -X POST -u "apikey":"{apikey}" -H "Content-Type: application/json" -d '{ "name": "my_environment", "description": "My environment" }' "{url}/v1/environments?version=2019-04-30"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); DiscoveryService discovery = new DiscoveryService("2019-04-30", authenticator); discovery.SetServiceUrl("{url}"); var result = discovery.CreateEnvironment( name: "my_environment", description: "My environment" ); Console.WriteLine(result.Response);
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); Discovery discovery = new Discovery("2019-04-30", authenticator); discovery.setServiceUrl("{url}"); String environmentName = "my_environment"; String environmentDesc = "My environment"; CreateEnvironmentOptions.Builder createOptionsBuilder = new CreateEnvironmentOptions.Builder(environmentName); createOptionsBuilder.description(environmentDesc); Environment createResponse = discovery.createEnvironment(createOptionsBuilder.build()).execute().getResult();
const DiscoveryV1 = require('ibm-watson/discovery/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV1({ version: '2019-04-30', authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); const createEnvironmentParams = { name: 'my_environment', description: 'My environment', size: 'LT', }; discovery.createEnvironment(createEnvironmentParams) .then(environment => { console.log(JSON.stringify(environment, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') discovery = DiscoveryV1( version='2019-04-30', authenticator=authenticator ) discovery.set_service_url('{url}') response = discovery.create_environment( name="my_environment", description="My environment" ).get_result() print(json.dumps(response, indent=2))
Response
Details about an environment.
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
Information about the Continuous Relevancy Training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- indexCapacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- diskUsage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- searchStatus
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- index_capacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- disk_usage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- search_status
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- index_capacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- disk_usage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- search_status
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- IndexCapacity
Summary of the document usage statistics for the environment.
- Documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- DiskUsage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- Collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- SearchStatus
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Status Code
Environment successfully added.
Bad request.
{ "environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8", "name": "test_environment", "description": "Test environment", "created": "2016-06-16T10:56:54.957Z", "updated": "2017-05-16T13:56:54.957Z", "status": "active", "read_only": false, "size": "M", "index_capacity": { "documents": { "indexed": 0, "maximum_allowed": 1000000 }, "disk_usage": { "used_bytes": 0, "maximum_allowed_bytes": 85899345920 }, "collections": { "available": 1, "maximum_allowed": 4 } }, "search_status": [ { "scope": "environment", "status": "NO_DATA", "status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy." } ] }
{ "environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8", "name": "test_environment", "description": "Test environment", "created": "2016-06-16T10:56:54.957Z", "updated": "2017-05-16T13:56:54.957Z", "status": "active", "read_only": false, "size": "M", "index_capacity": { "documents": { "indexed": 0, "maximum_allowed": 1000000 }, "disk_usage": { "used_bytes": 0, "maximum_allowed_bytes": 85899345920 }, "collections": { "available": 1, "maximum_allowed": 4 } }, "search_status": [ { "scope": "environment", "status": "NO_DATA", "status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy." } ] }
List environments
List existing environments for the service instance.
List existing environments for the service instance.
List existing environments for the service instance.
List existing environments for the service instance.
List existing environments for the service instance.
GET /v1/environments
ServiceCall<ListEnvironmentsResponse> listEnvironments(ListEnvironmentsOptions listEnvironmentsOptions)
listEnvironments(params)
list_environments(
self,
*,
name: str = None,
**kwargs,
) -> DetailedResponse
ListEnvironments(string name = null)
Request
Use the ListEnvironmentsOptions.Builder
to create a ListEnvironmentsOptions
object that contains the parameter values for the listEnvironments
method.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2019-04-30
.Show only the environment with the given name.
The listEnvironments options.
Show only the environment with the given name.
parameters
Show only the environment with the given name.
parameters
Show only the environment with the given name.
parameters
Show only the environment with the given name.
curl -u "apikey":"{apikey}" "{url}/v1/environments?version=2019-04-30"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); DiscoveryService discovery = new DiscoveryService("2019-04-30", authenticator); discovery.SetServiceUrl("{url}"); var result = discovery.ListEnvironments(); Console.WriteLine(result.Response);
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); Discovery discovery = new Discovery("2019-04-30", authenticator); discovery.setServiceUrl("{url}"); ListEnvironmentsOptions options = new ListEnvironmentsOptions.Builder().build(); ListEnvironmentsResponse listResponse = discovery.listEnvironments(options).execute().getResult();
const DiscoveryV1 = require('ibm-watson/discovery/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV1({ version: '2019-04-30', authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); discovery.listEnvironments() .then(listEnvironmentsResponse => { console.log(JSON.stringify(listEnvironmentsResponse, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') discovery = DiscoveryV1( version='2019-04-30', authenticator=authenticator ) discovery.set_service_url('{url}') environments = discovery.list_environments().get_result() print(json.dumps(environments, indent=2)) system_environments = [x for x in environments['environments'] if x['name'] == 'Watson System Environment'] system_environment_id = system_environments[0]['environment_id'] collections = discovery.list_collections(system_environment_id).get_result() system_collections = [x for x in collections['collections']] print(json.dumps(system_collections, indent=2))
Response
Response object containing an array of configured environments.
An array of [environments] that are available for the service instance.
Response object containing an array of configured environments.
{
"environments": [
{
"environment_id": "ecbda78e-fb06-40b1-a43f-a039fac0adc6",
"name": "byod_environment",
"description": "Private Data Environment",
"created": "2017-07-14T12:54:40.985Z",
"updated": "2017-07-14T12:54:40.985Z",
"read_only": false
},
{
"environment_id": "system",
"name": "Watson System Environment",
"description": "Watson System environment",
"created": "2017-07-13T01:14:20.761Z",
"updated": "2017-07-13T01:14:20.761Z",
"read_only": true
}
]
}
An array of [environments] that are available for the service instance.
Examples:{ "environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8", "name": "test_environment", "description": "Test environment", "created": "2016-06-16T10:56:54.957Z", "updated": "2017-05-16T13:56:54.957Z", "status": "active", "read_only": false, "size": "M", "index_capacity": { "documents": { "indexed": 0, "maximum_allowed": 1000000 }, "disk_usage": { "used_bytes": 0, "maximum_allowed_bytes": 85899345920 }, "collections": { "available": 1, "maximum_allowed": 4 } }, "search_status": [ { "scope": "environment", "status": "NO_DATA", "status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy." } ] }
- environments
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- indexCapacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- diskUsage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- searchStatus
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Response object containing an array of configured environments.
{
"environments": [
{
"environment_id": "ecbda78e-fb06-40b1-a43f-a039fac0adc6",
"name": "byod_environment",
"description": "Private Data Environment",
"created": "2017-07-14T12:54:40.985Z",
"updated": "2017-07-14T12:54:40.985Z",
"read_only": false
},
{
"environment_id": "system",
"name": "Watson System Environment",
"description": "Watson System environment",
"created": "2017-07-13T01:14:20.761Z",
"updated": "2017-07-13T01:14:20.761Z",
"read_only": true
}
]
}
An array of [environments] that are available for the service instance.
Examples:{ "environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8", "name": "test_environment", "description": "Test environment", "created": "2016-06-16T10:56:54.957Z", "updated": "2017-05-16T13:56:54.957Z", "status": "active", "read_only": false, "size": "M", "index_capacity": { "documents": { "indexed": 0, "maximum_allowed": 1000000 }, "disk_usage": { "used_bytes": 0, "maximum_allowed_bytes": 85899345920 }, "collections": { "available": 1, "maximum_allowed": 4 } }, "search_status": [ { "scope": "environment", "status": "NO_DATA", "status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy." } ] }
- environments
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- index_capacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- disk_usage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- search_status
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Response object containing an array of configured environments.
{
"environments": [
{
"environment_id": "ecbda78e-fb06-40b1-a43f-a039fac0adc6",
"name": "byod_environment",
"description": "Private Data Environment",
"created": "2017-07-14T12:54:40.985Z",
"updated": "2017-07-14T12:54:40.985Z",
"read_only": false
},
{
"environment_id": "system",
"name": "Watson System Environment",
"description": "Watson System environment",
"created": "2017-07-13T01:14:20.761Z",
"updated": "2017-07-13T01:14:20.761Z",
"read_only": true
}
]
}
An array of [environments] that are available for the service instance.
Examples:{ "environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8", "name": "test_environment", "description": "Test environment", "created": "2016-06-16T10:56:54.957Z", "updated": "2017-05-16T13:56:54.957Z", "status": "active", "read_only": false, "size": "M", "index_capacity": { "documents": { "indexed": 0, "maximum_allowed": 1000000 }, "disk_usage": { "used_bytes": 0, "maximum_allowed_bytes": 85899345920 }, "collections": { "available": 1, "maximum_allowed": 4 } }, "search_status": [ { "scope": "environment", "status": "NO_DATA", "status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy." } ] }
- environments
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- index_capacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- disk_usage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- search_status
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Response object containing an array of configured environments.
{
"environments": [
{
"environment_id": "ecbda78e-fb06-40b1-a43f-a039fac0adc6",
"name": "byod_environment",
"description": "Private Data Environment",
"created": "2017-07-14T12:54:40.985Z",
"updated": "2017-07-14T12:54:40.985Z",
"read_only": false
},
{
"environment_id": "system",
"name": "Watson System Environment",
"description": "Watson System environment",
"created": "2017-07-13T01:14:20.761Z",
"updated": "2017-07-13T01:14:20.761Z",
"read_only": true
}
]
}
An array of [environments] that are available for the service instance.
Examples:{ "environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8", "name": "test_environment", "description": "Test environment", "created": "2016-06-16T10:56:54.957Z", "updated": "2017-05-16T13:56:54.957Z", "status": "active", "read_only": false, "size": "M", "index_capacity": { "documents": { "indexed": 0, "maximum_allowed": 1000000 }, "disk_usage": { "used_bytes": 0, "maximum_allowed_bytes": 85899345920 }, "collections": { "available": 1, "maximum_allowed": 4 } }, "search_status": [ { "scope": "environment", "status": "NO_DATA", "status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy." } ] }
- Environments
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- IndexCapacity
Summary of the document usage statistics for the environment.
- Documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- DiskUsage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- Collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- SearchStatus
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Status Code
Successful response.
Bad request.
{ "environments": [ { "environment_id": "ecbda78e-fb06-40b1-a43f-a039fac0adc6", "name": "byod_environment", "description": "Private Data Environment", "created": "2017-07-14T12:54:40.985Z", "updated": "2017-07-14T12:54:40.985Z", "read_only": false }, { "environment_id": "system", "name": "Watson System Environment", "description": "Watson System environment", "created": "2017-07-13T01:14:20.761Z", "updated": "2017-07-13T01:14:20.761Z", "read_only": true } ] }
{ "environments": [ { "environment_id": "ecbda78e-fb06-40b1-a43f-a039fac0adc6", "name": "byod_environment", "description": "Private Data Environment", "created": "2017-07-14T12:54:40.985Z", "updated": "2017-07-14T12:54:40.985Z", "read_only": false }, { "environment_id": "system", "name": "Watson System Environment", "description": "Watson System environment", "created": "2017-07-13T01:14:20.761Z", "updated": "2017-07-13T01:14:20.761Z", "read_only": true } ] }
Get environment info
GET /v1/environments/{environment_id}
ServiceCall<Environment> getEnvironment(GetEnvironmentOptions getEnvironmentOptions)
getEnvironment(params)
get_environment(
self,
environment_id: str,
**kwargs,
) -> DetailedResponse
GetEnvironment(string environmentId)
Request
Use the GetEnvironmentOptions.Builder
to create a GetEnvironmentOptions
object that contains the parameter values for the getEnvironment
method.
Path Parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2019-04-30
.
The getEnvironment options.
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -u "apikey":"{apikey}" "{url}/v1/environments/{environment_id}?version=2019-04-30"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); DiscoveryService discovery = new DiscoveryService("2019-04-30", authenticator); discovery.SetServiceUrl("{url}"); var result = discovery.GetEnvironment( environmentId: "{environmentId}" ); Console.WriteLine(result.Response);
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); Discovery discovery = new Discovery("2019-04-30", authenticator); discovery.setServiceUrl("{url}"); String environmentId = "{environment_id}"; GetEnvironmentOptions getOptions = new GetEnvironmentOptions.Builder(environmentId).build(); Environment getResponse = discovery.getEnvironment(getOptions).execute().getResult();
const DiscoveryV1 = require('ibm-watson/discovery/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV1({ version: '2019-04-30', authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); const getEnvironmentParams = { environmentId: '{environment_id}', }; discovery.getEnvironment(getEnvironmentParams) .then(environment => { console.log(JSON.stringify(environment, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') discovery = DiscoveryV1( version='2019-04-30', authenticator=authenticator ) discovery.set_service_url('{url}') environment_info = discovery.get_environment( '{environment_id}').get_result() print(json.dumps(environment_info, indent=2))
Response
Details about an environment.
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
Information about the Continuous Relevancy Training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- indexCapacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- diskUsage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- searchStatus
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- index_capacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- disk_usage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- search_status
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- index_capacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- disk_usage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- search_status
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- IndexCapacity
Summary of the document usage statistics for the environment.
- Documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- DiskUsage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- Collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- SearchStatus
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Status Code
Environment fetched.
Bad request.
{ "environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8", "name": "test_environment", "description": "Test environment", "created": "2016-06-16T10:56:54.957Z", "updated": "2017-05-16T13:56:54.957Z", "status": "active", "read_only": false, "size": "M", "index_capacity": { "documents": { "indexed": 0, "maximum_allowed": 1000000 }, "disk_usage": { "used_bytes": 0, "maximum_allowed_bytes": 85899345920 }, "collections": { "available": 1, "maximum_allowed": 4 } }, "search_status": [ { "scope": "environment", "status": "NO_DATA", "status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy." } ] }
{ "environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8", "name": "test_environment", "description": "Test environment", "created": "2016-06-16T10:56:54.957Z", "updated": "2017-05-16T13:56:54.957Z", "status": "active", "read_only": false, "size": "M", "index_capacity": { "documents": { "indexed": 0, "maximum_allowed": 1000000 }, "disk_usage": { "used_bytes": 0, "maximum_allowed_bytes": 85899345920 }, "collections": { "available": 1, "maximum_allowed": 4 } }, "search_status": [ { "scope": "environment", "status": "NO_DATA", "status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy." } ] }
Update an environment
Updates an environment. The environment's name and description parameters can be changed. You must specify a name for the environment.
Updates an environment. The environment's name and description parameters can be changed. You must specify a name for the environment.
Updates an environment. The environment's name and description parameters can be changed. You must specify a name for the environment.
Updates an environment. The environment's name and description parameters can be changed. You must specify a name for the environment.
Updates an environment. The environment's name and description parameters can be changed. You must specify a name for the environment.
PUT /v1/environments/{environment_id}
ServiceCall<Environment> updateEnvironment(UpdateEnvironmentOptions updateEnvironmentOptions)
updateEnvironment(params)
update_environment(
self,
environment_id: str,
*,
name: str = None,
description: str = None,
size: str = None,
**kwargs,
) -> DetailedResponse
UpdateEnvironment(string environmentId, string name = null, string description = null, string size = null)
Request
Use the UpdateEnvironmentOptions.Builder
to create a UpdateEnvironmentOptions
object that contains the parameter values for the updateEnvironment
method.
Path Parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2019-04-30
.
An object that defines the environment's name and, optionally, description.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Default:
Description of the environment.
Default:
Size to change the environment to. Note: Lite plan users cannot change the environment size.
Allowable values: [
S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]
The updateEnvironment options.
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Default:
Description of the environment.
Default:
Size to change the environment to. Note: Lite plan users cannot change the environment size.
Allowable values: [
S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Default:
Description of the environment.
Default:
Size to change the environment to. Note: Lite plan users cannot change the environment size.
Allowable values: [
S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Default:
Description of the environment.
Default:
Size to change the environment to. Note: Lite plan users cannot change the environment size.
Allowable values: [
S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Default:
Description of the environment.
Default:
Size to change the environment to. Note: Lite plan users cannot change the environment size.
Allowable values: [
S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]
curl -X PUT -u "apikey":"{apikey}" -H "Content-Type: application/json" -d '{ "name": "Updated name", "description": "Updated description" }' "{url}/v1/environments/{environment_id}?version=2019-04-30"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); DiscoveryService discovery = new DiscoveryService("2019-04-30", authenticator); discovery.SetServiceUrl("{url}"); var result = discovery.UpdateEnvironment( environmentId: "{environmentId}", name: "Updated name", description: "Updated description" ); Console.WriteLine(result.Response);
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); Discovery discovery = new Discovery("2019-04-30", authenticator); discovery.setServiceUrl("{url}"); String environmentId = "{environment_id}"; String environmentName = "Updated name"; String envDescription = "Updated description"; UpdateEnvironmentOptions.Builder updateBuilder = new UpdateEnvironmentOptions.Builder(environmentId, environmentName); updateBuilder.description(envDescription); Environment updateResponse = discovery.updateEnvironment(updateBuilder.build()).execute().getResult();
const DiscoveryV1 = require('ibm-watson/discovery/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV1({ version: '2019-04-30', authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); const updateEnvironmentParams = { environmentId: '{environment_id}', name: '{updated name OR current name if updating description (name is required)}', description: '{updated description OR current description if updating just name (description will be set to `null` if not given)}', }; discovery.updateEnvironment(updateEnvironmentParams) .then(environment => { console.log(JSON.stringify(environment, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') discovery = DiscoveryV1( version='2019-04-30', authenticator=authenticator ) discovery.set_service_url('{url}') new_name = discovery.update_environment( '{environment_id}', name='Updated name', description='Updated description').get_result() print(json.dumps(new_name, indent=2))
Response
Details about an environment.
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
Information about the Continuous Relevancy Training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- indexCapacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- diskUsage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- searchStatus
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- index_capacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- disk_usage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- search_status
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- index_capacity
Summary of the document usage statistics for the environment.
- documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- disk_usage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- search_status
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Details about an environment.
{
"environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
"name": "test_environment",
"description": "Test environment",
"created": "2016-06-16T10:56:54.957Z",
"updated": "2017-05-16T13:56:54.957Z",
"status": "active",
"read_only": false,
"size": "M",
"index_capacity": {
"documents": {
"indexed": 0,
"maximum_allowed": 1000000
},
"disk_usage": {
"used_bytes": 0,
"maximum_allowed_bytes": 85899345920
},
"collections": {
"available": 1,
"maximum_allowed": 4
}
},
"search_status": [
{
"scope": "environment",
"status": "NO_DATA",
"status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy."
}
]
}
Unique identifier for the environment.
Name that identifies the environment.
Possible values: 0 ≤ length ≤ 255
Description of the environment.
Creation date of the environment, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date of most recent environment update, in the format
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Current status of the environment.
resizing
is displayed when a request to increase the environment size has been made, but is still in the process of being completed.Possible values: [
active
,pending
,maintenance
,resizing
]If
true
, the environment contains read-only collections that are maintained by IBM.Current size of the environment.
Possible values: [
LT
,XS
,S
,MS
,M
,ML
,L
,XL
,XXL
,XXXL
]The new size requested for this environment. Only returned when the environment status is
resizing
.Note: Querying and indexing can still be performed during an environment upsize.
Details about the resource usage and capacity of the environment.
- IndexCapacity
Summary of the document usage statistics for the environment.
- Documents
Number of documents indexed for the environment.
Total number of documents allowed in the environment's capacity.
Summary of the disk usage statistics for the environment.
- DiskUsage
Number of bytes within the environment's disk capacity that are currently used to store data.
Total number of bytes available in the environment's disk capacity.
Summary of the collection usage in the environment.
- Collections
Number of active collections in the environment.
Total number of collections allowed in the environment.
Information about the Continuous Relevancy Training for this environment.
- SearchStatus
Current scope of the training. Always returned as
environment
.The current status of Continuous Relevancy Training for this environment.
Possible values: [
NO_DATA
,INSUFFICENT_DATA
,TRAINING
,TRAINED
,NOT_APPLICABLE
]Long description of the current Continuous Relevancy Training status.
The date stamp of the most recent completed training for this environment.
Status Code
Environment successfully updated.
Bad request.
Forbidden. Returned if you attempt to update a read-only environment.
{ "environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8", "name": "test_environment", "description": "Test environment", "created": "2016-06-16T10:56:54.957Z", "updated": "2017-05-16T13:56:54.957Z", "status": "active", "read_only": false, "size": "M", "index_capacity": { "documents": { "indexed": 0, "maximum_allowed": 1000000 }, "disk_usage": { "used_bytes": 0, "maximum_allowed_bytes": 85899345920 }, "collections": { "available": 1, "maximum_allowed": 4 } }, "search_status": [ { "scope": "environment", "status": "NO_DATA", "status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy." } ] }
{ "environment_id": "f822208e-e4c2-45f8-a0d6-c2be950fbcc8", "name": "test_environment", "description": "Test environment", "created": "2016-06-16T10:56:54.957Z", "updated": "2017-05-16T13:56:54.957Z", "status": "active", "read_only": false, "size": "M", "index_capacity": { "documents": { "indexed": 0, "maximum_allowed": 1000000 }, "disk_usage": { "used_bytes": 0, "maximum_allowed_bytes": 85899345920 }, "collections": { "available": 1, "maximum_allowed": 4 } }, "search_status": [ { "scope": "environment", "status": "NO_DATA", "status_description": "The system is employing the default strategy for document search natural_language_query. Enable query and event logging so we can initiate relevancy training to improve search accuracy." } ] }
Delete environment
DELETE /v1/environments/{environment_id}
ServiceCall<DeleteEnvironmentResponse> deleteEnvironment(DeleteEnvironmentOptions deleteEnvironmentOptions)
deleteEnvironment(params)
delete_environment(
self,
environment_id: str,
**kwargs,
) -> DetailedResponse
DeleteEnvironment(string environmentId)
Request
Use the DeleteEnvironmentOptions.Builder
to create a DeleteEnvironmentOptions
object that contains the parameter values for the deleteEnvironment
method.
Path Parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2019-04-30
.
The deleteEnvironment options.
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -u "apikey":"{apikey}" -X DELETE "{url}/v1/environments/{environment_id}?version=2019-04-30"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); DiscoveryService discovery = new DiscoveryService("2019-04-30", authenticator); discovery.SetServiceUrl("{url}"); var result = discovery.DeleteEnvironment( environmentId: "{environmentId}" ); Console.WriteLine(result.Response);
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); Discovery discovery = new Discovery("2019-04-30", authenticator); discovery.setServiceUrl("{url}"); String environmentId = "{environment_id}"; DeleteEnvironmentOptions deleteRequest = new DeleteEnvironmentOptions.Builder(environmentId).build(); DeleteEnvironmentResponse deleteResponse = discovery.deleteEnvironment(deleteRequest).execute().getResult();
const DiscoveryV1 = require('ibm-watson/discovery/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV1({ version: '2019-04-30', authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); const deleteEnvironmentParams = { environmentId: '{environment_id}', }; discovery.deleteEnvironment(deleteEnvironmentParams) .then(deleteEnvironmentResponse => { console.log(JSON.stringify(deleteEnvironmentResponse, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') discovery = DiscoveryV1( version='2019-04-30', authenticator=authenticator ) discovery.set_service_url('{url}') del_env = discovery.delete_environment('{environment_id}').get_result() print(json.dumps(del_env, indent=2))
Response
Response object returned when deleting an environment.
The unique identifier for the environment.
Status of the environment.
Possible values: [
deleted
]
Response object returned when deleting an environment.
The unique identifier for the environment.
Status of the environment.
Possible values: [
deleted
]
Response object returned when deleting an environment.
The unique identifier for the environment.
Status of the environment.
Possible values: [
deleted
]
Response object returned when deleting an environment.
The unique identifier for the environment.
Status of the environment.
Possible values: [
deleted
]
Response object returned when deleting an environment.
The unique identifier for the environment.
Status of the environment.
Possible values: [
deleted
]
Status Code
Environment successfully deleted.
Bad request. Example error messages:
Invalid environment id. Please check if the format is correct.
Forbidden. Returned if you attempt to delete a read-only environment.
Returned any time the environment is not found (even immediately after the environment was successfully deleted).
Example error message:
An environment with ID '2cd8bc72-d737-46e3-b26b-05a585111111' was not found.
No Sample Response
List fields across collections
Gets a list of the unique fields (and their types) stored in the indexes of the specified collections.
Gets a list of the unique fields (and their types) stored in the indexes of the specified collections.
Gets a list of the unique fields (and their types) stored in the indexes of the specified collections.
Gets a list of the unique fields (and their types) stored in the indexes of the specified collections.
Gets a list of the unique fields (and their types) stored in the indexes of the specified collections.
GET /v1/environments/{environment_id}/fields
ServiceCall<ListCollectionFieldsResponse> listFields(ListFieldsOptions listFieldsOptions)
listFields(params)
list_fields(
self,
environment_id: str,
collection_ids: List[str],
**kwargs,
) -> DetailedResponse
ListFields(string environmentId, List<string> collectionIds)
Request
Use the ListFieldsOptions.Builder
to create a ListFieldsOptions
object that contains the parameter values for the listFields
method.
Path Parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2019-04-30
.A comma-separated list of collection IDs to be queried against.
The listFields options.
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A comma-separated list of collection IDs to be queried against.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A comma-separated list of collection IDs to be queried against.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A comma-separated list of collection IDs to be queried against.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A comma-separated list of collection IDs to be queried against.
curl -u "apikey":"{apikey}" "{url}/v1/environments/{environment_id}/fields?collection_ids={id1},{id2}&version=2019-04-30"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); DiscoveryService discovery = new DiscoveryService("2019-04-30", authenticator); discovery.SetServiceUrl("{url}"); var result = discovery.ListFields( environmentId: "{environmentId}", collectionIds: new List<string>() { "{collection_id1}", "{collection_id2}" } ); Console.WriteLine(result.Response);
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); Discovery discovery = new Discovery("2019-04-30", authenticator); discovery.setServiceUrl("{url}"); String environmentId = "{environment_id}"; String collectionId = "{collection_id}"; ListFieldsOptions options = new ListFieldsOptions.Builder() .environmentId(environmentId) .addCollectionIds(collectionId) .build(); ListCollectionFieldsResponse response = discovery.listFields(options).execute().getResult();
const DiscoveryV1 = require('ibm-watson/discovery/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV1({ version: '2019-04-30', authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); const listFieldsParams = { environmentId: '{environment_id}', collectionIds: ['{collection id}'], }; discovery.listFields(listFieldsParams) .then(listCollectionFieldsResponse => { console.log(JSON.stringify(listCollectionFieldsResponse, null, 2)); }) .catch(err => { console.log('error:', err); });
import os import json from ibm_watson import DiscoveryV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') discovery = DiscoveryV1( version='2019-04-30', authenticator=authenticator ) discovery.set_service_url('{url}') fields = discovery.list_fields('{environment_id}', ['{collection_id1}','{collection_id2}']).get_result() print(json.dumps(fields, indent=2))
Response
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.
-
Fields which contain nested JSON objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
). -
Fields returned from the News collection are prefixed with
v{N}-fullnews-t3-{YEAR}.mappings
(for example,v5-fullnews-t3-2016.mappings.text.properties.author
).
An array containing information about each field in the collections.
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.
-
Fields which contain nested JSON objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
). -
Fields returned from the News collection are prefixed with
v{N}-fullnews-t3-{YEAR}.mappings
(for example,v5-fullnews-t3-2016.mappings.text.properties.author
).
{
"fields": [
{
"field": "warnings",
"type": "nested"
},
{
"field": "warnings.properties.description",
"type": "string"
},
{
"field": "warnings.properties.phase",
"type": "string"
},
{
"field": "warnings.properties.warning_id",
"type": "string"
}
]
}
An array containing information about each field in the collections.
- fields
The name of the field.
The type of the field.
Possible values: [
nested
,string
,date
,long
,integer
,short
,byte
,double
,float
,boolean
,binary
]
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.
-
Fields which contain nested JSON objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
). -
Fields returned from the News collection are prefixed with
v{N}-fullnews-t3-{YEAR}.mappings
(for example,v5-fullnews-t3-2016.mappings.text.properties.author
).
{
"fields": [
{
"field": "warnings",
"type": "nested"
},
{
"field": "warnings.properties.description",
"type": "string"
},
{
"field": "warnings.properties.phase",
"type": "string"
},
{
"field": "warnings.properties.warning_id",
"type": "string"
}
]
}
An array containing information about each field in the collections.
- fields
The name of the field.
The type of the field.
Possible values: [
nested
,string
,date
,long
,integer
,short
,byte
,double
,float
,boolean
,binary
]
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.
-
Fields which contain nested JSON objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
). -
Fields returned from the News collection are prefixed with
v{N}-fullnews-t3-{YEAR}.mappings
(for example,v5-fullnews-t3-2016.mappings.text.properties.author
).
{
"fields": [
{
"field": "warnings",
"type": "nested"
},
{
"field": "warnings.properties.description",
"type": "string"
},
{
"field": "warnings.properties.phase",
"type": "string"
},
{
"field": "warnings.properties.warning_id",
"type": "string"
}
]
}
An array containing information about each field in the collections.
- fields
The name of the field.
The type of the field.
Possible values: [
nested
,string
,date
,long
,integer
,short
,byte
,double
,float
,boolean
,binary
]
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.
-
Fields which contain nested JSON objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
). -
Fields returned from the News collection are prefixed with
v{N}-fullnews-t3-{YEAR}.mappings
(for example,v5-fullnews-t3-2016.mappings.text.properties.author
).
{
"fields": [
{
"field": "warnings",
"type": "nested"
},
{
"field": "warnings.properties.description",
"type": "string"
},
{
"field": "warnings.properties.phase",
"type": "string"
},
{
"field": "warnings.properties.warning_id",
"type": "string"
}
]
}
An array containing information about each field in the collections.
- Fields
The name of the field.
The type of the field.
Possible values: [
nested
,string
,date
,long
,integer
,short
,byte
,double
,float
,boolean
,binary
]
Status Code
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations:
-
Fields which contain nested JSON objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
). -
Fields returned from the News collection are prefixed with
v{N}-fullnews-t3-{YEAR}.mappings
(for example,v5-fullnews-t3-2016.mappings.text.properties.author
).
-
Bad request.
{ "fields": [ { "field": "warnings", "type": "nested" }, { "field": "warnings.properties.description", "type": "string" }, { "field": "warnings.properties.phase", "type": "string" }, { "field": "warnings.properties.warning_id", "type": "string" } ] }
{ "fields": [ { "field": "warnings", "type": "nested" }, { "field": "warnings.properties.description", "type": "string" }, { "field": "warnings.properties.phase", "type": "string" }, { "field": "warnings.properties.warning_id", "type": "string" } ] }
Add configuration
Creates a new configuration.
If the input configuration contains the configuration_id, created, or updated properties, then they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when copying a configuration.
The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.
Creates a new configuration.
If the input configuration contains the configuration_id, created, or updated properties, then they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when copying a configuration.
The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.
Creates a new configuration.
If the input configuration contains the configuration_id, created, or updated properties, then they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when copying a configuration.
The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.
Creates a new configuration.
If the input configuration contains the configuration_id, created, or updated properties, then they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when copying a configuration.
The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.
Creates a new configuration.
If the input configuration contains the configuration_id, created, or updated properties, then they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when copying a configuration.
The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.
POST /v1/environments/{environment_id}/configurations
ServiceCall<Configuration> createConfiguration(CreateConfigurationOptions createConfigurationOptions)
createConfiguration(params)
create_configuration(
self,
environment_id: str,
name: str,
*,
description: str = None,
conversions: 'Conversions' = None,
enrichments: List['Enrichment'] = None,
normalizations: List['NormalizationOperation'] = None,
source: 'Source' = None,
**kwargs,
) -> DetailedResponse
CreateConfiguration(string environmentId, string name, string description = null, Conversions conversions = null, List<Enrichment> enrichments = null, List<NormalizationOperation> normalizations = null, Source source = null)
Request
Use the CreateConfigurationOptions.Builder
to create a CreateConfigurationOptions
object that contains the parameter values for the createConfiguration
method.
Path Parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2019-04-30
.
Input an object that enables you to customize how your content is ingested and what enrichments are added to your data.
name is required and must be unique within the current environment. All other properties are optional.
If the input configuration contains the configuration_id, created, or updated properties, then they will be ignored and overridden by the system (an error is not returned so that the overridden fields do not need to be removed when copying a configuration).
The configuration can contain unrecognized JSON fields. Any such fields will be ignored and will not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The description of the configuration, if available.
Document conversion settings.
An array of document enrichment settings for the configuration.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
Object containing source parameters for the configuration.
The createConfiguration options.
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keepContent
An array to XPaths.
Object containing an array of XPaths.
- excludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Default:
false
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Default:
["h1","h2"]
Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- jsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.Default:
true
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Default:
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Default:
false
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Default:
false
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Allowable values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Allowable values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.Default:
true
The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
Default:
America/New_York
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Allowable values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- siteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.Default:
true
The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Allowable values: [
gentle
,normal
,aggressive
]Default:
normal
When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.Default:
false
The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
Default:
2
The maximum milliseconds to wait for a response from the web server.
Default:
30000
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Default:
false
Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Default:
false
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Default:
["h1","h2"]
Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.Default:
true
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Default:
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Default:
false
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Default:
false
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Allowable values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Allowable values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.Default:
true
The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
Default:
America/New_York
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Allowable values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.Default:
true
The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Allowable values: [
gentle
,normal
,aggressive
]Default:
normal
When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.Default:
false
The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
Default:
2
The maximum milliseconds to wait for a response from the web server.
Default:
30000
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Default:
false
Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Default:
false
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Default:
["h1","h2"]
Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.Default:
true
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Default:
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Default:
false
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Default:
false
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Allowable values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Allowable values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.Default:
true
The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
Default:
America/New_York
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Allowable values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.Default:
true
The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Allowable values: [
gentle
,normal
,aggressive
]Default:
normal
When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.Default:
false
The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
Default:
2
The maximum milliseconds to wait for a response from the web server.
Default:
30000
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Default:
false
Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- Pdf
Object containing heading detection conversion settings for PDF documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- Word
Object containing heading detection conversion settings for Microsoft Word documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- Styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- Html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- KeepContent
An array to XPaths.
Object containing an array of XPaths.
- ExcludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- Segment
Enables/disables the Document Segmentation feature.
Default:
false
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Default:
["h1","h2"]
Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- JsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.Default:
true
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Default:
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Default:
false
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Default:
false
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- Options
Object containing Natural Language Understanding features to be used.
- Features
An object specifying the Keyword enrichment and related parameters.
- Keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- Entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- Sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- Emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- SemanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- Relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- Concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Allowable values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Allowable values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- Schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.Default:
true
The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
Default:
America/New_York
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Allowable values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- Options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- Folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- Objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- SiteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- Urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.Default:
true
The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Allowable values: [
gentle
,normal
,aggressive
]Default:
normal
When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.Default:
false
The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
Default:
2
The maximum milliseconds to wait for a response from the web server.
Default:
30000
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Default:
false
Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- Buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
curl -X POST -u "apikey":"{apikey}" -H "Content-Type: application/json" -d @config.json "{url}/v1/environments/{environment_id}/configurations?version=2019-04-30"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); DiscoveryService discovery = new DiscoveryService("2019-04-30", authenticator); discovery.SetServiceUrl("{url}"); var result = discovery.CreateConfiguration( environmentId: "{environmentId}", name: "doc-config" ); Console.WriteLine(result.Response);
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); Discovery discovery = new Discovery("2019-04-30", authenticator); discovery.setServiceUrl("{url}"); String environmentId = "{environment_id"}; String configurationName = "doc-config"; CreateConfigurationOptions.Builder createBuilder = new CreateConfigurationOptions.Builder(); Configuration configuration = GsonSingleton.getGson().fromJson( new FileReader("./config.json"), com.ibm.watson.internal.discovery.model.configuration.Configuration.class); configuration.setName(configurationName); createBuilder.configuration(configuration); createBuilder.environmentId(environmentId); Configuration createResponse = discovery.createConfiguration(createBuilder.build()).execute().getResult();
const DiscoveryV1 = require('ibm-watson/discovery/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV1({ version: '2019-04-30', authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); const createConfigurationParams = { environmentId: '{environment_id}', name: 'node-examples-test', }; discovery.createConfiguration(createConfigurationParams) .then(configuration => { console.log(JSON.stringify(configuration, null, 2)); }) .catch(err => { console.log('error:', err); });
import os import json from ibm_watson import DiscoveryV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') discovery = DiscoveryV1( version='2019-04-30', authenticator=authenticator ) discovery.set_service_url('{url}') with open(os.path.join(os.getcwd(), 'config.json')) as config_data: data = json.load(config_data) new_config = discovery.create_configuration( '{environment_id}', data['name'], description=data['description'], conversions=data['conversions'], enrichments=data['enrichments'], normalizations=data['normalizations']).get_result() print(json.dumps(new_config, indent=2))
Response
A custom configuration for the environment.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The unique identifier of the configuration
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
The description of the configuration, if available.
Document conversion settings.
An array of document enrichment settings for the configuration.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
Object containing source parameters for the configuration.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keepContent
An array to XPaths.
Object containing an array of XPaths.
- excludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- jsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- siteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- Conversions
A list of PDF conversion settings.
- Pdf
Object containing heading detection conversion settings for PDF documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- Word
Object containing heading detection conversion settings for Microsoft Word documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- Styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- Html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- KeepContent
An array to XPaths.
Object containing an array of XPaths.
- ExcludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- Segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- JsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- Enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- Options
Object containing Natural Language Understanding features to be used.
- Features
An object specifying the Keyword enrichment and related parameters.
- Keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- Entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- Sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- Emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- SemanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- Relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- Concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- Normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- Source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- Schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- Options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- Folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- Objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- SiteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- Urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- Buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
Status Code
Configuration successfully created.
Bad request.
Forbidden. Returned if you attempt to add a configuration to a read-only environment.
{ "configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e", "name": "IBM News", "created": "2015-08-24T18:42:25.324Z", "updated": "2015-08-24T18:42:25.324Z", "description": "A configuration useful for ingesting IBM press releases.", "conversions": { "html": { "exclude_tags_keep_content": [ "span" ], "exclude_content": { "xpaths": [ "/home" ] } }, "segment": { "enabled": true, "annotated_fields": [ "custom-field-1", "custom-field-2" ] }, "json_normalizations": [ { "operation": "move", "source_field": "extracted_metadata.title", "destination_field": "metadata.title" }, { "operation": "move", "source_field": "extracted_metadata.author", "destination_field": "metadata.author" }, { "operation": "remove", "source_field": "extracted_metadata" } ] }, "enrichments": [ { "enrichment": "natural_language_understanding", "source_field": "title", "destination_field": "enriched_title", "options": { "features": { "keywords": { "sentiment": true, "emotion": false, "limit": 50 }, "entities": { "sentiment": true, "emotion": false, "limit": 50, "mentions": true, "mention_types": true, "sentence_locations": true, "model": "WKS-model-id" }, "sentiment": { "document": true, "targets": [ "IBM", "Watson" ] }, "emotion": { "document": true, "targets": [ "IBM", "Watson" ] }, "categories": {}, "concepts": { "limit": 8 }, "semantic_roles": { "entities": true, "keywords": true, "limit": 50 }, "relations": { "model": "WKS-model-id" } } } } ], "normalizations": [ { "operation": "move", "source_field": "metadata.title", "destination_field": "title" }, { "operation": "move", "source_field": "metadata.author", "destination_field": "author" }, { "operation": "remove", "source_field": "html" }, { "operation": "remove_nulls" } ], "source": { "type": "salesforce", "credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b", "schedule": { "enabled": true, "time_zone": "America/New_York", "frequency": "weekly" }, "options": { "site_collections": [ { "site_collection_path": "/sites/TestSiteA", "limit": 10 } ] } } }
{ "configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e", "name": "IBM News", "created": "2015-08-24T18:42:25.324Z", "updated": "2015-08-24T18:42:25.324Z", "description": "A configuration useful for ingesting IBM press releases.", "conversions": { "html": { "exclude_tags_keep_content": [ "span" ], "exclude_content": { "xpaths": [ "/home" ] } }, "segment": { "enabled": true, "annotated_fields": [ "custom-field-1", "custom-field-2" ] }, "json_normalizations": [ { "operation": "move", "source_field": "extracted_metadata.title", "destination_field": "metadata.title" }, { "operation": "move", "source_field": "extracted_metadata.author", "destination_field": "metadata.author" }, { "operation": "remove", "source_field": "extracted_metadata" } ] }, "enrichments": [ { "enrichment": "natural_language_understanding", "source_field": "title", "destination_field": "enriched_title", "options": { "features": { "keywords": { "sentiment": true, "emotion": false, "limit": 50 }, "entities": { "sentiment": true, "emotion": false, "limit": 50, "mentions": true, "mention_types": true, "sentence_locations": true, "model": "WKS-model-id" }, "sentiment": { "document": true, "targets": [ "IBM", "Watson" ] }, "emotion": { "document": true, "targets": [ "IBM", "Watson" ] }, "categories": {}, "concepts": { "limit": 8 }, "semantic_roles": { "entities": true, "keywords": true, "limit": 50 }, "relations": { "model": "WKS-model-id" } } } } ], "normalizations": [ { "operation": "move", "source_field": "metadata.title", "destination_field": "title" }, { "operation": "move", "source_field": "metadata.author", "destination_field": "author" }, { "operation": "remove", "source_field": "html" }, { "operation": "remove_nulls" } ], "source": { "type": "salesforce", "credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b", "schedule": { "enabled": true, "time_zone": "America/New_York", "frequency": "weekly" }, "options": { "site_collections": [ { "site_collection_path": "/sites/TestSiteA", "limit": 10 } ] } } }
List configurations
Lists existing configurations for the service instance.
Lists existing configurations for the service instance.
Lists existing configurations for the service instance.
Lists existing configurations for the service instance.
Lists existing configurations for the service instance.
GET /v1/environments/{environment_id}/configurations
ServiceCall<ListConfigurationsResponse> listConfigurations(ListConfigurationsOptions listConfigurationsOptions)
listConfigurations(params)
list_configurations(
self,
environment_id: str,
*,
name: str = None,
**kwargs,
) -> DetailedResponse
ListConfigurations(string environmentId, string name = null)
Request
Use the ListConfigurationsOptions.Builder
to create a ListConfigurationsOptions
object that contains the parameter values for the listConfigurations
method.
Path Parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2019-04-30
.Find configurations with the given name.
The listConfigurations options.
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Find configurations with the given name.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Find configurations with the given name.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Find configurations with the given name.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Find configurations with the given name.
curl -u "apikey":"{apikey}" "{url}/v1/environments/{environment_id}/configurations?version=2019-04-30"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); DiscoveryService discovery = new DiscoveryService("2019-04-30", authenticator); discovery.SetServiceUrl("{url}"); var result = discovery.ListConfigurations( environmentId: "{environmentId}" ); Console.WriteLine(result.Response);
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); Discovery discovery = new Discovery("2019-04-30", authenticator); discovery.setServiceUrl("{url}"); String environmentId = "{environment_id}"; ListConfigurationsOptions listOptions = new ListConfigurationsOptions.Builder(environmentId).build(); ListConfigurationsResponse listResponse = discovery.listConfigurations(listOptions).execute().getResult();
const DiscoveryV1 = require('ibm-watson/discovery/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV1({ version: '2019-04-30', authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); const listConfigurationsParams = { environmentId: '{environment_id}', }; discovery.listConfigurations(listConfigurationsParams) .then(listConfigurationsResponse => { console.log(JSON.stringify(listConfigurationsResponse, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') discovery = DiscoveryV1( version='2019-04-30', authenticator=authenticator ) discovery.set_service_url('{url}') configs = discovery.list_configurations('{environment_id}').get_result() print(json.dumps(configs, indent=2))
Response
Object containing an array of available configurations.
An array of configurations that are available for the service instance.
Object containing an array of available configurations.
An array of configurations that are available for the service instance.
Examples:{ "configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e", "name": "IBM News", "created": "2015-08-24T18:42:25.324Z", "updated": "2015-08-24T18:42:25.324Z", "description": "A configuration useful for ingesting IBM press releases.", "conversions": { "html": { "exclude_tags_keep_content": [ "span" ], "exclude_content": { "xpaths": [ "/home" ] } }, "segment": { "enabled": true, "annotated_fields": [ "custom-field-1", "custom-field-2" ] }, "json_normalizations": [ { "operation": "move", "source_field": "extracted_metadata.title", "destination_field": "metadata.title" }, { "operation": "move", "source_field": "extracted_metadata.author", "destination_field": "metadata.author" }, { "operation": "remove", "source_field": "extracted_metadata" } ] }, "enrichments": [ { "enrichment": "natural_language_understanding", "source_field": "title", "destination_field": "enriched_title", "options": { "features": { "keywords": { "sentiment": true, "emotion": false, "limit": 50 }, "entities": { "sentiment": true, "emotion": false, "limit": 50, "mentions": true, "mention_types": true, "sentence_locations": true, "model": "WKS-model-id" }, "sentiment": { "document": true, "targets": [ "IBM", "Watson" ] }, "emotion": { "document": true, "targets": [ "IBM", "Watson" ] }, "categories": {}, "concepts": { "limit": 8 }, "semantic_roles": { "entities": true, "keywords": true, "limit": 50 }, "relations": { "model": "WKS-model-id" } } } } ], "normalizations": [ { "operation": "move", "source_field": "metadata.title", "destination_field": "title" }, { "operation": "move", "source_field": "metadata.author", "destination_field": "author" }, { "operation": "remove", "source_field": "html" }, { "operation": "remove_nulls" } ], "source": { "type": "salesforce", "credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b", "schedule": { "enabled": true, "time_zone": "America/New_York", "frequency": "weekly" }, "options": { "site_collections": [ { "site_collection_path": "/sites/TestSiteA", "limit": 10 } ] } } }
- configurations
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keepContent
An array to XPaths.
Object containing an array of XPaths.
- excludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- jsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- siteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
Object containing an array of available configurations.
An array of configurations that are available for the service instance.
Examples:{ "configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e", "name": "IBM News", "created": "2015-08-24T18:42:25.324Z", "updated": "2015-08-24T18:42:25.324Z", "description": "A configuration useful for ingesting IBM press releases.", "conversions": { "html": { "exclude_tags_keep_content": [ "span" ], "exclude_content": { "xpaths": [ "/home" ] } }, "segment": { "enabled": true, "annotated_fields": [ "custom-field-1", "custom-field-2" ] }, "json_normalizations": [ { "operation": "move", "source_field": "extracted_metadata.title", "destination_field": "metadata.title" }, { "operation": "move", "source_field": "extracted_metadata.author", "destination_field": "metadata.author" }, { "operation": "remove", "source_field": "extracted_metadata" } ] }, "enrichments": [ { "enrichment": "natural_language_understanding", "source_field": "title", "destination_field": "enriched_title", "options": { "features": { "keywords": { "sentiment": true, "emotion": false, "limit": 50 }, "entities": { "sentiment": true, "emotion": false, "limit": 50, "mentions": true, "mention_types": true, "sentence_locations": true, "model": "WKS-model-id" }, "sentiment": { "document": true, "targets": [ "IBM", "Watson" ] }, "emotion": { "document": true, "targets": [ "IBM", "Watson" ] }, "categories": {}, "concepts": { "limit": 8 }, "semantic_roles": { "entities": true, "keywords": true, "limit": 50 }, "relations": { "model": "WKS-model-id" } } } } ], "normalizations": [ { "operation": "move", "source_field": "metadata.title", "destination_field": "title" }, { "operation": "move", "source_field": "metadata.author", "destination_field": "author" }, { "operation": "remove", "source_field": "html" }, { "operation": "remove_nulls" } ], "source": { "type": "salesforce", "credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b", "schedule": { "enabled": true, "time_zone": "America/New_York", "frequency": "weekly" }, "options": { "site_collections": [ { "site_collection_path": "/sites/TestSiteA", "limit": 10 } ] } } }
- configurations
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
Object containing an array of available configurations.
An array of configurations that are available for the service instance.
Examples:{ "configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e", "name": "IBM News", "created": "2015-08-24T18:42:25.324Z", "updated": "2015-08-24T18:42:25.324Z", "description": "A configuration useful for ingesting IBM press releases.", "conversions": { "html": { "exclude_tags_keep_content": [ "span" ], "exclude_content": { "xpaths": [ "/home" ] } }, "segment": { "enabled": true, "annotated_fields": [ "custom-field-1", "custom-field-2" ] }, "json_normalizations": [ { "operation": "move", "source_field": "extracted_metadata.title", "destination_field": "metadata.title" }, { "operation": "move", "source_field": "extracted_metadata.author", "destination_field": "metadata.author" }, { "operation": "remove", "source_field": "extracted_metadata" } ] }, "enrichments": [ { "enrichment": "natural_language_understanding", "source_field": "title", "destination_field": "enriched_title", "options": { "features": { "keywords": { "sentiment": true, "emotion": false, "limit": 50 }, "entities": { "sentiment": true, "emotion": false, "limit": 50, "mentions": true, "mention_types": true, "sentence_locations": true, "model": "WKS-model-id" }, "sentiment": { "document": true, "targets": [ "IBM", "Watson" ] }, "emotion": { "document": true, "targets": [ "IBM", "Watson" ] }, "categories": {}, "concepts": { "limit": 8 }, "semantic_roles": { "entities": true, "keywords": true, "limit": 50 }, "relations": { "model": "WKS-model-id" } } } } ], "normalizations": [ { "operation": "move", "source_field": "metadata.title", "destination_field": "title" }, { "operation": "move", "source_field": "metadata.author", "destination_field": "author" }, { "operation": "remove", "source_field": "html" }, { "operation": "remove_nulls" } ], "source": { "type": "salesforce", "credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b", "schedule": { "enabled": true, "time_zone": "America/New_York", "frequency": "weekly" }, "options": { "site_collections": [ { "site_collection_path": "/sites/TestSiteA", "limit": 10 } ] } } }
- configurations
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
Object containing an array of available configurations.
An array of configurations that are available for the service instance.
Examples:{ "configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e", "name": "IBM News", "created": "2015-08-24T18:42:25.324Z", "updated": "2015-08-24T18:42:25.324Z", "description": "A configuration useful for ingesting IBM press releases.", "conversions": { "html": { "exclude_tags_keep_content": [ "span" ], "exclude_content": { "xpaths": [ "/home" ] } }, "segment": { "enabled": true, "annotated_fields": [ "custom-field-1", "custom-field-2" ] }, "json_normalizations": [ { "operation": "move", "source_field": "extracted_metadata.title", "destination_field": "metadata.title" }, { "operation": "move", "source_field": "extracted_metadata.author", "destination_field": "metadata.author" }, { "operation": "remove", "source_field": "extracted_metadata" } ] }, "enrichments": [ { "enrichment": "natural_language_understanding", "source_field": "title", "destination_field": "enriched_title", "options": { "features": { "keywords": { "sentiment": true, "emotion": false, "limit": 50 }, "entities": { "sentiment": true, "emotion": false, "limit": 50, "mentions": true, "mention_types": true, "sentence_locations": true, "model": "WKS-model-id" }, "sentiment": { "document": true, "targets": [ "IBM", "Watson" ] }, "emotion": { "document": true, "targets": [ "IBM", "Watson" ] }, "categories": {}, "concepts": { "limit": 8 }, "semantic_roles": { "entities": true, "keywords": true, "limit": 50 }, "relations": { "model": "WKS-model-id" } } } } ], "normalizations": [ { "operation": "move", "source_field": "metadata.title", "destination_field": "title" }, { "operation": "move", "source_field": "metadata.author", "destination_field": "author" }, { "operation": "remove", "source_field": "html" }, { "operation": "remove_nulls" } ], "source": { "type": "salesforce", "credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b", "schedule": { "enabled": true, "time_zone": "America/New_York", "frequency": "weekly" }, "options": { "site_collections": [ { "site_collection_path": "/sites/TestSiteA", "limit": 10 } ] } } }
- Configurations
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- Conversions
A list of PDF conversion settings.
- Pdf
Object containing heading detection conversion settings for PDF documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- Word
Object containing heading detection conversion settings for Microsoft Word documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- Styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- Html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- KeepContent
An array to XPaths.
Object containing an array of XPaths.
- ExcludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- Segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- JsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- Enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- Options
Object containing Natural Language Understanding features to be used.
- Features
An object specifying the Keyword enrichment and related parameters.
- Keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- Entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- Sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- Emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- SemanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- Relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- Concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- Normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- Source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- Schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- Options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- Folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- Objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- SiteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- Urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- Buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
Status Code
Successful response.
Bad request.
No Sample Response
Get configuration details
GET /v1/environments/{environment_id}/configurations/{configuration_id}
ServiceCall<Configuration> getConfiguration(GetConfigurationOptions getConfigurationOptions)
getConfiguration(params)
get_configuration(
self,
environment_id: str,
configuration_id: str,
**kwargs,
) -> DetailedResponse
GetConfiguration(string environmentId, string configurationId)
Request
Use the GetConfigurationOptions.Builder
to create a GetConfigurationOptions
object that contains the parameter values for the getConfiguration
method.
Path Parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
The ID of the configuration.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2019-04-30
.
The getConfiguration options.
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the configuration.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the configuration.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the configuration.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the configuration.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -u "apikey":"{apikey}" "{url}/v1/environments/{environment_id}/configurations/{configuration_id}?version=2019-04-30"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); DiscoveryService discovery = new DiscoveryService("2019-04-30", authenticator); discovery.SetServiceUrl("{url}"); var result = discovery.GetConfiguration( environmentId: "{environmentId}", configurationId: "{configurationId}" ); Console.WriteLine(result.Response);
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); Discovery discovery = new Discovery("2019-04-30", authenticator); discovery.setServiceUrl("{url}"); String environmentId = "{environment_id}"; String configurationId = "{configuration_id}"; GetConfigurationOptions getOptions = new GetConfigurationOptions.Builder(environmentId, configurationId).build(); Configuration getResponse = discovery.getConfiguration(getOptions).execute().getResult();
const DiscoveryV1 = require('ibm-watson/discovery/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV1({ version: '2019-04-30', authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); const getConfigurationParams = { environmentId: '{environment_id}', configurationId: '{configuration_id}', }; discovery.getConfiguration(getConfigurationParams) .then(configuration => { console.log(JSON.stringify(configuration, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') discovery = DiscoveryV1( version='2019-04-30', authenticator=authenticator ) discovery.set_service_url('{url}') config = discovery.get_configuration( '{environment_id}', '{configuration_id}').get_result() print(json.dumps(config, indent=2))
Response
A custom configuration for the environment.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The unique identifier of the configuration
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
The description of the configuration, if available.
Document conversion settings.
An array of document enrichment settings for the configuration.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
Object containing source parameters for the configuration.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keepContent
An array to XPaths.
Object containing an array of XPaths.
- excludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- jsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- siteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- Conversions
A list of PDF conversion settings.
- Pdf
Object containing heading detection conversion settings for PDF documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- Word
Object containing heading detection conversion settings for Microsoft Word documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- Styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- Html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- KeepContent
An array to XPaths.
Object containing an array of XPaths.
- ExcludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- Segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- JsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- Enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- Options
Object containing Natural Language Understanding features to be used.
- Features
An object specifying the Keyword enrichment and related parameters.
- Keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- Entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- Sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- Emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- SemanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- Relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- Concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- Normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- Source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- Schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- Options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- Folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- Objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- SiteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- Urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- Buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
Status Code
Configuration successfully fetched.
Bad request.
{ "configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e", "name": "IBM News", "created": "2015-08-24T18:42:25.324Z", "updated": "2015-08-24T18:42:25.324Z", "description": "A configuration useful for ingesting IBM press releases.", "conversions": { "html": { "exclude_tags_keep_content": [ "span" ], "exclude_content": { "xpaths": [ "/home" ] } }, "segment": { "enabled": true, "annotated_fields": [ "custom-field-1", "custom-field-2" ] }, "json_normalizations": [ { "operation": "move", "source_field": "extracted_metadata.title", "destination_field": "metadata.title" }, { "operation": "move", "source_field": "extracted_metadata.author", "destination_field": "metadata.author" }, { "operation": "remove", "source_field": "extracted_metadata" } ] }, "enrichments": [ { "enrichment": "natural_language_understanding", "source_field": "title", "destination_field": "enriched_title", "options": { "features": { "keywords": { "sentiment": true, "emotion": false, "limit": 50 }, "entities": { "sentiment": true, "emotion": false, "limit": 50, "mentions": true, "mention_types": true, "sentence_locations": true, "model": "WKS-model-id" }, "sentiment": { "document": true, "targets": [ "IBM", "Watson" ] }, "emotion": { "document": true, "targets": [ "IBM", "Watson" ] }, "categories": {}, "concepts": { "limit": 8 }, "semantic_roles": { "entities": true, "keywords": true, "limit": 50 }, "relations": { "model": "WKS-model-id" } } } } ], "normalizations": [ { "operation": "move", "source_field": "metadata.title", "destination_field": "title" }, { "operation": "move", "source_field": "metadata.author", "destination_field": "author" }, { "operation": "remove", "source_field": "html" }, { "operation": "remove_nulls" } ], "source": { "type": "salesforce", "credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b", "schedule": { "enabled": true, "time_zone": "America/New_York", "frequency": "weekly" }, "options": { "site_collections": [ { "site_collection_path": "/sites/TestSiteA", "limit": 10 } ] } } }
{ "configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e", "name": "IBM News", "created": "2015-08-24T18:42:25.324Z", "updated": "2015-08-24T18:42:25.324Z", "description": "A configuration useful for ingesting IBM press releases.", "conversions": { "html": { "exclude_tags_keep_content": [ "span" ], "exclude_content": { "xpaths": [ "/home" ] } }, "segment": { "enabled": true, "annotated_fields": [ "custom-field-1", "custom-field-2" ] }, "json_normalizations": [ { "operation": "move", "source_field": "extracted_metadata.title", "destination_field": "metadata.title" }, { "operation": "move", "source_field": "extracted_metadata.author", "destination_field": "metadata.author" }, { "operation": "remove", "source_field": "extracted_metadata" } ] }, "enrichments": [ { "enrichment": "natural_language_understanding", "source_field": "title", "destination_field": "enriched_title", "options": { "features": { "keywords": { "sentiment": true, "emotion": false, "limit": 50 }, "entities": { "sentiment": true, "emotion": false, "limit": 50, "mentions": true, "mention_types": true, "sentence_locations": true, "model": "WKS-model-id" }, "sentiment": { "document": true, "targets": [ "IBM", "Watson" ] }, "emotion": { "document": true, "targets": [ "IBM", "Watson" ] }, "categories": {}, "concepts": { "limit": 8 }, "semantic_roles": { "entities": true, "keywords": true, "limit": 50 }, "relations": { "model": "WKS-model-id" } } } } ], "normalizations": [ { "operation": "move", "source_field": "metadata.title", "destination_field": "title" }, { "operation": "move", "source_field": "metadata.author", "destination_field": "author" }, { "operation": "remove", "source_field": "html" }, { "operation": "remove_nulls" } ], "source": { "type": "salesforce", "credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b", "schedule": { "enabled": true, "time_zone": "America/New_York", "frequency": "weekly" }, "options": { "site_collections": [ { "site_collection_path": "/sites/TestSiteA", "limit": 10 } ] } } }
Update a configuration
Replaces an existing configuration.
- Completely replaces the original configuration.
- The configuration_id, updated, and created fields are accepted in the request, but they are ignored, and an error is not generated. It is also acceptable for users to submit an updated configuration with none of the three properties.
- Documents are processed with a snapshot of the configuration as it was at the time the document was submitted to be ingested. This means that already submitted documents will not see any updates made to the configuration.
Replaces an existing configuration.
- Completely replaces the original configuration.
- The configuration_id, updated, and created fields are accepted in the request, but they are ignored, and an error is not generated. It is also acceptable for users to submit an updated configuration with none of the three properties.
- Documents are processed with a snapshot of the configuration as it was at the time the document was submitted to be ingested. This means that already submitted documents will not see any updates made to the configuration.
Replaces an existing configuration.
- Completely replaces the original configuration.
- The configuration_id, updated, and created fields are accepted in the request, but they are ignored, and an error is not generated. It is also acceptable for users to submit an updated configuration with none of the three properties.
- Documents are processed with a snapshot of the configuration as it was at the time the document was submitted to be ingested. This means that already submitted documents will not see any updates made to the configuration.
Replaces an existing configuration.
- Completely replaces the original configuration.
- The configuration_id, updated, and created fields are accepted in the request, but they are ignored, and an error is not generated. It is also acceptable for users to submit an updated configuration with none of the three properties.
- Documents are processed with a snapshot of the configuration as it was at the time the document was submitted to be ingested. This means that already submitted documents will not see any updates made to the configuration.
Replaces an existing configuration.
- Completely replaces the original configuration.
- The configuration_id, updated, and created fields are accepted in the request, but they are ignored, and an error is not generated. It is also acceptable for users to submit an updated configuration with none of the three properties.
- Documents are processed with a snapshot of the configuration as it was at the time the document was submitted to be ingested. This means that already submitted documents will not see any updates made to the configuration.
PUT /v1/environments/{environment_id}/configurations/{configuration_id}
ServiceCall<Configuration> updateConfiguration(UpdateConfigurationOptions updateConfigurationOptions)
updateConfiguration(params)
update_configuration(
self,
environment_id: str,
configuration_id: str,
name: str,
*,
description: str = None,
conversions: 'Conversions' = None,
enrichments: List['Enrichment'] = None,
normalizations: List['NormalizationOperation'] = None,
source: 'Source' = None,
**kwargs,
) -> DetailedResponse
UpdateConfiguration(string environmentId, string configurationId, string name, string description = null, Conversions conversions = null, List<Enrichment> enrichments = null, List<NormalizationOperation> normalizations = null, Source source = null)
Request
Use the UpdateConfigurationOptions.Builder
to create a UpdateConfigurationOptions
object that contains the parameter values for the updateConfiguration
method.
Path Parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
The ID of the configuration.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2019-04-30
.
Input an object that enables you to update and customize how your data is ingested and what enrichments are added to your data. The name parameter is required and must be unique within the current environment. All other properties are optional, but if they are omitted the default values replace the current value of each omitted property.
If the input configuration contains the configuration_id, created, or updated properties, they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when updating a configuration.
The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The description of the configuration, if available.
Document conversion settings.
An array of document enrichment settings for the configuration.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
Object containing source parameters for the configuration.
The updateConfiguration options.
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the configuration.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keepContent
An array to XPaths.
Object containing an array of XPaths.
- excludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Default:
false
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Default:
["h1","h2"]
Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- jsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.Default:
true
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Default:
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Default:
false
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Default:
false
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Allowable values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Allowable values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.Default:
true
The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
Default:
America/New_York
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Allowable values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- siteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.Default:
true
The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Allowable values: [
gentle
,normal
,aggressive
]Default:
normal
When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.Default:
false
The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
Default:
2
The maximum milliseconds to wait for a response from the web server.
Default:
30000
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Default:
false
Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the configuration.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Default:
false
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Default:
["h1","h2"]
Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.Default:
true
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Default:
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Default:
false
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Default:
false
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Allowable values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Allowable values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.Default:
true
The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
Default:
America/New_York
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Allowable values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.Default:
true
The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Allowable values: [
gentle
,normal
,aggressive
]Default:
normal
When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.Default:
false
The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
Default:
2
The maximum milliseconds to wait for a response from the web server.
Default:
30000
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Default:
false
Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the configuration.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Default:
false
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Default:
["h1","h2"]
Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.Default:
true
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Default:
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Default:
false
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Default:
false
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Allowable values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Allowable values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.Default:
true
The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
Default:
America/New_York
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Allowable values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.Default:
true
The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Allowable values: [
gentle
,normal
,aggressive
]Default:
normal
When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.Default:
false
The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
Default:
2
The maximum milliseconds to wait for a response from the web server.
Default:
30000
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Default:
false
Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
parameters
The ID of the environment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the configuration.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- Pdf
Object containing heading detection conversion settings for PDF documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- Word
Object containing heading detection conversion settings for Microsoft Word documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- Styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- Html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- KeepContent
An array to XPaths.
Object containing an array of XPaths.
- ExcludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- Segment
Enables/disables the Document Segmentation feature.
Default:
false
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Default:
["h1","h2"]
Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- JsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.Default:
true
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Default:
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Default:
false
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Default:
false
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- Options
Object containing Natural Language Understanding features to be used.
- Features
An object specifying the Keyword enrichment and related parameters.
- Keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- Entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- Sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- Emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- SemanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- Relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- Concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Allowable values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Allowable values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- Schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.Default:
true
The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
Default:
America/New_York
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Allowable values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- Options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- Folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- Objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- SiteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- Urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.Default:
true
The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Allowable values: [
gentle
,normal
,aggressive
]Default:
normal
When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.Default:
false
The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
Default:
2
The maximum milliseconds to wait for a response from the web server.
Default:
30000
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Default:
false
Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- Buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
curl -X PUT -u "apikey":"{apikey}" -H "Content-Type: application/json" -d @new_config.json "{url}/v1/environments/{environment_id}/configurations/{configuration_id}?version=2019-04-30"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); DiscoveryService discovery = new DiscoveryService("2019-04-30", authenticator); discovery.SetServiceUrl("{url}"); var result = discovery.UpdateConfiguration( environmentId: "{environmentId}", configurationId: "{configurationId}", name: "new-config" ); Console.WriteLine(result.Response);
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); Discovery discovery = new Discovery("2019-04-30", authenticator); discovery.setServiceUrl("{url}"); String environmentId = "{environment_id}"; String configurationId = "{configuration_id}"; String updatedConfigurationName = "new-config"; Configuration updatedConfiguration = GsonSingleton.getGson().fromJson( new FileReader("{updatedConfigFilePath}"), com.ibm.watson.internal.discovery.model.configuration.Configuration.class); UpdateConfigurationOptions.Builder updateBuilder = new UpdateConfigurationOptions.Builder(environmentId, configurationId, updatedConfigurationName); updateBuilder.configuration(updatedConfiguration); Configuration updateResponse = discovery.updateConfiguration(updateBuilder.build()).execute().getResult();
const DiscoveryV1 = require('ibm-watson/discovery/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV1({ version: '2019-04-30', authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); const updateConfigurationParams = { environmentId: '{environment_id}', configurationId: '{configuration_id}', name: '{updated or original name if updating another parameter (name is required)}', }; discovery.updateConfiguration(updateConfigurationParams) .then(configuration => { console.log(JSON.stringify(configuration, null, 2)); }) .catch(err => { console.log('error:', err); });
import os import json from ibm_watson import DiscoveryV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') discovery = DiscoveryV1( version='2019-04-30', authenticator=authenticator ) discovery.set_service_url('{url}') with open(os.path.join(os.getcwd(), 'config_update.json')) as config_data: data = json.load(config_data) updated_config = discovery.update_configuration( '{environment_id}', '{configuration_id}', data['name'], description=data['description'], conversions=data['conversions'], enrichments=data['enrichments'], normalizations=data['normalizations']).get_result() print(json.dumps(updated_config, indent=2))
Response
A custom configuration for the environment.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The unique identifier of the configuration
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
The description of the configuration, if available.
Document conversion settings.
An array of document enrichment settings for the configuration.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
Object containing source parameters for the configuration.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keepContent
An array to XPaths.
Object containing an array of XPaths.
- excludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- jsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- siteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- conversions
A list of PDF conversion settings.
- pdf
Object containing heading detection conversion settings for PDF documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- word
Object containing heading detection conversion settings for Microsoft Word documents.
- heading
Array of font matching configurations.
- fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- keep_content
An array to XPaths.
Object containing an array of XPaths.
- exclude_content
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- json_normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- options
Object containing Natural Language Understanding features to be used.
- features
An object specifying the Keyword enrichment and related parameters.
- keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- semantic_roles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- site_collections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to
web_crawl
.- urls
The starting URL to crawl.
When
true
, crawls of the specified URL are limited to the host part of the url field.The number of concurrent URLs to fetch.
gentle
means one URL is fetched at a time with a delay between each call.normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls.aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls.Possible values: [
gentle
,normal
,aggressive
]When
true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
The maximum milliseconds to wait for a response from the web server.
When
true
, the crawler will ignore anyrobots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set totrue
when a gateway_id is specied in the credentials.Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing
https://ibm.com/watson
also excludeshttps://ibm.com/watson/discovery
.
Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to
cloud_object_store
, and the crawl_all_buckets field isfalse
or not specified.- buckets
The name of the cloud object store bucket to crawl.
The number of documents to crawl from this cloud object store bucket. If not specified, all documents in the bucket are crawled.
When
true
, all buckets in the specified cloud object store are crawled. If set totrue
, the buckets array must not be specified.
A custom configuration for the environment.
{
"configuration_id": "448e3545-51ca-4530-a03b-6ff282ceac2e",
"name": "IBM News",
"created": "2015-08-24T18:42:25.324Z",
"updated": "2015-08-24T18:42:25.324Z",
"description": "A configuration useful for ingesting IBM press releases.",
"conversions": {
"html": {
"exclude_tags_keep_content": [
"span"
],
"exclude_content": {
"xpaths": [
"/home"
]
}
},
"segment": {
"enabled": true,
"annotated_fields": [
"custom-field-1",
"custom-field-2"
]
},
"json_normalizations": [
{
"operation": "move",
"source_field": "extracted_metadata.title",
"destination_field": "metadata.title"
},
{
"operation": "move",
"source_field": "extracted_metadata.author",
"destination_field": "metadata.author"
},
{
"operation": "remove",
"source_field": "extracted_metadata"
}
]
},
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "title",
"destination_field": "enriched_title",
"options": {
"features": {
"keywords": {
"sentiment": true,
"emotion": false,
"limit": 50
},
"entities": {
"sentiment": true,
"emotion": false,
"limit": 50,
"mentions": true,
"mention_types": true,
"sentence_locations": true,
"model": "WKS-model-id"
},
"sentiment": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"emotion": {
"document": true,
"targets": [
"IBM",
"Watson"
]
},
"categories": {},
"concepts": {
"limit": 8
},
"semantic_roles": {
"entities": true,
"keywords": true,
"limit": 50
},
"relations": {
"model": "WKS-model-id"
}
}
}
}
],
"normalizations": [
{
"operation": "move",
"source_field": "metadata.title",
"destination_field": "title"
},
{
"operation": "move",
"source_field": "metadata.author",
"destination_field": "author"
},
{
"operation": "remove",
"source_field": "html"
},
{
"operation": "remove_nulls"
}
],
"source": {
"type": "salesforce",
"credential_id": "00ad0000-0000-11e8-ba89-0ed5f00f718b",
"schedule": {
"enabled": true,
"time_zone": "America/New_York",
"frequency": "weekly"
},
"options": {
"site_collections": [
{
"site_collection_path": "/sites/TestSiteA",
"limit": 10
}
]
}
}
}
The unique identifier of the configuration.
The name of the configuration.
Possible values: 0 ≤ length ≤ 255
The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
The description of the configuration, if available.
Document conversion settings.
- Conversions
A list of PDF conversion settings.
- Pdf
Object containing heading detection conversion settings for PDF documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
A list of Word conversion settings.
- Word
Object containing heading detection conversion settings for Microsoft Word documents.
- Heading
Array of font matching configurations.
- Fonts
The HTML heading level that any content with the matching font is converted to.
The minimum size of the font to match.
The maximum size of the font to match.
When
true
, the font is matched if it is bold.When
true
, the font is matched if it is italic.The name of the font.
Array of Microsoft Word styles to convert.
- Styles
HTML head level that content matching this style is tagged with.
Array of word style names to convert.
A list of HTML conversion settings.
- Html
Array of HTML tags that are excluded completely.
Array of HTML tags which are excluded but still retain content.
Object containing an array of XPaths.
- KeepContent
An array to XPaths.
Object containing an array of XPaths.
- ExcludeContent
An array to XPaths.
An array of HTML tag attributes to keep in the converted document.
Array of HTML tag attributes to exclude.
A list of Document Segmentation settings.
- Segment
Enables/disables the Document Segmentation feature.
Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to
false
in the smart_document_understanding object.Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field
sub-title
is specified, when a document is uploaded each time the smart document understanding conversion encounters a field of typesub-title
the document is split at that point and the content of the field used as the title of the remaining content. This split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set totrue
in the smart_document_understanding object.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- JsonNormalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
When
true
, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.
An array of document enrichment settings for the configuration.
- Enrichments
Describes what the enrichment step does.
Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if
text
is a top-level field with no sub-fields,text.foo
is a valid destination buttext.foo.bar
is not.Field to be enriched.
Arrays can be specified as the source_field if the enrichment service for this enrichment is set to
natural_language_undstanding
.Indicates that the enrichments will overwrite the destination_field field if it already exists.
Name of the enrichment service to call. The only supported option is
natural_language_understanding
. Theelements
option is deprecated and support ended on 10 July 2020.The options object must contain Natural Language Understanding options.
If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Options that are specific to a particular enrichment.
The
elements
enrichment type is deprecated. Use the Create a project method of the Discovery v2 API to create acontent_intelligence
project type instead.- Options
Object containing Natural Language Understanding features to be used.
- Features
An object specifying the Keyword enrichment and related parameters.
- Keywords
When
true
, sentiment analysis of keywords will be performed on the specified field.When
true
, emotion detection of keywords will be performed on the specified field.The maximum number of keywords to extract for each instance of the specified field.
An object speficying the Entities enrichment and related parameters.
- Entities
When
true
, sentiment analysis of entities will be performed on the specified field.When
true
, emotion detection of entities will be performed on the specified field.The maximum number of entities to extract for each instance of the specified field.
When
true
, the number of mentions of each identified entity is recorded. The default isfalse
.When
true
, the types of mentions for each idetifieid entity is recorded. The default isfalse
.When
true
, a list of sentence locations for each instance of each identified entity is recorded. The default isfalse
.The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, or the default public model
alchemy
.
An object specifying the sentiment extraction enrichment and related parameters.
- Sentiment
When
true
, sentiment analysis is performed on the entire field.A comma-separated list of target strings that will have any associated sentiment analyzed.
An object specifying the emotion detection enrichment and related parameters.
- Emotion
When
true
, emotion detection is performed on the entire field.A comma-separated list of target strings that will have any associated emotions detected.
An object that indicates the Categories enrichment will be applied to the specified field.
An object specifiying the semantic roles enrichment and related parameters.
- SemanticRoles
When
true
, entities are extracted from the identified sentence parts.When
true
, keywords are extracted from the identified sentence parts.The maximum number of semantic roles enrichments to extact from each instance of the specified field.
An object specifying the relations enrichment and related parameters.
- Relations
For use with
natural_language_understanding
enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the default public model isen-news
.
An object specifiying the concepts enrichment and related parameters.
- Concepts
The maximum number of concepts enrichments to extact from each instance of the specified field.
ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are
ar
(Arabic),en
(English),fr
(French),de
(German),it
(Italian),pt
(Portuguese),ru
(Russian),es
(Spanish), andsv
(Swedish). Note: Not all features support all languages, automatic detection is recommended.Possible values: [
ar
,en
,fr
,de
,it
,pt
,ru
,es
,sv
]The element extraction model to use, which can be
contract
only. Theelements
enrichment is deprecated.
Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
- Normalizations
Identifies what type of operation to perform.
copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.
move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).
merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.
remove - Deletes the source_field field. The destination_field is ignored for this operation.
remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]The source field for the operation.
The destination field for the operation.
Object containing source parameters for the configuration.
- Source
The type of source to connect to.
box
indicates the configuration is to connect an instance of Enterprise Box.salesforce
indicates the configuration is to connect to Salesforce.sharepoint
indicates the configuration is to connect to Microsoft SharePoint Online.web_crawl
indicates the configuration is to perform a web page crawl.cloud_object_storage
indicates the configuration is to connect to a cloud object store.
Possible values: [
box
,salesforce
,sharepoint
,web_crawl
,cloud_object_storage
]The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.
Object containing the schedule information for the source.
- Schedule
When
true
, the source is re-crawled based on the frequency field in this object. Whenfalse
the source is not re-crawled; Whenfalse
and connecting to Salesforce the source is crawled annually.The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.
The crawl schedule in the specified time_zone.
five_minutes
: Runs every five minutes.hourly
: Runs every hour.daily
: Runs every day between 00:00 and 06:00.weekly
: Runs every week on Sunday between 00:00 and 06:00.monthly
: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values: [
daily
,weekly
,monthly
,five_minutes
,hourly
]
The options object defines which items to crawl from the source system.
- Options
Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to
box
.- Folders
The Box user ID of the user who owns the folder to crawl.
The Box folder ID of the folder to crawl.
The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to
salesforce
.- Objects
The name of the Salesforce document object to crawl. For example,
case
.The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to
sharepoint
.- SiteCollections
The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.
The maximum number of documents to crawl for this site col