Introduction
IBM Watson® Discovery is a cognitive search and content analytics engine that you can add to applications to identify patterns, trends and actionable insights to drive better decision-making. Securely unify structured and unstructured data with pre-enriched content, and use a simplified query language to eliminate the need for manual filtering of results.
This documentation describes Java SDK major version 9. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Node SDK major version 6. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Python SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Ruby SDK major version 2. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes .NET Standard SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Go SDK major version 2. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Swift SDK major version 4. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Unity SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.
The IBM Watson Unity SDK has the following requirements.
- The SDK requires Unity version 2018.2 or later to support Transport Layer Security (TLS) 1.2.
- Set the project settings for both the Scripting Runtime Version and the Api Compatibility Level to
.NET 4.x Equivalent
. - For more information, see TLS 1.0 support.
- Set the project settings for both the Scripting Runtime Version and the Api Compatibility Level to
- The SDK doesn't support the WebGL projects. Change your build settings to any platform except
WebGL
.
For more information about how to install and configure the SDK and SDK Core, see https://github.com/watson-developer-cloud/unity-sdk.
The code examples on this tab use the client library that is provided for Java.
Maven
<dependency>
<groupId>com.ibm.watson</groupId>
<artifactId>ibm-watson</artifactId>
<version>11.0.0</version>
</dependency>
Gradle
compile 'com.ibm.watson:ibm-watson:11.0.0'
GitHub
The code examples on this tab use the client library that is provided for Node.js.
Installation
npm install ibm-watson@^8.0.0
GitHub
The code examples on this tab use the client library that is provided for Python.
Installation
pip install --upgrade "ibm-watson>=7.0.0"
GitHub
The code examples on this tab use the client library that is provided for Ruby.
Installation
gem install ibm_watson
GitHub
The code examples on this tab use the client library that is provided for Go.
go get -u github.com/watson-developer-cloud/go-sdk/v2@v3.0.0
GitHub
The code examples on this tab use the client library that is provided for Swift.
Cocoapods
pod 'IBMWatsonDiscoveryV2', '~> 5.0.0'
Carthage
github "watson-developer-cloud/swift-sdk" ~> 5.0.0
Swift Package Manager
.package(url: "https://github.com/watson-developer-cloud/swift-sdk", from: "5.0.0")
GitHub
The code examples on this tab use the client library that is provided for .NET Standard.
Package Manager
Install-Package IBM.Watson.Discovery.v2 -Version 7.0.0
.NET CLI
dotnet add package IBM.Watson.Discovery.v2 --version 7.0.0
PackageReference
<PackageReference Include="IBM.Watson.Discovery.v2" Version="7.0.0" />
GitHub
The code examples on this tab use the client library that is provided for Unity.
GitHub
IBM Cloud URLs
The base URLs come from the service instance. To find the URL, view the service credentials by clicking the name of the service in the Resource list. Use the value of the URL. Add the method to form the complete API endpoint for your request.
The following example URL represents a Discovery instance that is hosted in Washington DC:
https://api.us-east.discovery.watson.cloud.ibm.com/instances/6bbda3b3-d572-45e1-8c54-22d6ed9e52c2
The following URLs represent the base URLs for Discovery. When you call the API, use the URL that corresponds to the location of your service instance.
- Dallas:
https://api.us-south.discovery.watson.cloud.ibm.com
- Washington DC:
https://api.us-east.discovery.watson.cloud.ibm.com
- Frankfurt:
https://api.eu-de.discovery.watson.cloud.ibm.com
- Sydney:
https://api.au-syd.discovery.watson.cloud.ibm.com
- Tokyo:
https://api.jp-tok.discovery.watson.cloud.ibm.com
- London:
https://api.eu-gb.discovery.watson.cloud.ibm.com
- Seoul:
https://api.kr-seo.discovery.watson.cloud.ibm.com
Set the correct service URL by calling the setServiceUrl()
method of the service instance.
Set the correct service URL by specifying the serviceUrl
parameter when you create the service instance.
Set the correct service URL by calling the set_service_url()
method of the service instance.
Set the correct service URL by specifying the service_url
property of the service instance.
Set the correct service URL by calling the SetServiceURL()
method of the service instance.
Set the correct service URL by setting the serviceURL
property of the service instance.
Set the correct service URL by calling the SetServiceUrl()
method of the service instance.
Set the correct service URL by calling the SetServiceUrl()
method of the service instance.
Dallas API endpoint example for services managed on IBM Cloud
curl -X {request_method} -u "apikey:{apikey}" "https://api.us-south.discovery.watson.cloud.ibm.com/instances/{instance_id}"
Your service instance might not use this URL
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
Discovery discovery = new Discovery("{version}", authenticator);
discovery.setServiceUrl("https://api.us-east.discovery.watson.cloud.ibm.com");
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
const DiscoveryV2 = require('ibm-watson/discovery/v2');
const { IamAuthenticator } = require('ibm-watson/auth');
const discovery = new DiscoveryV2({
version: '{version}',
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: 'https://api.us-east.discovery.watson.cloud.ibm.com',
});
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
from ibm_watson import DiscoveryV2
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('{apikey}')
discovery = DiscoveryV2(
version='{version}',
authenticator=authenticator
)
discovery.set_service_url('https://api.us-east.discovery.watson.cloud.ibm.com')
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
require "ibm_watson/authenticators"
require "ibm_watson/discovery_v2"
include IBMWatson
authenticator = Authenticators::IamAuthenticator.new(
apikey: "{apikey}"
)
discovery = DiscoveryV2.new(
version: "{version}",
authenticator: authenticator
)
discovery.service_url = "https://api.us-east.discovery.watson.cloud.ibm.com"
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
discovery, discoveryErr := discoveryv2.NewDiscoveryV2(options)
if discoveryErr != nil {
panic(discoveryErr)
}
discovery.SetServiceURL("https://api.us-east.discovery.watson.cloud.ibm.com")
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let discovery = Discovery(version: "{version}", authenticator: authenticator)
discovery.serviceURL = "https://api.us-east.discovery.watson.cloud.ibm.com"
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("https://api.us-east.discovery.watson.cloud.ibm.com");
Default URL
https://api.us-south.discovery.watson.cloud.ibm.com
Example for the Washington DC location
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("https://api.us-east.discovery.watson.cloud.ibm.com");
Cloud Pak for Data URLs
For services installed on Cloud Pak for Data, the base URLs come from both the cluster and service instance.
You can find the base URL from the Cloud Pak for Data web client in the details page about the instance. Click the name of the service in your list of instances to see the URL.
Use that URL in your requests to Discovery v2. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.
Set the URL by calling the setServiceUrl()
method of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.
Set the correct service URL by specifying the serviceUrl
parameter when you create the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.
Set the correct service URL by specifying the url
parameter when you create the service instance or by calling the set_url()
method of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.
Set the correct service URL by specifying the url
parameter when you create the service instance or by calling the url=
method of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.
Set the correct service URL by specifying the URL
parameter when you create the service instance or by calling the SetURL=
method of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.
Set the correct service URL by setting the serviceURL
property of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.
Set the correct service URL by calling the SetEndpoint()
method of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.
Set the correct service URL by setting the Url
property of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.
Endpoint example for Cloud Pak for Data
curl -X {request_method} -H "Authorization: Bearer {token}" "https://{cpd_cluster_host}{:port}/discovery/{deployment_id}/instances/{instance_id}/api"
Endpoint example for Cloud Pak for Data
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}", "{username}", "{password}");
Discovery discovery = new Discovery("{version}", authenticator);
discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{deployment_id}/instances/{instance_id}/api");
Endpoint example for Cloud Pak for Data
const DiscoveryV2 = require('ibm-watson/discovery/v2');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');
const discovery = new DiscoveryV2({
version: '{version}',
authenticator: new CloudPakForDataAuthenticator({
username: '{username}',
password: '{password}',
url: 'https://{cpd_cluster_host}{:port}',
}),
serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{deployment_id}/instances/{instance_id}/api',
});
Endpoint example for Cloud Pak for Data
from ibm_watson import DiscoveryV2
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator
authenticator = CloudPakForDataAuthenticator(
'{username}',
'{password}',
'https://{cpd_cluster_host}{:port}'
)
discovery = DiscoveryV2(
version='{version}',
authenticator=authenticator
)
discovery.set_service_url('https://{cpd_cluster_host}{:port}/discovery/{deployment_id}/instances/{instance_id}/api')
Endpoint example for Cloud Pak for Data
require "ibm_watson/authenticators"
require "ibm_watson/discovery_v2"
include IBMWatson
authenticator = Authenticators::CLoudPakForDataAuthenticator.new(
username: "{username}",
password: "{password}",
url: "https://{cpd_cluster_host}{:port}"
)
discovery = DiscoveryV2.new(
version: "{version}",
authenticator: authenticator
)
discovery.service_url = "https://{cpd_cluster_host}{:port}/discovery/{deployment_id}/instances/{instance_id}/api"
Endpoint example for Cloud Pak for Data
discovery, discoveryErr := discoveryv2.NewDiscoveryV2(options)
if discoveryErr != nil {
panic(discoveryErr)
}
discovery.SetServiceURL("https://{cpd_cluster_host}{:port}/discovery/{deployment_id}/instances/{instance_id}/api")
Endpoint example for Cloud Pak for Data
let authenticator = CloudPakForDataAuthenticator(username: "{username}", password: "{password}", url: "https://{cpd_cluster_host}{:port}")
let discovery = Discovery(version: "{version}", authenticator: authenticator)
discovery.serviceURL = "https://{cpd_cluster_host}{:port}/discovery/{deployment_id}/instances/{instance_id}/api"
Endpoint example for Cloud Pak for Data
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
url: "https://{cpd_cluster_host}{:port}",
username: "{username}",
password: "{password}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{deployment_id}/instances/{instance_id}/api");
Endpoint example for Cloud Pak for Data
var authenticator = new CloudPakForDataAuthenticator(
url: "https://{cpd_cluster_host}{:port}",
username: "{username}",
password: "{password}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{deployment_id}/instances/{instance_id}/api");
Disabling SSL verification
All Watson services use Secure Sockets Layer (SSL) (or Transport Layer Security (TLS)) for secure connections between the client and server. The connection is verified against the local certificate store to ensure authentication, integrity, and confidentiality.
If you use a self-signed certificate, you need to disable SSL verification to make a successful connection.
Enabling SSL verification is highly recommended. Disabling SSL jeopardizes the security of the connection and data. Disable SSL only if necessary, and take steps to enable SSL as soon as possible.
To disable SSL verification for a curl request, use the --insecure
(-k
) option with the request.
To disable SSL verification, create an HttpConfigOptions
object and set the disableSslVerification
property to true
. Then, pass the object to the service instance by using the configureClient
method.
To disable SSL verification, set the disableSslVerification
parameter to true
when you create the service instance.
To disable SSL verification, specify True
on the set_disable_ssl_verification
method for the service instance.
To disable SSL verification, set the disable_ssl_verification
parameter to true
in the configure_http_client()
method for the service instance.
To disable SSL verification, call the DisableSSLVerification
method on the service instance.
To disable SSL verification, call the disableSSLVerification()
method on the service instance. You cannot disable SSL verification on Linux.
To disable SSL verification, set the DisableSslVerification
method to true
on the service instance.
To disable SSL verification, set the DisableSslVerification
method to true
on the service instance.
Example to disable SSL verification with a service managed on IBM Cloud. Replace {apikey}
and {url}
with your service credentials.
curl -k -X {request_method} -u "apikey:{apikey}" "{url}/{method}"
Example to disable SSL verification with a service managed on IBM Cloud
IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
Discovery discovery = new Discovery("{version}", authenticator);
discovery.setServiceUrl("{url}");
HttpConfigOptions configOptions = new HttpConfigOptions.Builder()
.disableSslVerification(true)
.build();
discovery.configureClient(configOptions);
Example to disable SSL verification with a service managed on IBM Cloud
const DiscoveryV2 = require('ibm-watson/discovery/v2');
const { IamAuthenticator } = require('ibm-watson/auth');
const discovery = new DiscoveryV2({
version: '{version}',
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: '{url}',
disableSslVerification: true,
});
Example to disable SSL verification with a service managed on IBM Cloud
from ibm_watson import DiscoveryV2
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('{apikey}')
discovery = DiscoveryV2(
version='{version}',
authenticator=authenticator
)
discovery.set_service_url('{url}')
discovery.set_disable_ssl_verification(True)
Example to disable SSL verification with a service managed on IBM Cloud
require "ibm_watson/authenticators"
require "ibm_watson/discovery_v2"
include IBMWatson
authenticator = Authenticators::IamAuthenticator.new(
apikey: "{apikey}"
)
discovery = DiscoveryV2.new(
version: "{version}",
authenticator: authenticator
)
discovery.service_url = "{url}"
discovery.configure_http_client(disable_ssl_verification: true)
Example to disable SSL verification with a service managed on IBM Cloud
discovery, discoveryErr := discoveryv2.NewDiscoveryV2(options)
if discoveryErr != nil {
panic(discoveryErr)
}
discovery.SetServiceURL("{url}")
discovery.DisableSSLVerification()
Example to disable SSL verification with a service managed on IBM Cloud
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let discovery = Discovery(version: "{version}", authenticator: authenticator)
discovery.serviceURL = "{url}"
discovery.disableSSLVerification()
Example to disable SSL verification with a service managed on IBM Cloud
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.DisableSslVerification(true);
Example to disable SSL verification with a service managed on IBM Cloud
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.DisableSslVerification = true;
Example to disable SSL verification with an installed service
curl -k -X {request_method} -H "Authorization: Bearer {token}" "{url}/v2/{method}"
Example to disable SSL verification with an installed service
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}", "{username}", "{password}");
Discovery discovery = new Discovery("{version}", authenticator);
discovery.setServiceUrl("{url}";
HttpConfigOptions configOptions = new HttpConfigOptions.Builder()
.disableSslVerification(true)
.build();
discovery.configureClient(configOptions);
Example to disable SSL verification with an installed service
const DiscoveryV2 = require('ibm-watson/discovery/v2');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');
const discovery = new DiscoveryV2({
version: '{version}',
authenticator: new CloudPakForDataAuthenticator({
username: '{username}',
password: '{password}',
url: 'https://{cpd_cluster_host}{:port}',
}),
serviceUrl: '{url}',
disableSslVerification: true,
});
Example to disable SSL verification with an installed service
from ibm_watson import DiscoveryV2
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator
authenticator = CloudPakForDataAuthenticator(
'{username}',
'{password}'
)
discovery = DiscoveryV2(
version='{version}',
authenticator=authenticator
)
discovery.set_service_url('{url}')
discovery.set_disable_ssl_verification(True)
Example to disable SSL verification with an installed service
require "ibm_watson/authenticators"
require "ibm_watson/discovery_v2"
include IBMWatson
authenticator = Authenticators::CLoudPakForDataAuthenticator.new(
username: "{username}",
password: "{password}",
url: "https://{cpd_cluster_host}{:port}"
)
discovery = DiscoveryV2.new(
version: "{version}",
authenticator: authenticator
)
discovery.service_url = "{url}"
discovery.configure_http_client(disable_ssl_verification: true)
Example to disable SSL verification with an installed service
discovery, discoveryErr := discoveryv2.NewDiscoveryV2(options)
if discoveryErr != nil {
panic(discoveryErr)
}
discovery.SetServiceURL("{url}")
discovery.DisableSSLVerification()
Example to disable SSL verification with an installed service
let authenticator = WatsonCloudPakForDataAuthenticator(username: "{username}", password: "{password}", url: "https://{cpd_cluster_host}{:port}")
let discovery = Discovery(version: "{version}", authenticator: authenticator)
discovery.serviceURL = "{url}"
discovery.disableSSLVerification()
Example to disable SSL verification with an installed service
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
url: "https://{cpd_cluster_host}{:port}",
username: "{username}",
password: "{password}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.DisableSslVerification(true);
Example to disable SSL verification with an installed service
var authenticator = new CloudPakForDataAuthenticator(
url: "https://{cpd_cluster_host}{:port}",
username: "{username}",
password: "{password}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.DisableSslVerification = true;
IBM Cloud
For IBM Cloud instances, you authenticate to the API by using IBM Cloud Identity and Access Management (IAM).
You can pass either a bearer token in an authorization header or an API key. Tokens support authenticated requests without embedding service credentials in every call. API keys use basic authentication. For more information, see Authenticating to Watson services.
- For testing and development, you can pass an API key directly.
- For production use, unless you use the Watson SDKs, use an IAM token.
If you pass in an API key, use apikey
for the username and the value of the API key as the password. For example, if the API key is f5sAznhrKQyvBFFaZbtF60m5tzLbqWhyALQawBg5TjRI
in the service credentials, include the credentials in your call like this:
curl -u "apikey:f5sAznhrKQyvBFFaZbtF60m5tzLbqWhyALQawBg5TjRI"
For IBM Cloud instances, the SDK provides initialization methods for each form of authentication.
- Use the API key to have the SDK manage the lifecycle of the access token. The SDK requests an access token, ensures that the access token is valid, and refreshes it if necessary.
- Use the access token to manage the lifecycle yourself. You must periodically refresh the token.
For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.
IBM Cloud. Replace {apikey}
and {url}
with your service credentials.
curl -X {request_method} -u "apikey:{apikey}" "{url}/v2/{method}"
IBM Cloud. SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
Discovery discovery = new Discovery("{version}", authenticator);
discovery.setServiceUrl("{url}");
IBM Cloud. SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
const DiscoveryV2 = require('ibm-watson/discovery/v2');
const { IamAuthenticator } = require('ibm-watson/auth');
const discovery = new DiscoveryV2({
version: '{version}',
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: '{url}',
});
IBM Cloud. SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
from ibm_watson import DiscoveryV2
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('{apikey}')
discovery = DiscoveryV2(
version='{version}',
authenticator=authenticator
)
discovery.set_service_url('{url}')
IBM Cloud. SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
require "ibm_watson/authenticators"
require "ibm_watson/discovery_v2"
include IBMWatson
authenticator = Authenticators::IamAuthenticator.new(
apikey: "{apikey}"
)
discovery = DiscoveryV2.new(
version: "{version}",
authenticator: authenticator
)
discovery.service_url = "{url}"
IBM Cloud. SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
import (
"github.com/IBM/go-sdk-core/core"
"github.com/watson-developer-cloud/go-sdk/discoveryv2"
)
func main() {
authenticator := &core.IamAuthenticator{
ApiKey: "{apikey}",
}
options := &discoveryv2.DiscoveryV2Options{
Version: "{version}",
Authenticator: authenticator,
}
discovery, discoveryErr := discoveryv2.NewDiscoveryV2(options)
if discoveryErr != nil {
panic(discoveryErr)
}
discovery.SetServiceURL("{url}")
}
IBM Cloud. SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let discovery = Discovery(version: "{version}", authenticator: authenticator)
discovery.serviceURL = "{url}"
IBM Cloud. SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
IBM Cloud. SDK managing the IAM token. Replace {apikey}
, {version}
, and {url}
.
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
Cloud Pak for Data
For Cloud Pak for Data, you pass a bearer token in an Authorization
header to authenticate to the API. The token is associated with a username.
- For testing and development, you can use the bearer token that's displayed in the Cloud Pak for Data web client. To find this token, view the details for the service instance by clicking the name of the service in your list of instances. The details also include the service endpoint URL. Don't use this token in production because it does not expire.
- For production use, create a user in the Cloud Pak for Data web client to use for authentication. Generate a token from that user's credentials with the
POST /v1/authorize
method.
For more information, see the Get authorization token method of the Cloud Pak for Data API reference.
For Cloud Pak for Data instances, pass either username and password credentials or a bearer token that you generate to authenticate to the API. Username and password credentials use basic authentication. However, the SDK manages the lifecycle of the token. Tokens are temporary security credentials. If you pass a token, you maintain the token lifecycle.
For production use, create a user in the Cloud Pak for Data web client to use for authentication, and decide which authentication mechanism to use.
- To have the SDK manage the lifecycle of the token, use the username and password for that new user in your calls.
- To manage the lifecycle of the token yourself, generate a token from that user's credentials. Call the
POST /v1/authorize
method to generate the token, and then pass the token in anAuthorization
header in your calls. You can see an example of the method on the Curl tab.
For more information, see the Get authorization token method of the Cloud Pak for Data API reference.
Don't use the bearer token that's displayed in the web client for the instance except during testing and development because that token does not expire.
To find your value for {url}
, view the details for the service instance by clicking the name of the service in your list of instances in the Cloud Pak for Data web client.
Cloud Pak for Data. Generating a bearer token.
Replace {cpd_cluster_host}
and {port}
with the details for the service instance. Replace {username}
and {password}
with your Cloud Pak for Data credentials.
curl -k -X POST -H "cache-control: no-cache" -H "Content-Type: application/json" -d "{\"username\":\"{username}\",\"password\":\"{password}\"}" "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize"
The response includes a token
property.
Authenticating to the API. Replace {token}
with your details.
curl -H "Authorization: Bearer {token}" "{url}/v2/{method}"
Cloud Pak for Data. SDK managing the token.
Replace {username}
and {password}
with your Cloud Pak for Data credentials. Replace {version}
with the service version date. For {url}
, see Endpoint URLs.
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}");
Discovery discovery = new Discovery("{version}", authenticator);
discovery.setServiceUrl("{url}");
Cloud Pak for Data. SDK managing the token.
Replace {username}
and {password}
with your Cloud Pak for Data credentials. Replace {version}
with the service version date. For {url}
, see Endpoint URLs.
const DiscoveryV2 = require('ibm-watson/discovery/v2');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');
const discovery = new DiscoveryV2({
version: '{version}',
authenticator: new CloudPakForDataAuthenticator({
username: '{username}',
password: '{password}',
url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize',
}),
serviceUrl: '{url}',
});
Cloud Pak for Data. SDK managing the token.
Replace {username}
and {password}
with your Cloud Pak for Data credentials. Replace {version}
with the service version date. For {url}
, see Endpoint URLs.
from ibm_watson import DiscoveryV2
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator
authenticator = CloudPakForDataAuthenticator(
'{username}',
'{password}',
'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize'
)
discovery = DiscoveryV2(
version='{version}',
authenticator=authenticator
)
discovery.set_service_url('{url}')
Cloud Pak for Data. SDK managing the token.
Replace {username}
and {password}
with your Cloud Pak for Data credentials. Replace {version}
with the service version date. For {url}
, see Endpoint URLs.
require "ibm_watson/authenticators"
require "ibm_watson/discovery_v2"
include IBMWatson
authenticator = Authenticators::CloudPakForDataAuthenticator.new(
username: "{username}",
password: "{password}",
url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize"
)
discovery = DiscoveryV2.new(
version: "{version}",
authenticator: authenticator
)
discovery.service_url = "{url}"
Cloud Pak for Data. SDK managing the token.
Replace {username}
and {password}
with your Cloud Pak for Data credentials. Replace {version}
with the service version date. For {url}
, see Endpoint URLs.
import (
"github.com/IBM/go-sdk-core/core"
"github.com/watson-developer-cloud/go-sdk/discoveryv2"
)
func main() {
authenticator := &core.CloudPakForDataAuthenticator{
URL: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
Username: "{username}",
Password: "{password}",
}
options := &discoveryv2.DiscoveryV2Options{
Version: "{version}",
Authenticator: authenticator,
}
discovery, discoveryErr := discoveryv2.NewDiscoveryV2(options)
if discoveryErr != nil {
panic(discoveryErr)
}
discovery.SetServiceURL("{url}")
}
Cloud Pak for Data. SDK managing the token.
Replace {username}
and {password}
with your Cloud Pak for Data credentials. Replace {version}
with the service version date. For {url}
, see Endpoint URLs.
let authenticator = WatsonCloudPakForDataAuthenticator(username: "{username}", password: "{password}", url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize")
let discovery = Discovery(version: "{version}", authenticator: authenticator)
discovery.serviceURL = "{url}"
Cloud Pak for Data. SDK managing the token.
Replace {username}
and {password}
with your Cloud Pak for Data credentials. Replace {version}
with the service version date. For {cpd_cluster_host}
, {port}
, {release}
, and {instance_id}
, see Endpoint URLs.
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
username: "{username}",
password: "{password}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
Cloud Pak for Data. SDK managing the token.
Replace {username}
and {password}
with your Cloud Pak for Data credentials. Replace {version}
with the service version date. For {cpd_cluster_host}
, {port}
, {release}
, and {instance_id}
, see Endpoint URLs.
var authenticator = new CloudPakForDataAuthenticator(
url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
username: "{username}",
password: "{password}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
Access between services
Your application might use more than one Watson service. You can grant access between services and you can grant access to more than one service for your applications.
For IBM Cloud services, the method to grant access between Watson services varies depending on the type of API key. For more information, see IAM access.
- To grant access between IBM Cloud services, create an authorization between the services. For more information, see Granting access between services.
- To grant access to your services by applications without using user credentials, create a service ID, add an API key, and assign access policies. For more information, see Creating and working with service IDs.
When you give a user ID access to multiple services, use an endpoint URL that includes the service instance ID (for example, https://api.us-south.discovery.watson.cloud.ibm.com/instances/6bbda3b3-d572-45e1-8c54-22d6ed9e52c2
). You can find the instance ID in two places:
-
By clicking the service instance row in the Resource list. The instance ID is the GUID in the details pane.
-
By clicking the name of the service instance in the list and looking at the credentials URL.
If you don't see the instance ID in the URL, the credentials predate service IDs. Add new credentials from the Service credentials page and use those credentials.
Because the Cloud Pak for Data bearer token is associated with a username, you can use the token for all CPD Watson services that are associated with the username.
Versioning
API requests require a version parameter that takes a date in the format version=YYYY-MM-DD
. When the API is updated with any breaking changes, the service introduces a new version date for the API.
Send the version parameter with every API request. The service uses the API version for the date you specify, or the most recent version before that date. Don't default to the current date. Instead, specify a date that matches a version that is compatible with your app, and don't change it until your app is ready for a later version.
Specify the version to use on API requests with the version parameter when you create the service instance. The service uses the API version for the date you specify, or the most recent version before that date. Don't default to the current date. Instead, specify a date that matches a version that is compatible with your app, and don't change it until your app is ready for a later version.
This documentation describes the current version of Discovery, 2023-03-31
. In some cases, differences in earlier versions are noted in the descriptions of parameters and response models.
Error handling
Discovery uses standard HTTP response codes to indicate whether a method completed successfully. HTTP response codes in the 2xx range indicate success. A response in the 4xx range is some sort of failure, and a response in the 5xx range usually indicates an internal system error that cannot be resolved by the user. Response codes are listed with the method.
ErrorResponse
Name | Description |
---|---|
code integer |
The HTTP response code. |
error string |
General description of an error. |
The Java SDK generates an exception for any unsuccessful method invocation. All methods that accept an argument can also throw an IllegalArgumentException
.
Exception | Description |
---|---|
IllegalArgumentException | An invalid argument was passed to the method. |
When the Java SDK receives an error response from the Discovery service, it generates an exception from the com.ibm.watson.developer_cloud.service.exception
package. All service exceptions contain the following fields.
Field | Description |
---|---|
statusCode | The HTTP response code that is returned. |
message | A message that describes the error. |
When the Node SDK receives an error response from the Discovery service, it creates an Error
object with information that describes the error that occurred. This error object is passed as the first parameter to the callback function for the method. The contents of the error object are as shown in the following table.
Error
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
The Python SDK generates an exception for any unsuccessful method invocation. When the Python SDK receives an error response from the Discovery service, it generates an ApiException
with the following fields.
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
info | A dictionary of additional information about the error. |
When the Ruby SDK receives an error response from the Discovery service, it generates an ApiException
with the following fields.
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
info | A dictionary of additional information about the error. |
The Go SDK generates an error for any unsuccessful service instantiation and method invocation. You can check for the error immediately. The contents of the error object are as shown in the following table.
Error
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
The Swift SDK returns a WatsonError
in the completionHandler
any unsuccessful method invocation. This error type is an enum that conforms to LocalizedError
and contains an errorDescription
property that returns an error message. Some of the WatsonError
cases contain associated values that reveal more information about the error.
Field | Description |
---|---|
errorDescription | A message that describes the error. |
When the .NET Standard SDK receives an error response from the Discovery service, it generates a ServiceResponseException
with the following fields.
Field | Description |
---|---|
Message | A message that describes the error. |
CodeDescription | The HTTP response code that is returned. |
When the Unity SDK receives an error response from the Discovery service, it generates an IBMError
with the following fields.
Field | Description |
---|---|
Url | The URL that generated the error. |
StatusCode | The HTTP response code returned. |
ErrorMessage | A message that describes the error. |
Response | The contents of the response from the server. |
ResponseHeaders | A dictionary of headers returned by the request. |
Example error handling
try {
// Invoke a method
} catch (NotFoundException e) {
// Handle Not Found (404) exception
} catch (RequestTooLargeException e) {
// Handle Request Too Large (413) exception
} catch (ServiceResponseException e) {
// Base class for all exceptions caused by error responses from the service
System.out.println("Service returned status code "
+ e.getStatusCode() + ": " + e.getMessage());
}
Example error handling
discovery.method(params)
.catch(err => {
console.log('error:', err);
});
Example error handling
from ibm_watson import ApiException
try:
# Invoke a method
except ApiException as ex:
print "Method failed with status code " + str(ex.code) + ": " + ex.message
Example error handling
require "ibm_watson"
begin
# Invoke a method
rescue IBMWatson::ApiException => ex
print "Method failed with status code #{ex.code}: #{ex.error}"
end
Example error handling
import "github.com/watson-developer-cloud/go-sdk/discoveryv2"
// Instantiate a service
discovery, discoveryErr := discoveryv2.NewDiscoveryV2(options)
// Check for errors
if discoveryErr != nil {
panic(discoveryErr)
}
// Call a method
result, _, responseErr := discovery.MethodName(&methodOptions)
// Check for errors
if responseErr != nil {
panic(responseErr)
}
Example error handling
discovery.method() {
response, error in
if let error = error {
switch error {
case let .http(statusCode, message, metadata):
switch statusCode {
case .some(404):
// Handle Not Found (404) exception
print("Not found")
case .some(413):
// Handle Request Too Large (413) exception
print("Payload too large")
default:
if let statusCode = statusCode {
print("Error - code: \(statusCode), \(message ?? "")")
}
}
default:
print(error.localizedDescription)
}
return
}
guard let result = response?.result else {
print(error?.localizedDescription ?? "unknown error")
return
}
print(result)
}
Example error handling
try
{
// Invoke a method
}
catch(ServiceResponseException e)
{
Console.WriteLine("Error: " + e.Message);
}
catch (Exception e)
{
Console.WriteLine("Error: " + e.Message);
}
Example error handling
// Invoke a method
discovery.MethodName(Callback, Parameters);
// Check for errors
private void Callback(DetailedResponse<ExampleResponse> response, IBMError error)
{
if (error == null)
{
Log.Debug("ExampleCallback", "Response received: {0}", response.Response);
}
else
{
Log.Debug("ExampleCallback", "Error received: {0}, {1}, {3}", error.StatusCode, error.ErrorMessage, error.Response);
}
}
Additional headers
Some Watson services accept special parameters in headers that are passed with the request.
You can pass request header parameters in all requests or in a single request to the service.
To pass a request header, use the --header
(-H
) option with a curl request.
To pass header parameters with every request, use the setDefaultHeaders
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, use the addHeader
method as a modifier on the request before you execute it.
To pass header parameters with every request, specify the headers
parameter when you create the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, use the headers
method as a modifier on the request before you execute it.
To pass header parameters with every request, specify the set_default_headers
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, include headers
as a dict
in the request.
To pass header parameters with every request, specify the add_default_headers
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, specify the headers
method as a chainable method in the request.
To pass header parameters with every request, specify the SetDefaultHeaders
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, specify the Headers
as a map
in the request.
To pass header parameters with every request, add them to the defaultHeaders
property of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, pass the headers
parameter to the request method.
To pass header parameters in a single request, use the WithHeader()
method as a modifier on the request before you execute it. See Data collection for an example use of this method.
To pass header parameters in a single request, use the WithHeader()
method as a modifier on the request before you execute it.
Example header parameter in a request
curl -X {request_method} -H "Request-Header: {header_value}" "{url}/v2/{method}"
Example header parameter in a request
ReturnType returnValue = discovery.methodName(parameters)
.addHeader("Custom-Header", "{header_value}")
.execute();
Example header parameter in a request
const parameters = {
{parameters}
};
discovery.methodName(
parameters,
headers: {
'Custom-Header': '{header_value}'
})
.then(result => {
console.log(response);
})
.catch(err => {
console.log('error:', err);
});
Example header parameter in a request
response = discovery.methodName(
parameters,
headers = {
'Custom-Header': '{header_value}'
})
Example header parameter in a request
response = discovery.headers(
"Custom-Header" => "{header_value}"
).methodName(parameters)
Example header parameter in a request
result, _, responseErr := discovery.MethodName(
&methodOptions{
Headers: map[string]string{
"Accept": "application/json",
},
},
)
Example header parameter in a request
let customHeader: [String: String] = ["Custom-Header": "{header_value}"]
discovery.methodName(parameters, headers: customHeader) {
response, error in
}
Example header parameter in a request for a service managed on IBM Cloud
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.WithHeader("Custom-Header", "header_value");
Example header parameter in a request for an installed service
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
url: "https://{cpd_cluster_host}{:port}",
username: "{username}",
password: "{password}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api");
discovery.WithHeader("Custom-Header", "header_value");
Example header parameter in a request for a service managed on IBM Cloud
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.WithHeader("Custom-Header", "header_value");
Example header parameter in a request for an installed service
var authenticator = new CloudPakForDataAuthenticator(
url: "https://{cpd_cluster_host}{:port}",
username: "{username}",
password: "{password}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api");
discovery.WithHeader("Custom-Header", "header_value");
Response details
The Discovery service might return information to the application in response headers.
To access all response headers that the service returns, include the --include
(-i
) option with a curl request. To see detailed response data for the request, including request headers, response headers, and extra debugging information, include the --verbose
(-v
) option with the request.
Example request to access response headers
curl -X {request_method} {authentication_method} --include "{url}/v2/{method}"
To access information in the response headers, use one of the request methods that returns details with the response: executeWithDetails()
, enqueueWithDetails()
, or rxWithDetails()
. These methods return a Response<T>
object, where T
is the expected response model. Use the getResult()
method to access the response object for the method, and use the getHeaders()
method to access information in response headers.
Example request to access response headers
Response<ReturnType> response = discovery.methodName(parameters)
.executeWithDetails();
// Access response from methodName
ReturnType returnValue = response.getResult();
// Access information in response headers
Headers responseHeaders = response.getHeaders();
All response data is available in the Response<T>
object that is returned by each method. To access information in the response
object, use the following properties.
Property | Description |
---|---|
result |
Returns the response for the service-specific method. |
headers |
Returns the response header information. |
status |
Returns the HTTP status code. |
Example request to access response headers
discovery.methodName(parameters)
.then(response => {
console.log(response.headers);
})
.catch(err => {
console.log('error:', err);
});
The return value from all service methods is a DetailedResponse
object. To access information in the result object or response headers, use the following methods.
DetailedResponse
Method | Description |
---|---|
get_result() |
Returns the response for the service-specific method. |
get_headers() |
Returns the response header information. |
get_status_code() |
Returns the HTTP status code. |
Example request to access response headers
discovery.set_detailed_response(True)
response = discovery.methodName(parameters)
# Access response from methodName
print(json.dumps(response.get_result(), indent=2))
# Access information in response headers
print(response.get_headers())
# Access HTTP response status
print(response.get_status_code())
The return value from all service methods is a DetailedResponse
object. To access information in the response
object, use the following properties.
DetailedResponse
Property | Description |
---|---|
result |
Returns the response for the service-specific method. |
headers |
Returns the response header information. |
status |
Returns the HTTP status code. |
Example request to access response headers
response = discovery.methodName(parameters)
# Access response from methodName
print response.result
# Access information in response headers
print response.headers
# Access HTTP response status
print response.status
The return value from all service methods is a DetailedResponse
object. To access information in the response
object or response headers, use the following methods.
DetailedResponse
Method | Description |
---|---|
GetResult() |
Returns the response for the service-specific method. |
GetHeaders() |
Returns the response header information. |
GetStatusCode() |
Returns the HTTP status code. |
Example request to access response headers
import (
"github.com/IBM/go-sdk-core/core"
"github.com/watson-developer-cloud/go-sdk/discoveryv2"
)
result, response, responseErr := discovery.MethodName(
&methodOptions{})
// Access result
core.PrettyPrint(response.GetResult(), "Result ")
// Access response headers
core.PrettyPrint(response.GetHeaders(), "Headers ")
// Access status code
core.PrettyPrint(response.GetStatusCode(), "Status Code ")
All response data is available in the WatsonResponse<T>
object that is returned in each method's completionHandler
.
Example request to access response headers
discovery.methodName(parameters) {
response, error in
guard let result = response?.result else {
print(error?.localizedDescription ?? "unknown error")
return
}
print(result) // The data returned by the service
print(response?.statusCode)
print(response?.headers)
}
The response contains fields for response headers, response JSON, and the status code.
DetailedResponse
Property | Description |
---|---|
Result |
Returns the result for the service-specific method. |
Response |
Returns the raw JSON response for the service-specific method. |
Headers |
Returns the response header information. |
StatusCode |
Returns the HTTP status code. |
Example request to access response headers
var results = discovery.MethodName(parameters);
var result = results.Result; // The result object
var responseHeaders = results.Headers; // The response headers
var responseJson = results.Response; // The raw response JSON
var statusCode = results.StatusCode; // The response status code
The response contains fields for response headers, response JSON, and the status code.
DetailedResponse
Property | Description |
---|---|
Result |
Returns the result for the service-specific method. |
Response |
Returns the raw JSON response for the service-specific method. |
Headers |
Returns the response header information. |
StatusCode |
Returns the HTTP status code. |
Example request to access response headers
private void Example()
{
discovery.MethodName(Callback, Parameters);
}
private void Callback(DetailedResponse<ResponseType> response, IBMError error)
{
var result = response.Result; // The result object
var responseHeaders = response.Headers; // The response headers
var responseJson = reresponsesults.Response; // The raw response JSON
var statusCode = response.StatusCode; // The response status code
}
Data collection (IBM Cloud)
By default, Discovery service instances managed on IBM Cloud that are not part of Premium plans collect data about API requests and their results. This data is collected only to improve the services for future users. The collected data is not shared or made public. Data is not collected for services that are part of Premium plans.
To prevent IBM usage of your data for an API request, set the X-Watson-Learning-Opt-Out header parameter to true
. You can also disable request logging at the account level. For more information, see Controlling request logging for Watson services.
You must set the header on each request that you do not want IBM to access for general service improvements.
You can set the header by using the setDefaultHeaders
method of the service object.
You can set the header by using the headers
parameter when you create the service object.
You can set the header by using the set_default_headers
method of the service object.
You can set the header by using the add_default_headers
method of the service object.
You can set the header by using the SetDefaultHeaders
method of the service object.
You can set the header by adding it to the defaultHeaders
property of the service object.
You can set the header by using the WithHeader()
method of the service object.
Example request with a service managed on IBM Cloud
curl -u "apikey:{apikey}" -H "X-Watson-Learning-Opt-Out: true" "{url}/{method}"
Example request with a service managed on IBM Cloud
Map<String, String> headers = new HashMap<String, String>();
headers.put("X-Watson-Learning-Opt-Out", "true");
discovery.setDefaultHeaders(headers);
Example request with a service managed on IBM Cloud
const DiscoveryV2 = require('ibm-watson/discovery/v2');
const { IamAuthenticator } = require('ibm-watson/auth');
const discovery = new DiscoveryV2({
version: '{version}',
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: '{url}',
headers: {
'X-Watson-Learning-Opt-Out': 'true'
}
});
Example request with a service managed on IBM Cloud
discovery.set_default_headers({'x-watson-learning-opt-out': "true"})
Example request with a service managed on IBM Cloud
discovery.add_default_headers(headers: {"x-watson-learning-opt-out" => "true"})
Example request with a service managed on IBM Cloud
import "net/http"
headers := http.Header{}
headers.Add("x-watson-learning-opt-out", "true")
discovery.SetDefaultHeaders(headers)
Example request with a service managed on IBM Cloud
discovery.defaultHeaders["X-Watson-Learning-Opt-Out"] = "true"
Example request with a service managed on IBM Cloud
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
DiscoveryService discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.WithHeader("X-Watson-Learning-Opt-Out", "true");
Example request with a service managed on IBM Cloud
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var discovery = new DiscoveryService("{version}", authenticator);
discovery.SetServiceUrl("{url}");
discovery.WithHeader("X-Watson-Learning-Opt-Out", "true");
Synchronous and asynchronous requests
The Java SDK supports both synchronous (blocking) and asynchronous (non-blocking) execution of service methods. All service methods implement the ServiceCall interface.
- To call a method synchronously, use the
execute
method of theServiceCall
interface. You can call theexecute
method directly from an instance of the service. - To call a method asynchronously, use the
enqueue
method of theServiceCall
interface to receive a callback when the response arrives. The ServiceCallback interface of the method's argument providesonResponse
andonFailure
methods that you override to handle the callback.
The Ruby SDK supports both synchronous (blocking) and asynchronous (non-blocking) execution of service methods. All service methods implement the Concurrent::Async module. When you use the synchronous or asynchronous methods, an IVar object is returned. You access the DetailedResponse
object by calling ivar_object.value
.
For more information about the Ivar object, see the IVar class docs.
-
To call a method synchronously, either call the method directly or use the
.await
chainable method of theConcurrent::Async
module.Calling a method directly (without
.await
) returns aDetailedResponse
object. -
To call a method asynchronously, use the
.async
chainable method of theConcurrent::Async
module.
You can call the .await
and .async
methods directly from an instance of the service.
Example synchronous request
ReturnType returnValue = discovery.method(parameters).execute();
Example asynchronous request
discovery.method(parameters).enqueue(new ServiceCallback<ReturnType>() {
@Override public void onResponse(ReturnType response) {
. . .
}
@Override public void onFailure(Exception e) {
. . .
}
});
Example synchronous request
response = discovery.method_name(parameters)
or
response = discovery.await.method_name(parameters)
Example asynchronous request
response = discovery.async.method_name(parameters)
Methods
List projects
Lists existing projects for this instance.
Lists existing projects for this instance.
Lists existing projects for this instance.
Lists existing projects for this instance.
Lists existing projects for this instance.
GET /v2/projects
ServiceCall<ListProjectsResponse> listProjects(ListProjectsOptions listProjectsOptions)
listProjects(params)
list_projects(
self,
**kwargs,
) -> DetailedResponse
ListProjects()
Request
Use the ListProjectsOptions.Builder
to create a ListProjectsOptions
object that contains the parameter values for the listProjects
method.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
parameters
parameters
parameters
curl {auth} "{url}/v2/projects?version=2023-03-31"
Response
A list of projects in this instance.
An array of project details.
A list of projects in this instance.
An array of project details.
- projects
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancyTrainingStatus
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
A list of projects in this instance.
An array of project details.
- projects
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancy_training_status
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
A list of projects in this instance.
An array of project details.
- projects
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancy_training_status
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
A list of projects in this instance.
An array of project details.
- Projects
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- RelevancyTrainingStatus
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Status Code
Successful response.
Bad request.
{ "projects": [ { "project_id": "fe119406-4111-4cd8-8418-6f0074ee5a12", "type": "document_retrieval", "name": "Sample Project", "collection_count": 1 }, { "project_id": "b1722e7b-0b42-4fce-a6a4-d7cd8a4895f1", "type": "document_retrieval", "name": "My search bar", "collection_count": 5 }, { "project_id": "c8a8f4f1-9bcb-4035-85ae-5ec2336f2e62", "type": "conversational_search", "name": "Tutorial project", "collection_count": 1 } ] }
{ "projects": [ { "project_id": "fe119406-4111-4cd8-8418-6f0074ee5a12", "type": "document_retrieval", "name": "Sample Project", "collection_count": 1 }, { "project_id": "b1722e7b-0b42-4fce-a6a4-d7cd8a4895f1", "type": "document_retrieval", "name": "My search bar", "collection_count": 5 }, { "project_id": "c8a8f4f1-9bcb-4035-85ae-5ec2336f2e62", "type": "conversational_search", "name": "Tutorial project", "collection_count": 1 } ] }
Create a project
Create a new project for this instance
Create a new project for this instance.
Create a new project for this instance.
Create a new project for this instance.
Create a new project for this instance.
POST /v2/projects
ServiceCall<ProjectDetails> createProject(CreateProjectOptions createProjectOptions)
createProject(params)
create_project(
self,
name: str,
type: str,
*,
default_query_parameters: 'DefaultQueryParams' = None,
**kwargs,
) -> DetailedResponse
CreateProject(string name, string type, DefaultQueryParams defaultQueryParameters = null)
Request
Use the CreateProjectOptions.Builder
to create a CreateProjectOptions
object that contains the parameter values for the createProject
method.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
An object that represents the project to be created.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.The Intelligent Document Processing (IDP) project type is available from IBM Cloud-managed instances only.
Allowable values: [
intelligent_document_processing
,document_retrieval
,conversational_search
,content_intelligence
,content_mining
,other
]Default query parameters for this project.
The createProject options.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Allowable values: [
document_retrieval
,conversational_search
,content_intelligence
,content_mining
,other
]Default query parameters for this project.
- defaultQueryParameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- tableResults
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
Default:
0
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggestedRefinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
parameters
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Allowable values: [
document_retrieval
,conversational_search
,content_intelligence
,content_mining
,other
]Default query parameters for this project.
- defaultQueryParameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- table_results
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
Default:
0
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggested_refinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
parameters
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Allowable values: [
document_retrieval
,conversational_search
,content_intelligence
,content_mining
,other
]Default query parameters for this project.
- default_query_parameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- table_results
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
Default:
0
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggested_refinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
parameters
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Allowable values: [
document_retrieval
,conversational_search
,content_intelligence
,content_mining
,other
]Default query parameters for this project.
- defaultQueryParameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- Passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- TableResults
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
Default:
0
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- SuggestedRefinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
curl -X POST {auth} --header "Content-Type: application/json" --data "{ \"name\": \"My project\", \"type\": \"document_retrieval\" }" "{url}/v2/projects?version=2023-03-31"
Response
Detailed information about the specified project.
The Universally Unique Identifier (UUID) of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.The Intelligent Document Processing (IDP) project type is available from IBM Cloud-managed instances only.
Possible values: [
intelligent_document_processing
,document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancy_training_status
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancyTrainingStatus
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- defaultQueryParameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- tableResults
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggestedRefinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancy_training_status
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- default_query_parameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- table_results
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggested_refinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancy_training_status
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- default_query_parameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- table_results
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggested_refinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- RelevancyTrainingStatus
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- DefaultQueryParameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- Passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- TableResults
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- SuggestedRefinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Status Code
The project has successfully been created.
Bad request.
{ "project_id": "6b18887d-d68e-434d-99f1-20925c7654e0", "type": "document_retrieval", "name": "My project", "collection_count": 0, "default_query_parameters": { "aggregation": "[term(enriched_text.entities.text,name:entities)]", "count": 10, "sort": "", "return": [], "passages": { "enabled": true, "count": 10, "fields": [ "text", "title" ], "characters": 200, "per_document": true, "max_per_document": 1, "find_answers": false, "max_answers_per_passage": 1 }, "highlight": false, "spelling_suggestions": false, "table_results": { "enabled": false, "count": 10, "per_document": 0 } } }
{ "project_id": "6b18887d-d68e-434d-99f1-20925c7654e0", "type": "document_retrieval", "name": "My project", "collection_count": 0, "default_query_parameters": { "aggregation": "[term(enriched_text.entities.text,name:entities)]", "count": 10, "sort": "", "return": [], "passages": { "enabled": true, "count": 10, "fields": [ "text", "title" ], "characters": 200, "per_document": true, "max_per_document": 1, "find_answers": false, "max_answers_per_passage": 1 }, "highlight": false, "spelling_suggestions": false, "table_results": { "enabled": false, "count": 10, "per_document": 0 } } }
Get project
Get details on the specified project.
Get details on the specified project.
Get details on the specified project.
Get details on the specified project.
Get details on the specified project.
GET /v2/projects/{project_id}
ServiceCall<ProjectDetails> getProject(GetProjectOptions getProjectOptions)
getProject(params)
get_project(
self,
project_id: str,
**kwargs,
) -> DetailedResponse
GetProject(string projectId)
Request
Use the GetProjectOptions.Builder
to create a GetProjectOptions
object that contains the parameter values for the getProject
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The getProject options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}?version=2023-03-31"
Response
Detailed information about the specified project.
The Universally Unique Identifier (UUID) of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.The Intelligent Document Processing (IDP) project type is available from IBM Cloud-managed instances only.
Possible values: [
intelligent_document_processing
,document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancy_training_status
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancyTrainingStatus
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- defaultQueryParameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- tableResults
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggestedRefinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancy_training_status
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- default_query_parameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- table_results
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggested_refinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancy_training_status
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- default_query_parameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- table_results
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggested_refinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- RelevancyTrainingStatus
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- DefaultQueryParameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- Passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- TableResults
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- SuggestedRefinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Status Code
Returns information about the specified project if it exists.
Project not found.
{ "project_id": "6b18887d-d68e-434d-99f1-20925c7654e0", "type": "document_retrieval", "name": "My project", "collection_count": 0, "relevancy_training_status": { "total_examples": 0, "sufficient_label_diversity": false, "processing": false, "minimum_examples_added": false, "available": false, "notices": 0, "minimum_queries_added": false }, "default_query_parameters": { "aggregation": "[term(enriched_text.entities.text,name:entities)]", "count": 10, "sort": "", "return": [], "passages": { "enabled": true, "count": 10, "fields": [ "text", "title" ], "characters": 200, "per_document": true, "max_per_document": 1, "find_answers": false, "max_answers_per_passage": 1 }, "highlight": false, "spelling_suggestions": false, "table_results": { "enabled": false, "count": 10, "per_document": 0 } } }
{ "project_id": "6b18887d-d68e-434d-99f1-20925c7654e0", "type": "document_retrieval", "name": "My project", "collection_count": 0, "relevancy_training_status": { "total_examples": 0, "sufficient_label_diversity": false, "processing": false, "minimum_examples_added": false, "available": false, "notices": 0, "minimum_queries_added": false }, "default_query_parameters": { "aggregation": "[term(enriched_text.entities.text,name:entities)]", "count": 10, "sort": "", "return": [], "passages": { "enabled": true, "count": 10, "fields": [ "text", "title" ], "characters": 200, "per_document": true, "max_per_document": 1, "find_answers": false, "max_answers_per_passage": 1 }, "highlight": false, "spelling_suggestions": false, "table_results": { "enabled": false, "count": 10, "per_document": 0 } } }
Update a project
Update the specified project's name.
Update the specified project's name.
Update the specified project's name.
Update the specified project's name.
Update the specified project's name.
POST /v2/projects/{project_id}
ServiceCall<ProjectDetails> updateProject(UpdateProjectOptions updateProjectOptions)
updateProject(params)
update_project(
self,
project_id: str,
*,
name: str = None,
**kwargs,
) -> DetailedResponse
UpdateProject(string projectId, string name = null)
Request
Use the UpdateProjectOptions.Builder
to create a UpdateProjectOptions
object that contains the parameter values for the updateProject
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
An object that represents the new name of the project.
The new name to give this project.
The updateProject options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The new name to give this project.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The new name to give this project.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The new name to give this project.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The new name to give this project.
curl -X POST {auth} --header "Content-Type: application/json" --data "{ \"name\": \"My updated project\" }" "{url}/v2/projects/{project_id}?version=2023-03-31"
Response
Detailed information about the specified project.
The Universally Unique Identifier (UUID) of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.The Intelligent Document Processing (IDP) project type is available from IBM Cloud-managed instances only.
Possible values: [
intelligent_document_processing
,document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancy_training_status
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancyTrainingStatus
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- defaultQueryParameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- tableResults
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggestedRefinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancy_training_status
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- default_query_parameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- table_results
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggested_refinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- relevancy_training_status
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- default_query_parameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- table_results
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggested_refinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Detailed information about the specified project.
The unique identifier of this project.
The human readable name of this project.
The type of project.
The
content_intelligence
type is a Document Retrieval for Contracts project and theother
type is a Custom project.The
content_mining
andcontent_intelligence
types are available with Premium plan managed deployments and installed deployments only.Possible values: [
document_retrieval
,conversational_search
,content_mining
,content_intelligence
,other
]Relevancy training status information for this project.
- RelevancyTrainingStatus
When the training data was updated.
The total number of examples.
When
true
, sufficient label diversity is present to allow training for this project.When
true
, the relevancy training is in processing.When
true
, the minimum number of examples required to train has been met.The time that the most recent successful training occurred.
When
true
, relevancy training is available when querying collections in the project.The number of notices generated during the relevancy training.
When
true
, the minimum number of queries required to train has been met.
The number of collections configured in this project.
Default query parameters for this project.
- DefaultQueryParameters
An array of collection identifiers to query. If empty or omitted all collections in the project are queried.
Default settings configuration for passage search options.
- Passages
When
true
, a passage search is performed by default.The number of passages to return.
An array of field names to perform the passage search on.
The approximate number of characters that each returned passage will contain.
When
true
the number of passages that can be returned from a single document is restricted to the max_per_document value.The default maximum number of passages that can be taken from a single document as the result of a passage query.
Default project query settings for table results.
- TableResults
When
true
, a table results for the query are returned by default.The number of table results to return by default.
The number of table results to include in each result document.
A string representing the default aggregation query for the project.
Object that contains suggested refinement settings.
Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- SuggestedRefinements
When
true
, suggested refinements for the query are returned by default.The number of suggested refinements to return by default.
When
true
, a spelling suggestions for the query are returned by default.When
true
, highlights for the query are returned by default.The number of document results returned by default.
A comma separated list of document fields to sort results by default.
An array of field names to return in document results if present by default.
Status Code
Returns the updated project information.
Bad request.
Project not found
{ "project_id": "6b18887d-d68e-434d-99f1-20925c7654e0", "type": "document_retrieval", "name": "My updated project", "collection_count": 0, "relevancy_training_status": { "total_examples": 0, "sufficient_label_diversity": false, "processing": false, "minimum_examples_added": false, "available": false, "notices": 0, "minimum_queries_added": false }, "default_query_parameters": { "aggregation": "[term(enriched_text.entities.text,name:entities)]", "count": 10, "sort": "", "return": [], "passages": { "enabled": true, "count": 10, "fields": [ "text", "title" ], "characters": 200, "per_document": true, "max_per_document": 1, "find_answers": false, "max_answers_per_passage": 1 }, "highlight": false, "spelling_suggestions": false, "table_results": { "enabled": false, "count": 10, "per_document": 0 } } }
{ "project_id": "6b18887d-d68e-434d-99f1-20925c7654e0", "type": "document_retrieval", "name": "My updated project", "collection_count": 0, "relevancy_training_status": { "total_examples": 0, "sufficient_label_diversity": false, "processing": false, "minimum_examples_added": false, "available": false, "notices": 0, "minimum_queries_added": false }, "default_query_parameters": { "aggregation": "[term(enriched_text.entities.text,name:entities)]", "count": 10, "sort": "", "return": [], "passages": { "enabled": true, "count": 10, "fields": [ "text", "title" ], "characters": 200, "per_document": true, "max_per_document": 1, "find_answers": false, "max_answers_per_passage": 1 }, "highlight": false, "spelling_suggestions": false, "table_results": { "enabled": false, "count": 10, "per_document": 0 } } }
Delete a project
Deletes the specified project.
Important: Deleting a project deletes everything that is part of the specified project, including all collections.
Deletes the specified project.
Important: Deleting a project deletes everything that is part of the specified project, including all collections.
Deletes the specified project.
Important: Deleting a project deletes everything that is part of the specified project, including all collections.
Deletes the specified project.
Important: Deleting a project deletes everything that is part of the specified project, including all collections.
Deletes the specified project.
Important: Deleting a project deletes everything that is part of the specified project, including all collections.
DELETE /v2/projects/{project_id}
ServiceCall<Void> deleteProject(DeleteProjectOptions deleteProjectOptions)
deleteProject(params)
delete_project(
self,
project_id: str,
**kwargs,
) -> DetailedResponse
DeleteProject(string projectId)
Request
Use the DeleteProjectOptions.Builder
to create a DeleteProjectOptions
object that contains the parameter values for the deleteProject
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The deleteProject options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -X DELETE {auth} "{url}/v2/projects/{project_id}?version=2023-03-31"
List fields
Gets a list of the unique fields (and their types) stored in the specified collections.
Gets a list of the unique fields (and their types) stored in the specified collections.
Gets a list of the unique fields (and their types) stored in the specified collections.
Gets a list of the unique fields (and their types) stored in the specified collections.
Gets a list of the unique fields (and their types) stored in the specified collections.
GET /v2/projects/{project_id}/fields
ServiceCall<ListFieldsResponse> listFields(ListFieldsOptions listFieldsOptions)
listFields(params)
list_fields(
self,
project_id: str,
*,
collection_ids: List[str] = None,
**kwargs,
) -> DetailedResponse
ListFields(string projectId, List<string> collectionIds = null)
Request
Use the ListFieldsOptions.Builder
to create a ListFieldsOptions
object that contains the parameter values for the listFields
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
The listFields options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
curl {auth} "{url}/v2/projects/{project_id}/fields?collection_ids={collection_id_1},{collection_id_2}&version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.ListFields( projectId: "{project_id}", collectionIds: new List<string>() { "{collection_id_1}","{collection_id_2}" } ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); ListFieldsOptions options = new ListFieldsOptions.Builder() .projectId("{project_id}") .addCollectionIds("{collection_id_1}") .addCollectionIds("{collection_id_2}") .build(); ListFieldsResponse response = discovery.listFields(options).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', collectionIds: ['{collection_id_1}','{collection_id_2}'], }; discovery.listFields(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.list_fields( project_id='{project_id}', collection_ids=['{collection_id_1}','{collection_id_2}'] ).get_result() print(json.dumps(response, indent=2))
Response
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.
-
Fields which contain nested objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
).
An array that contains information about each field in the collections.
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.
-
Fields which contain nested objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
).
An array that contains information about each field in the collections.
- fields
The name of the field.
The type of the field.
Possible values: [
nested
,string
,date
,long
,integer
,short
,byte
,double
,float
,boolean
,binary
]The collection Id of the collection where the field was found.
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.
-
Fields which contain nested objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
).
An array that contains information about each field in the collections.
- fields
The name of the field.
The type of the field.
Possible values: [
nested
,string
,date
,long
,integer
,short
,byte
,double
,float
,boolean
,binary
]The collection Id of the collection where the field was found.
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.
-
Fields which contain nested objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
).
An array that contains information about each field in the collections.
- fields
The name of the field.
The type of the field.
Possible values: [
nested
,string
,date
,long
,integer
,short
,byte
,double
,float
,boolean
,binary
]The collection Id of the collection where the field was found.
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.
-
Fields which contain nested objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
).
An array that contains information about each field in the collections.
- Fields
The name of the field.
The type of the field.
Possible values: [
nested
,string
,date
,long
,integer
,short
,byte
,double
,float
,boolean
,binary
]The collection Id of the collection where the field was found.
Status Code
The list of fetched fields.
The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations:
-
Fields which contain nested JSON objects are assigned a type of "nested".
-
Fields which belong to a nested object are prefixed with
.properties
(for example,warnings.properties.severity
means that thewarnings
object has a property calledseverity
).
-
Bad request.
{ "fields": [ { "field": "enriched_text", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "nested" }, { "field": "enriched_text.entities", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "nested" }, { "field": "enriched_text.entities.mentions", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "nested" }, { "field": "enriched_text.entities.mentions.location.begin", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "double" }, { "field": "enriched_text.entities.mentions.location.end", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "double" }, { "field": "enriched_text.entities.mentions.text", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "enriched_text.entities.model_name", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "enriched_text.entities.text", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "enriched_text.entities.type", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "extracted_metadata.file_type", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "extracted_metadata.filename", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "extracted_metadata.numPages", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "extracted_metadata.publicationdate", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "date" }, { "field": "extracted_metadata.sha1", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "metadata", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "nested" }, { "field": "metadata.customer_id", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "metadata.parent_document_id", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "text", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "document_id", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" } ] }
{ "fields": [ { "field": "enriched_text", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "nested" }, { "field": "enriched_text.entities", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "nested" }, { "field": "enriched_text.entities.mentions", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "nested" }, { "field": "enriched_text.entities.mentions.location.begin", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "double" }, { "field": "enriched_text.entities.mentions.location.end", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "double" }, { "field": "enriched_text.entities.mentions.text", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "enriched_text.entities.model_name", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "enriched_text.entities.text", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "enriched_text.entities.type", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "extracted_metadata.file_type", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "extracted_metadata.filename", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "extracted_metadata.numPages", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "extracted_metadata.publicationdate", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "date" }, { "field": "extracted_metadata.sha1", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "metadata", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "nested" }, { "field": "metadata.customer_id", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "metadata.parent_document_id", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "text", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" }, { "field": "document_id", "collection_id": "a3a41b30-8336-dc17-0000-017b75119cfe", "type": "string" } ] }
List collections
Lists existing collections for the specified project.
Lists existing collections for the specified project.
Lists existing collections for the specified project.
Lists existing collections for the specified project.
Lists existing collections for the specified project.
GET /v2/projects/{project_id}/collections
ServiceCall<ListCollectionsResponse> listCollections(ListCollectionsOptions listCollectionsOptions)
listCollections(params)
list_collections(
self,
project_id: str,
**kwargs,
) -> DetailedResponse
ListCollections(string projectId)
Request
Use the ListCollectionsOptions.Builder
to create a ListCollectionsOptions
object that contains the parameter values for the listCollections
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The listCollections options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/collections?version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.ListCollections( projectId: "{project_id}" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); ListCollectionsOptions options = new ListCollectionsOptions.Builder() .projectId("{project_id}") .build(); ListCollectionsResponse response = discovery.listCollections(options).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', }; discovery.listCollections(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.list_collections( project_id='{project_id}' ).get_result() print(json.dumps(response, indent=2))
Response
Response object that contains an array of collection details.
An array that contains information about each collection in the project.
Response object that contains an array of collection details.
{
"collections": [
{
"collection_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"name": "example"
}
]
}
An array that contains information about each collection in the project.
Examples:{ "collection_id": "800e58e4-198d-45eb-be87-74e1d6df4e96", "name": "test-collection" }
- collections
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
Response object that contains an array of collection details.
{
"collections": [
{
"collection_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"name": "example"
}
]
}
An array that contains information about each collection in the project.
Examples:{ "collection_id": "800e58e4-198d-45eb-be87-74e1d6df4e96", "name": "test-collection" }
- collections
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
Response object that contains an array of collection details.
{
"collections": [
{
"collection_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"name": "example"
}
]
}
An array that contains information about each collection in the project.
Examples:{ "collection_id": "800e58e4-198d-45eb-be87-74e1d6df4e96", "name": "test-collection" }
- collections
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
Response object that contains an array of collection details.
{
"collections": [
{
"collection_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"name": "example"
}
]
}
An array that contains information about each collection in the project.
Examples:{ "collection_id": "800e58e4-198d-45eb-be87-74e1d6df4e96", "name": "test-collection" }
- Collections
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
Status Code
Successful response.
Bad request.
{ "collections": [ { "collection_id": "f1360220-ea2d-4271-9d62-89a910b13c37", "name": "example" } ] }
{ "collections": [ { "collection_id": "f1360220-ea2d-4271-9d62-89a910b13c37", "name": "example" } ] }
Create a collection
Create a new collection in the specified project.
Create a new collection in the specified project.
Create a new collection in the specified project.
Create a new collection in the specified project.
Create a new collection in the specified project.
POST /v2/projects/{project_id}/collections
ServiceCall<CollectionDetails> createCollection(CreateCollectionOptions createCollectionOptions)
createCollection(params)
create_collection(
self,
project_id: str,
name: str,
*,
description: str = None,
language: str = None,
enrichments: List['CollectionEnrichment'] = None,
conversions: 'Conversions' = None,
normalizations: List['NormalizationOperation'] = None,
**kwargs,
) -> DetailedResponse
CreateCollection(string projectId, string name, string description = null, string language = null, List<CollectionEnrichment> enrichments = null, Conversions conversions = null, List<NormalizationOperation> normalizations = null)
Request
Use the CreateCollectionOptions.Builder
to create a CreateCollectionOptions
object that contains the parameter values for the createCollection
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
An object that represents the collection to be created.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The language of the collection. For a list of supported languages, see the product documentation.
Default:
en
If set to
true
, optical character recognition (OCR) is enabled. For more information, see Optical character recognition.Default:
false
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
An object with webhook information. For more information, see Document status webhook API.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
The createCollection options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The language of the collection. For a list of supported languages, see the product documentation.
Default:
en
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- jsonNormalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The language of the collection. For a list of supported languages, see the product documentation.
Default:
en
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- json_normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The language of the collection. For a list of supported languages, see the product documentation.
Default:
en
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- json_normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The language of the collection. For a list of supported languages, see the product documentation.
Default:
en
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- JsonNormalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
curl -X POST {auth} --header "Content-Type: application/json" --data "{ \"name\": \"Tutorials\", \"description\": \"Instructional PDFs\" }" "{url}/v2/projects/{project_id}/collections?version=2023-03-31"
Response
A collection for storing documents.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
The Universally Unique Identifier (UUID) of the collection.
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
If set to
true
, optical character recognition (OCR) is enabled. For more information, see Optical character recognition.An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
An object with webhook information. For more information, see Document status webhook API.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
An object that describes the Smart Document Understanding model for a collection.
- smart_document_understanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud-managed instances only.
- source_document_counts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud-managed instances only.
- document_counts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- jsonNormalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- smartDocumentUnderstanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- sourceDocumentCounts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- documentCounts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- json_normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- smart_document_understanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- source_document_counts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- document_counts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- json_normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- smart_document_understanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- source_document_counts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- document_counts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- Enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- Conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- JsonNormalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- Normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- SmartDocumentUnderstanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- SourceDocumentCounts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- DocumentCounts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
Status Code
The collection has been successfully created
Bad request.
-
No request body.
-
Missing rquired request parameter or its value.
-
Missing project.
-
Too many collections in project.
-
Too many total collections.
-
Unsupported language is requested.
-
Invalid normalization operation is requested.
-
Missing normalization source.
-
Missing normalization destination.
-
Invalid normalization destination.
-
Project not found.
{ "name": "Tutorials", "collection_id": "eb0215ed-6ec2-132a-0000-017b740f39c1", "description": "Instructional PDFs", "created": "2021-08-23T17:29:21.104Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-000000000002", "fields": [ "text" ] }, { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text" ] } ], "smart_document_understanding": { "enabled": false }, "source_document_counts": { "pending": 0, "processing": 0, "available": 0, "failed": 0 }, "document_counts": { "processing": 0, "available": 0, "failed": 0 } }
{ "name": "Tutorials", "collection_id": "eb0215ed-6ec2-132a-0000-017b740f39c1", "description": "Instructional PDFs", "created": "2021-08-23T17:29:21.104Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-000000000002", "fields": [ "text" ] }, { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text" ] } ], "smart_document_understanding": { "enabled": false }, "source_document_counts": { "pending": 0, "processing": 0, "available": 0, "failed": 0 }, "document_counts": { "processing": 0, "available": 0, "failed": 0 } }
Get collection details
Get details about the specified collection.
Get details about the specified collection.
Get details about the specified collection.
Get details about the specified collection.
Get details about the specified collection.
GET /v2/projects/{project_id}/collections/{collection_id}
ServiceCall<CollectionDetails> getCollection(GetCollectionOptions getCollectionOptions)
getCollection(params)
get_collection(
self,
project_id: str,
collection_id: str,
**kwargs,
) -> DetailedResponse
GetCollection(string projectId, string collectionId)
Request
Use the GetCollectionOptions.Builder
to create a GetCollectionOptions
object that contains the parameter values for the getCollection
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The getCollection options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/collections/{collection_id}?version=2023-03-31"
Response
A collection for storing documents.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
The Universally Unique Identifier (UUID) of the collection.
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
If set to
true
, optical character recognition (OCR) is enabled. For more information, see Optical character recognition.An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
An object with webhook information. For more information, see Document status webhook API.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
An object that describes the Smart Document Understanding model for a collection.
- smart_document_understanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud-managed instances only.
- source_document_counts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud-managed instances only.
- document_counts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- jsonNormalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- smartDocumentUnderstanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- sourceDocumentCounts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- documentCounts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- json_normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- smart_document_understanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- source_document_counts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- document_counts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- json_normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- smart_document_understanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- source_document_counts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- document_counts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- Enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- Conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- JsonNormalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- Normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- SmartDocumentUnderstanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- SourceDocumentCounts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- DocumentCounts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
Status Code
Returns the specified collection details.
Collection or project not found.
{ "name": "Tutorials", "collection_id": "eb0215ed-6ec2-132a-0000-017b740f39c1", "description": "Instructional PDFs", "created": "2021-08-23T17:29:21.104Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-000000000002", "fields": [ "text" ] }, { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text" ] } ], "smart_document_understanding": { "enabled": true, "model": "text_extraction" }, "source_document_counts": { "pending": 0, "processing": 0, "available": 10, "failed": 0 }, "document_counts": { "processing": 0, "available": 10, "failed": 0 } }
{ "name": "Tutorials", "collection_id": "eb0215ed-6ec2-132a-0000-017b740f39c1", "description": "Instructional PDFs", "created": "2021-08-23T17:29:21.104Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-000000000002", "fields": [ "text" ] }, { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text" ] } ], "smart_document_understanding": { "enabled": true, "model": "text_extraction" }, "source_document_counts": { "pending": 0, "processing": 0, "available": 10, "failed": 0 }, "document_counts": { "processing": 0, "available": 10, "failed": 0 } }
Update a collection
Updates the specified collection's name, description, enrichments, and configuration.
If you apply normalization rules to data in an existing collection, you must initiate reprocessing of the collection. To do so, from the Manage fields page in the product user interface, temporarily change the data type of a field to enable the reprocess button. Change the data type of the field back to its original value, and then click Apply changes and reprocess.
To remove a configuration that applies JSON normalization operations as part of the conversion phase of ingestion, specify an empty json_normalizations
object ([]
) in the request.
To remove a configuration that applies JSON normalization operations after enrichments are applied, specify an empty normalizations
object ([]
) in the request.
Updates the specified collection's name, description, enrichments, and configuration.
If you apply normalization rules to data in an existing collection, you must initiate reprocessing of the collection. To do so, from the Manage fields page in the product user interface, temporarily change the data type of a field to enable the reprocess button. Change the data type of the field back to its original value, and then click Apply changes and reprocess.
To remove a configuration that applies JSON normalization operations as part of the conversion phase of ingestion, specify an empty json_normalizations
object ([]
) in the request.
To remove a configuration that applies JSON normalization operations after enrichments are applied, specify an empty normalizations
object ([]
) in the request.
Updates the specified collection's name, description, enrichments, and configuration.
If you apply normalization rules to data in an existing collection, you must initiate reprocessing of the collection. To do so, from the Manage fields page in the product user interface, temporarily change the data type of a field to enable the reprocess button. Change the data type of the field back to its original value, and then click Apply changes and reprocess.
To remove a configuration that applies JSON normalization operations as part of the conversion phase of ingestion, specify an empty json_normalizations
object ([]
) in the request.
To remove a configuration that applies JSON normalization operations after enrichments are applied, specify an empty normalizations
object ([]
) in the request.
Updates the specified collection's name, description, enrichments, and configuration.
If you apply normalization rules to data in an existing collection, you must initiate reprocessing of the collection. To do so, from the Manage fields page in the product user interface, temporarily change the data type of a field to enable the reprocess button. Change the data type of the field back to its original value, and then click Apply changes and reprocess.
To remove a configuration that applies JSON normalization operations as part of the conversion phase of ingestion, specify an empty json_normalizations
object ([]
) in the request.
To remove a configuration that applies JSON normalization operations after enrichments are applied, specify an empty normalizations
object ([]
) in the request.
Updates the specified collection's name, description, enrichments, and configuration.
If you apply normalization rules to data in an existing collection, you must initiate reprocessing of the collection. To do so, from the Manage fields page in the product user interface, temporarily change the data type of a field to enable the reprocess button. Change the data type of the field back to its original value, and then click Apply changes and reprocess.
To remove a configuration that applies JSON normalization operations as part of the conversion phase of ingestion, specify an empty json_normalizations
object ([]
) in the request.
To remove a configuration that applies JSON normalization operations after enrichments are applied, specify an empty normalizations
object ([]
) in the request.
POST /v2/projects/{project_id}/collections/{collection_id}
ServiceCall<CollectionDetails> updateCollection(UpdateCollectionOptions updateCollectionOptions)
updateCollection(params)
update_collection(
self,
project_id: str,
collection_id: str,
*,
name: str = None,
description: str = None,
enrichments: List['CollectionEnrichment'] = None,
conversions: 'Conversions' = None,
normalizations: List['NormalizationOperation'] = None,
**kwargs,
) -> DetailedResponse
UpdateCollection(string projectId, string collectionId, string name = null, string description = null, List<CollectionEnrichment> enrichments = null, Conversions conversions = null, List<NormalizationOperation> normalizations = null)
Request
Use the UpdateCollectionOptions.Builder
to create a UpdateCollectionOptions
object that contains the parameter values for the updateCollection
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
An object that represents the collection to be updated.
The new name of the collection.
Possible values: 0 ≤ length ≤ 255
The new description of the collection.
If set to
true
, optical character recognition (OCR) is enabled. For more information, see Optical character recognition.Default:
false
An array of enrichments that are applied to this collection.
An object with webhook information. For more information, see Document status webhook API.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
The updateCollection options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The new name of the collection.
Possible values: 0 ≤ length ≤ 255
The new description of the collection.
An array of enrichments that are applied to this collection.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- jsonNormalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The new name of the collection.
Possible values: 0 ≤ length ≤ 255
The new description of the collection.
An array of enrichments that are applied to this collection.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- json_normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The new name of the collection.
Possible values: 0 ≤ length ≤ 255
The new description of the collection.
An array of enrichments that are applied to this collection.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- json_normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The new name of the collection.
Possible values: 0 ≤ length ≤ 255
The new description of the collection.
An array of enrichments that are applied to this collection.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- JsonNormalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Allowable values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
curl -X POST {auth} --header "Content-Type: application/json" --data "{ \"name\": \"Tutorials for developers\" }" "{url}/v2/projects/{project_id}/collections/{collection_id}?version=2023-03-31"
Response
A collection for storing documents.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
The Universally Unique Identifier (UUID) of the collection.
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
If set to
true
, optical character recognition (OCR) is enabled. For more information, see Optical character recognition.An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
An object with webhook information. For more information, see Document status webhook API.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
An object that describes the Smart Document Understanding model for a collection.
- smart_document_understanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud-managed instances only.
- source_document_counts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud-managed instances only.
- document_counts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- jsonNormalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- smartDocumentUnderstanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- sourceDocumentCounts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- documentCounts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- json_normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- smart_document_understanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- source_document_counts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- document_counts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- json_normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- smart_document_understanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- source_document_counts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- document_counts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
A collection for storing documents.
The unique identifier of the collection.
The name of the collection.
Possible values: 0 ≤ length ≤ 255
A description of the collection.
The date that the collection was created.
The language of the collection. For a list of supported languages, see the product documentation.
An array of enrichments that are applied to this collection. To get a list of enrichments that are available for a project, use the List enrichments method.
If no enrichments are specified when the collection is created, the default enrichments for the project type are applied. For more information about project default settings, see the product documentation.
- Enrichments
The unique identifier of this enrichment. For more information about how to determine the ID of an enrichment, see the product documentation.
An array of field names that the enrichment is applied to.
If you apply an enrichment to a field from a JSON file, the data is converted to an array automatically, even if the field contains a single value.
Document processing operations that occur during the document conversion phase of ingestion.
Note: Available only from service instances that are managed by IBM Cloud.
- Conversions
Defines operations that normalize the JSON representation of data at the end of the conversion phase of ingestion. When you specify a source_field, choose a field that will exist after data conversion. Operations run in the order that is specified in this array.
New root-level fields that are added with a conversion normalization are available from the product user interface, such as from the Manage fields page and the Fields to enrich lists in the Enrichments page.
- JsonNormalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
Defines operations that normalize the JSON representation of data after enrichments are applied. Operations run in the order that is specified in this array.
New fields that are added with a post-enrichment normalization are displayed in the JSON representation of query results, but are not available from product user interface pages, such as Manage fields.
Note: Available only from service instances that are managed by IBM Cloud.
- Normalizations
Identifies the type of operation to perform. The options include:
-
copy
: Copies the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. -
move
: Moves the value of the source_field to the destination_field. If the destination_field already exists, then its value is overwritten. Themove
operation is the same as acopy
, except that the source_field is removed after the value is copied. -
merge
: Merges the value of the source_field with the value of the destination_field, and then removes the source_field.The destination_field is converted into an array and the value of the source_field is appended to the array. If the destination_field does not exist, the source_field remains unchanged and no destination_field is added.
-
conversions: For JSON normalization operations that occur during ingestion, if the source_field does not exist in a document, the value in the destination_field is unchanged.
-
normalizations: For JSON normalization operations that occur after enrichment, if the source_field does not exist in a document, the destination_field is converted to an array, even if the destination_field has a single value.
If you want to ensure that the data type of the destination_field is consistent across all documents, include both normalizations and conversions objects in the request.
-
-
remove
: Deletes the field that is specified in the source_field parameter. The destination_field parameter is ignored for this operation. -
remove_nulls
: Removes all empty nested fields from the ingested document. (Empty root-level fields are removed by default.) The source_field and destination_field parameters are ignored by this operation becauseremove_nulls
operates on the entire document. Theremove_nulls
operation often is called as the last normalization operation. However, it does increase the time it takes to process documents.
Possible values: [
copy
,move
,merge
,remove
,remove_nulls
]-
The source field for the operation.
The destination field for the operation. If the destination field is a new field, the name that you specify for it must meet the naming requirements that are listed in the product documentation.
An object that describes the Smart Document Understanding model for a collection.
- SmartDocumentUnderstanding
When
true
, smart document understanding conversion is enabled for the collection.Specifies the type of Smart Document Understanding (SDU) model that is enabled for the collection. The following types of models are supported:
-
custom
: A user-trained model is applied. -
pre_trained
: A pretrained model is applied. This type of model is applied automatically to Document Retrieval for Contracts projects. -
text_extraction
: An SDU model that extracts text and metadata from the content. This model is enabled in collections by default regardless of the types of documents in the collection (as long as the service plan supports SDU models).
You can apply user-trained or pretrained models to collections from the Identify fields page of the product user interface. For more information, see the product documentation.
Possible values: [
custom
,pre_trained
,text_extraction
]-
Object that describes the status of documents that are uploaded to a collection or that are added from a crawled external data source. To get the total number of documents, sum the values of the status types.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- SourceDocumentCounts
Number of source documents that are uploaded to the collection, but not converted yet.
Number of source documents in the collection that are being processed.
Number of source documents in the collection that were added successfully. The number includes any original parent documents that are split into subdocuments during the ingestion process, but does not include the resulting subdocuments in the count.
Number of source documents in the collection for which processing failed. If the original document is split and some subdocuments from it are added successfully but other subdocuments aren't, the status of the original document shows as failed.
Object with counts of documents in the collection grouped by document status.
Document status information is returned asynchronously. If counts are zero, wait a minute and then use the Get collection details method to check the status.
Note: Available from installed instances starting with Cloud Pak for Data 4.7 and from IBM Cloud Plus and Enterprise plan instances only.
- DocumentCounts
Number of documents that are either waiting to be or are being enriched and are not yet available in the collection to be queried.
Number of documents in the collection that were processed successfully and are available to be queried. The number includes any subdocuments that are generated when a source document is split during the ingestion process, but does not include the original parent document.
Number of documents in the collection for which processing failed.
Status Code
Returns the updated collection details.
Bad request.
-
No request body.
-
Missing project or collection.
-
Invalid normalization operation is requested.
-
Missing normalization source.
-
Missing normalization destination.
-
Invalid normalization destination.
-
Collection or project not found.
{ "name": "Tutorials for developers", "collection_id": "eb0215ed-6ec2-132a-0000-017b740f39c1", "description": "Instructional PDFs", "created": "2021-08-23T17:29:21.104Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-000000000002", "fields": [ "text" ] }, { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text" ] } ], "smart_document_understanding": { "enabled": true, "model": "text_extraction" }, "source_document_counts": { "pending": 0, "processing": 0, "available": 0, "failed": 0 }, "document_counts": { "processing": 0, "available": 0, "failed": 0 } }
{ "name": "Tutorials for developers", "collection_id": "eb0215ed-6ec2-132a-0000-017b740f39c1", "description": "Instructional PDFs", "created": "2021-08-23T17:29:21.104Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-000000000002", "fields": [ "text" ] }, { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text" ] } ], "smart_document_understanding": { "enabled": true, "model": "text_extraction" }, "source_document_counts": { "pending": 0, "processing": 0, "available": 0, "failed": 0 }, "document_counts": { "processing": 0, "available": 0, "failed": 0 } }
Delete a collection
Deletes the specified collection from the project. All documents stored in the specified collection and not shared is also deleted.
Deletes the specified collection from the project. All documents stored in the specified collection and not shared is also deleted.
Deletes the specified collection from the project. All documents stored in the specified collection and not shared is also deleted.
Deletes the specified collection from the project. All documents stored in the specified collection and not shared is also deleted.
Deletes the specified collection from the project. All documents stored in the specified collection and not shared is also deleted.
DELETE /v2/projects/{project_id}/collections/{collection_id}
ServiceCall<Void> deleteCollection(DeleteCollectionOptions deleteCollectionOptions)
deleteCollection(params)
delete_collection(
self,
project_id: str,
collection_id: str,
**kwargs,
) -> DetailedResponse
DeleteCollection(string projectId, string collectionId)
Request
Use the DeleteCollectionOptions.Builder
to create a DeleteCollectionOptions
object that contains the parameter values for the deleteCollection
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The deleteCollection options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -X DELETE {auth} "{url}/v2/projects/{project_id}/collections/{collection_id}?version=2023-03-31"
List documents
Lists the documents in the specified collection. The list includes only the document ID of each document and returns information for up to 10,000 documents.
Note: This method is available only from Cloud Pak for Data version 4.0.9 and later installed instances, and from IBM Cloud-managed instances.
Lists the documents in the specified collection. The list includes only the document ID of each document and returns information for up to 10,000 documents.
Note: This method is available only from Cloud Pak for Data version 4.0.9 and later installed instances and from Plus and Enterprise plan IBM Cloud-managed instances. It is not currently available from Premium plan instances.
Lists the documents in the specified collection. The list includes only the document ID of each document and returns information for up to 10,000 documents.
Note: This method is available only from Cloud Pak for Data version 4.0.9 and later installed instances and from Plus and Enterprise plan IBM Cloud-managed instances. It is not currently available from Premium plan instances.
Lists the documents in the specified collection. The list includes only the document ID of each document and returns information for up to 10,000 documents.
Note: This method is available only from Cloud Pak for Data version 4.0.9 and later installed instances and from Plus and Enterprise plan IBM Cloud-managed instances. It is not currently available from Premium plan instances.
Lists the documents in the specified collection. The list includes only the document ID of each document and returns information for up to 10,000 documents.
Note: This method is available only from Cloud Pak for Data version 4.0.9 and later installed instances and from Plus and Enterprise plan IBM Cloud-managed instances. It is not currently available from Premium plan instances.
GET /v2/projects/{project_id}/collections/{collection_id}/documents
ServiceCall<ListDocumentsResponse> listDocuments(ListDocumentsOptions listDocumentsOptions)
listDocuments(params)
list_documents(
self,
project_id: str,
collection_id: str,
*,
count: int = None,
status: str = None,
has_notices: bool = None,
is_parent: bool = None,
parent_document_id: str = None,
sha256: str = None,
**kwargs,
) -> DetailedResponse
ListDocuments(string projectId, string collectionId, long? count = null, string status = null, bool? hasNotices = null, bool? isParent = null, string parentDocumentId = null, string sha256 = null)
Request
Use the ListDocumentsOptions.Builder
to create a ListDocumentsOptions
object that contains the parameter values for the listDocuments
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.The maximum number of documents to return. Up to 1,000 documents are returned by default. The maximum number allowed is 10,000.
Default:
1000
Filters the documents to include only documents with the specified ingestion status. The options include:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
You can specify one status value or add a comma-separated list of more than one status value. For example,
available,failed
.-
If set to
true
, only documents that have notices, meaning documents for which warnings or errors were generated during the ingestion, are returned. If set tofalse
, only documents that don't have notices are returned. If unspecified, no filter based on notices is applied.Notice details are not available in the result, but you can use the Query collection notices method to find details by adding the parameter
query=notices.document_id:{document-id}
.If set to
true
, only parent documents, meaning documents that were split during the ingestion process and resulted in two or more child documents, are returned. If set tofalse
, only child documents are returned. If unspecified, no filter based on the parent or child relationship is applied.CSV files, for example, are split into separate documents per line and JSON files are split into separate documents per object.
Filters the documents to include only child documents that were generated when the specified parent document was processed.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Filters the documents to include only documents with the specified SHA-256 hash. Format the hash as a hexadecimal string.
The listDocuments options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The maximum number of documents to return. Up to 1,000 documents are returned by default. The maximum number allowed is 10,000.
Default:
1000
Filters the documents to include only documents with the specified ingestion status. The options include:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
You can specify one status value or add a comma-separated list of more than one status value. For example,
available,failed
.-
If set to
true
, only documents that have notices, meaning documents for which warnings or errors were generated during the ingestion, are returned. If set tofalse
, only documents that don't have notices are returned. If unspecified, no filter based on notices is applied.Notice details are not available in the result, but you can use the Query collection notices method to find details by adding the parameter
query=notices.document_id:{document-id}
.If set to
true
, only parent documents, meaning documents that were split during the ingestion process and resulted in two or more child documents, are returned. If set tofalse
, only child documents are returned. If unspecified, no filter based on the parent or child relationship is applied.CSV files, for example, are split into separate documents per line and JSON files are split into separate documents per object.
Filters the documents to include only child documents that were generated when the specified parent document was processed.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Filters the documents to include only documents with the specified SHA-256 hash. Format the hash as a hexadecimal string.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The maximum number of documents to return. Up to 1,000 documents are returned by default. The maximum number allowed is 10,000.
Default:
1000
Filters the documents to include only documents with the specified ingestion status. The options include:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
You can specify one status value or add a comma-separated list of more than one status value. For example,
available,failed
.-
If set to
true
, only documents that have notices, meaning documents for which warnings or errors were generated during the ingestion, are returned. If set tofalse
, only documents that don't have notices are returned. If unspecified, no filter based on notices is applied.Notice details are not available in the result, but you can use the Query collection notices method to find details by adding the parameter
query=notices.document_id:{document-id}
.If set to
true
, only parent documents, meaning documents that were split during the ingestion process and resulted in two or more child documents, are returned. If set tofalse
, only child documents are returned. If unspecified, no filter based on the parent or child relationship is applied.CSV files, for example, are split into separate documents per line and JSON files are split into separate documents per object.
Filters the documents to include only child documents that were generated when the specified parent document was processed.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Filters the documents to include only documents with the specified SHA-256 hash. Format the hash as a hexadecimal string.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The maximum number of documents to return. Up to 1,000 documents are returned by default. The maximum number allowed is 10,000.
Default:
1000
Filters the documents to include only documents with the specified ingestion status. The options include:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
You can specify one status value or add a comma-separated list of more than one status value. For example,
available,failed
.-
If set to
true
, only documents that have notices, meaning documents for which warnings or errors were generated during the ingestion, are returned. If set tofalse
, only documents that don't have notices are returned. If unspecified, no filter based on notices is applied.Notice details are not available in the result, but you can use the Query collection notices method to find details by adding the parameter
query=notices.document_id:{document-id}
.If set to
true
, only parent documents, meaning documents that were split during the ingestion process and resulted in two or more child documents, are returned. If set tofalse
, only child documents are returned. If unspecified, no filter based on the parent or child relationship is applied.CSV files, for example, are split into separate documents per line and JSON files are split into separate documents per object.
Filters the documents to include only child documents that were generated when the specified parent document was processed.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Filters the documents to include only documents with the specified SHA-256 hash. Format the hash as a hexadecimal string.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The maximum number of documents to return. Up to 1,000 documents are returned by default. The maximum number allowed is 10,000.
Default:
1000
Filters the documents to include only documents with the specified ingestion status. The options include:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
You can specify one status value or add a comma-separated list of more than one status value. For example,
available,failed
.-
If set to
true
, only documents that have notices, meaning documents for which warnings or errors were generated during the ingestion, are returned. If set tofalse
, only documents that don't have notices are returned. If unspecified, no filter based on notices is applied.Notice details are not available in the result, but you can use the Query collection notices method to find details by adding the parameter
query=notices.document_id:{document-id}
.If set to
true
, only parent documents, meaning documents that were split during the ingestion process and resulted in two or more child documents, are returned. If set tofalse
, only child documents are returned. If unspecified, no filter based on the parent or child relationship is applied.CSV files, for example, are split into separate documents per line and JSON files are split into separate documents per object.
Filters the documents to include only child documents that were generated when the specified parent document was processed.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Filters the documents to include only documents with the specified SHA-256 hash. Format the hash as a hexadecimal string.
curl {auth} "{url}/v2/projects/{project_id}/collections/{collection_id}/documents?status=available&version=2023-03-31"
Response
Response object that contains an array of documents.
The number of matching results for the document query.
An array that lists the documents in a collection. Only the document ID of each document is returned in the list. You can use the Get document method to get more information about an individual document.
Response object that contains an array of documents.
{
"matching_results": 2,
"documents": [
{
"document_id": "4ffcfd8052005b99469e632506763bac_0"
},
{
"document_id": "4ffcfd8052005b99469e632506763bac_1"
}
]
}
The number of matching results for the document query.
An array that lists the documents in a collection. Only the document ID of each document is returned in the list. You can use the Get document method to get more information about an individual document.
- documents
The unique identifier of the document.
Date and time that the document is added to the collection. For a child document, the date and time when the process that generates the child document runs. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date and time that the document is finished being processed and is indexed. This date changes whenever the document is reprocessed, including for enrichment changes. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.The status of the ingestion of the document. The possible values are:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
Possible values: [
available
,failed
,pending
,processing
]-
Array of JSON objects for notices, meaning warning or error messages, that are produced by the document ingestion process. The array does not include notices that are produced for child documents that are generated when a document is processed.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Information about the child documents that are generated from a single document during ingestion or other processing.
- children
Indicates whether the child documents have any notices. The value is
false
if the document does not have child documents.Number of child documents. The value is
0
when processing of the document doesn't generate any child documents.
Name of the original source file (if available).
The type of the original source file, such as
csv
,excel
,html
,json
,pdf
,text
,word
, and so on.The SHA-256 hash of the original source file. The hash is formatted as a hexadecimal string.
Response object that contains an array of documents.
{
"matching_results": 2,
"documents": [
{
"document_id": "4ffcfd8052005b99469e632506763bac_0"
},
{
"document_id": "4ffcfd8052005b99469e632506763bac_1"
}
]
}
The number of matching results for the document query.
An array that lists the documents in a collection. Only the document ID of each document is returned in the list. You can use the Get document method to get more information about an individual document.
- documents
The unique identifier of the document.
Date and time that the document is added to the collection. For a child document, the date and time when the process that generates the child document runs. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date and time that the document is finished being processed and is indexed. This date changes whenever the document is reprocessed, including for enrichment changes. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.The status of the ingestion of the document. The possible values are:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
Possible values: [
available
,failed
,pending
,processing
]-
Array of JSON objects for notices, meaning warning or error messages, that are produced by the document ingestion process. The array does not include notices that are produced for child documents that are generated when a document is processed.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Information about the child documents that are generated from a single document during ingestion or other processing.
- children
Indicates whether the child documents have any notices. The value is
false
if the document does not have child documents.Number of child documents. The value is
0
when processing of the document doesn't generate any child documents.
Name of the original source file (if available).
The type of the original source file, such as
csv
,excel
,html
,json
,pdf
,text
,word
, and so on.The SHA-256 hash of the original source file. The hash is formatted as a hexadecimal string.
Response object that contains an array of documents.
{
"matching_results": 2,
"documents": [
{
"document_id": "4ffcfd8052005b99469e632506763bac_0"
},
{
"document_id": "4ffcfd8052005b99469e632506763bac_1"
}
]
}
The number of matching results for the document query.
An array that lists the documents in a collection. Only the document ID of each document is returned in the list. You can use the Get document method to get more information about an individual document.
- documents
The unique identifier of the document.
Date and time that the document is added to the collection. For a child document, the date and time when the process that generates the child document runs. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date and time that the document is finished being processed and is indexed. This date changes whenever the document is reprocessed, including for enrichment changes. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.The status of the ingestion of the document. The possible values are:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
Possible values: [
available
,failed
,pending
,processing
]-
Array of JSON objects for notices, meaning warning or error messages, that are produced by the document ingestion process. The array does not include notices that are produced for child documents that are generated when a document is processed.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Information about the child documents that are generated from a single document during ingestion or other processing.
- children
Indicates whether the child documents have any notices. The value is
false
if the document does not have child documents.Number of child documents. The value is
0
when processing of the document doesn't generate any child documents.
Name of the original source file (if available).
The type of the original source file, such as
csv
,excel
,html
,json
,pdf
,text
,word
, and so on.The SHA-256 hash of the original source file. The hash is formatted as a hexadecimal string.
Response object that contains an array of documents.
{
"matching_results": 2,
"documents": [
{
"document_id": "4ffcfd8052005b99469e632506763bac_0"
},
{
"document_id": "4ffcfd8052005b99469e632506763bac_1"
}
]
}
The number of matching results for the document query.
An array that lists the documents in a collection. Only the document ID of each document is returned in the list. You can use the Get document method to get more information about an individual document.
- Documents
The unique identifier of the document.
Date and time that the document is added to the collection. For a child document, the date and time when the process that generates the child document runs. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date and time that the document is finished being processed and is indexed. This date changes whenever the document is reprocessed, including for enrichment changes. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.The status of the ingestion of the document. The possible values are:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
Possible values: [
available
,failed
,pending
,processing
]-
Array of JSON objects for notices, meaning warning or error messages, that are produced by the document ingestion process. The array does not include notices that are produced for child documents that are generated when a document is processed.
- Notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Information about the child documents that are generated from a single document during ingestion or other processing.
- Children
Indicates whether the child documents have any notices. The value is
false
if the document does not have child documents.Number of child documents. The value is
0
when processing of the document doesn't generate any child documents.
Name of the original source file (if available).
The type of the original source file, such as
csv
,excel
,html
,json
,pdf
,text
,word
, and so on.The SHA-256 hash of the original source file. The hash is formatted as a hexadecimal string.
Status Code
Successful response.
Missing project or collection.
{ "matching_results": 2, "documents": [ { "document_id": "4ffcfd8052005b99469e632506763bac_0" }, { "document_id": "4ffcfd8052005b99469e632506763bac_1" } ] }
{ "matching_results": 2, "documents": [ { "document_id": "4ffcfd8052005b99469e632506763bac_0" }, { "document_id": "4ffcfd8052005b99469e632506763bac_1" } ] }
Add a document
Add a document to a collection with optional metadata.
Returns immediately after the system has accepted the document for processing.
Use this method to upload a file to the collection. You cannot use this method to crawl an external data source.
-
For a list of supported file types, see the product documentation.
-
You must provide document content, metadata, or both. If the request is missing both document content and metadata, it is rejected.
-
You can set the Content-Type parameter on the file part to indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example,
application/octet-stream
), then the service attempts to automatically detect the document's media type. -
If the document is uploaded to a collection that shares its data with another collection, the X-Watson-Discovery-Force header must be set to
true
. -
In curl requests only, you can assign an ID to a document that you add by appending the ID to the endpoint (
/v2/projects/{project_id}/collections/{collection_id}/documents/{document_id}
). If a document already exists with the specified ID, it is replaced.
For more information about how certain file types and field names are handled when a file is added to a collection, see the product documentation.
Add a document to a collection with optional metadata.
Returns immediately after the system has accepted the document for processing.
Use this method to upload a file to the collection. You cannot use this method to crawl an external data source.
-
For a list of supported file types, see the product documentation.
-
You must provide document content, metadata, or both. If the request is missing both document content and metadata, it is rejected.
-
You can set the Content-Type parameter on the file part to indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example,
application/octet-stream
), then the service attempts to automatically detect the document's media type. -
If the document is uploaded to a collection that shares its data with another collection, the X-Watson-Discovery-Force header must be set to
true
. -
In curl requests only, you can assign an ID to a document that you add by appending the ID to the endpoint (
/v2/projects/{project_id}/collections/{collection_id}/documents/{document_id}
). If a document already exists with the specified ID, it is replaced.
For more information about how certain file types and field names are handled when a file is added to a collection, see the product documentation.
Add a document to a collection with optional metadata.
Returns immediately after the system has accepted the document for processing.
Use this method to upload a file to the collection. You cannot use this method to crawl an external data source.
-
For a list of supported file types, see the product documentation.
-
You must provide document content, metadata, or both. If the request is missing both document content and metadata, it is rejected.
-
You can set the Content-Type parameter on the file part to indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example,
application/octet-stream
), then the service attempts to automatically detect the document's media type. -
If the document is uploaded to a collection that shares its data with another collection, the X-Watson-Discovery-Force header must be set to
true
. -
In curl requests only, you can assign an ID to a document that you add by appending the ID to the endpoint (
/v2/projects/{project_id}/collections/{collection_id}/documents/{document_id}
). If a document already exists with the specified ID, it is replaced.
For more information about how certain file types and field names are handled when a file is added to a collection, see the product documentation.
Add a document to a collection with optional metadata.
Returns immediately after the system has accepted the document for processing.
Use this method to upload a file to the collection. You cannot use this method to crawl an external data source.
-
For a list of supported file types, see the product documentation.
-
You must provide document content, metadata, or both. If the request is missing both document content and metadata, it is rejected.
-
You can set the Content-Type parameter on the file part to indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example,
application/octet-stream
), then the service attempts to automatically detect the document's media type. -
If the document is uploaded to a collection that shares its data with another collection, the X-Watson-Discovery-Force header must be set to
true
. -
In curl requests only, you can assign an ID to a document that you add by appending the ID to the endpoint (
/v2/projects/{project_id}/collections/{collection_id}/documents/{document_id}
). If a document already exists with the specified ID, it is replaced.
For more information about how certain file types and field names are handled when a file is added to a collection, see the product documentation.
Add a document to a collection with optional metadata.
Returns immediately after the system has accepted the document for processing.
Use this method to upload a file to the collection. You cannot use this method to crawl an external data source.
-
For a list of supported file types, see the product documentation.
-
You must provide document content, metadata, or both. If the request is missing both document content and metadata, it is rejected.
-
You can set the Content-Type parameter on the file part to indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example,
application/octet-stream
), then the service attempts to automatically detect the document's media type. -
If the document is uploaded to a collection that shares its data with another collection, the X-Watson-Discovery-Force header must be set to
true
. -
In curl requests only, you can assign an ID to a document that you add by appending the ID to the endpoint (
/v2/projects/{project_id}/collections/{collection_id}/documents/{document_id}
). If a document already exists with the specified ID, it is replaced.
For more information about how certain file types and field names are handled when a file is added to a collection, see the product documentation.
POST /v2/projects/{project_id}/collections/{collection_id}/documents
ServiceCall<DocumentAccepted> addDocument(AddDocumentOptions addDocumentOptions)
addDocument(params)
add_document(
self,
project_id: str,
collection_id: str,
*,
file: BinaryIO = None,
filename: str = None,
file_content_type: str = None,
metadata: str = None,
x_watson_discovery_force: bool = None,
**kwargs,
) -> DetailedResponse
AddDocument(string projectId, string collectionId, System.IO.MemoryStream file = null, string filename = null, string fileContentType = null, string metadata = null, bool? xWatsonDiscoveryForce = null)
Request
Use the AddDocumentOptions.Builder
to create a AddDocumentOptions
object that contains the parameter values for the addDocument
method.
Custom Headers
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
Form Parameters
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }
The addDocument options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file. Values for this parameter can be obtained from the HttpMediaType class.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
curl -X POST {auth} --form "file=@{filename}" --form metadata="{\"field_name\": \"content\"}" "{url}/v2/projects/{project_id}/collections/{collection_id}/documents?version=2023-03-31"
Download example document sample1.html
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); DetailedResponse<DocumentAccepted> result = null; using (FileStream fs = File.OpenRead("path/to/file.pdf")) { using (MemoryStream ms = new MemoryStream()) { fs.CopyTo(ms); result = service.AddDocument( projectId: "{project_id}", collectionId: "{collection_id}", file: ms, filename: "example-file", fileContentType: "application/pdf" ); } } Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); File examplePdf = new File("path/to/file.pdf"); AddDocumentOptions options = new AddDocumentOptions.Builder() .projectId("{project_id}") .collectionId("{collection_id}") .file(examplePdf) .filename("example-file") .fileContentType("application/pdf") .build(); DocumentAccepted response = discovery.addDocument(options).execute().getResult(); System.out.println(response);
const fs = require('fs'); const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', collectionId: '{collectionId}', file: fs.createReadStream('path/to/file.pdf'), filename: 'example-file', fileContentType: 'application/pdf', }; discovery.addDocument(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json import os from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') with open(os.path.join(os.getcwd(), '{path_element}', '{filename}'),'rb') as fileinfo: response = discovery.add_document( project_id='{project_id}', collection_id='{}', file=fileinfo, filename='example-file', file_content_type='application/pdf' ).get_result() print(json.dumps(response, indent=2))
Response
Information returned after an uploaded document is accepted.
The unique identifier of the ingested document.
Status of the document in the ingestion process. A status of
processing
is returned for documents that are ingested with a version date before2019-01-01
. Thepending
status is returned for all others.Possible values: [
processing
,pending
]
Information returned after an uploaded document is accepted.
{
"document_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"status": "processing"
}
The unique identifier of the ingested document.
Status of the document in the ingestion process. A status of
processing
is returned for documents that are ingested with a version date before2019-01-01
. Thepending
status is returned for all others.Possible values: [
processing
,pending
]
Information returned after an uploaded document is accepted.
{
"document_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"status": "processing"
}
The unique identifier of the ingested document.
Status of the document in the ingestion process. A status of
processing
is returned for documents that are ingested with a version date before2019-01-01
. Thepending
status is returned for all others.Possible values: [
processing
,pending
]
Information returned after an uploaded document is accepted.
{
"document_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"status": "processing"
}
The unique identifier of the ingested document.
Status of the document in the ingestion process. A status of
processing
is returned for documents that are ingested with a version date before2019-01-01
. Thepending
status is returned for all others.Possible values: [
processing
,pending
]
Information returned after an uploaded document is accepted.
{
"document_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"status": "processing"
}
The unique identifier of the ingested document.
Status of the document in the ingestion process. A status of
processing
is returned for documents that are ingested with a version date before2019-01-01
. Thepending
status is returned for all others.Possible values: [
processing
,pending
]
Status Code
The document has been accepted and will be processed.
Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.
Forbidden. Returned if you attempt to add a document to a collection in a read-only project.
Not found. Returned if you attempt to add a document to a project that doesn't exist or if the collection specified isn't part of the specified project.
Too large. Returned if you attempt to add a document or document metadata that exceeds the maximum possible.
Unsupported. Returned if the media type of the uploaded document is not supported by Discovery..
{ "document_id": "f1360220-ea2d-4271-9d62-89a910b13c37", "status": "processing" }
{ "document_id": "f1360220-ea2d-4271-9d62-89a910b13c37", "status": "processing" }
Get document details
Get details about a specific document, whether the document is added by uploading a file or by crawling an external data source.
Note: This method is available only from Cloud Pak for Data version 4.0.9 and later installed instances, and from IBM Cloud-managed instances.
Get details about a specific document, whether the document is added by uploading a file or by crawling an external data source.
Note: This method is available only from Cloud Pak for Data version 4.0.9 and later installed instances and from Plus and Enterprise plan IBM Cloud-managed instances. It is not currently available from Premium plan instances.
Get details about a specific document, whether the document is added by uploading a file or by crawling an external data source.
Note: This method is available only from Cloud Pak for Data version 4.0.9 and later installed instances and from Plus and Enterprise plan IBM Cloud-managed instances. It is not currently available from Premium plan instances.
Get details about a specific document, whether the document is added by uploading a file or by crawling an external data source.
Note: This method is available only from Cloud Pak for Data version 4.0.9 and later installed instances and from Plus and Enterprise plan IBM Cloud-managed instances. It is not currently available from Premium plan instances.
Get details about a specific document, whether the document is added by uploading a file or by crawling an external data source.
Note: This method is available only from Cloud Pak for Data version 4.0.9 and later installed instances and from Plus and Enterprise plan IBM Cloud-managed instances. It is not currently available from Premium plan instances.
GET /v2/projects/{project_id}/collections/{collection_id}/documents/{document_id}
ServiceCall<DocumentDetails> getDocument(GetDocumentOptions getDocumentOptions)
getDocument(params)
get_document(
self,
project_id: str,
collection_id: str,
document_id: str,
**kwargs,
) -> DetailedResponse
GetDocument(string projectId, string collectionId, string documentId)
Request
Use the GetDocumentOptions.Builder
to create a GetDocumentOptions
object that contains the parameter values for the getDocument
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The getDocument options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/collections/{collection_id}/documents/{document_id}?version=2023-03-31"
Response
Information about a document.
The unique identifier of the document.
Date and time that the document is added to the collection. For a child document, the date and time when the process that generates the child document runs. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date and time that the document is finished being processed and is indexed. This date changes whenever the document is reprocessed, including for enrichment changes. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.The status of the ingestion of the document. The possible values are:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
Possible values: [
available
,failed
,pending
,processing
]-
Array of JSON objects for notices, meaning warning or error messages, that are produced by the document ingestion process. The array does not include notices that are produced for child documents that are generated when a document is processed.
Information about the child documents that are generated from a single document during ingestion or other processing.
- children
Indicates whether the child documents have any notices. The value is
false
if the document does not have child documents.Number of child documents. The value is
0
when processing of the document doesn't generate any child documents.
Name of the original source file (if available).
The type of the original source file, such as
csv
,excel
,html
,json
,pdf
,text
,word
, and so on.The SHA-256 hash of the original source file. The hash is formatted as a hexadecimal string.
Information about a document.
The unique identifier of the document.
Date and time that the document is added to the collection. For a child document, the date and time when the process that generates the child document runs. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date and time that the document is finished being processed and is indexed. This date changes whenever the document is reprocessed, including for enrichment changes. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.The status of the ingestion of the document. The possible values are:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
Possible values: [
available
,failed
,pending
,processing
]-
Array of JSON objects for notices, meaning warning or error messages, that are produced by the document ingestion process. The array does not include notices that are produced for child documents that are generated when a document is processed.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Information about the child documents that are generated from a single document during ingestion or other processing.
- children
Indicates whether the child documents have any notices. The value is
false
if the document does not have child documents.Number of child documents. The value is
0
when processing of the document doesn't generate any child documents.
Name of the original source file (if available).
The type of the original source file, such as
csv
,excel
,html
,json
,pdf
,text
,word
, and so on.The SHA-256 hash of the original source file. The hash is formatted as a hexadecimal string.
Information about a document.
The unique identifier of the document.
Date and time that the document is added to the collection. For a child document, the date and time when the process that generates the child document runs. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date and time that the document is finished being processed and is indexed. This date changes whenever the document is reprocessed, including for enrichment changes. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.The status of the ingestion of the document. The possible values are:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
Possible values: [
available
,failed
,pending
,processing
]-
Array of JSON objects for notices, meaning warning or error messages, that are produced by the document ingestion process. The array does not include notices that are produced for child documents that are generated when a document is processed.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Information about the child documents that are generated from a single document during ingestion or other processing.
- children
Indicates whether the child documents have any notices. The value is
false
if the document does not have child documents.Number of child documents. The value is
0
when processing of the document doesn't generate any child documents.
Name of the original source file (if available).
The type of the original source file, such as
csv
,excel
,html
,json
,pdf
,text
,word
, and so on.The SHA-256 hash of the original source file. The hash is formatted as a hexadecimal string.
Information about a document.
The unique identifier of the document.
Date and time that the document is added to the collection. For a child document, the date and time when the process that generates the child document runs. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date and time that the document is finished being processed and is indexed. This date changes whenever the document is reprocessed, including for enrichment changes. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.The status of the ingestion of the document. The possible values are:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
Possible values: [
available
,failed
,pending
,processing
]-
Array of JSON objects for notices, meaning warning or error messages, that are produced by the document ingestion process. The array does not include notices that are produced for child documents that are generated when a document is processed.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Information about the child documents that are generated from a single document during ingestion or other processing.
- children
Indicates whether the child documents have any notices. The value is
false
if the document does not have child documents.Number of child documents. The value is
0
when processing of the document doesn't generate any child documents.
Name of the original source file (if available).
The type of the original source file, such as
csv
,excel
,html
,json
,pdf
,text
,word
, and so on.The SHA-256 hash of the original source file. The hash is formatted as a hexadecimal string.
Information about a document.
The unique identifier of the document.
Date and time that the document is added to the collection. For a child document, the date and time when the process that generates the child document runs. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.Date and time that the document is finished being processed and is indexed. This date changes whenever the document is reprocessed, including for enrichment changes. The date-time format is
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
.The status of the ingestion of the document. The possible values are:
-
available
: Ingestion is finished and the document is indexed. -
failed
: Ingestion is finished, but the document is not indexed because of an error. -
pending
: The document is uploaded, but the ingestion process is not started. -
processing
: Ingestion is in progress.
Possible values: [
available
,failed
,pending
,processing
]-
Array of JSON objects for notices, meaning warning or error messages, that are produced by the document ingestion process. The array does not include notices that are produced for child documents that are generated when a document is processed.
- Notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Information about the child documents that are generated from a single document during ingestion or other processing.
- Children
Indicates whether the child documents have any notices. The value is
false
if the document does not have child documents.Number of child documents. The value is
0
when processing of the document doesn't generate any child documents.
Name of the original source file (if available).
The type of the original source file, such as
csv
,excel
,html
,json
,pdf
,text
,word
, and so on.The SHA-256 hash of the original source file. The hash is formatted as a hexadecimal string.
Status Code
Returns the specified document details.
Missing project, collection, or document.
{ "document_id": "4ffcfd8052005b99469e632506763bac_0", "created": "2022-04-05T07:01:55.863Z", "updated": "2022-04-21T06:27:26.509Z", "status": "available", "notices": [], "children": { "count": 0, "have_notices": false }, "filename": "Proposal.docx", "file_type": "word" }
{ "document_id": "4ffcfd8052005b99469e632506763bac_0", "created": "2022-04-05T07:01:55.863Z", "updated": "2022-04-21T06:27:26.509Z", "status": "available", "notices": [], "children": { "count": 0, "have_notices": false }, "filename": "Proposal.docx", "file_type": "word" }
Update a document
Replace an existing document or add a document with a specified document ID. Starts ingesting a document with optional metadata.
Use this method to upload a file to a collection. You cannot use this method to crawl an external data source.
If the document is uploaded to a collection that shares its data with another collection, the X-Watson-Discovery-Force header must be set to true
.
Notes:
-
Uploading a new document with this method automatically replaces any existing document stored with the same document ID.
-
If an uploaded document is split into child documents during ingestion, all existing child documents are overwritten, even if the updated version of the document has fewer child documents.
Replace an existing document or add a document with a specified document ID. Starts ingesting a document with optional metadata.
Use this method to upload a file to a collection. You cannot use this method to crawl an external data source.
If the document is uploaded to a collection that shares its data with another collection, the X-Watson-Discovery-Force header must be set to true
.
Notes:
-
Uploading a new document with this method automatically replaces any existing document stored with the same document ID.
-
If an uploaded document is split into child documents during ingestion, all existing child documents are overwritten, even if the updated version of the document has fewer child documents.
Replace an existing document or add a document with a specified document ID. Starts ingesting a document with optional metadata.
Use this method to upload a file to a collection. You cannot use this method to crawl an external data source.
If the document is uploaded to a collection that shares its data with another collection, the X-Watson-Discovery-Force header must be set to true
.
Notes:
-
Uploading a new document with this method automatically replaces any existing document stored with the same document ID.
-
If an uploaded document is split into child documents during ingestion, all existing child documents are overwritten, even if the updated version of the document has fewer child documents.
Replace an existing document or add a document with a specified document ID. Starts ingesting a document with optional metadata.
Use this method to upload a file to a collection. You cannot use this method to crawl an external data source.
If the document is uploaded to a collection that shares its data with another collection, the X-Watson-Discovery-Force header must be set to true
.
Notes:
-
Uploading a new document with this method automatically replaces any existing document stored with the same document ID.
-
If an uploaded document is split into child documents during ingestion, all existing child documents are overwritten, even if the updated version of the document has fewer child documents.
Replace an existing document or add a document with a specified document ID. Starts ingesting a document with optional metadata.
Use this method to upload a file to a collection. You cannot use this method to crawl an external data source.
If the document is uploaded to a collection that shares its data with another collection, the X-Watson-Discovery-Force header must be set to true
.
Notes:
-
Uploading a new document with this method automatically replaces any existing document stored with the same document ID.
-
If an uploaded document is split into child documents during ingestion, all existing child documents are overwritten, even if the updated version of the document has fewer child documents.
POST /v2/projects/{project_id}/collections/{collection_id}/documents/{document_id}
ServiceCall<DocumentAccepted> updateDocument(UpdateDocumentOptions updateDocumentOptions)
updateDocument(params)
update_document(
self,
project_id: str,
collection_id: str,
document_id: str,
*,
file: BinaryIO = None,
filename: str = None,
file_content_type: str = None,
metadata: str = None,
x_watson_discovery_force: bool = None,
**kwargs,
) -> DetailedResponse
UpdateDocument(string projectId, string collectionId, string documentId, System.IO.MemoryStream file = null, string filename = null, string fileContentType = null, string metadata = null, bool? xWatsonDiscoveryForce = null)
Request
Use the UpdateDocumentOptions.Builder
to create a UpdateDocumentOptions
object that contains the parameter values for the updateDocument
method.
Custom Headers
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
Form Parameters
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }
The updateDocument options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file. Values for this parameter can be obtained from the HttpMediaType class.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
curl -X POST {auth} --form "file=@{filename}" --form metadata="{\"field_name\": \"content\"}" "{url}/v2/projects/{project_id}/collections/{collection_id}/documents/{document_id}?version=2023-03-31"
Download example document sample1.html
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.UpdateDocument( projectId: "{project_id}", collectionId: "{collection_id}", documentId: "{document_id}", filename: "updated-file-name" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); UpdateDocumentOptions options = new UpdateDocumentOptions.Builder() .projectId("{project_id}") .collectionId("{collection_id}") .documentId("{document_id}") .metadata("{ "metadata": "value" }") .build(); DocumentAccepted response = discovery.updateDocument(options).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', collectionId: '{collectionId}', documentId: '{documentId}', metadata: '{"metadata": "value"}', }; discovery.updateDocument(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json import os from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') with open(os.path.join(os.getcwd(), '{path_element}', '{filename}'),'rb') as fileinfo: response = discovery.update_document( project_id='{project_id}', collection_id='{collection_id}', document_id='{document_id}', file=fileinfo, filename='example-file', fileinfo='application/pdf' ).get_result() print(json.dumps(response, indent=2))
Response
Information returned after an uploaded document is accepted.
The unique identifier of the ingested document.
Status of the document in the ingestion process. A status of
processing
is returned for documents that are ingested with a version date before2019-01-01
. Thepending
status is returned for all others.Possible values: [
processing
,pending
]
Information returned after an uploaded document is accepted.
{
"document_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"status": "processing"
}
The unique identifier of the ingested document.
Status of the document in the ingestion process. A status of
processing
is returned for documents that are ingested with a version date before2019-01-01
. Thepending
status is returned for all others.Possible values: [
processing
,pending
]
Information returned after an uploaded document is accepted.
{
"document_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"status": "processing"
}
The unique identifier of the ingested document.
Status of the document in the ingestion process. A status of
processing
is returned for documents that are ingested with a version date before2019-01-01
. Thepending
status is returned for all others.Possible values: [
processing
,pending
]
Information returned after an uploaded document is accepted.
{
"document_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"status": "processing"
}
The unique identifier of the ingested document.
Status of the document in the ingestion process. A status of
processing
is returned for documents that are ingested with a version date before2019-01-01
. Thepending
status is returned for all others.Possible values: [
processing
,pending
]
Information returned after an uploaded document is accepted.
{
"document_id": "f1360220-ea2d-4271-9d62-89a910b13c37",
"status": "processing"
}
The unique identifier of the ingested document.
Status of the document in the ingestion process. A status of
processing
is returned for documents that are ingested with a version date before2019-01-01
. Thepending
status is returned for all others.Possible values: [
processing
,pending
]
Status Code
The document has been accepted and will be processed.
Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.
Forbidden. Returned if you attempt to add a document to a collection in a read-only project.
Not found. Returned if the project, collection, or document ID is missing or incorrect.
Too large. Returned if you attempt to add a document or document metadata that exceeds the maximum possible.
Unsupported. Returned if the media type of the uploaded document is not supported by Discovery..
{ "document_id": "f1360220-ea2d-4271-9d62-89a910b13c37", "status": "processing" }
{ "document_id": "f1360220-ea2d-4271-9d62-89a910b13c37", "status": "processing" }
Delete a document
Deletes the document with the document ID that you specify from the collection. Removes uploaded documents from the collection permanently. If you delete a document that was added by crawling an external data source, the document will be added again with the next scheduled crawl of the data source. The delete function removes the document from the collection, not from the external data source.
Note: Files such as CSV or JSON files generate subdocuments when they are added to a collection. If you delete a subdocument, and then repeat the action that created it, the deleted document is added back in to your collection. To remove subdocuments that are generated by an uploaded file, delete the original document instead. You can get the document ID of the original document from the parent_document_id
of the subdocument result.
Deletes the document with the document ID that you specify from the collection. Removes uploaded documents from the collection permanently. If you delete a document that was added by crawling an external data source, the document will be added again with the next scheduled crawl of the data source. The delete function removes the document from the collection, not from the external data source.
Note: Files such as CSV or JSON files generate subdocuments when they are added to a collection. If you delete a subdocument, and then repeat the action that created it, the deleted document is added back in to your collection. To remove subdocuments that are generated by an uploaded file, delete the original document instead. You can get the document ID of the original document from the parent_document_id
of the subdocument result.
Deletes the document with the document ID that you specify from the collection. Removes uploaded documents from the collection permanently. If you delete a document that was added by crawling an external data source, the document will be added again with the next scheduled crawl of the data source. The delete function removes the document from the collection, not from the external data source.
Note: Files such as CSV or JSON files generate subdocuments when they are added to a collection. If you delete a subdocument, and then repeat the action that created it, the deleted document is added back in to your collection. To remove subdocuments that are generated by an uploaded file, delete the original document instead. You can get the document ID of the original document from the parent_document_id
of the subdocument result.
Deletes the document with the document ID that you specify from the collection. Removes uploaded documents from the collection permanently. If you delete a document that was added by crawling an external data source, the document will be added again with the next scheduled crawl of the data source. The delete function removes the document from the collection, not from the external data source.
Note: Files such as CSV or JSON files generate subdocuments when they are added to a collection. If you delete a subdocument, and then repeat the action that created it, the deleted document is added back in to your collection. To remove subdocuments that are generated by an uploaded file, delete the original document instead. You can get the document ID of the original document from the parent_document_id
of the subdocument result.
Deletes the document with the document ID that you specify from the collection. Removes uploaded documents from the collection permanently. If you delete a document that was added by crawling an external data source, the document will be added again with the next scheduled crawl of the data source. The delete function removes the document from the collection, not from the external data source.
Note: Files such as CSV or JSON files generate subdocuments when they are added to a collection. If you delete a subdocument, and then repeat the action that created it, the deleted document is added back in to your collection. To remove subdocuments that are generated by an uploaded file, delete the original document instead. You can get the document ID of the original document from the parent_document_id
of the subdocument result.
DELETE /v2/projects/{project_id}/collections/{collection_id}/documents/{document_id}
ServiceCall<DeleteDocumentResponse> deleteDocument(DeleteDocumentOptions deleteDocumentOptions)
deleteDocument(params)
delete_document(
self,
project_id: str,
collection_id: str,
document_id: str,
*,
x_watson_discovery_force: bool = None,
**kwargs,
) -> DetailedResponse
DeleteDocument(string projectId, string collectionId, string documentId, bool? xWatsonDiscoveryForce = null)
Request
Use the DeleteDocumentOptions.Builder
to create a DeleteDocumentOptions
object that contains the parameter values for the deleteDocument
method.
Custom Headers
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The deleteDocument options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the document.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
When
true
, the uploaded document is added to the collection even if the data for that collection is shared with other collections.Default:
false
curl -X DELETE {auth} "{url}/v2/projects/{project_id}/collections/{collection_id}/documents/{document_id}?version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.DeleteDocument( projectId: "{project_id}", collectionId: "{collection_id}", documentId: "{document_id}" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); DeleteDocumentOptions options = new DeleteDocumentOptions.Builder() .projectId("{project_id}") .collectionId("{collection_id}") .documentId("{document_id}") .build(); DeleteDocumentResponse response = discovery.deleteDocument(options).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', collectionId: '{collectionId}', documentId: '{documentId}', }; discovery.deleteDocument(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.delete_document( project_id='{project_id}', collection_id='{collection_id}', document_id='{document_id}' ).get_result() print(json.dumps(response, indent=2))
Response
Information returned when a document is deleted.
The unique identifier of the document.
Status of the document. A deleted document has the status deleted.
Possible values: [
deleted
]
Information returned when a document is deleted.
The unique identifier of the document.
Status of the document. A deleted document has the status deleted.
Possible values: [
deleted
]
Information returned when a document is deleted.
The unique identifier of the document.
Status of the document. A deleted document has the status deleted.
Possible values: [
deleted
]
Information returned when a document is deleted.
The unique identifier of the document.
Status of the document. A deleted document has the status deleted.
Possible values: [
deleted
]
Information returned when a document is deleted.
The unique identifier of the document.
Status of the document. A deleted document has the status deleted.
Possible values: [
deleted
]
Status Code
The document was successfully deleted.
Forbidden. Returned if you attempt to delete a document in a collection that connects automatically to an external source.
Not found. Returned if the project, collection, or document ID is missing or incorrect.
No Sample Response
Query a project
Search your data by submitting queries that are written in natural language or formatted in the Discovery Query Language. For more information, see the Discovery documentation. The default query parameters differ by project type. For more information about the project default settings, see the Discovery documentation. See the Projects API documentation for details about how to set custom default query settings.
The length of the UTF-8 encoding of the POST body cannot exceed 10,000 bytes, which is roughly equivalent to 10,000 characters in English.
Search your data by submitting queries that are written in natural language or formatted in the Discovery Query Language. For more information, see the Discovery documentation. The default query parameters differ by project type. For more information about the project default settings, see the Discovery documentation. See the Projects API documentation for details about how to set custom default query settings.
The length of the UTF-8 encoding of the POST body cannot exceed 10,000 bytes, which is roughly equivalent to 10,000 characters in English.
Search your data by submitting queries that are written in natural language or formatted in the Discovery Query Language. For more information, see the Discovery documentation. The default query parameters differ by project type. For more information about the project default settings, see the Discovery documentation. See the Projects API documentation for details about how to set custom default query settings.
The length of the UTF-8 encoding of the POST body cannot exceed 10,000 bytes, which is roughly equivalent to 10,000 characters in English.
Search your data by submitting queries that are written in natural language or formatted in the Discovery Query Language. For more information, see the Discovery documentation. The default query parameters differ by project type. For more information about the project default settings, see the Discovery documentation. See the Projects API documentation for details about how to set custom default query settings.
The length of the UTF-8 encoding of the POST body cannot exceed 10,000 bytes, which is roughly equivalent to 10,000 characters in English.
Search your data by submitting queries that are written in natural language or formatted in the Discovery Query Language. For more information, see the Discovery documentation. The default query parameters differ by project type. For more information about the project default settings, see the Discovery documentation. See the Projects API documentation for details about how to set custom default query settings.
The length of the UTF-8 encoding of the POST body cannot exceed 10,000 bytes, which is roughly equivalent to 10,000 characters in English.
POST /v2/projects/{project_id}/query
ServiceCall<QueryResponse> query(QueryOptions queryOptions)
query(params)
query(
self,
project_id: str,
*,
collection_ids: List[str] = None,
filter: str = None,
query: str = None,
natural_language_query: str = None,
aggregation: str = None,
count: int = None,
return_: List[str] = None,
offset: int = None,
sort: str = None,
highlight: bool = None,
spelling_suggestions: bool = None,
table_results: 'QueryLargeTableResults' = None,
suggested_refinements: 'QueryLargeSuggestedRefinements' = None,
passages: 'QueryLargePassages' = None,
similar: 'QueryLargeSimilar' = None,
**kwargs,
) -> DetailedResponse
Query(string projectId, List<string> collectionIds = null, string filter = null, string query = null, string naturalLanguageQuery = null, string aggregation = null, long? count = null, List<string> _return = null, long? offset = null, string sort = null, bool? highlight = null, bool? spellingSuggestions = null, QueryLargeTableResults tableResults = null, QueryLargeSuggestedRefinements suggestedRefinements = null, QueryLargePassages passages = null, QueryLargeSimilar similar = null)
Request
Use the QueryOptions.Builder
to create a QueryOptions
object that contains the parameter values for the query
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
An object that represents the query to be submitted.
A comma-separated list of collection IDs to be queried against.
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the aggregation, query, or natural_language_query parameters, the filter parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.
A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using training data and natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For more information about the supported types of aggregations, see the Discovery documentation.
Number of results to return.
A list of the fields in the document hierarchy to return. You can specify both root-level (
text
) and nested (extracted_metadata.filename
) fields. If this parameter is an empty list, then all fields are returned.The number of query results to skip at the beginning. Consider that the
count
is set to 10 (the default value) and the total number of results that are returned is 100. In this case, the following examples show the returned results for differentoffset
values:-
If
offset
is set to 95, it returns the last 5 results. -
If
offset
is set to 10, it returns the second batch of 10 results. -
If
offset
is set to 100 or more, it returns empty results.
-
A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with
-
for descending or+
for ascending. Ascending is the default sort direction if no prefix is specified.When
true
, a highlight field is returned for each result that contains fields that match the query. The matching query terms are emphasized with surrounding<em></em>
tags. This parameter is ignored if passages.enabled and passages.per_document aretrue
, in which case passages are returned for each document instead of highlights.When
true
and the natural_language_query parameter is used, the natural_language_query parameter is spell checked. The most likely correction is returned in the suggested_query field of the response (if one exists).Configuration for table retrieval
- table_results
Whether to enable table retrieval.
Maximum number of tables to return.
Configuration for suggested refinements.
Note: The suggested_refinements parameter that identified dynamic facets from the data is deprecated.
- suggested_refinements
Whether to perform suggested refinements.
Maximum number of suggested refinements texts to be returned. The maximum is
100
.Possible values: 1 ≤ value ≤ 100
Configuration for passage retrieval.
- passages
A passages query that returns the most relevant passages from the results.
If
true
, ranks the documents by document quality, and then returns the highest-ranked passages per document in adocument_passages
field for each document entry in the results list of the response.If
false
, ranks the passages from all of the documents by passage quality regardless of the document quality and returns them in a separatepassages
field in the response.Maximum number of passages to return per document in the result. Ignored if passages.per_document is
false
.A list of fields to extract passages from. By default, passages are extracted from the
text
andtitle
fields only. If you add this parameter and specify an empty list ([]
) as its value, then the service searches all root-level fields for suitable passages.The maximum number of passages to return. Ignored if passages.per_document is
true
.Possible values: value ≤ 400
The approximate number of characters that any one passage will have.
Possible values: 50 ≤ value ≤ 2000
When true,
answer
objects are returned as part of each passage in the query results. The primary difference between ananswer
and apassage
is that the length of a passage is defined by the query, where the length of ananswer
is calculated by Discovery based on how much text is needed to answer the question.This parameter is ignored if passages are not enabled for the query, or no natural_language_query is specified.
If the find_answers parameter is set to
true
and per_document parameter is also set totrue
, then the document search results and the passage search results within each document are reordered using the answer confidences. The goal of this reordering is to place the best answer as the first answer of the first passage of the first document. Similarly, if the find_answers parameter is set totrue
and per_document parameter is set tofalse
, then the passage search results are reordered in decreasing order of the highest confidence answer for each document and passage.The find_answers parameter is available only on managed instances of Discovery.
Default:
false
The number of
answer
objects to return per passage if the find_answers parmeter is specified astrue
.Default:
1
Finds results from documents that are similar to documents of interest. Use this parameter to add a More like these function to your search. You can include this parameter with or without a query, filter or natural_language_query parameter.
- similar
When
true
, includes documents in the query results that are similar to documents you specify.Default:
false
The list of documents of interest. Required if enabled is
true
.Looks for similarities in the specified subset of fields in the documents. If not specified, all of the document fields are used.
The query options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A comma-separated list of collection IDs to be queried against.
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the aggregation, query, or natural_language_query parameters, the filter parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.
A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using training data and natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For more information about the supported types of aggregations, see the Discovery documentation.
Number of results to return.
A list of the fields in the document hierarchy to return. You can specify both root-level (
text
) and nested (extracted_metadata.filename
) fields. If this parameter is an empty list, then all fields are returned.The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results.
A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with
-
for descending or+
for ascending. Ascending is the default sort direction if no prefix is specified.When
true
, a highlight field is returned for each result that contains fields that match the query. The matching query terms are emphasized with surrounding<em></em>
tags. This parameter is ignored if passages.enabled and passages.per_document aretrue
, in which case passages are returned for each document instead of highlights.When
true
and the natural_language_query parameter is used, the natural_language_query parameter is spell checked. The most likely correction is returned in the suggested_query field of the response (if one exists).Configuration for table retrieval.
- tableResults
Whether to enable table retrieval.
Maximum number of tables to return.
Configuration for suggested refinements.
Note: The suggested_refinements parameter that identified dynamic facets from the data is deprecated.
- suggestedRefinements
Whether to perform suggested refinements.
Maximum number of suggested refinements texts to be returned. The maximum is
100
.Possible values: 1 ≤ value ≤ 100
Configuration for passage retrieval.
- passages
A passages query that returns the most relevant passages from the results.
If
true
, ranks the documents by document quality, and then returns the highest-ranked passages per document in adocument_passages
field for each document entry in the results list of the response.If
false
, ranks the passages from all of the documents by passage quality regardless of the document quality and returns them in a separatepassages
field in the response.Maximum number of passages to return per document in the result. Ignored if passages.per_document is
false
.A list of fields to extract passages from. By default, passages are extracted from the
text
andtitle
fields only. If you add this parameter and specify an empty list ([]
) as its value, then the service searches all root-level fields for suitable passages.The maximum number of passages to return. Ignored if passages.per_document is
true
.Possible values: value ≤ 400
The approximate number of characters that any one passage will have.
Possible values: 50 ≤ value ≤ 2000
When true,
answer
objects are returned as part of each passage in the query results. The primary difference between ananswer
and apassage
is that the length of a passage is defined by the query, where the length of ananswer
is calculated by Discovery based on how much text is needed to answer the question.This parameter is ignored if passages are not enabled for the query, or no natural_language_query is specified.
If the find_answers parameter is set to
true
and per_document parameter is also set totrue
, then the document search results and the passage search results within each document are reordered using the answer confidences. The goal of this reordering is to place the best answer as the first answer of the first passage of the first document. Similarly, if the find_answers parameter is set totrue
and per_document parameter is set tofalse
, then the passage search results are reordered in decreasing order of the highest confidence answer for each document and passage.The find_answers parameter is available only on managed instances of Discovery.
Default:
false
The number of
answer
objects to return per passage if the find_answers parmeter is specified astrue
.Default:
1
Finds results from documents that are similar to documents of interest. Use this parameter to add a More like these function to your search. You can include this parameter with or without a query, filter or natural_language_query parameter.
- similar
When
true
, includes documents in the query results that are similar to documents you specify.Default:
false
The list of documents of interest. Required if enabled is
true
.Looks for similarities in the specified subset of fields in the documents. If not specified, all of the document fields are used.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A comma-separated list of collection IDs to be queried against.
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the aggregation, query, or natural_language_query parameters, the filter parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.
A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using training data and natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For more information about the supported types of aggregations, see the Discovery documentation.
Number of results to return.
A list of the fields in the document hierarchy to return. You can specify both root-level (
text
) and nested (extracted_metadata.filename
) fields. If this parameter is an empty list, then all fields are returned.The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results.
A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with
-
for descending or+
for ascending. Ascending is the default sort direction if no prefix is specified.When
true
, a highlight field is returned for each result that contains fields that match the query. The matching query terms are emphasized with surrounding<em></em>
tags. This parameter is ignored if passages.enabled and passages.per_document aretrue
, in which case passages are returned for each document instead of highlights.When
true
and the natural_language_query parameter is used, the natural_language_query parameter is spell checked. The most likely correction is returned in the suggested_query field of the response (if one exists).Configuration for table retrieval.
- tableResults
Whether to enable table retrieval.
Maximum number of tables to return.
Configuration for suggested refinements.
Note: The suggested_refinements parameter that identified dynamic facets from the data is deprecated.
- suggestedRefinements
Whether to perform suggested refinements.
Maximum number of suggested refinements texts to be returned. The maximum is
100
.Possible values: 1 ≤ value ≤ 100
Configuration for passage retrieval.
- passages
A passages query that returns the most relevant passages from the results.
If
true
, ranks the documents by document quality, and then returns the highest-ranked passages per document in adocument_passages
field for each document entry in the results list of the response.If
false
, ranks the passages from all of the documents by passage quality regardless of the document quality and returns them in a separatepassages
field in the response.Maximum number of passages to return per document in the result. Ignored if passages.per_document is
false
.A list of fields to extract passages from. By default, passages are extracted from the
text
andtitle
fields only. If you add this parameter and specify an empty list ([]
) as its value, then the service searches all root-level fields for suitable passages.The maximum number of passages to return. Ignored if passages.per_document is
true
.Possible values: value ≤ 400
The approximate number of characters that any one passage will have.
Possible values: 50 ≤ value ≤ 2000
When true,
answer
objects are returned as part of each passage in the query results. The primary difference between ananswer
and apassage
is that the length of a passage is defined by the query, where the length of ananswer
is calculated by Discovery based on how much text is needed to answer the question.This parameter is ignored if passages are not enabled for the query, or no natural_language_query is specified.
If the find_answers parameter is set to
true
and per_document parameter is also set totrue
, then the document search results and the passage search results within each document are reordered using the answer confidences. The goal of this reordering is to place the best answer as the first answer of the first passage of the first document. Similarly, if the find_answers parameter is set totrue
and per_document parameter is set tofalse
, then the passage search results are reordered in decreasing order of the highest confidence answer for each document and passage.The find_answers parameter is available only on managed instances of Discovery.
Default:
false
The number of
answer
objects to return per passage if the find_answers parmeter is specified astrue
.Default:
1
Finds results from documents that are similar to documents of interest. Use this parameter to add a More like these function to your search. You can include this parameter with or without a query, filter or natural_language_query parameter.
- similar
When
true
, includes documents in the query results that are similar to documents you specify.Default:
false
The list of documents of interest. Required if enabled is
true
.Looks for similarities in the specified subset of fields in the documents. If not specified, all of the document fields are used.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A comma-separated list of collection IDs to be queried against.
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the aggregation, query, or natural_language_query parameters, the filter parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.
A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using training data and natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For more information about the supported types of aggregations, see the Discovery documentation.
Number of results to return.
A list of the fields in the document hierarchy to return. You can specify both root-level (
text
) and nested (extracted_metadata.filename
) fields. If this parameter is an empty list, then all fields are returned.The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results.
A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with
-
for descending or+
for ascending. Ascending is the default sort direction if no prefix is specified.When
true
, a highlight field is returned for each result that contains fields that match the query. The matching query terms are emphasized with surrounding<em></em>
tags. This parameter is ignored if passages.enabled and passages.per_document aretrue
, in which case passages are returned for each document instead of highlights.When
true
and the natural_language_query parameter is used, the natural_language_query parameter is spell checked. The most likely correction is returned in the suggested_query field of the response (if one exists).Configuration for table retrieval.
- table_results
Whether to enable table retrieval.
Maximum number of tables to return.
Configuration for suggested refinements.
Note: The suggested_refinements parameter that identified dynamic facets from the data is deprecated.
- suggested_refinements
Whether to perform suggested refinements.
Maximum number of suggested refinements texts to be returned. The maximum is
100
.Possible values: 1 ≤ value ≤ 100
Configuration for passage retrieval.
- passages
A passages query that returns the most relevant passages from the results.
If
true
, ranks the documents by document quality, and then returns the highest-ranked passages per document in adocument_passages
field for each document entry in the results list of the response.If
false
, ranks the passages from all of the documents by passage quality regardless of the document quality and returns them in a separatepassages
field in the response.Maximum number of passages to return per document in the result. Ignored if passages.per_document is
false
.A list of fields to extract passages from. By default, passages are extracted from the
text
andtitle
fields only. If you add this parameter and specify an empty list ([]
) as its value, then the service searches all root-level fields for suitable passages.The maximum number of passages to return. Ignored if passages.per_document is
true
.Possible values: value ≤ 400
The approximate number of characters that any one passage will have.
Possible values: 50 ≤ value ≤ 2000
When true,
answer
objects are returned as part of each passage in the query results. The primary difference between ananswer
and apassage
is that the length of a passage is defined by the query, where the length of ananswer
is calculated by Discovery based on how much text is needed to answer the question.This parameter is ignored if passages are not enabled for the query, or no natural_language_query is specified.
If the find_answers parameter is set to
true
and per_document parameter is also set totrue
, then the document search results and the passage search results within each document are reordered using the answer confidences. The goal of this reordering is to place the best answer as the first answer of the first passage of the first document. Similarly, if the find_answers parameter is set totrue
and per_document parameter is set tofalse
, then the passage search results are reordered in decreasing order of the highest confidence answer for each document and passage.The find_answers parameter is available only on managed instances of Discovery.
Default:
false
The number of
answer
objects to return per passage if the find_answers parmeter is specified astrue
.Default:
1
Finds results from documents that are similar to documents of interest. Use this parameter to add a More like these function to your search. You can include this parameter with or without a query, filter or natural_language_query parameter.
- similar
When
true
, includes documents in the query results that are similar to documents you specify.Default:
false
The list of documents of interest. Required if enabled is
true
.Looks for similarities in the specified subset of fields in the documents. If not specified, all of the document fields are used.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A comma-separated list of collection IDs to be queried against.
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the aggregation, query, or natural_language_query parameters, the filter parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.
A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using training data and natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For more information about the supported types of aggregations, see the Discovery documentation.
Number of results to return.
A list of the fields in the document hierarchy to return. You can specify both root-level (
text
) and nested (extracted_metadata.filename
) fields. If this parameter is an empty list, then all fields are returned.The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results.
A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with
-
for descending or+
for ascending. Ascending is the default sort direction if no prefix is specified.When
true
, a highlight field is returned for each result that contains fields that match the query. The matching query terms are emphasized with surrounding<em></em>
tags. This parameter is ignored if passages.enabled and passages.per_document aretrue
, in which case passages are returned for each document instead of highlights.When
true
and the natural_language_query parameter is used, the natural_language_query parameter is spell checked. The most likely correction is returned in the suggested_query field of the response (if one exists).Configuration for table retrieval.
- tableResults
Whether to enable table retrieval.
Maximum number of tables to return.
Configuration for suggested refinements.
Note: The suggested_refinements parameter that identified dynamic facets from the data is deprecated.
- suggestedRefinements
Whether to perform suggested refinements.
Maximum number of suggested refinements texts to be returned. The maximum is
100
.Possible values: 1 ≤ value ≤ 100
Configuration for passage retrieval.
- passages
A passages query that returns the most relevant passages from the results.
If
true
, ranks the documents by document quality, and then returns the highest-ranked passages per document in adocument_passages
field for each document entry in the results list of the response.If
false
, ranks the passages from all of the documents by passage quality regardless of the document quality and returns them in a separatepassages
field in the response.Maximum number of passages to return per document in the result. Ignored if passages.per_document is
false
.A list of fields to extract passages from. By default, passages are extracted from the
text
andtitle
fields only. If you add this parameter and specify an empty list ([]
) as its value, then the service searches all root-level fields for suitable passages.The maximum number of passages to return. Ignored if passages.per_document is
true
.Possible values: value ≤ 400
The approximate number of characters that any one passage will have.
Possible values: 50 ≤ value ≤ 2000
When true,
answer
objects are returned as part of each passage in the query results. The primary difference between ananswer
and apassage
is that the length of a passage is defined by the query, where the length of ananswer
is calculated by Discovery based on how much text is needed to answer the question.This parameter is ignored if passages are not enabled for the query, or no natural_language_query is specified.
If the find_answers parameter is set to
true
and per_document parameter is also set totrue
, then the document search results and the passage search results within each document are reordered using the answer confidences. The goal of this reordering is to place the best answer as the first answer of the first passage of the first document. Similarly, if the find_answers parameter is set totrue
and per_document parameter is set tofalse
, then the passage search results are reordered in decreasing order of the highest confidence answer for each document and passage.The find_answers parameter is available only on managed instances of Discovery.
Default:
false
The number of
answer
objects to return per passage if the find_answers parmeter is specified astrue
.Default:
1
Finds results from documents that are similar to documents of interest. Use this parameter to add a More like these function to your search. You can include this parameter with or without a query, filter or natural_language_query parameter.
- similar
When
true
, includes documents in the query results that are similar to documents you specify.Default:
false
The list of documents of interest. Required if enabled is
true
.Looks for similarities in the specified subset of fields in the documents. If not specified, all of the document fields are used.
curl -X POST {auth} --header "Content-Type: application/json" --data "{ \"collection_ids\": [ \"{collection_id_1}\", \"{collection_id_2}\" ], \"query\": \"text:IBM\" }" "{url}/v2/projects/{project_id}/query?version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.Query( projectId: "{project_id}", query: "{field}:{value}" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); QueryOptions options = new QueryOptions.Builder() .projectId("{project_id}") .query("{field}:{value}") .build(); QueryResponse response = discovery.query(options).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', query: '{field}:{value}', }; discovery.query(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.query( project_id='{project_id}', query='{field:value}' ).get_result() print(json.dumps(response, indent=2))
Response
A response that contains the documents and aggregations for the query.
The number of matching results for the query. Results that match due to a curation only are not counted in the total.
Array of document results for the query.
- results
The remaining key-value pairs
Array of aggregations for the query.
Possible values: 1 ≤ number of items ≤ 50000
- aggregations
An object contain retrieval type information.
Suggested correction to the submitted natural_language_query value.
Array of suggested refinements. Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.Array of table results.
Passages that best match the query from across all of the collections in the project. Returned if passages.per_document is
false
.
A response that contains the documents and aggregations for the query.
{
"matching_results": 24,
"retrieval_details": {
"document_retrieval_strategy": "untrained"
},
"results": [
{
"id": "watson-generated ID"
}
],
"aggregations": [
{
"type": "term",
"field": "field",
"count": 1,
"results": [
{
"key": "active",
"matching_results": 34
}
]
}
]
}
The number of matching results for the query. Results that match due to a curation only are not counted in the total.
Array of document results for the query.
- results
The unique identifier of the document.
Metadata of the document.
Metadata of a query result.
- resultMetadata
The document retrieval source that produced this search result.
Possible values: [
search
,curation
]The collection id associated with this training data set.
The confidence score for the given result. Calculated based on how relevant the result is estimated to be. The score can range from
0.0
to1.0
. The higher the number, the more relevant the document. Theconfidence
value for a result was calculated using the model specified in thedocument_retrieval_strategy
field of the result set. This field is returned only if the natural_language_query parameter is specified in the query.Possible values: 0 ≤ value ≤ 1
Passages from the document that best matches the query. Returned if passages.per_document is
true
.- documentPassages
The content of the extracted passage.
The position of the first character of the extracted passage in the originating field.
The position after the last character of the extracted passage in the originating field.
The label of the field from which the passage has been extracted.
An arry of extracted answers to the specified query. Returned for natural language queries when passages.per_document is
true
.- answers
Answer text for the specified query as identified by Discovery.
The position of the first character of the extracted answer in the originating field.
The position after the last character of the extracted answer in the originating field.
An estimate of the probability that the answer is relevant.
Possible values: 0 ≤ value ≤ 1
Array of aggregations for the query.
Possible values: 1 ≤ number of items ≤ 50000
- aggregations
Returns results from the field that is specified.
- QueryAggregation
Specifies that the aggregation type is
term
.The field in the document where the values come from.
The number of results returned. Not returned if
relevancy:true
is specified in the request.Identifier specified in the query request of this aggregation. Not returned if
relevancy:true
is specified in the request.An array of results.
- results
Value of the field with a nonzero frequency in the document set.
Number of documents that contain the 'key'.
The relevancy score for this result. Returned only if
relevancy:true
is specified in the request.Number of documents in the collection that contain the term in the specified field. Returned only when
relevancy:true
is specified in the request.Number of documents that are estimated to match the query and also meet the condition. Returned only when
relevancy:true
is specified in the request.An array of subaggregations. Returned only when this aggregation is combined with other aggregations in the request or is returned as a subaggregation.
An object contain retrieval type information.
- retrievalDetails
Identifies the document retrieval strategy used for this query.
relevancy_training
indicates that the results were returned using a relevancy trained model.Note: In the event of trained collections being queried, but the trained model is not used to return results, the document_retrieval_strategy is listed as
untrained
.Possible values: [
untrained
,relevancy_training
]
Suggested correction to the submitted natural_language_query value.
Array of suggested refinements. Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggestedRefinements
The text used to filter.
Array of table results.
- tableResults
The identifier for the retrieved table.
The identifier of the document the table was retrieved from.
The identifier of the collection the table was retrieved from.
HTML snippet of the table info.
The offset of the table html snippet in the original document html.
Full table object retrieved from Table Understanding Enrichment.
- table
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of the current table from the input document without associated markup content.
Text and associated location within a table.
- sectionTitle
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
Text and associated location within a table.
- title
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
An array of table-level cells that apply as headers to all the other cells in the current table.
- tableHeaders
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of the cell from the input document without associated markup content.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of row-level cells, each applicable as a header to other cells in the same row as itself, of the current table.
- rowHeaders
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
Normalized row header text.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of column-level cells, each applicable as a header to other cells in the same column as itself, of the current table.
- columnHeaders
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
Normalized column header text.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of key-value pairs identified in the current table.
- keyValuePairs
A key in a key-value pair.
- key
The unique ID of the key in the table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The text content of the table cell without HTML markup.
A list of values in a key-value pair.
- value
The unique ID of the value in the table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The text content of the table cell without HTML markup.
An array of cells that are neither table header nor column header nor row header cells, of the current table with corresponding row and column header associations.
- bodyCells
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.A list of ID values that represent the table row headers that are associated with this body cell.
A list of row header values that are associated with this body cell.
A list of normalized row header values that are associated with this body cell.
A list of ID values that represent the column headers that are associated with this body cell.
A list of column header values that are associated with this body cell.
A list of normalized column header values that are associated with this body cell.
A list of document attributes.
- attributes
The type of attribute.
The text associated with the attribute.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
An array of lists of textual entries across the document related to the current table being parsed.
- contexts
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
Passages that best match the query from across all of the collections in the project. Returned if passages.per_document is
false
.- passages
The content of the extracted passage.
The confidence score of the passage's analysis. A higher score indicates greater confidence. The score is used to rank the passages from all documents and is returned only if passages.per_document is
false
.The unique identifier of the ingested document.
The unique identifier of the collection.
The position of the first character of the extracted passage in the originating field.
The position after the last character of the extracted passage in the originating field.
The label of the field from which the passage has been extracted.
An array of extracted answers to the specified query. Returned for natural language queries when passages.per_document is
false
.- answers
Answer text for the specified query as identified by Discovery.
The position of the first character of the extracted answer in the originating field.
The position after the last character of the extracted answer in the originating field.
An estimate of the probability that the answer is relevant.
Possible values: 0 ≤ value ≤ 1
A response that contains the documents and aggregations for the query.
{
"matching_results": 24,
"retrieval_details": {
"document_retrieval_strategy": "untrained"
},
"results": [
{
"id": "watson-generated ID"
}
],
"aggregations": [
{
"type": "term",
"field": "field",
"count": 1,
"results": [
{
"key": "active",
"matching_results": 34
}
]
}
]
}
The number of matching results for the query. Results that match due to a curation only are not counted in the total.
Array of document results for the query.
- results
The unique identifier of the document.
Metadata of the document.
Metadata of a query result.
- result_metadata
The document retrieval source that produced this search result.
Possible values: [
search
,curation
]The collection id associated with this training data set.
The confidence score for the given result. Calculated based on how relevant the result is estimated to be. The score can range from
0.0
to1.0
. The higher the number, the more relevant the document. Theconfidence
value for a result was calculated using the model specified in thedocument_retrieval_strategy
field of the result set. This field is returned only if the natural_language_query parameter is specified in the query.Possible values: 0 ≤ value ≤ 1
Passages from the document that best matches the query. Returned if passages.per_document is
true
.- document_passages
The content of the extracted passage.
The position of the first character of the extracted passage in the originating field.
The position after the last character of the extracted passage in the originating field.
The label of the field from which the passage has been extracted.
An arry of extracted answers to the specified query. Returned for natural language queries when passages.per_document is
true
.- answers
Answer text for the specified query as identified by Discovery.
The position of the first character of the extracted answer in the originating field.
The position after the last character of the extracted answer in the originating field.
An estimate of the probability that the answer is relevant.
Possible values: 0 ≤ value ≤ 1
Array of aggregations for the query.
Possible values: 1 ≤ number of items ≤ 50000
- aggregations
Returns results from the field that is specified.
- QueryAggregation
Specifies that the aggregation type is
term
.The field in the document where the values come from.
The number of results returned. Not returned if
relevancy:true
is specified in the request.Identifier specified in the query request of this aggregation. Not returned if
relevancy:true
is specified in the request.An array of results.
- results
Value of the field with a nonzero frequency in the document set.
Number of documents that contain the 'key'.
The relevancy score for this result. Returned only if
relevancy:true
is specified in the request.Number of documents in the collection that contain the term in the specified field. Returned only when
relevancy:true
is specified in the request.Number of documents that are estimated to match the query and also meet the condition. Returned only when
relevancy:true
is specified in the request.An array of subaggregations. Returned only when this aggregation is combined with other aggregations in the request or is returned as a subaggregation.
An object contain retrieval type information.
- retrieval_details
Identifies the document retrieval strategy used for this query.
relevancy_training
indicates that the results were returned using a relevancy trained model.Note: In the event of trained collections being queried, but the trained model is not used to return results, the document_retrieval_strategy is listed as
untrained
.Possible values: [
untrained
,relevancy_training
]
Suggested correction to the submitted natural_language_query value.
Array of suggested refinements. Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggested_refinements
The text used to filter.
Array of table results.
- table_results
The identifier for the retrieved table.
The identifier of the document the table was retrieved from.
The identifier of the collection the table was retrieved from.
HTML snippet of the table info.
The offset of the table html snippet in the original document html.
Full table object retrieved from Table Understanding Enrichment.
- table
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of the current table from the input document without associated markup content.
Text and associated location within a table.
- section_title
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
Text and associated location within a table.
- title
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
An array of table-level cells that apply as headers to all the other cells in the current table.
- table_headers
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of the cell from the input document without associated markup content.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of row-level cells, each applicable as a header to other cells in the same row as itself, of the current table.
- row_headers
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
Normalized row header text.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of column-level cells, each applicable as a header to other cells in the same column as itself, of the current table.
- column_headers
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
Normalized column header text.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of key-value pairs identified in the current table.
- key_value_pairs
A key in a key-value pair.
- key
The unique ID of the key in the table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The text content of the table cell without HTML markup.
A list of values in a key-value pair.
- value
The unique ID of the value in the table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The text content of the table cell without HTML markup.
An array of cells that are neither table header nor column header nor row header cells, of the current table with corresponding row and column header associations.
- body_cells
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.A list of ID values that represent the table row headers that are associated with this body cell.
A list of row header values that are associated with this body cell.
A list of normalized row header values that are associated with this body cell.
A list of ID values that represent the column headers that are associated with this body cell.
A list of column header values that are associated with this body cell.
A list of normalized column header values that are associated with this body cell.
A list of document attributes.
- attributes
The type of attribute.
The text associated with the attribute.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
An array of lists of textual entries across the document related to the current table being parsed.
- contexts
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
Passages that best match the query from across all of the collections in the project. Returned if passages.per_document is
false
.- passages
The content of the extracted passage.
The confidence score of the passage's analysis. A higher score indicates greater confidence. The score is used to rank the passages from all documents and is returned only if passages.per_document is
false
.The unique identifier of the ingested document.
The unique identifier of the collection.
The position of the first character of the extracted passage in the originating field.
The position after the last character of the extracted passage in the originating field.
The label of the field from which the passage has been extracted.
An array of extracted answers to the specified query. Returned for natural language queries when passages.per_document is
false
.- answers
Answer text for the specified query as identified by Discovery.
The position of the first character of the extracted answer in the originating field.
The position after the last character of the extracted answer in the originating field.
An estimate of the probability that the answer is relevant.
Possible values: 0 ≤ value ≤ 1
A response that contains the documents and aggregations for the query.
{
"matching_results": 24,
"retrieval_details": {
"document_retrieval_strategy": "untrained"
},
"results": [
{
"id": "watson-generated ID"
}
],
"aggregations": [
{
"type": "term",
"field": "field",
"count": 1,
"results": [
{
"key": "active",
"matching_results": 34
}
]
}
]
}
The number of matching results for the query. Results that match due to a curation only are not counted in the total.
Array of document results for the query.
- results
The unique identifier of the document.
Metadata of the document.
Metadata of a query result.
- result_metadata
The document retrieval source that produced this search result.
Possible values: [
search
,curation
]The collection id associated with this training data set.
The confidence score for the given result. Calculated based on how relevant the result is estimated to be. The score can range from
0.0
to1.0
. The higher the number, the more relevant the document. Theconfidence
value for a result was calculated using the model specified in thedocument_retrieval_strategy
field of the result set. This field is returned only if the natural_language_query parameter is specified in the query.Possible values: 0 ≤ value ≤ 1
Passages from the document that best matches the query. Returned if passages.per_document is
true
.- document_passages
The content of the extracted passage.
The position of the first character of the extracted passage in the originating field.
The position after the last character of the extracted passage in the originating field.
The label of the field from which the passage has been extracted.
An arry of extracted answers to the specified query. Returned for natural language queries when passages.per_document is
true
.- answers
Answer text for the specified query as identified by Discovery.
The position of the first character of the extracted answer in the originating field.
The position after the last character of the extracted answer in the originating field.
An estimate of the probability that the answer is relevant.
Possible values: 0 ≤ value ≤ 1
Array of aggregations for the query.
Possible values: 1 ≤ number of items ≤ 50000
- aggregations
Returns results from the field that is specified.
- QueryAggregation
Specifies that the aggregation type is
term
.The field in the document where the values come from.
The number of results returned. Not returned if
relevancy:true
is specified in the request.Identifier specified in the query request of this aggregation. Not returned if
relevancy:true
is specified in the request.An array of results.
- results
Value of the field with a nonzero frequency in the document set.
Number of documents that contain the 'key'.
The relevancy score for this result. Returned only if
relevancy:true
is specified in the request.Number of documents in the collection that contain the term in the specified field. Returned only when
relevancy:true
is specified in the request.Number of documents that are estimated to match the query and also meet the condition. Returned only when
relevancy:true
is specified in the request.An array of subaggregations. Returned only when this aggregation is combined with other aggregations in the request or is returned as a subaggregation.
An object contain retrieval type information.
- retrieval_details
Identifies the document retrieval strategy used for this query.
relevancy_training
indicates that the results were returned using a relevancy trained model.Note: In the event of trained collections being queried, but the trained model is not used to return results, the document_retrieval_strategy is listed as
untrained
.Possible values: [
untrained
,relevancy_training
]
Suggested correction to the submitted natural_language_query value.
Array of suggested refinements. Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- suggested_refinements
The text used to filter.
Array of table results.
- table_results
The identifier for the retrieved table.
The identifier of the document the table was retrieved from.
The identifier of the collection the table was retrieved from.
HTML snippet of the table info.
The offset of the table html snippet in the original document html.
Full table object retrieved from Table Understanding Enrichment.
- table
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of the current table from the input document without associated markup content.
Text and associated location within a table.
- section_title
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
Text and associated location within a table.
- title
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
An array of table-level cells that apply as headers to all the other cells in the current table.
- table_headers
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of the cell from the input document without associated markup content.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of row-level cells, each applicable as a header to other cells in the same row as itself, of the current table.
- row_headers
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
Normalized row header text.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of column-level cells, each applicable as a header to other cells in the same column as itself, of the current table.
- column_headers
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
Normalized column header text.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of key-value pairs identified in the current table.
- key_value_pairs
A key in a key-value pair.
- key
The unique ID of the key in the table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The text content of the table cell without HTML markup.
A list of values in a key-value pair.
- value
The unique ID of the value in the table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The text content of the table cell without HTML markup.
An array of cells that are neither table header nor column header nor row header cells, of the current table with corresponding row and column header associations.
- body_cells
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.A list of ID values that represent the table row headers that are associated with this body cell.
A list of row header values that are associated with this body cell.
A list of normalized row header values that are associated with this body cell.
A list of ID values that represent the column headers that are associated with this body cell.
A list of column header values that are associated with this body cell.
A list of normalized column header values that are associated with this body cell.
A list of document attributes.
- attributes
The type of attribute.
The text associated with the attribute.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
An array of lists of textual entries across the document related to the current table being parsed.
- contexts
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- location
The element's
begin
index.The element's
end
index.
Passages that best match the query from across all of the collections in the project. Returned if passages.per_document is
false
.- passages
The content of the extracted passage.
The confidence score of the passage's analysis. A higher score indicates greater confidence. The score is used to rank the passages from all documents and is returned only if passages.per_document is
false
.The unique identifier of the ingested document.
The unique identifier of the collection.
The position of the first character of the extracted passage in the originating field.
The position after the last character of the extracted passage in the originating field.
The label of the field from which the passage has been extracted.
An array of extracted answers to the specified query. Returned for natural language queries when passages.per_document is
false
.- answers
Answer text for the specified query as identified by Discovery.
The position of the first character of the extracted answer in the originating field.
The position after the last character of the extracted answer in the originating field.
An estimate of the probability that the answer is relevant.
Possible values: 0 ≤ value ≤ 1
A response that contains the documents and aggregations for the query.
{
"matching_results": 24,
"retrieval_details": {
"document_retrieval_strategy": "untrained"
},
"results": [
{
"id": "watson-generated ID"
}
],
"aggregations": [
{
"type": "term",
"field": "field",
"count": 1,
"results": [
{
"key": "active",
"matching_results": 34
}
]
}
]
}
The number of matching results for the query. Results that match due to a curation only are not counted in the total.
Array of document results for the query.
- Results
The unique identifier of the document.
Metadata of the document.
Metadata of a query result.
- ResultMetadata
The document retrieval source that produced this search result.
Possible values: [
search
,curation
]The collection id associated with this training data set.
The confidence score for the given result. Calculated based on how relevant the result is estimated to be. The score can range from
0.0
to1.0
. The higher the number, the more relevant the document. Theconfidence
value for a result was calculated using the model specified in thedocument_retrieval_strategy
field of the result set. This field is returned only if the natural_language_query parameter is specified in the query.Possible values: 0 ≤ value ≤ 1
Passages from the document that best matches the query. Returned if passages.per_document is
true
.- DocumentPassages
The content of the extracted passage.
The position of the first character of the extracted passage in the originating field.
The position after the last character of the extracted passage in the originating field.
The label of the field from which the passage has been extracted.
An arry of extracted answers to the specified query. Returned for natural language queries when passages.per_document is
true
.- Answers
Answer text for the specified query as identified by Discovery.
The position of the first character of the extracted answer in the originating field.
The position after the last character of the extracted answer in the originating field.
An estimate of the probability that the answer is relevant.
Possible values: 0 ≤ value ≤ 1
Array of aggregations for the query.
Possible values: 1 ≤ number of items ≤ 50000
- Aggregations
Returns results from the field that is specified.
- QueryAggregation
Specifies that the aggregation type is
term
.The field in the document where the values come from.
The number of results returned. Not returned if
relevancy:true
is specified in the request.Identifier specified in the query request of this aggregation. Not returned if
relevancy:true
is specified in the request.An array of results.
- Results
Value of the field with a nonzero frequency in the document set.
Number of documents that contain the 'key'.
The relevancy score for this result. Returned only if
relevancy:true
is specified in the request.Number of documents in the collection that contain the term in the specified field. Returned only when
relevancy:true
is specified in the request.Number of documents that are estimated to match the query and also meet the condition. Returned only when
relevancy:true
is specified in the request.An array of subaggregations. Returned only when this aggregation is combined with other aggregations in the request or is returned as a subaggregation.
An object contain retrieval type information.
- RetrievalDetails
Identifies the document retrieval strategy used for this query.
relevancy_training
indicates that the results were returned using a relevancy trained model.Note: In the event of trained collections being queried, but the trained model is not used to return results, the document_retrieval_strategy is listed as
untrained
.Possible values: [
untrained
,relevancy_training
]
Suggested correction to the submitted natural_language_query value.
Array of suggested refinements. Note: The
suggested_refinements
parameter that identified dynamic facets from the data is deprecated.- SuggestedRefinements
The text used to filter.
Array of table results.
- TableResults
The identifier for the retrieved table.
The identifier of the document the table was retrieved from.
The identifier of the collection the table was retrieved from.
HTML snippet of the table info.
The offset of the table html snippet in the original document html.
Full table object retrieved from Table Understanding Enrichment.
- Table
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- Location
The element's
begin
index.The element's
end
index.
The textual contents of the current table from the input document without associated markup content.
Text and associated location within a table.
- SectionTitle
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- Location
The element's
begin
index.The element's
end
index.
Text and associated location within a table.
- Title
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- Location
The element's
begin
index.The element's
end
index.
An array of table-level cells that apply as headers to all the other cells in the current table.
- TableHeaders
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- Location
The element's
begin
index.The element's
end
index.
The textual contents of the cell from the input document without associated markup content.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of row-level cells, each applicable as a header to other cells in the same row as itself, of the current table.
- RowHeaders
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- Location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
Normalized row header text.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of column-level cells, each applicable as a header to other cells in the same column as itself, of the current table.
- ColumnHeaders
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- Location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
Normalized column header text.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.
An array of key-value pairs identified in the current table.
- KeyValuePairs
A key in a key-value pair.
- Key
The unique ID of the key in the table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- Location
The element's
begin
index.The element's
end
index.
The text content of the table cell without HTML markup.
A list of values in a key-value pair.
- Value
The unique ID of the value in the table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- Location
The element's
begin
index.The element's
end
index.
The text content of the table cell without HTML markup.
An array of cells that are neither table header nor column header nor row header cells, of the current table with corresponding row and column header associations.
- BodyCells
The unique ID of the cell in the current table.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- Location
The element's
begin
index.The element's
end
index.
The textual contents of this cell from the input document without associated markup content.
The
begin
index of this cell'srow
location in the current table.The
end
index of this cell'srow
location in the current table.The
begin
index of this cell'scolumn
location in the current table.The
end
index of this cell'scolumn
location in the current table.A list of ID values that represent the table row headers that are associated with this body cell.
A list of row header values that are associated with this body cell.
A list of normalized row header values that are associated with this body cell.
A list of ID values that represent the column headers that are associated with this body cell.
A list of column header values that are associated with this body cell.
A list of normalized column header values that are associated with this body cell.
A list of document attributes.
- Attributes
The type of attribute.
The text associated with the attribute.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- Location
The element's
begin
index.The element's
end
index.
An array of lists of textual entries across the document related to the current table being parsed.
- Contexts
The text retrieved.
The numeric location of the identified element in the document, represented with two integers labeled
begin
andend
.- Location
The element's
begin
index.The element's
end
index.
Passages that best match the query from across all of the collections in the project. Returned if passages.per_document is
false
.- Passages
The content of the extracted passage.
The confidence score of the passage's analysis. A higher score indicates greater confidence. The score is used to rank the passages from all documents and is returned only if passages.per_document is
false
.The unique identifier of the ingested document.
The unique identifier of the collection.
The position of the first character of the extracted passage in the originating field.
The position after the last character of the extracted passage in the originating field.
The label of the field from which the passage has been extracted.
An array of extracted answers to the specified query. Returned for natural language queries when passages.per_document is
false
.- Answers
Answer text for the specified query as identified by Discovery.
The position of the first character of the extracted answer in the originating field.
The position after the last character of the extracted answer in the originating field.
An estimate of the probability that the answer is relevant.
Possible values: 0 ≤ value ≤ 1
Status Code
Query executed successfully.
Bad request.
-
Project has no collections.
-
A list of document ids is required in similar.document_ids when similar.enabled is
true
.
-
Bad request.
{ "matching_results": 24, "retrieval_details": { "document_retrieval_strategy": "untrained" }, "results": [ { "id": "watson-generated ID" } ], "aggregations": [ { "type": "term", "field": "field", "count": 1, "results": [ { "key": "active", "matching_results": 34 } ] } ] }
{ "matching_results": 24, "retrieval_details": { "document_retrieval_strategy": "untrained" }, "results": [ { "id": "watson-generated ID" } ], "aggregations": [ { "type": "term", "field": "field", "count": 1, "results": [ { "key": "active", "matching_results": 34 } ] } ] }
Get Autocomplete Suggestions
Returns completion query suggestions for the specified prefix.
Suggested words are based on terms from the project documents. Suggestions are not based on terms from the project's search history, and the project does not learn from previous user choices.
Returns completion query suggestions for the specified prefix.
Suggested words are based on terms from the project documents. Suggestions are not based on terms from the project's search history, and the project does not learn from previous user choices.
Returns completion query suggestions for the specified prefix.
Suggested words are based on terms from the project documents. Suggestions are not based on terms from the project's search history, and the project does not learn from previous user choices.
Returns completion query suggestions for the specified prefix.
Suggested words are based on terms from the project documents. Suggestions are not based on terms from the project's search history, and the project does not learn from previous user choices.
Returns completion query suggestions for the specified prefix.
Suggested words are based on terms from the project documents. Suggestions are not based on terms from the project's search history, and the project does not learn from previous user choices.
GET /v2/projects/{project_id}/autocompletion
ServiceCall<Completions> getAutocompletion(GetAutocompletionOptions getAutocompletionOptions)
getAutocompletion(params)
get_autocompletion(
self,
project_id: str,
prefix: str,
*,
collection_ids: List[str] = None,
field: str = None,
count: int = None,
**kwargs,
) -> DetailedResponse
GetAutocompletion(string projectId, string prefix, List<string> collectionIds = null, string field = null, long? count = null)
Request
Use the GetAutocompletionOptions.Builder
to create a GetAutocompletionOptions
object that contains the parameter values for the getAutocompletion
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.The prefix to use for autocompletion. For example, the prefix
Ho
could autocomplete tohot
,housing
, orhow
.Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
The field in the result documents that autocompletion suggestions are identified from.
The number of autocompletion suggestions to return.
Default:
5
The getAutocompletion options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The prefix to use for autocompletion. For example, the prefix
Ho
could autocomplete tohot
,housing
, orhow
.Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
The field in the result documents that autocompletion suggestions are identified from.
The number of autocompletion suggestions to return.
Default:
5
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The prefix to use for autocompletion. For example, the prefix
Ho
could autocomplete tohot
,housing
, orhow
.Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
The field in the result documents that autocompletion suggestions are identified from.
The number of autocompletion suggestions to return.
Default:
5
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The prefix to use for autocompletion. For example, the prefix
Ho
could autocomplete tohot
,housing
, orhow
.Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
The field in the result documents that autocompletion suggestions are identified from.
The number of autocompletion suggestions to return.
Default:
5
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The prefix to use for autocompletion. For example, the prefix
Ho
could autocomplete tohot
,housing
, orhow
.Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
The field in the result documents that autocompletion suggestions are identified from.
The number of autocompletion suggestions to return.
Default:
5
curl {auth} "{url}/v2/projects/{project_id}/autocompletion?collection_ids={collection_id_1},{collection_id_2}&prefix=ab&version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.GetAutocompletion( projectId: "{project_id}", prefix: "Ho", count: 5 ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); GetAutocompletionOptions options = new GetAutocompletionOptions.Builder() .projectId("{project_id}") .prefix("Ho") .count(5L) .build(); Completions response = discovery.getAutocompletion(options).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', prefix: 'Ho', count: 5, }; discovery.getAutocompletion(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.get_autocompletion( project_id='{project_id}', prefix='Ho', count=5 ).get_result() print(json.dumps(response, indent=2))
Response
An object that contains an array of autocompletion suggestions.
Array of autocomplete suggestion based on the provided prefix.
An object that contains an array of autocompletion suggestions.
Array of autocomplete suggestion based on the provided prefix.
An object that contains an array of autocompletion suggestions.
Array of autocomplete suggestion based on the provided prefix.
An object that contains an array of autocompletion suggestions.
Array of autocomplete suggestion based on the provided prefix.
An object that contains an array of autocompletion suggestions.
Array of autocomplete suggestion based on the provided prefix.
Status Code
Object that contains an array of possible completions.
The specified field does not exist.
{ "completions": [ "absolutely" ] }
{ "completions": [ "absolutely" ] }
Query collection notices
Finds collection-level notices (errors and warnings) that are generated when documents are ingested.
Finds collection-level notices (errors and warnings) that are generated when documents are ingested.
Finds collection-level notices (errors and warnings) that are generated when documents are ingested.
Finds collection-level notices (errors and warnings) that are generated when documents are ingested.
Finds collection-level notices (errors and warnings) that are generated when documents are ingested.
GET /v2/projects/{project_id}/collections/{collection_id}/notices
ServiceCall<QueryNoticesResponse> queryCollectionNotices(QueryCollectionNoticesOptions queryCollectionNoticesOptions)
queryCollectionNotices(params)
query_collection_notices(
self,
project_id: str,
collection_id: str,
*,
filter: str = None,
query: str = None,
natural_language_query: str = None,
count: int = None,
offset: int = None,
**kwargs,
) -> DetailedResponse
QueryCollectionNotices(string projectId, string collectionId, string filter = null, string query = null, string naturalLanguageQuery = null, long? count = null, long? offset = null)
Request
Use the QueryCollectionNoticesOptions.Builder
to create a QueryCollectionNoticesOptions
object that contains the parameter values for the queryCollectionNotices
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the
aggregation
,query
, ornatural_language_query
parameters, thefilter
parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
Number of results to return. The maximum for the count and offset values together in any one query is 10,000
Default:
10
The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000
The queryCollectionNotices options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the
aggregation
,query
, ornatural_language_query
parameters, thefilter
parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
Number of results to return. The maximum for the count and offset values together in any one query is 10,000.
Default:
10
The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the
aggregation
,query
, ornatural_language_query
parameters, thefilter
parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
Number of results to return. The maximum for the count and offset values together in any one query is 10,000.
Default:
10
The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the
aggregation
,query
, ornatural_language_query
parameters, thefilter
parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
Number of results to return. The maximum for the count and offset values together in any one query is 10,000.
Default:
10
The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the
aggregation
,query
, ornatural_language_query
parameters, thefilter
parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
Number of results to return. The maximum for the count and offset values together in any one query is 10,000.
Default:
10
The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000.
curl {auth} '{url}/v2/projects/{project_id}/collections/{collection_id}/notices?version=2023-03-31&query=notices.step:conversion&count=2'
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.QueryCollectionNotices( projectId: "{projectId}", collectionId: "{collectionId}", query: "notices.step:conversion&count=2" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); QueryCollectionNoticesOptions queryCollectionNoticesOptions = new QueryCollectionNoticesOptions.Builder() .projectId("{project_id}") .collectionId("{collection_id}") .naturalLanguageQuery("warning") .build(); QueryNoticesResponse response = discovery.queryCollectionNotices(queryCollectionNoticesOptions).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', collectionId: '{collectionId}', query: 'notices.step:conversion&count=2', }; discovery.queryCollectionNotices(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.query_notices( project_id='{project_id}', collection_id='{collection_id}', query='notices.step:conversion&count=2' ).get_result() print(json.dumps(response, indent=2))
Response
Object that contains notice query results.
The number of matching results
Array of document results that match the query.
Object that contains notice query results.
The number of matching results.
Array of document results that match the query.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Object that contains notice query results.
The number of matching results.
Array of document results that match the query.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Object that contains notice query results.
The number of matching results.
Array of document results that match the query.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Object that contains notice query results.
The number of matching results.
Array of document results that match the query.
- Notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Status Code
Query for collection notices executed successfully.
Bad request.
{ "matching_results": 48, "notices": [ { "severity": "warning", "created": "2021-09-15T20:11:22Z", "description": "We couldn't download the requested content from https://www.cdc.gov/coronavirus/2019-ncov/global-covid-19/essential-health-services.html because the request timed out.", "step": "conversion", "document_id": "9yJk9qKOQ", "notice_id": "failed_crawl" }, { "severity": "warning", "created": "2021-09-15T20:17:30Z", "description": "We couldn't download the requested content from https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covid-19-data-sources.html because the request timed out.", "step": "conversion", "document_id": "0xSkAIgmj", "notice_id": "failed_crawl" } ] }
{ "matching_results": 48, "notices": [ { "severity": "warning", "created": "2021-09-15T20:11:22Z", "description": "We couldn't download the requested content from https://www.cdc.gov/coronavirus/2019-ncov/global-covid-19/essential-health-services.html because the request timed out.", "step": "conversion", "document_id": "9yJk9qKOQ", "notice_id": "failed_crawl" }, { "severity": "warning", "created": "2021-09-15T20:17:30Z", "description": "We couldn't download the requested content from https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covid-19-data-sources.html because the request timed out.", "step": "conversion", "document_id": "0xSkAIgmj", "notice_id": "failed_crawl" } ] }
Query project notices
Finds project-level notices (errors and warnings). Currently, project-level notices are generated by relevancy training.
Finds project-level notices (errors and warnings). Currently, project-level notices are generated by relevancy training.
Finds project-level notices (errors and warnings). Currently, project-level notices are generated by relevancy training.
Finds project-level notices (errors and warnings). Currently, project-level notices are generated by relevancy training.
Finds project-level notices (errors and warnings). Currently, project-level notices are generated by relevancy training.
GET /v2/projects/{project_id}/notices
ServiceCall<QueryNoticesResponse> queryNotices(QueryNoticesOptions queryNoticesOptions)
queryNotices(params)
query_notices(
self,
project_id: str,
*,
filter: str = None,
query: str = None,
natural_language_query: str = None,
count: int = None,
offset: int = None,
**kwargs,
) -> DetailedResponse
QueryNotices(string projectId, string filter = null, string query = null, string naturalLanguageQuery = null, long? count = null, long? offset = null)
Request
Use the QueryNoticesOptions.Builder
to create a QueryNoticesOptions
object that contains the parameter values for the queryNotices
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the
aggregation
,query
, ornatural_language_query
parameters, thefilter
parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
Number of results to return. The maximum for the count and offset values together in any one query is 10,000
Default:
10
The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000
The queryNotices options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the
aggregation
,query
, ornatural_language_query
parameters, thefilter
parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
Number of results to return. The maximum for the count and offset values together in any one query is 10,000.
Default:
10
The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the
aggregation
,query
, ornatural_language_query
parameters, thefilter
parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
Number of results to return. The maximum for the count and offset values together in any one query is 10,000.
Default:
10
The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the
aggregation
,query
, ornatural_language_query
parameters, thefilter
parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
Number of results to return. The maximum for the count and offset values together in any one query is 10,000.
Default:
10
The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Searches for documents that match the Discovery Query Language criteria that is specified as input. Filter calls are cached and are faster than query calls because the results are not ordered by relevance. When used with the
aggregation
,query
, ornatural_language_query
parameters, thefilter
parameter runs first. This parameter is useful for limiting results to those that contain specific metadata values.A query search that is written in the Discovery Query Language and returns all matching documents in your data set with full enrichments and full text, and with the most relevant documents listed first. You can use this parameter or the natural_language_query parameter to specify the query input, but not both.
A natural language query that returns relevant documents by using natural language understanding. You can use this parameter or the query parameter to specify the query input, but not both. To filter the results based on criteria you specify, include the filter parameter in the request.
Possible values: 1 ≤ length ≤ 2048
Number of results to return. The maximum for the count and offset values together in any one query is 10,000.
Default:
10
The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000.
curl {auth} "{url}/v2/projects/{project_id}/notices?natural_language_query=error&version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.QueryNotices( projectId: "{project_id}", query: "{field}:{value}" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); QueryNoticesOptions options = new QueryNoticesOptions.Builder() .projectId("{project_id}") .query("{field}:{value}") .build(); QueryNoticesResponse response = discovery.queryNotices(options).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', query: '{field}:{value}', }; discovery.queryNotices(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.query_notices( project_id='{project_id}', query='{field}:{value}' ).get_result() print(json.dumps(response, indent=2))
Response
Object that contains notice query results.
The number of matching results
Array of document results that match the query.
Object that contains notice query results.
The number of matching results.
Array of document results that match the query.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Object that contains notice query results.
The number of matching results.
Array of document results that match the query.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Object that contains notice query results.
The number of matching results.
Array of document results that match the query.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Object that contains notice query results.
The number of matching results.
Array of document results that match the query.
- Notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Status Code
Query for project notices executed successfully.
Bad request.
No Sample Response
Get a custom stop words list
Returns the custom stop words list that is used by the collection. For information about the default stop words lists that are applied to queries, see the product documentation.
Returns the custom stop words list that is used by the collection. For information about the default stop words lists that are applied to queries, see the product documentation.
Returns the custom stop words list that is used by the collection. For information about the default stop words lists that are applied to queries, see the product documentation.
Returns the custom stop words list that is used by the collection. For information about the default stop words lists that are applied to queries, see the product documentation.
Returns the custom stop words list that is used by the collection. For information about the default stop words lists that are applied to queries, see the product documentation.
GET /v2/projects/{project_id}/collections/{collection_id}/stopwords
ServiceCall<StopWordList> getStopwordList(GetStopwordListOptions getStopwordListOptions)
getStopwordList(params)
get_stopword_list(
self,
project_id: str,
collection_id: str,
**kwargs,
) -> DetailedResponse
GetStopwordList(string projectId, string collectionId)
Request
Use the GetStopwordListOptions.Builder
to create a GetStopwordListOptions
object that contains the parameter values for the getStopwordList
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The getStopwordList options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/collections/{collection_id}/stopwords?version=2023-03-31"
Response
List of words to filter out of text that is submitted in queries.
List of stop words.
List of words to filter out of text that is submitted in queries.
List of stop words.
List of words to filter out of text that is submitted in queries.
List of stop words.
List of words to filter out of text that is submitted in queries.
List of stop words.
List of words to filter out of text that is submitted in queries.
List of stop words.
Status Code
List of custom stop words or an empty array if no custom stop words list is applied to the collection.
{ "stopwords": [] }
{ "stopwords": [] }
Create a custom stop words list
Adds a list of custom stop words. Stop words are words that you want the service to ignore when they occur in a query because they're not useful in distinguishing the semantic meaning of the query. The stop words list cannot contain more than 1 million characters.
A default stop words list is used by all collections. The default list is applied both at indexing time and at query time. A custom stop words list that you add is used at query time only.
The custom stop words list augments the default stop words list; you cannot remove stop words. For information about the default stop words lists per language, see the product documentation.
Adds a list of custom stop words. Stop words are words that you want the service to ignore when they occur in a query because they're not useful in distinguishing the semantic meaning of the query. The stop words list cannot contain more than 1 million characters.
A default stop words list is used by all collections. The default list is applied both at indexing time and at query time. A custom stop words list that you add is used at query time only.
The custom stop words list augments the default stop words list; you cannot remove stop words. For information about the default stop words lists per language, see the product documentation.
Adds a list of custom stop words. Stop words are words that you want the service to ignore when they occur in a query because they're not useful in distinguishing the semantic meaning of the query. The stop words list cannot contain more than 1 million characters.
A default stop words list is used by all collections. The default list is applied both at indexing time and at query time. A custom stop words list that you add is used at query time only.
The custom stop words list augments the default stop words list; you cannot remove stop words. For information about the default stop words lists per language, see the product documentation.
Adds a list of custom stop words. Stop words are words that you want the service to ignore when they occur in a query because they're not useful in distinguishing the semantic meaning of the query. The stop words list cannot contain more than 1 million characters.
A default stop words list is used by all collections. The default list is applied both at indexing time and at query time. A custom stop words list that you add is used at query time only.
The custom stop words list augments the default stop words list; you cannot remove stop words. For information about the default stop words lists per language, see the product documentation.
Adds a list of custom stop words. Stop words are words that you want the service to ignore when they occur in a query because they're not useful in distinguishing the semantic meaning of the query. The stop words list cannot contain more than 1 million characters.
A default stop words list is used by all collections. The default list is applied both at indexing time and at query time. A custom stop words list that you add is used at query time only.
The custom stop words list augments the default stop words list; you cannot remove stop words. For information about the default stop words lists per language, see the product documentation.
POST /v2/projects/{project_id}/collections/{collection_id}/stopwords
ServiceCall<StopWordList> createStopwordList(CreateStopwordListOptions createStopwordListOptions)
createStopwordList(params)
create_stopword_list(
self,
project_id: str,
collection_id: str,
*,
stopwords: List[str] = None,
**kwargs,
) -> DetailedResponse
CreateStopwordList(string projectId, string collectionId, List<string> stopwords = null)
Request
Use the CreateStopwordListOptions.Builder
to create a CreateStopwordListOptions
object that contains the parameter values for the createStopwordList
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
List of words to filter out of text that is submitted in queries.
List of stop words.
The createStopwordList options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
List of stop words.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
List of stop words.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
List of stop words.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
List of stop words.
curl -X POST {auth} --header "Content-Type: application/json" --data "{ \"stopwords\": [ \"a\", \"about\", \"again\", \"am\", \"an\", \"and\", \"any\", \"are\", \"as\", \"at\", \"be\", \"because\", \"been\", \"being\", \"between\", \"both\", \"but\", \"by\" ] }" "{url}/v2/projects/{project_id}/collections/{collection_id}/stopwords?version=2023-03-31"
Response
List of words to filter out of text that is submitted in queries.
List of stop words.
List of words to filter out of text that is submitted in queries.
List of stop words.
List of words to filter out of text that is submitted in queries.
List of stop words.
List of words to filter out of text that is submitted in queries.
List of stop words.
List of words to filter out of text that is submitted in queries.
List of stop words.
Status Code
The array of custom stop words that are applied to the collection.
Bad request.
-
Unable to create stop words because request body was not valid JSON or was incorrect JSON.
-
Stopword creation request contained no stopwords.
-
Could not find named collection.
Input stopwords file is too large. Input must be less than 1000000 characters.
{ "stopwords": [ "a", "about", "again", "am", "an", "and", "any", "are", "as", "at", "be", "because", "been", "being", "between", "both", "but", "by" ] }
{ "stopwords": [ "a", "about", "again", "am", "an", "and", "any", "are", "as", "at", "be", "because", "been", "being", "between", "both", "but", "by" ] }
Delete a custom stop words list
Deletes a custom stop words list to stop using it in queries against the collection. After a custom stop words list is deleted, the default stop words list is used.
Deletes a custom stop words list to stop using it in queries against the collection. After a custom stop words list is deleted, the default stop words list is used.
Deletes a custom stop words list to stop using it in queries against the collection. After a custom stop words list is deleted, the default stop words list is used.
Deletes a custom stop words list to stop using it in queries against the collection. After a custom stop words list is deleted, the default stop words list is used.
Deletes a custom stop words list to stop using it in queries against the collection. After a custom stop words list is deleted, the default stop words list is used.
DELETE /v2/projects/{project_id}/collections/{collection_id}/stopwords
ServiceCall<Void> deleteStopwordList(DeleteStopwordListOptions deleteStopwordListOptions)
deleteStopwordList(params)
delete_stopword_list(
self,
project_id: str,
collection_id: str,
**kwargs,
) -> DetailedResponse
DeleteStopwordList(string projectId, string collectionId)
Request
Use the DeleteStopwordListOptions.Builder
to create a DeleteStopwordListOptions
object that contains the parameter values for the deleteStopwordList
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The deleteStopwordList options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -X DELETE {auth} "{url}/v2/projects/{project_id}/collections/{collection_id}/stopwords?version=2023-03-31"
Get the expansion list
Returns the current expansion list for the specified collection. If an expansion list is not specified, an empty expansions array is returned.
Returns the current expansion list for the specified collection. If an expansion list is not specified, an empty expansions array is returned.
Returns the current expansion list for the specified collection. If an expansion list is not specified, an empty expansions array is returned.
Returns the current expansion list for the specified collection. If an expansion list is not specified, an empty expansions array is returned.
Returns the current expansion list for the specified collection. If an expansion list is not specified, an empty expansions array is returned.
GET /v2/projects/{project_id}/collections/{collection_id}/expansions
ServiceCall<Expansions> listExpansions(ListExpansionsOptions listExpansionsOptions)
listExpansions(params)
list_expansions(
self,
project_id: str,
collection_id: str,
**kwargs,
) -> DetailedResponse
ListExpansions(string projectId, string collectionId)
Request
Use the ListExpansionsOptions.Builder
to create a ListExpansionsOptions
object that contains the parameter values for the listExpansions
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The listExpansions options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/collections/{collection_id}/expansions?version=2023-03-31"
Response
The query expansion definitions for the specified collection.
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
-
The query expansion definitions for the specified collection.
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
The query expansion definitions for the specified collection.
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
The query expansion definitions for the specified collection.
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
The query expansion definitions for the specified collection.
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- _Expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
Status Code
Successfully fetched expansion details.
Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.
{ "expansions": [] }
{ "expansions": [] }
Create or update an expansion list
Creates or replaces the expansion list for this collection. An expansion list introduces alternative wording for key terms that are mentioned in your collection. By identifying synonyms or common misspellings, you expand the scope of a query beyond exact matches. The maximum number of expanded terms allowed per collection is 5,000.
Creates or replaces the expansion list for this collection. An expansion list introduces alternative wording for key terms that are mentioned in your collection. By identifying synonyms or common misspellings, you expand the scope of a query beyond exact matches. The maximum number of expanded terms allowed per collection is 5,000.
Creates or replaces the expansion list for this collection. An expansion list introduces alternative wording for key terms that are mentioned in your collection. By identifying synonyms or common misspellings, you expand the scope of a query beyond exact matches. The maximum number of expanded terms allowed per collection is 5,000.
Creates or replaces the expansion list for this collection. An expansion list introduces alternative wording for key terms that are mentioned in your collection. By identifying synonyms or common misspellings, you expand the scope of a query beyond exact matches. The maximum number of expanded terms allowed per collection is 5,000.
Creates or replaces the expansion list for this collection. An expansion list introduces alternative wording for key terms that are mentioned in your collection. By identifying synonyms or common misspellings, you expand the scope of a query beyond exact matches. The maximum number of expanded terms allowed per collection is 5,000.
POST /v2/projects/{project_id}/collections/{collection_id}/expansions
ServiceCall<Expansions> createExpansions(CreateExpansionsOptions createExpansionsOptions)
createExpansions(params)
create_expansions(
self,
project_id: str,
collection_id: str,
expansions: List['Expansion'],
**kwargs,
) -> DetailedResponse
CreateExpansions(string projectId, string collectionId, List<Expansion> expansions)
Request
Use the CreateExpansionsOptions.Builder
to create a CreateExpansionsOptions
object that contains the parameter values for the createExpansions
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
An object that defines the expansion list.
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
-
The createExpansions options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
curl -X POST {auth} --header "Content-Type: application/json" --data "{ \"expansions\": [ { \"input_terms\": [ \"on premise\" ], \"expanded_terms\": [ \"on premises\", \"on-premises\" ] }, { \"input_terms\": [ \"car\" ], \"expanded_terms\": [ \"car\", \"automobile\", \"vehicle\" ] }, { \"expanded_terms\": [ \"ibm\", \"international business machines\", \"big blue\" ] } ] }" "{url}/v2/projects/{project_id}/collections/{collection_id}/expansions?version=2023-03-31"
Response
The query expansion definitions for the specified collection.
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
-
The query expansion definitions for the specified collection.
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
The query expansion definitions for the specified collection.
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
The query expansion definitions for the specified collection.
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
The query expansion definitions for the specified collection.
An array of query expansion definitions.
Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as
bidirectional
orunidirectional
.-
Bidirectional: Each entry in the
expanded_terms
list expands to include all expanded terms. For example, a query foribm
expands toibm OR international business machines OR big blue
. -
Unidirectional: The terms in
input_terms
in the query are replaced by the terms inexpanded_terms
. For example, a query for the often misused termon premise
is converted toon premises OR on-premises
and does not contain the original term. If you want an input term to be included in the query, then repeat the input term in the expanded terms list.
Possible values: 0 ≤ number of items ≤ 5000
- _Expansions
A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.
A list of terms that this expansion will be expanded to. If specified without input_terms, the list also functions as the input term list.
-
Status Code
The expansion list has been accepted and will be used for all future queries
Bad request.
-
Unable to create expansions because request body was not valid JSON or was incorrect JSON.
-
Query expansion creation request contained no expansions.
-
Request exceeded expansion mappings limit.
-
Could not find named collection.
{ "expansions": [ { "input_terms": [ "on premise" ], "expanded_terms": [ "on premises", "on-premises" ] }, { "input_terms": [ "car" ], "expanded_terms": [ "car", "automobile", "vehicle" ] }, { "expanded_terms": [ "ibm", "international business machines", "big blue" ] } ] }
{ "expansions": [ { "input_terms": [ "on premise" ], "expanded_terms": [ "on premises", "on-premises" ] }, { "input_terms": [ "car" ], "expanded_terms": [ "car", "automobile", "vehicle" ] }, { "expanded_terms": [ "ibm", "international business machines", "big blue" ] } ] }
Delete the expansion list
Removes the expansion information for this collection. To disable query expansion for a collection, delete the expansion list.
Removes the expansion information for this collection. To disable query expansion for a collection, delete the expansion list.
Removes the expansion information for this collection. To disable query expansion for a collection, delete the expansion list.
Removes the expansion information for this collection. To disable query expansion for a collection, delete the expansion list.
Removes the expansion information for this collection. To disable query expansion for a collection, delete the expansion list.
DELETE /v2/projects/{project_id}/collections/{collection_id}/expansions
ServiceCall<Void> deleteExpansions(DeleteExpansionsOptions deleteExpansionsOptions)
deleteExpansions(params)
delete_expansions(
self,
project_id: str,
collection_id: str,
**kwargs,
) -> DetailedResponse
DeleteExpansions(string projectId, string collectionId)
Request
Use the DeleteExpansionsOptions.Builder
to create a DeleteExpansionsOptions
object that contains the parameter values for the deleteExpansions
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The deleteExpansions options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -X DELETE {auth} "{url}/v2/projects/{project_id}/collections/{collection_id}/expansions?version=2023-03-31"
List component settings
Returns default configuration settings for components.
Returns default configuration settings for components.
Returns default configuration settings for components.
Returns default configuration settings for components.
Returns default configuration settings for components.
GET /v2/projects/{project_id}/component_settings
ServiceCall<ComponentSettingsResponse> getComponentSettings(GetComponentSettingsOptions getComponentSettingsOptions)
getComponentSettings(params)
get_component_settings(
self,
project_id: str,
**kwargs,
) -> DetailedResponse
GetComponentSettings(string projectId)
Request
Use the GetComponentSettingsOptions.Builder
to create a GetComponentSettingsOptions
object that contains the parameter values for the getComponentSettings
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The getComponentSettings options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/component_settings?version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.GetComponentSettings( projectId: "{project_id}" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); GetComponentSettingsOptions options = new GetComponentSettingsOptions.Builder() .projectId("{project_id}") .build(); ComponentSettingsResponse response = discovery.getComponentSettings(options).execute().getResult(); System.out.println(response);
const fs = require('fs'); const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', }; discovery.getComponentSettings(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.get_component_settings( project_id='{project_id}' ).get_result() print(json.dumps(response, indent=2))
Response
The default component settings for this project.
Fields shown in the results section of the UI
Whether or not autocomplete is enabled.
Whether or not structured search is enabled.
Number or results shown per page.
a list of component setting aggregations
The default component settings for this project.
Fields shown in the results section of the UI.
- fieldsShown
Body label.
- body
Use the whole passage as the body.
Use a specific field as the title.
Title label.
- title
Use a specific field as the title.
Whether or not autocomplete is enabled.
Whether or not structured search is enabled.
Number or results shown per page.
a list of component setting aggregations.
- aggregations
Identifier used to map aggregation settings to aggregation configuration.
User-friendly alias for the aggregation.
Whether users is allowed to select more than one of the aggregation terms.
Type of visualization to use when rendering the aggregation.
Possible values: [
auto
,facet_table
,word_cloud
,map
]
The default component settings for this project.
Fields shown in the results section of the UI.
- fields_shown
Body label.
- body
Use the whole passage as the body.
Use a specific field as the title.
Title label.
- title
Use a specific field as the title.
Whether or not autocomplete is enabled.
Whether or not structured search is enabled.
Number or results shown per page.
a list of component setting aggregations.
- aggregations
Identifier used to map aggregation settings to aggregation configuration.
User-friendly alias for the aggregation.
Whether users is allowed to select more than one of the aggregation terms.
Type of visualization to use when rendering the aggregation.
Possible values: [
auto
,facet_table
,word_cloud
,map
]
The default component settings for this project.
Fields shown in the results section of the UI.
- fields_shown
Body label.
- body
Use the whole passage as the body.
Use a specific field as the title.
Title label.
- title
Use a specific field as the title.
Whether or not autocomplete is enabled.
Whether or not structured search is enabled.
Number or results shown per page.
a list of component setting aggregations.
- aggregations
Identifier used to map aggregation settings to aggregation configuration.
User-friendly alias for the aggregation.
Whether users is allowed to select more than one of the aggregation terms.
Type of visualization to use when rendering the aggregation.
Possible values: [
auto
,facet_table
,word_cloud
,map
]
The default component settings for this project.
Fields shown in the results section of the UI.
- FieldsShown
Body label.
- Body
Use the whole passage as the body.
Use a specific field as the title.
Title label.
- Title
Use a specific field as the title.
Whether or not autocomplete is enabled.
Whether or not structured search is enabled.
Number or results shown per page.
a list of component setting aggregations.
- Aggregations
Identifier used to map aggregation settings to aggregation configuration.
User-friendly alias for the aggregation.
Whether users is allowed to select more than one of the aggregation terms.
Type of visualization to use when rendering the aggregation.
Possible values: [
auto
,facet_table
,word_cloud
,map
]
Status Code
Successful response.
{ "results_per_page": 5, "structured_search": false, "fields_shown": { "body": { "use_passage": true, "field": "" }, "title": { "field": "title" } }, "aggregations": [ { "name": "entities", "label": "Top Entities", "multiple_selections_allowed": true }, { "name": "_system_collections", "label": "Collections", "multiple_selections_allowed": true } ], "autocomplete": true }
{ "results_per_page": 5, "structured_search": false, "fields_shown": { "body": { "use_passage": true, "field": "" }, "title": { "field": "title" } }, "aggregations": [ { "name": "entities", "label": "Top Entities", "multiple_selections_allowed": true }, { "name": "_system_collections", "label": "Collections", "multiple_selections_allowed": true } ], "autocomplete": true }
List training queries
List the training queries for the specified project.
List the training queries for the specified project.
List the training queries for the specified project.
List the training queries for the specified project.
List the training queries for the specified project.
GET /v2/projects/{project_id}/training_data/queries
ServiceCall<TrainingQuerySet> listTrainingQueries(ListTrainingQueriesOptions listTrainingQueriesOptions)
listTrainingQueries(params)
list_training_queries(
self,
project_id: str,
**kwargs,
) -> DetailedResponse
ListTrainingQueries(string projectId)
Request
Use the ListTrainingQueriesOptions.Builder
to create a ListTrainingQueriesOptions
object that contains the parameter values for the listTrainingQueries
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The listTrainingQueries options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/training_data/queries?version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.ListTrainingQueries( projectId: "{project_id}" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); ListTrainingQueriesOptions options = new ListTrainingQueriesOptions.Builder() .projectId("{project_id}") .build(); TrainingQuerySet response = discovery.listTrainingQueries(options).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', }; discovery.listTrainingQueries(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.list_training_queries( project_id='{project_id}' ).get_result() print(json.dumps(response, indent=2))
Response
Object specifying the training queries contained in the identified training set.
Array of training queries. At least 50 queries are required for training to begin. A maximum of 10,000 queries are returned.
Possible values: 50 ≤ number of items ≤ 10000
Object specifying the training queries contained in the identified training set.
Array of training queries. At least 50 queries are required for training to begin. A maximum of 10,000 queries are returned.
Possible values: 50 ≤ number of items ≤ 10000
- queries
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object specifying the training queries contained in the identified training set.
Array of training queries. At least 50 queries are required for training to begin. A maximum of 10,000 queries are returned.
Possible values: 50 ≤ number of items ≤ 10000
- queries
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object specifying the training queries contained in the identified training set.
Array of training queries. At least 50 queries are required for training to begin. A maximum of 10,000 queries are returned.
Possible values: 50 ≤ number of items ≤ 10000
- queries
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object specifying the training queries contained in the identified training set.
Array of training queries. At least 50 queries are required for training to begin. A maximum of 10,000 queries are returned.
Possible values: 50 ≤ number of items ≤ 10000
- Queries
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- Examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Status Code
Training data for the specified project found and returned.
The specified project does not exist.
No Sample Response
Delete training queries
Removes all training queries for the specified project.
Removes all training queries for the specified project.
Removes all training queries for the specified project.
Removes all training queries for the specified project.
Removes all training queries for the specified project.
DELETE /v2/projects/{project_id}/training_data/queries
ServiceCall<Void> deleteTrainingQueries(DeleteTrainingQueriesOptions deleteTrainingQueriesOptions)
deleteTrainingQueries(params)
delete_training_queries(
self,
project_id: str,
**kwargs,
) -> DetailedResponse
DeleteTrainingQueries(string projectId)
Request
Use the DeleteTrainingQueriesOptions.Builder
to create a DeleteTrainingQueriesOptions
object that contains the parameter values for the deleteTrainingQueries
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The deleteTrainingQueries options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -X DELETE {auth} "{url}/v2/projects/{project_id}/training_data/queries?version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.DeleteTrainingQueries( projectId: "{project_id}" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); DeleteTrainingQueriesOptions options = new DeleteTrainingQueriesOptions.Builder() .projectId("{project_id}") .build(); discovery.deleteTrainingQueries(options).execute();
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', }; discovery.deleteTrainingQueries(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.delete_training_queries( project_id='{project_id}' ).get_result() print(json.dumps(response, indent=2))
Create a training query
Add a query to the training data for this project. The query can contain a filter and natural language query.
Note: You cannot apply relevancy training to a content_mining
project type.
Add a query to the training data for this project. The query can contain a filter and natural language query.
Note: You cannot apply relevancy training to a content_mining
project type.
Add a query to the training data for this project. The query can contain a filter and natural language query.
Note: You cannot apply relevancy training to a content_mining
project type.
Add a query to the training data for this project. The query can contain a filter and natural language query.
Note: You cannot apply relevancy training to a content_mining
project type.
Add a query to the training data for this project. The query can contain a filter and natural language query.
Note: You cannot apply relevancy training to a content_mining
project type.
POST /v2/projects/{project_id}/training_data/queries
ServiceCall<TrainingQuery> createTrainingQuery(CreateTrainingQueryOptions createTrainingQueryOptions)
createTrainingQuery(params)
create_training_query(
self,
project_id: str,
natural_language_query: str,
examples: List['TrainingExample'],
*,
filter: str = None,
**kwargs,
) -> DetailedResponse
CreateTrainingQuery(string projectId, string naturalLanguageQuery, List<TrainingExample> examples, string filter = null)
Request
Use the CreateTrainingQueryOptions.Builder
to create a CreateTrainingQueryOptions
object that contains the parameter values for the createTrainingQuery
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
An object that represents the query to be submitted. At least 50 queries are required for training to begin. A maximum of 10,000 queries are allowed.
The natural text query that is used as the training query.
Array of training examples.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The createTrainingQuery options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The natural text query that is used as the training query.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The natural text query that is used as the training query.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The natural text query that is used as the training query.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The natural text query that is used as the training query.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
curl -X POST {auth} --data "{ \"natural_language_query\": \"why is the sky blue\", \"filter\": \"text:meteorology\", \"examples\": [{ \"document_id\": \"54f95ac0-3e4f-4756-bea6-7a67b2713c81\", \"relevance\": 1, \"collection_id\": \"800e58e4-198d-45eb-be87-74e1d6df4e96\" }, { \"document_id\": \"01bcca32-7300-4c9f-8d32-33ed7ea643da\", \"relevance\": 5, \"collection_id\": \"800e58e4-198d-45eb-be87-74e1d6df4e96\" }] }" "{url}/v2/projects/{project_id}/training_data/queries?version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); TrainingExample trainingExample = new TrainingExample() { CollectionId = "{collection_id}", DocumentId = "{document_id}" }; DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.CreateTrainingQuery( projectId: "{project_id}", examples: new List<TrainingExample>() { trainingExample }, naturalLanguageQuery: "This is an example of a query" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); TrainingExample trainingExample = new TrainingExample.Builder() .collectionId("{collection_id}") .documentId("{document_id}") .relevance(1L) .build(); CreateTrainingQueryOptions options = new CreateTrainingQueryOptions.Builder() .projectId("{project_id}") .addExamples(trainingExample) .naturalLanguageQuery("This is an example of a query") .build(); TrainingQuery response = discovery.createTrainingQuery(options).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', naturalLanguageQuery: 'This is an example of a query', examples: [ { collection_id: '{collectionId}', document_id: '{documentId}', }, ], }; discovery.createTrainingQuery(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_watson.discovery_v2 import TrainingExample from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') training_example = TrainingExample( document_id='{document_id}', collection_id='{collection_id}', relevance=1 ) response = discovery.create_training_query( project_id='{project_id}', natural_language_query='This is an example of a query', examples=[training_example] ).get_result() print(json.dumps(response, indent=2))
Response
Object that contains training query details.
The natural text query that is used as the training query.
Array of training examples.
The query ID associated with the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- Examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Status Code
The query was successfully added.
Invalid headers or request.
The specified project does not exist.
No Sample Response
Get a training data query
Get details for a specific training data query, including the query string and all examples
Get details for a specific training data query, including the query string and all examples.
Get details for a specific training data query, including the query string and all examples.
Get details for a specific training data query, including the query string and all examples.
Get details for a specific training data query, including the query string and all examples.
GET /v2/projects/{project_id}/training_data/queries/{query_id}
ServiceCall<TrainingQuery> getTrainingQuery(GetTrainingQueryOptions getTrainingQueryOptions)
getTrainingQuery(params)
get_training_query(
self,
project_id: str,
query_id: str,
**kwargs,
) -> DetailedResponse
GetTrainingQuery(string projectId, string queryId)
Request
Use the GetTrainingQueryOptions.Builder
to create a GetTrainingQueryOptions
object that contains the parameter values for the getTrainingQuery
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The getTrainingQuery options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/training_data/queries/{query_id}?version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.GetTrainingQuery( projectId: "{project_id}", queryId: "{query_id}" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); GetTrainingQueryOptions options = new GetTrainingQueryOptions.Builder() .projectId("{project_id}") .queryId("{query_id}") .build(); TrainingQuery response = discovery.getTrainingQuery(options).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', queryId: '{queryId}', }; discovery.getTrainingQuery(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.get_training_query( project_id='{project_id}', query_id='{query_id}' ).get_result() print(json.dumps(response, indent=2))
Response
Object that contains training query details.
The natural text query that is used as the training query.
Array of training examples.
The query ID associated with the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- Examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Status Code
Details of the specified training query.
Query or project not found.
No Sample Response
Update a training query
Updates an existing training query and its examples. You must resubmit all of the examples with the update request.
Updates an existing training query and its examples. You must resubmit all of the examples with the update request.
Updates an existing training query and its examples. You must resubmit all of the examples with the update request.
Updates an existing training query and its examples. You must resubmit all of the examples with the update request.
Updates an existing training query and its examples. You must resubmit all of the examples with the update request.
POST /v2/projects/{project_id}/training_data/queries/{query_id}
ServiceCall<TrainingQuery> updateTrainingQuery(UpdateTrainingQueryOptions updateTrainingQueryOptions)
updateTrainingQuery(params)
update_training_query(
self,
project_id: str,
query_id: str,
natural_language_query: str,
examples: List['TrainingExample'],
*,
filter: str = None,
**kwargs,
) -> DetailedResponse
UpdateTrainingQuery(string projectId, string queryId, string naturalLanguageQuery, List<TrainingExample> examples, string filter = null)
Request
Use the UpdateTrainingQueryOptions.Builder
to create a UpdateTrainingQueryOptions
object that contains the parameter values for the updateTrainingQuery
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The body of the example that is to be added to the specified query.
The natural text query that is used as the training query.
Array of training examples.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The updateTrainingQuery options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The natural text query that is used as the training query.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The natural text query that is used as the training query.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The natural text query that is used as the training query.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The natural text query that is used as the training query.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
curl -X POST {auth} --data "{ \"query_id\": \"3c4fff84-1500-455c-b125-eaa2d319f6d3\", \"natural_language_query\": \"why is the sky blue\", \"filter\": \"text:meteorology\", \"examples\": [{ \"document_id\": \"54f95ac0-3e4f-4756-bea6-7a67b2713c81\", \"relevance\": 1, \"collection_id\": \"800e58e4-198d-45eb-be87-74e1d6df4e96\" }, { \"document_id\": \"01bcca32-7300-4c9f-8d32-33ed7ea643da\", \"relevance\": 5, \"collection_id\": \"800e58e4-198d-45eb-be87-74e1d6df4e96\" }] }" "{url}/v2/projects/{project_id}/training_data/queries/{query_id}?version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var newFilter = "field:1"; TrainingExample newTrainingExample = new TrainingExample() { CollectionId = "{collection_id}", DocumentId = "{document_id}" }; var result = service.UpdateTrainingQuery( projectId: "{project_id}", queryId: "{query_id}", naturalLanguageQuery: "This is a new example of a query", examples: new List<TrainingExample>() { newTrainingExample }, filter: newFilter );
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); TrainingExample newTrainingExample = new TrainingExample.Builder() .collectionId("{collection_id}") .documentId("{document_id}") .relevance(1L) .build(); String newQuery = "This is a new query!"; UpdateTrainingQueryOptions options = new UpdateTrainingQueryOptions.Builder() .projectId("{project_id}") .queryId("{query_id}") .addExamples(newTrainingExample) .naturalLanguageQuery(newQuery) .build(); TrainingQuery response = discovery.updateTrainingQuery(options).execute().getResult(); System.out.println(response);
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', queryId: '{queryId}', naturalLanguageQuery: 'This is a new query!', examples: [ { document_id: '{documentId}', collection_id: '{collectionId}', relevance: 1, }, ], }; discovery.updateTrainingQuery(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_watson.discovery_v2 import TrainingExample from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') training_example = TrainingExample( document_id='{document_id}', collection_id='{collection_id}', relevance=1 ) response = discovery.update_training_query( project_id='{project_id}', query_id='{query_id}', natural_language_query='This is an example of a query', examples=[training_example], filter='{field:1}' ).get_result() print(json.dumps(response, indent=2))
Response
Object that contains training query details.
The natural text query that is used as the training query.
Array of training examples.
The query ID associated with the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Object that contains training query details.
The query ID associated with the training query.
The natural text query that is used as the training query.
The filter used on the collection before the natural_language_query is applied. Only specify a filter if the documents that you consider to be most relevant are not included in the top 100 results when you submit test queries. If you specify a filter during training, apply the same filter to queries that are submitted at runtime for optimal ranking results.
The date and time the query was created.
The date and time the query was updated.
Array of training examples.
- Examples
The document ID associated with this training example.
The collection ID associated with this training example.
The relevance score of the training example. Scores range from
0
to100
. Zero means not relevant. The higher the number, the more relevant the example.The date and time the example was created.
The date and time the example was updated.
Status Code
The example was successfully added to the query.
Bad request.
No Sample Response
Delete a training data query
Removes details from a training data query, including the query string and all examples.
To delete an example, use the Update a training query method and omit the example that you want to delete from the example set.
Removes details from a training data query, including the query string and all examples.
To delete an example, use the Update a training query method and omit the example that you want to delete from the example set.
Removes details from a training data query, including the query string and all examples.
To delete an example, use the Update a training query method and omit the example that you want to delete from the example set.
Removes details from a training data query, including the query string and all examples.
To delete an example, use the Update a training query method and omit the example that you want to delete from the example set.
Removes details from a training data query, including the query string and all examples.
To delete an example, use the Update a training query method and omit the example that you want to delete from the example set.
DELETE /v2/projects/{project_id}/training_data/queries/{query_id}
ServiceCall<Void> deleteTrainingQuery(DeleteTrainingQueryOptions deleteTrainingQueryOptions)
deleteTrainingQuery(params)
delete_training_query(
self,
project_id: str,
query_id: str,
**kwargs,
) -> DetailedResponse
DeleteTrainingQuery(string projectId, string queryId)
Request
Use the DeleteTrainingQueryOptions.Builder
to create a DeleteTrainingQueryOptions
object that contains the parameter values for the deleteTrainingQuery
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The deleteTrainingQuery options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the query used for training.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -X DELETE {auth} "{url}/v2/projects/{project_id}/training_data/queries/{query_id}?version=2023-03-31"
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator( url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", username: "{username}", password: "{password}" ); DiscoveryService service = new DiscoveryService("2020-08-30", authenticator); service.SetServiceUrl("{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}"); var result = service.DeleteTrainingQuery( projectId: "{project_id}", queryId: "{query_id}" ); Console.WriteLine(result.Response);
CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}"); Discovery discovery = new Discovery("2020-08-30", authenticator); discovery.setServiceUrl("https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api"); DeleteTrainingQueryOptions deleteTrainingQueryOptions = new DeleteTrainingQueryOptions.Builder() .projectId("{project_id}") .queryId("{query_id}") .build(); discovery.deleteTrainingQuery(deleteTrainingQueryOptions).execute().getResult();
const DiscoveryV2 = require('ibm-watson/discovery/v2'); const { CloudPakForDataAuthenticator } = require('ibm-watson/auth'); const discovery = new DiscoveryV2({ authenticator: new CloudPakForDataAuthenticator({ url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', username: '{username}', password: '{password}', }), version: '2020-08-30', serviceUrl: 'https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api', }); const params = { projectId: '{projectId}', queryId: '{queryId}', }; discovery.deleteTrainingQuery(params) .then(response => { console.log(JSON.stringify(response.result, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import DiscoveryV2 from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator authenticator = CloudPakForDataAuthenticator( '{username}', '{password}', 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize', disable_ssl_verification=True) discovery = DiscoveryV2( version='2020-08-30', authenticator=authenticator ) discovery.set_service_url('{https://{cpd_cluster_host}{:port}/discovery/{release}/instances/{instance_id}/api}') response = discovery.delete_training_query( project_id='{project_id}', query_id='{query_id}' ).get_result() print(json.dumps(response, indent=2))
List enrichments
Lists the enrichments available to this project. The Part of Speech and Sentiment of Phrases enrichments might be listed, but are reserved for internal use only.
Lists the enrichments available to this project. The Part of Speech and Sentiment of Phrases enrichments might be listed, but are reserved for internal use only.
Lists the enrichments available to this project. The Part of Speech and Sentiment of Phrases enrichments might be listed, but are reserved for internal use only.
Lists the enrichments available to this project. The Part of Speech and Sentiment of Phrases enrichments might be listed, but are reserved for internal use only.
Lists the enrichments available to this project. The Part of Speech and Sentiment of Phrases enrichments might be listed, but are reserved for internal use only.
GET /v2/projects/{project_id}/enrichments
ServiceCall<Enrichments> listEnrichments(ListEnrichmentsOptions listEnrichmentsOptions)
listEnrichments(params)
list_enrichments(
self,
project_id: str,
**kwargs,
) -> DetailedResponse
ListEnrichments(string projectId)
Request
Use the ListEnrichmentsOptions.Builder
to create a ListEnrichmentsOptions
object that contains the parameter values for the listEnrichments
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The listEnrichments options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/enrichments?version=2023-03-31"
Response
An object that contains an array of enrichment definitions.
An array of enrichment definitions.
An object that contains an array of enrichment definitions.
An array of enrichment definitions.
- enrichments
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
An object that contains an array of enrichment definitions.
An array of enrichment definitions.
- enrichments
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
An object that contains an array of enrichment definitions.
An array of enrichment definitions.
- enrichments
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
An object that contains an array of enrichment definitions.
An array of enrichment definitions.
- _Enrichments
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- Options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Status Code
Returns an array of available enrichments.
Bad request.
Project not found
{ "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-000000000016", "name": "Sentiment of Document", "type": "natural_language_understanding", "description": "Predict document-level sentiment" }, { "enrichment_id": "701db916-fc83-57ab-0000-000000000018", "name": "Keywords", "type": "natural_language_understanding", "description": "Extract keywords from each document" }, { "enrichment_id": "701db916-fc83-57ab-0000-000000000012", "name": "Table Understanding", "type": "rule_based", "description": "Understand tabular data in HTML via understanding of each table's column headers, row headers, body cells, as well as relevant context and title" }, { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "name": "Entities v2", "type": "natural_language_understanding", "description": "Extract named entities from each document with v2 type system" }, { "enrichment_id": "701db916-fc83-57ab-0000-000000000004", "name": "Sentiment of Phrases", "type": "sentiment", "description": "Extract phrases and expressions which convey sentiment." }, { "enrichment_id": "701db916-fc83-57ab-0000-000000000017", "name": "Entities v1 legacy", "type": "natural_language_understanding", "description": "Extract named entities from each document with v1 type system" }, { "enrichment_id": "701db916-fc83-57ab-0000-000000000002", "name": "Part of Speech", "type": "part_of_speech", "description": "Extract words and phrases from unstructured content and mark these extractions as annotations." }, { "enrichment_id": "314567a0-2bf7-11ee-be56-0242ac120002", "name": "Sentence Classification", "type": "sentence_classifier", "description": "Classify sentences from each document" } ] }
{ "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-000000000016", "name": "Sentiment of Document", "type": "natural_language_understanding", "description": "Predict document-level sentiment" }, { "enrichment_id": "701db916-fc83-57ab-0000-000000000018", "name": "Keywords", "type": "natural_language_understanding", "description": "Extract keywords from each document" }, { "enrichment_id": "701db916-fc83-57ab-0000-000000000012", "name": "Table Understanding", "type": "rule_based", "description": "Understand tabular data in HTML via understanding of each table's column headers, row headers, body cells, as well as relevant context and title" }, { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "name": "Entities v2", "type": "natural_language_understanding", "description": "Extract named entities from each document with v2 type system" }, { "enrichment_id": "701db916-fc83-57ab-0000-000000000004", "name": "Sentiment of Phrases", "type": "sentiment", "description": "Extract phrases and expressions which convey sentiment." }, { "enrichment_id": "701db916-fc83-57ab-0000-000000000017", "name": "Entities v1 legacy", "type": "natural_language_understanding", "description": "Extract named entities from each document with v1 type system" }, { "enrichment_id": "701db916-fc83-57ab-0000-000000000002", "name": "Part of Speech", "type": "part_of_speech", "description": "Extract words and phrases from unstructured content and mark these extractions as annotations." }, { "enrichment_id": "314567a0-2bf7-11ee-be56-0242ac120002", "name": "Sentence Classification", "type": "sentence_classifier", "description": "Classify sentences from each document" } ] }
Create an enrichment
Create an enrichment for use with the specified project. To apply the enrichment to a collection in the project, use the Collections API.
Create an enrichment for use with the specified project. To apply the enrichment to a collection in the project, use the Collections API.
Create an enrichment for use with the specified project. To apply the enrichment to a collection in the project, use the Collections API.
Create an enrichment for use with the specified project. To apply the enrichment to a collection in the project, use the Collections API.
Create an enrichment for use with the specified project. To apply the enrichment to a collection in the project, use the Collections API.
POST /v2/projects/{project_id}/enrichments
ServiceCall<Enrichment> createEnrichment(CreateEnrichmentOptions createEnrichmentOptions)
createEnrichment(params)
create_enrichment(
self,
project_id: str,
enrichment: 'CreateEnrichment',
*,
file: BinaryIO = None,
**kwargs,
) -> DetailedResponse
CreateEnrichment(string projectId, CreateEnrichment enrichment, System.IO.MemoryStream file = null)
Request
Use the CreateEnrichmentOptions.Builder
to create a CreateEnrichmentOptions
object that contains the parameter values for the createEnrichment
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
Form Parameters
Information about a specific enrichment.
- enrichment
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment. The following types are supported:
-
classifier
: Creates a document classifier enrichment from a document classifier model that you create by using the Document classifier API. Note: A text classifier enrichment can be created only from the product user interface. -
dictionary
: Creates a custom dictionary enrichment that you define in a CSV file. -
regular_expression
: Creates a custom regular expression enrichment from regex syntax that you specify in the request. -
rule_based
: Creates an enrichment from an advanced rules model that is created and exported as a ZIP file from Watson Knowledge Studio. -
uima_annotator
: Creates an enrichment from a custom UIMA text analysis model that is defined in a PEAR file created in one of the following ways:-
Watson Explorer Content Analytics Studio. Note: Supported in IBM Cloud Pak for Data instances only.
-
Rule-based model that is created in Watson Knowledge Studio.
-
-
watson_knowledge_studio_model
: Creates an enrichment from a Watson Knowledge Studio machine learning model that is defined in a ZIP file. -
webhook
: Connects to an external enrichment application by using a webhook. -
sentence_classifier
: Use sentence classifier to classify sentences in your documents. This feature is available in IBM Cloud-managed instances only. The sentence classifier feature is beta functionality. Beta features are not supported by the SDKs.
Allowable values: [
classifier
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,webhook
,sentence_classifier
]-
An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.The Universally Unique Identifier (UUID) of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.The Universally Unique Identifier (UUID) of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Default:
0
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.Default:
0
A URL that uses the SSL protocol (begins with https) for the webhook. Required when type is
webhook
. Not valid when creating any other type of enrichment.The Discovery API version that allows to distinguish the schema. The version is specified in the
yyyy-mm-dd
format. Optional whentype
iswebhook
. Not valid when creating any other type of enrichment.Default:
2023-03-31
A private key can be included in the request to authenticate with the external service. The maximum length is 1,024 characters. Optional when
type
iswebhook
. Not valid when creating any other type of enrichment.An array of headers to pass with the HTTP request. Optional when
type
iswebhook
. Not valid when creating any other type of enrichment.Discovery calculates offsets of the text's location with this encoding type in documents. Use the same location encoding type in both Discovery and external enrichment for a document.
These encoding types are supported:
utf-8
,utf-16
, andutf-32
. Optional whentype
iswebhook
. Not valid when creating any other type of enrichment.Default:
`utf-16`
The enrichment file to upload. Expected file types per enrichment are as follows:
-
CSV for
dictionary
andsentence_classifier
(the training data CSV file to upload). -
PEAR for
uima_annotator
andrule_based
(Explorer) -
ZIP for
watson_knowledge_studio_model
andrule_based
(Studio Advanced Rule Editor)
-
The createEnrichment options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Information about a specific enrichment.
- enrichment
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment. The following types are supported:
-
classifier
: Creates a document classifier enrichment from a document classifier model that you create by using the Document classifier API. Note: A text classifier enrichment can be created only from the product user interface. -
dictionary
: Creates a custom dictionary enrichment that you define in a CSV file. -
regular_expression
: Creates a custom regular expression enrichment from regex syntax that you specify in the request. -
rule_based
: Creates an enrichment from an advanced rules model that is created and exported as a ZIP file from Watson Knowledge Studio. -
uima_annotator
: Creates an enrichment from a custom UIMA text analysis model that is defined in a PEAR file created in one of the following ways:-
Watson Explorer Content Analytics Studio. Note: Supported in IBM Cloud Pak for Data instances only.
-
Rule-based model that is created in Watson Knowledge Studio.
-
-
watson_knowledge_studio_model
: Creates an enrichment from a Watson Knowledge Studio machine learning model that is defined in a ZIP file.
Allowable values: [
classifier
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
]-
An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Default:
0
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.Default:
0
The enrichment file to upload. Expected file types per enrichment are as follows:
-
CSV for
dictionary
-
PEAR for
uima_annotator
andrule_based
(Explorer) -
ZIP for
watson_knowledge_studio_model
andrule_based
(Studio Advanced Rule Editor).
-
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Information about a specific enrichment.
The enrichment file to upload. Expected file types per enrichment are as follows:
-
CSV for
dictionary
-
PEAR for
uima_annotator
andrule_based
(Explorer) -
ZIP for
watson_knowledge_studio_model
andrule_based
(Studio Advanced Rule Editor).
-
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Information about a specific enrichment.
The enrichment file to upload. Expected file types per enrichment are as follows:
-
CSV for
dictionary
-
PEAR for
uima_annotator
andrule_based
(Explorer) -
ZIP for
watson_knowledge_studio_model
andrule_based
(Studio Advanced Rule Editor).
-
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Information about a specific enrichment.
The enrichment file to upload. Expected file types per enrichment are as follows:
-
CSV for
dictionary
-
PEAR for
uima_annotator
andrule_based
(Explorer) -
ZIP for
watson_knowledge_studio_model
andrule_based
(Studio Advanced Rule Editor).
-
curl -X POST {auth} --header "Content-Type: multipart/form-data" --form enrichment="{\"name\": \"Products\", \"type\": \"dictionary\", \"description\": \"Products dictionary\", \"options\": {\"languages\": [\"en\"], \"entity_type\": \"products\"}}" --form file=@product_list.csv "{url}/v2/projects/{project_id}/enrichments?version=2023-03-31"
Response
Information about a specific enrichment.
The Universally Unique Identifier (UUID) of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
,webhook
,sentence_classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- Options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Status Code
The enrichment has been successfully created
Bad request.
{ "enrichment_id": "5b36383b-fa57-0bfe-0000-017b77fba220", "name": "Products", "type": "dictionary", "description": "Products dictionary", "options": { "languages": [ "en" ], "entity_type": "products" } }
{ "enrichment_id": "5b36383b-fa57-0bfe-0000-017b77fba220", "name": "Products", "type": "dictionary", "description": "Products dictionary", "options": { "languages": [ "en" ], "entity_type": "products" } }
Get enrichment
Get details about a specific enrichment.
Get details about a specific enrichment.
Get details about a specific enrichment.
Get details about a specific enrichment.
Get details about a specific enrichment.
GET /v2/projects/{project_id}/enrichments/{enrichment_id}
ServiceCall<Enrichment> getEnrichment(GetEnrichmentOptions getEnrichmentOptions)
getEnrichment(params)
get_enrichment(
self,
project_id: str,
enrichment_id: str,
**kwargs,
) -> DetailedResponse
GetEnrichment(string projectId, string enrichmentId)
Request
Use the GetEnrichmentOptions.Builder
to create a GetEnrichmentOptions
object that contains the parameter values for the getEnrichment
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the enrichment.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The getEnrichment options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/enrichments/{enrichment_id}?version=2023-03-31"
Response
Information about a specific enrichment.
The Universally Unique Identifier (UUID) of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
,webhook
,sentence_classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- Options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Status Code
Returns information about the specified enrichment.
Enrichment or project not found.
{ "enrichment_id": "5b36383b-fa57-0bfe-0000-017b77fba220", "name": "Products", "type": "dictionary", "description": "Products dictionary", "options": { "languages": [ "en" ], "entity_type": "products" } }
{ "enrichment_id": "5b36383b-fa57-0bfe-0000-017b77fba220", "name": "Products", "type": "dictionary", "description": "Products dictionary", "options": { "languages": [ "en" ], "entity_type": "products" } }
Update an enrichment
Updates an existing enrichment's name and description.
Updates an existing enrichment's name and description.
Updates an existing enrichment's name and description.
Updates an existing enrichment's name and description.
Updates an existing enrichment's name and description.
POST /v2/projects/{project_id}/enrichments/{enrichment_id}
ServiceCall<Enrichment> updateEnrichment(UpdateEnrichmentOptions updateEnrichmentOptions)
updateEnrichment(params)
update_enrichment(
self,
project_id: str,
enrichment_id: str,
name: str,
*,
description: str = None,
**kwargs,
) -> DetailedResponse
UpdateEnrichment(string projectId, string enrichmentId, string name, string description = null)
Request
Use the UpdateEnrichmentOptions.Builder
to create a UpdateEnrichmentOptions
object that contains the parameter values for the updateEnrichment
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the enrichment.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
An object that lists the new name and description for an enrichment.
A new name for the enrichment.
A new description for the enrichment.
The updateEnrichment options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A new name for the enrichment.
A new description for the enrichment.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A new name for the enrichment.
A new description for the enrichment.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A new name for the enrichment.
A new description for the enrichment.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A new name for the enrichment.
A new description for the enrichment.
curl -X POST {auth} --header "Content-Type: application/json" --data "{ \"name\": \"Products and Services\" }" "{url}/v2/projects/{project_id}/enrichments/{enrichment_id}?version=2023-03-31"
Response
Information about a specific enrichment.
The Universally Unique Identifier (UUID) of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
,webhook
,sentence_classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Information about a specific enrichment.
The unique identifier of this enrichment.
The human readable name for this enrichment.
The description of this enrichment.
The type of this enrichment.
Possible values: [
part_of_speech
,sentiment
,natural_language_understanding
,dictionary
,regular_expression
,uima_annotator
,rule_based
,watson_knowledge_studio_model
,classifier
]An object that contains options for the current enrichment. Starting with version
2020-08-30
, the enrichment options are not included in responses from the List Enrichments method.- Options
An array of supported languages for this enrichment. When creating an enrichment, only specify a language that is used by the model or in the dictionary. Required when type is
dictionary
. Optional when type isrule_based
. Not valid when creating any other type of enrichment.The name of the entity type. This value is used as the field name in the index. Required when type is
dictionary
orregular_expression
. Not valid when creating any other type of enrichment.The regular expression to apply for this enrichment. Required when type is
regular_expression
. Not valid when creating any other type of enrichment.The name of the result document field that this enrichment creates. Required when type is
rule_based
orclassifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier. Required when type is
classifier
. Not valid when creating any other type of enrichment.A unique identifier of the document classifier model. Required when type is
classifier
. Not valid when creating any other type of enrichment.Specifies a threshold. Only classes with evaluation confidence scores that are higher than the specified threshold are included in the output. Optional when type is
classifier
. Not valid when creating any other type of enrichment.Possible values: 0 ≤ value ≤ 1
Evaluates only the classes that fall in the top set of results when ranked by confidence. For example, if set to
5
, then the top five classes for each document are evaluated. If set to 0, the confidence_threshold is used to determine the predicted classes. Optional when type isclassifier
. Not valid when creating any other type of enrichment.
Status Code
Returns the updated enrichment details.
Bad request.
Enrichment or project not found.
{ "enrichment_id": "5b36383b-fa57-0bfe-0000-017b77fba220", "name": "Products and Services", "type": "dictionary", "description": "Products dictionary", "options": { "languages": [ "en" ], "entity_type": "products" } }
{ "enrichment_id": "5b36383b-fa57-0bfe-0000-017b77fba220", "name": "Products and Services", "type": "dictionary", "description": "Products dictionary", "options": { "languages": [ "en" ], "entity_type": "products" } }
Delete an enrichment
Deletes an existing enrichment from the specified project.
Note: Only enrichments that have been manually created can be deleted.
Deletes an existing enrichment from the specified project.
Note: Only enrichments that have been manually created can be deleted.
Deletes an existing enrichment from the specified project.
Note: Only enrichments that have been manually created can be deleted.
Deletes an existing enrichment from the specified project.
Note: Only enrichments that have been manually created can be deleted.
Deletes an existing enrichment from the specified project.
Note: Only enrichments that have been manually created can be deleted.
DELETE /v2/projects/{project_id}/enrichments/{enrichment_id}
ServiceCall<Void> deleteEnrichment(DeleteEnrichmentOptions deleteEnrichmentOptions)
deleteEnrichment(params)
delete_enrichment(
self,
project_id: str,
enrichment_id: str,
**kwargs,
) -> DetailedResponse
DeleteEnrichment(string projectId, string enrichmentId)
Request
Use the DeleteEnrichmentOptions.Builder
to create a DeleteEnrichmentOptions
object that contains the parameter values for the deleteEnrichment
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the enrichment.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The deleteEnrichment options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the enrichment.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -X DELETE {auth} "{url}/v2/projects/{project_id}/enrichments/{enrichment_id}?version=2023-03-31"
List batches
A batch is a set of documents that are ready for enrichment by an external application. After you apply a webhook enrichment to a collection, and then process or upload documents to the collection, Discovery creates a batch with a unique batch_id.
To start, you must register your external application as a webhook type by using the Create enrichment API method.
Use the List batches API to get the following:
-
Notified batches that are not yet pulled by the external enrichment application.
-
Batches that are pulled, but not yet pushed to Discovery by the external enrichment application.
GET /v2/projects/{project_id}/collections/{collection_id}/batches
Request
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
curl {auth} "{url}/v2/projects/{project_id}/collections/{collection_id}/batches?version=2023-03-31"
Response
An object that contains a list of batches that are ready for enrichment by the external application.
An array that lists the batches in a collection.
Status Code
Returns an array of available batches that are ready to be pulled by the external enrichment application.
The batches timeout automatically after around 72 hours from the time they were created.
Missing project or collection, or bad request.
{ "batches": [ { "batch_id": "e9e1316b-a8b7-48d6-b538-e608af616a12", "enrichment_id": "fd290d8b-53e2-dba1-0000-018a8d150b85", "created": "2023-09-21T06:38:21.260Z" }, { "batch_id": "aaab573d-e48c-3445-7b9d-5f3f9a9970215", "enrichment_id": "fd290d8b-53e2-dba1-0000-018a8d150b85", "created": "2023-09-21T05:38:21.260Z" } ] }
Pull batches
Pull a batch of documents from Discovery for enrichment by an external application. Ensure to include the Accept-Encoding: gzip
header in this method to get the file. You can also implement retry logic when calling this method to avoid any network errors.
GET /v2/projects/{project_id}/collections/{collection_id}/batches/{batch_id}
Request
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
The Universally Unique Identifier (UUID) of the document batch that is being requested from Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
curl {auth} --header "Accept-Encoding: gzip" "{url}/v2/projects/{project_id}/collections/{collection_id}/batches/{batch_id}?version=2023-03-31"
Response
A compressed newline delimited JSON (NDJSON) file containing the document. The NDJSON format is used to describe structured data. The file name format is {batch_id}.ndjson.gz
. For more information, see Binary attachment from the pull batches method.
A compressed NDJSON file containing the document.
Status Code
Returns a compressed NDJSON file.
Not found. Returned when the project, collection, or batch is missing.
{ "document_id": "3fb9f330-bd94-41a8-ac75-f732af13b5ac_1", "location_encoding": "utf-32", "language": "en", "artifact": "2016/1/2{\"parent_document_id\":\"3fb9f330-bd94-41a8-ac75-f732af13b5ac\"}1vanilla ice creamFemalecontamination_tamperingI got some ice cream for my children, but there was something like a piece of thread inside the cup.20QueensSilver Card MemberIce cream", "features": [ { "type": "field", "location": { "begin": 0, "end": 8 }, "properties": { "field_name": "date", "field_index": 0, "field_type": "string" } }, { "type": "field", "location": { "begin": 8, "end": 69 }, "properties": { "field_name": "metadata", "field_index": 0, "field_type": "json" } }, { "type": "field", "location": { "begin": 69, "end": 70 }, "properties": { "field_name": "claim_id", "field_index": 0, "field_type": "long" } }, { "type": "field", "location": { "begin": 70, "end": 87 }, "properties": { "field_name": "claim_product", "field_index": 0, "field_type": "string" } }, { "type": "field", "location": { "begin": 87, "end": 93 }, "properties": { "field_name": "client_sex", "field_index": 0, "field_type": "string" } }, { "type": "field", "location": { "begin": 93, "end": 116 }, "properties": { "field_name": "label", "field_index": 0, "field_type": "string" } }, { "type": "field", "location": { "begin": 116, "end": 216 }, "properties": { "field_name": "body", "field_index": 0, "field_type": "string" } }, { "type": "field", "location": { "begin": 216, "end": 218 }, "properties": { "field_name": "client_age", "field_index": 0, "field_type": "long" } }, { "type": "field", "location": { "begin": 218, "end": 224 }, "properties": { "field_name": "client_location", "field_index": 0, "field_type": "string" } }, { "type": "field", "location": { "begin": 224, "end": 242 }, "properties": { "field_name": "client_segment", "field_index": 0, "field_type": "string" } }, { "type": "field", "location": { "begin": 242, "end": 251 }, "properties": { "field_name": "claim_product_line", "field_index": 0, "field_type": "string" } } ] }
Push batches
Push a batch of documents to Discovery after annotation by an external application. You can implement retry logic when calling this method to avoid any network errors.
POST /v2/projects/{project_id}/collections/{collection_id}/batches/{batch_id}
Request
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
The Universally Unique Identifier (UUID) of the document batch that is being requested from Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
Form Parameters
A compressed newline-delimited JSON (NDJSON), which is a JSON file with one row of data per line. For example,
{batch_id}.ndjson.gz
. For more information, see Binary attachment in the push batches method.There is no limitation on the name of the file because Discovery does not use the name for processing. The list of features in the document is specified in the
features
object.
curl -X POST {auth} --form "file=@{filename}" "{url}/v2/projects/{project_id}/collections/{collection_id}/batches/{batch_id}?version=2023-03-31"
Response
Whether or not a batch push of documents is successful. Discovery does not check what is included in the .gz
file, and returns a response as soon as it temporarily stores a .gz
file.
Documents included in a pushed .gz
file are processed and annotations from the external enrichment are added to corresponding fields in the indexed documents of the collection.
Any additional documents in the .gz
file that were not in the batch (with the same batch_id) pulled from Discovery using the Pull batches method are ignored.
Status Code
The batch has been accepted and is being processed.
Not found. Returned when the project, collection, or batch is missing. It is also returned when the batch is already processed or when a batch gets timed out.
The batch is already pushed and accepted. Returned between the time the batch is submitted and the time when Discovery is processing the submitted documents.
{ "accepted": true }
List document classifiers
Get a list of the document classifiers in a project. Returns only the name and classifier ID of each document classifier.
Get a list of the document classifiers in a project. Returns only the name and classifier ID of each document classifier.
Get a list of the document classifiers in a project. Returns only the name and classifier ID of each document classifier.
Get a list of the document classifiers in a project. Returns only the name and classifier ID of each document classifier.
Get a list of the document classifiers in a project. Returns only the name and classifier ID of each document classifier.
GET /v2/projects/{project_id}/document_classifiers
ServiceCall<DocumentClassifiers> listDocumentClassifiers(ListDocumentClassifiersOptions listDocumentClassifiersOptions)
listDocumentClassifiers(params)
list_document_classifiers(
self,
project_id: str,
**kwargs,
) -> DetailedResponse
ListDocumentClassifiers(string projectId)
Request
Use the ListDocumentClassifiersOptions.Builder
to create a ListDocumentClassifiersOptions
object that contains the parameter values for the listDocumentClassifiers
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The listDocumentClassifiers options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/document_classifiers?version=2023-03-31"
Response
An object that contains a list of document classifier definitions.
An array of document classifier definitions.
An object that contains a list of document classifier definitions.
An array of document classifier definitions.
- classifiers
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federatedClassification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
An object that contains a list of document classifier definitions.
An array of document classifier definitions.
- classifiers
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federated_classification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
An object that contains a list of document classifier definitions.
An array of document classifier definitions.
- classifiers
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federated_classification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
An object that contains a list of document classifier definitions.
An array of document classifier definitions.
- Classifiers
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- Enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- FederatedClassification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Status Code
Returns a list of document classifiers.
Document project is not found.
{ "classifiers": [ { "name": "Customer comments classifier", "classifier_id": "0e36fdd2-7fb0-812b-0000-017ed29d6587" }, { "name": "Product feedback classification", "classifier_id": "357f301e-e918-0b73-0000-017e8983070c" } ] }
{ "classifiers": [ { "name": "Customer comments classifier", "classifier_id": "0e36fdd2-7fb0-812b-0000-017ed29d6587" }, { "name": "Product feedback classification", "classifier_id": "357f301e-e918-0b73-0000-017e8983070c" } ] }
Create a document classifier
Create a document classifier. You can use the API to create a document classifier in any project type. After you create a document classifier, you can use the Enrichments API to create a classifier enrichment, and then the Collections API to apply the enrichment to a collection in the project.
Note: This method is supported on installed instances (IBM Cloud Pak for Data) or IBM Cloud-managed Premium or Enterprise plan instances.
Create a document classifier. You can use the API to create a document classifier in any project type. After you create a document classifier, you can use the Enrichments API to create a classifier enrichment, and then the Collections API to apply the enrichment to a collection in the project.
Note: This method is supported on installed instances (IBM Cloud Pak for Data) or IBM Cloud-managed Premium or Enterprise plan instances.
Create a document classifier. You can use the API to create a document classifier in any project type. After you create a document classifier, you can use the Enrichments API to create a classifier enrichment, and then the Collections API to apply the enrichment to a collection in the project.
Note: This method is supported on installed instances (IBM Cloud Pak for Data) or IBM Cloud-managed Premium or Enterprise plan instances.
Create a document classifier. You can use the API to create a document classifier in any project type. After you create a document classifier, you can use the Enrichments API to create a classifier enrichment, and then the Collections API to apply the enrichment to a collection in the project.
Note: This method is supported on installed instances (IBM Cloud Pak for Data) or IBM Cloud-managed Premium or Enterprise plan instances.
Create a document classifier. You can use the API to create a document classifier in any project type. After you create a document classifier, you can use the Enrichments API to create a classifier enrichment, and then the Collections API to apply the enrichment to a collection in the project.
Note: This method is supported on installed instances (IBM Cloud Pak for Data) or IBM Cloud-managed Premium or Enterprise plan instances.
POST /v2/projects/{project_id}/document_classifiers
ServiceCall<DocumentClassifier> createDocumentClassifier(CreateDocumentClassifierOptions createDocumentClassifierOptions)
createDocumentClassifier(params)
create_document_classifier(
self,
project_id: str,
training_data: BinaryIO,
classifier: 'CreateDocumentClassifier',
*,
test_data: BinaryIO = None,
**kwargs,
) -> DetailedResponse
CreateDocumentClassifier(string projectId, System.IO.MemoryStream trainingData, CreateDocumentClassifier classifier, System.IO.MemoryStream testData = null)
Request
Use the CreateDocumentClassifierOptions.Builder
to create a CreateDocumentClassifierOptions
object that contains the parameter values for the createDocumentClassifier
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
Form Parameters
The training data CSV file to upload. The CSV file must have headers. The file must include a field that contains the text you want to classify and a field that contains the classification labels that you want to use to classify your data. If you want to specify multiple values in a single field, use a semicolon as the value separator. For a sample file, see the product documentation.
An object that manages the settings and data that is required to train a document classification model.
- classifier
A human-readable name of the document classifier.
Possible values: length ≤ 255
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.The name of the field from the training and test data that contains the classification labels.
A description of the document classifier.
An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
The Universally Unique Identifier (UUID) of the enrichment.
An array of field names where the enrichment is applied.
An object with details for creating federated document classifier models.
- federated_classification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
The CSV with test data to upload. The column values in the test file must be the same as the column values in the training data file. If no test data is provided, the training data is split into two separate groups of training and test data.
The createDocumentClassifier options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The training data CSV file to upload. The CSV file must have headers. The file must include a field that contains the text you want to classify and a field that contains the classification labels that you want to use to classify your data. If you want to specify multiple values in a single field, use a semicolon as the value separator. For a sample file, see the product documentation.
An object that manages the settings and data that is required to train a document classification model.
- classifier
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.The name of the field from the training and test data that contains the classification labels.
An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An object with details for creating federated document classifier models.
- federatedClassification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
The CSV with test data to upload. The column values in the test file must be the same as the column values in the training data file. If no test data is provided, the training data is split into two separate groups of training and test data.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The training data CSV file to upload. The CSV file must have headers. The file must include a field that contains the text you want to classify and a field that contains the classification labels that you want to use to classify your data. If you want to specify multiple values in a single field, use a semicolon as the value separator. For a sample file, see the product documentation.
An object that manages the settings and data that is required to train a document classification model.
The CSV with test data to upload. The column values in the test file must be the same as the column values in the training data file. If no test data is provided, the training data is split into two separate groups of training and test data.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The training data CSV file to upload. The CSV file must have headers. The file must include a field that contains the text you want to classify and a field that contains the classification labels that you want to use to classify your data. If you want to specify multiple values in a single field, use a semicolon as the value separator. For a sample file, see the product documentation.
An object that manages the settings and data that is required to train a document classification model.
The CSV with test data to upload. The column values in the test file must be the same as the column values in the training data file. If no test data is provided, the training data is split into two separate groups of training and test data.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The training data CSV file to upload. The CSV file must have headers. The file must include a field that contains the text you want to classify and a field that contains the classification labels that you want to use to classify your data. If you want to specify multiple values in a single field, use a semicolon as the value separator. For a sample file, see the product documentation.
An object that manages the settings and data that is required to train a document classification model.
The CSV with test data to upload. The column values in the test file must be the same as the column values in the training data file. If no test data is provided, the training data is split into two separate groups of training and test data.
curl -X POST {auth} --header "Content-Type: multipart/form-data" --form classifier="{\"name\": \"Customer comments classifier\", \"description\": \"User reviews\", \"language\": \"en\", \"answer_field\": \"label\", \"enrichments\": [ {\"enrichment_id\":\"701db916-fc83-57ab-0000-00000000001e\", \"fields\": [ \"text\", \"body\"] } ], \"federated_classification\": { \"field\": \"claim_product_line\" } }" --form "training_data=@training.csv;type=text/csv" --form "test_data=@test.csv;type=text/csv" "{url}/v2/projects/{project_id}/document_classifiers?version=2023-03-31"
Response
Information about a document classifier.
The Universally Unique Identifier (UUID) of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federatedClassification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federated_classification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federated_classification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- Enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- FederatedClassification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Status Code
Returns the document classifier details.
Bad request. A required parameter is null or invalid. Specific failure messages include:
- No request body (missing classifier configuration).
- Missing required request parameter or its value.
- Unsupported language is requested.
- Provided
training_data
ortest_data
does not have field requested inanswer_field
. - Requested enrichment does not exist.
- Provided
training_data
ortest_data
does not have the fields requested inenrichments.fields
. - Provided
training_data
ortest_data
does not have the field requested infederated_classification.field
.
Missing project.
{ "name": "Customer comments classifier", "classifier_id": "0e36fdd2-7fb0-812b-0000-017ed29d6587", "description": "User reviews", "created": "2022-03-09T17:51:51.592Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text", "body" ] } ], "answer_field": "label", "recognized_fields": [ "text", "body", "date", "client_age", "client_location", "label", "type_column", "client_segment", "claim_id", "claim_product", "claim_product_line" ], "training_data_file": "training.csv", "test_data_file": "test.csv", "federated_classification": { "field": "claim_product_line" } }
{ "name": "Customer comments classifier", "classifier_id": "0e36fdd2-7fb0-812b-0000-017ed29d6587", "description": "User reviews", "created": "2022-03-09T17:51:51.592Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text", "body" ] } ], "answer_field": "label", "recognized_fields": [ "text", "body", "date", "client_age", "client_location", "label", "type_column", "client_segment", "claim_id", "claim_product", "claim_product_line" ], "training_data_file": "training.csv", "test_data_file": "test.csv", "federated_classification": { "field": "claim_product_line" } }
Get a document classifier
Get details about a specific document classifier.
Get details about a specific document classifier.
Get details about a specific document classifier.
Get details about a specific document classifier.
Get details about a specific document classifier.
GET /v2/projects/{project_id}/document_classifiers/{classifier_id}
ServiceCall<DocumentClassifier> getDocumentClassifier(GetDocumentClassifierOptions getDocumentClassifierOptions)
getDocumentClassifier(params)
get_document_classifier(
self,
project_id: str,
classifier_id: str,
**kwargs,
) -> DetailedResponse
GetDocumentClassifier(string projectId, string classifierId)
Request
Use the GetDocumentClassifierOptions.Builder
to create a GetDocumentClassifierOptions
object that contains the parameter values for the getDocumentClassifier
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the classifier.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The getDocumentClassifier options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/document_classifiers/{classifier_id}?version=2023-03-31"
Response
Information about a document classifier.
The Universally Unique Identifier (UUID) of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federatedClassification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federated_classification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federated_classification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- Enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- FederatedClassification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Status Code
Returns details about a specific document classifier.
Document classifier or project not found.
{ "name": "Customer comments classifier", "classifier_id": "0e36fdd2-7fb0-812b-0000-017ed29d6587", "description": "User reviews", "created": "2022-03-09T17:51:51.592Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text", "body" ] } ], "answer_field": "label", "recognized_fields": [ "text", "body", "date", "client_age", "client_location", "label", "type_column", "client_segment", "claim_id", "claim_product", "claim_product_line" ], "training_data_file": "training.csv", "test_data_file": "test.csv", "federated_classification": { "field": "claim_product_line" } }
{ "name": "Customer comments classifier", "classifier_id": "0e36fdd2-7fb0-812b-0000-017ed29d6587", "description": "User reviews", "created": "2022-03-09T17:51:51.592Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text", "body" ] } ], "answer_field": "label", "recognized_fields": [ "text", "body", "date", "client_age", "client_location", "label", "type_column", "client_segment", "claim_id", "claim_product", "claim_product_line" ], "training_data_file": "training.csv", "test_data_file": "test.csv", "federated_classification": { "field": "claim_product_line" } }
Update a document classifier
Update the document classifier name or description, update the training data, or add or update the test data.
Update the document classifier name or description, update the training data, or add or update the test data.
Update the document classifier name or description, update the training data, or add or update the test data.
Update the document classifier name or description, update the training data, or add or update the test data.
Update the document classifier name or description, update the training data, or add or update the test data.
POST /v2/projects/{project_id}/document_classifiers/{classifier_id}
ServiceCall<DocumentClassifier> updateDocumentClassifier(UpdateDocumentClassifierOptions updateDocumentClassifierOptions)
updateDocumentClassifier(params)
update_document_classifier(
self,
project_id: str,
classifier_id: str,
classifier: 'UpdateDocumentClassifier',
*,
training_data: BinaryIO = None,
test_data: BinaryIO = None,
**kwargs,
) -> DetailedResponse
UpdateDocumentClassifier(string projectId, string classifierId, UpdateDocumentClassifier classifier, System.IO.MemoryStream trainingData = null, System.IO.MemoryStream testData = null)
Request
Use the UpdateDocumentClassifierOptions.Builder
to create a UpdateDocumentClassifierOptions
object that contains the parameter values for the updateDocumentClassifier
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the classifier.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
Form Parameters
An object that contains a new name or description for a document classifier, updated training data, or new or updated test data.
- classifier
A new name for the classifier.
Possible values: length ≤ 255
A new description for the classifier.
The training data CSV file to upload. The CSV file must have headers. The file must include a field that contains the text you want to classify and a field that contains the classification labels that you want to use to classify your data. If you want to specify multiple values in a single column, use a semicolon as the value separator. For a sample file, see the product documentation.
The CSV with test data to upload. The column values in the test file must be the same as the column values in the training data file. If no test data is provided, the training data is split into two separate groups of training and test data.
The updateDocumentClassifier options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
An object that contains a new name or description for a document classifier, updated training data, or new or updated test data.
- classifier
A new name for the classifier.
Possible values: length ≤ 255
A new description for the classifier.
The training data CSV file to upload. The CSV file must have headers. The file must include a field that contains the text you want to classify and a field that contains the classification labels that you want to use to classify your data. If you want to specify multiple values in a single column, use a semicolon as the value separator. For a sample file, see the product documentation.
The CSV with test data to upload. The column values in the test file must be the same as the column values in the training data file. If no test data is provided, the training data is split into two separate groups of training and test data.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
An object that contains a new name or description for a document classifier, updated training data, or new or updated test data.
The training data CSV file to upload. The CSV file must have headers. The file must include a field that contains the text you want to classify and a field that contains the classification labels that you want to use to classify your data. If you want to specify multiple values in a single column, use a semicolon as the value separator. For a sample file, see the product documentation.
The CSV with test data to upload. The column values in the test file must be the same as the column values in the training data file. If no test data is provided, the training data is split into two separate groups of training and test data.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
An object that contains a new name or description for a document classifier, updated training data, or new or updated test data.
The training data CSV file to upload. The CSV file must have headers. The file must include a field that contains the text you want to classify and a field that contains the classification labels that you want to use to classify your data. If you want to specify multiple values in a single column, use a semicolon as the value separator. For a sample file, see the product documentation.
The CSV with test data to upload. The column values in the test file must be the same as the column values in the training data file. If no test data is provided, the training data is split into two separate groups of training and test data.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
An object that contains a new name or description for a document classifier, updated training data, or new or updated test data.
The training data CSV file to upload. The CSV file must have headers. The file must include a field that contains the text you want to classify and a field that contains the classification labels that you want to use to classify your data. If you want to specify multiple values in a single column, use a semicolon as the value separator. For a sample file, see the product documentation.
The CSV with test data to upload. The column values in the test file must be the same as the column values in the training data file. If no test data is provided, the training data is split into two separate groups of training and test data.
curl -X POST {auth} --header "Content-Type: multipart/form-data" --form classifier="{ \"name\": \"Customer feedback classifier\", \"description\": \"User reviews and feedback\" }" --form "training_data=@training.csv;type=text/csv" --form "test_data=@test.csv;type=text/csv" "{url}/v2/projects/{project_id}/document_classifiers/{classifier_id}?version=2023-03-31"
Response
Information about a document classifier.
The Universally Unique Identifier (UUID) of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federatedClassification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federated_classification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- federated_classification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Information about a document classifier.
A unique identifier of the document classifier.
A human-readable name of the document classifier.
Possible values: length ≤ 255
A description of the document classifier.
The date that the document classifier was created.
The language of the training data that is associated with the document classifier. Language is specified by using the ISO 639-1 language code, such as
en
for English orja
for Japanese.An array of enrichments to apply to the data that is used to train and test the document classifier. The output from the enrichments is used as features by the classifier to classify the document content both during training and at run time.
- Enrichments
A unique identifier of the enrichment.
An array of field names where the enrichment is applied.
An array of fields that are used to train the document classifier. The same set of fields must exist in the training data, the test data, and the documents where the resulting document classifier enrichment is applied at run time.
The name of the field from the training and test data that contains the classification labels.
Name of the CSV file with training data that is used to train the document classifier.
Name of the CSV file with data that is used to test the document classifier. If no test data is provided, a subset of the training data is used for testing purposes.
An object with details for creating federated document classifier models.
- FederatedClassification
Name of the field that contains the values from which multiple classifier models are defined. For example, you can specify a field that lists product lines to create a separate model per product line.
Status Code
Returns the updated document classifier details.
Bad request. A required parameter is null or invalid. Specific failure messages include:
- No request body (missing classifier configuration).
- Provided
training_data
ortest_data
does not have field requested inanswer_field
. - Provided
training_data
ortest_data
does not have the fields requested inenrichments.fields
. - Provided
training_data
ortest_data
does not have the field requested infederated_classification.field
.
Missing project or classifier.
{ "name": "Customer feedback classifier", "classifier_id": "0e36fdd2-7fb0-812b-0000-017ed29d6587", "description": "User reviews and feedback", "created": "2022-03-09T17:51:51.592Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text", "body" ] } ], "answer_field": "label", "recognized_fields": [ "text", "body", "date", "client_age", "client_location", "label", "type_column", "client_segment", "claim_id", "claim_product", "claim_product_line" ], "training_data_file": "training.csv", "test_data_file": "test.csv", "federated_classification": { "field": "claim_product_line" } }
{ "name": "Customer feedback classifier", "classifier_id": "0e36fdd2-7fb0-812b-0000-017ed29d6587", "description": "User reviews and feedback", "created": "2022-03-09T17:51:51.592Z", "language": "en", "enrichments": [ { "enrichment_id": "701db916-fc83-57ab-0000-00000000001e", "fields": [ "text", "body" ] } ], "answer_field": "label", "recognized_fields": [ "text", "body", "date", "client_age", "client_location", "label", "type_column", "client_segment", "claim_id", "claim_product", "claim_product_line" ], "training_data_file": "training.csv", "test_data_file": "test.csv", "federated_classification": { "field": "claim_product_line" } }
Delete a document classifier
Deletes an existing document classifier from the specified project.
Deletes an existing document classifier from the specified project.
Deletes an existing document classifier from the specified project.
Deletes an existing document classifier from the specified project.
Deletes an existing document classifier from the specified project.
DELETE /v2/projects/{project_id}/document_classifiers/{classifier_id}
ServiceCall<Void> deleteDocumentClassifier(DeleteDocumentClassifierOptions deleteDocumentClassifierOptions)
deleteDocumentClassifier(params)
delete_document_classifier(
self,
project_id: str,
classifier_id: str,
**kwargs,
) -> DetailedResponse
DeleteDocumentClassifier(string projectId, string classifierId)
Request
Use the DeleteDocumentClassifierOptions.Builder
to create a DeleteDocumentClassifierOptions
object that contains the parameter values for the deleteDocumentClassifier
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the classifier.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The deleteDocumentClassifier options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -X DELETE {auth} "{url}/v2/projects/{project_id}/document_classifiers/{classifier_id}?version=2023-03-31"
Response
Response type: object
Status Code
The document classifier was deleted successfully.
The document classifier cannot be deleted because the resulting classifier enrichment is applied to a collection. You must remove the enrichment from any collections that are using it before you attempt to delete the document classifier.
Missing project or classifier.
No Sample Response
List document classifier models
Get a list of the document classifier models in a project. Returns only the name and model ID of each document classifier model.
Get a list of the document classifier models in a project. Returns only the name and model ID of each document classifier model.
Get a list of the document classifier models in a project. Returns only the name and model ID of each document classifier model.
Get a list of the document classifier models in a project. Returns only the name and model ID of each document classifier model.
Get a list of the document classifier models in a project. Returns only the name and model ID of each document classifier model.
GET /v2/projects/{project_id}/document_classifiers/{classifier_id}/models
ServiceCall<DocumentClassifierModels> listDocumentClassifierModels(ListDocumentClassifierModelsOptions listDocumentClassifierModelsOptions)
listDocumentClassifierModels(params)
list_document_classifier_models(
self,
project_id: str,
classifier_id: str,
**kwargs,
) -> DetailedResponse
ListDocumentClassifierModels(string projectId, string classifierId)
Request
Use the ListDocumentClassifierModelsOptions.Builder
to create a ListDocumentClassifierModelsOptions
object that contains the parameter values for the listDocumentClassifierModels
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the classifier.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The listDocumentClassifierModels options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/document_classifiers/{classifier_id}/models?version=2023-03-31"
Response
An object that contains a list of document classifier model definitions.
An array of document classifier model definitions.
An object that contains a list of document classifier model definitions.
An array of document classifier model definitions.
- models
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- microAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- perClass
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
An object that contains a list of document classifier model definitions.
An array of document classifier model definitions.
- models
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- micro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- per_class
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
An object that contains a list of document classifier model definitions.
An array of document classifier model definitions.
- models
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- micro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- per_class
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
An object that contains a list of document classifier model definitions.
An array of document classifier model definitions.
- Models
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- Evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- MicroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- MacroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- PerClass
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Status Code
Returns a list of document classifier models.
Document classifier or project not found.
{ "models": [ { "name": "Model 1.0", "model_id": "031f168e-0716-f6fe-0000-017f84d080e1" } ] }
{ "models": [ { "name": "Model 1.0", "model_id": "031f168e-0716-f6fe-0000-017f84d080e1" } ] }
Create a document classifier model
Create a document classifier model by training a model that uses the data and classifier settings defined in the specified document classifier.
Note: This method is supported on installed intances (IBM Cloud Pak for Data) or IBM Cloud-managed Premium or Enterprise plan instances.
Create a document classifier model by training a model that uses the data and classifier settings defined in the specified document classifier.
Note: This method is supported on installed intances (IBM Cloud Pak for Data) or IBM Cloud-managed Premium or Enterprise plan instances.
Create a document classifier model by training a model that uses the data and classifier settings defined in the specified document classifier.
Note: This method is supported on installed intances (IBM Cloud Pak for Data) or IBM Cloud-managed Premium or Enterprise plan instances.
Create a document classifier model by training a model that uses the data and classifier settings defined in the specified document classifier.
Note: This method is supported on installed intances (IBM Cloud Pak for Data) or IBM Cloud-managed Premium or Enterprise plan instances.
Create a document classifier model by training a model that uses the data and classifier settings defined in the specified document classifier.
Note: This method is supported on installed intances (IBM Cloud Pak for Data) or IBM Cloud-managed Premium or Enterprise plan instances.
POST /v2/projects/{project_id}/document_classifiers/{classifier_id}/models
ServiceCall<DocumentClassifierModel> createDocumentClassifierModel(CreateDocumentClassifierModelOptions createDocumentClassifierModelOptions)
createDocumentClassifierModel(params)
create_document_classifier_model(
self,
project_id: str,
classifier_id: str,
name: str,
*,
description: str = None,
learning_rate: float = None,
l1_regularization_strengths: List[float] = None,
l2_regularization_strengths: List[float] = None,
training_max_steps: int = None,
improvement_ratio: float = None,
**kwargs,
) -> DetailedResponse
CreateDocumentClassifierModel(string projectId, string classifierId, string name, string description = null, double? learningRate = null, List<double?> l1RegularizationStrengths = null, List<double?> l2RegularizationStrengths = null, long? trainingMaxSteps = null, double? improvementRatio = null)
Request
Use the CreateDocumentClassifierModelOptions.Builder
to create a CreateDocumentClassifierModelOptions
object that contains the parameter values for the createDocumentClassifierModel
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the classifier.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
An object that contains the training configuration information for the document classifier to be trained.
The name of the document classifier model.
Possible values: 0 ≤ length ≤ 255
A description of the document classifier model.
A tuning parameter in an optimization algorithm that determines the step size at each iteration of the training process. It influences how much of any newly acquired information overrides the existing information, and therefore is said to represent the speed at which a machine learning model learns. The default value is
0.1
.Possible values: 0 ≤ value ≤ 1
Default:
0.1
Avoids overfitting by shrinking the coefficient of less important features to zero, which removes some features altogether. You can specify many values for hyper-parameter optimization. The default value is
[0.000001]
.Possible values: value ≥ 0
Default:
[0.000001]
A method you can apply to avoid overfitting your model on the training data. You can specify many values for hyper-parameter optimization. The default value is
[0.000001]
.Possible values: value ≥ 0
Default:
[0.000001]
Maximum number of training steps to complete. This setting is useful if you need the training process to finish in a specific time frame to fit into an automated process. The default value is ten million.
Possible values: value ≥ 0
Default:
10000000
Stops the training run early if the improvement ratio is not met by the time the process reaches a certain point. The default value is
0.00001
.Possible values: 0 ≤ value ≤ 1
Default:
0.00001
The createDocumentClassifierModel options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the document classifier model.
Possible values: 0 ≤ length ≤ 255
A description of the document classifier model.
A tuning parameter in an optimization algorithm that determines the step size at each iteration of the training process. It influences how much of any newly acquired information overrides the existing information, and therefore is said to represent the speed at which a machine learning model learns. The default value is
0.1
.Possible values: 0 ≤ value ≤ 1
Default:
0.1
Avoids overfitting by shrinking the coefficient of less important features to zero, which removes some features altogether. You can specify many values for hyper-parameter optimization. The default value is
[0.000001]
.Possible values: value ≥ 0
Default:
[1.0E-6]
A method you can apply to avoid overfitting your model on the training data. You can specify many values for hyper-parameter optimization. The default value is
[0.000001]
.Possible values: value ≥ 0
Default:
[1.0E-6]
Maximum number of training steps to complete. This setting is useful if you need the training process to finish in a specific time frame to fit into an automated process. The default value is ten million.
Possible values: value ≥ 0
Default:
10000000
Stops the training run early if the improvement ratio is not met by the time the process reaches a certain point. The default value is
0.00001
.Possible values: 0 ≤ value ≤ 1
Default:
0.000010
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the document classifier model.
Possible values: 0 ≤ length ≤ 255
A description of the document classifier model.
A tuning parameter in an optimization algorithm that determines the step size at each iteration of the training process. It influences how much of any newly acquired information overrides the existing information, and therefore is said to represent the speed at which a machine learning model learns. The default value is
0.1
.Possible values: 0 ≤ value ≤ 1
Default:
0.1
Avoids overfitting by shrinking the coefficient of less important features to zero, which removes some features altogether. You can specify many values for hyper-parameter optimization. The default value is
[0.000001]
.Possible values: value ≥ 0
Default:
[1.0E-6]
A method you can apply to avoid overfitting your model on the training data. You can specify many values for hyper-parameter optimization. The default value is
[0.000001]
.Possible values: value ≥ 0
Default:
[1.0E-6]
Maximum number of training steps to complete. This setting is useful if you need the training process to finish in a specific time frame to fit into an automated process. The default value is ten million.
Possible values: value ≥ 0
Default:
10000000
Stops the training run early if the improvement ratio is not met by the time the process reaches a certain point. The default value is
0.00001
.Possible values: 0 ≤ value ≤ 1
Default:
0.000010
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the document classifier model.
Possible values: 0 ≤ length ≤ 255
A description of the document classifier model.
A tuning parameter in an optimization algorithm that determines the step size at each iteration of the training process. It influences how much of any newly acquired information overrides the existing information, and therefore is said to represent the speed at which a machine learning model learns. The default value is
0.1
.Possible values: 0 ≤ value ≤ 1
Default:
0.1
Avoids overfitting by shrinking the coefficient of less important features to zero, which removes some features altogether. You can specify many values for hyper-parameter optimization. The default value is
[0.000001]
.Possible values: value ≥ 0
Default:
[1.0E-6]
A method you can apply to avoid overfitting your model on the training data. You can specify many values for hyper-parameter optimization. The default value is
[0.000001]
.Possible values: value ≥ 0
Default:
[1.0E-6]
Maximum number of training steps to complete. This setting is useful if you need the training process to finish in a specific time frame to fit into an automated process. The default value is ten million.
Possible values: value ≥ 0
Default:
10000000
Stops the training run early if the improvement ratio is not met by the time the process reaches a certain point. The default value is
0.00001
.Possible values: 0 ≤ value ≤ 1
Default:
0.000010
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The name of the document classifier model.
Possible values: 0 ≤ length ≤ 255
A description of the document classifier model.
A tuning parameter in an optimization algorithm that determines the step size at each iteration of the training process. It influences how much of any newly acquired information overrides the existing information, and therefore is said to represent the speed at which a machine learning model learns. The default value is
0.1
.Possible values: 0 ≤ value ≤ 1
Default:
0.1
Avoids overfitting by shrinking the coefficient of less important features to zero, which removes some features altogether. You can specify many values for hyper-parameter optimization. The default value is
[0.000001]
.Possible values: value ≥ 0
Default:
[1.0E-6]
A method you can apply to avoid overfitting your model on the training data. You can specify many values for hyper-parameter optimization. The default value is
[0.000001]
.Possible values: value ≥ 0
Default:
[1.0E-6]
Maximum number of training steps to complete. This setting is useful if you need the training process to finish in a specific time frame to fit into an automated process. The default value is ten million.
Possible values: value ≥ 0
Default:
10000000
Stops the training run early if the improvement ratio is not met by the time the process reaches a certain point. The default value is
0.00001
.Possible values: 0 ≤ value ≤ 1
Default:
0.000010
curl -X POST {auth} --header "Content-Type: application/json" --data "{ \"name\": \"Model 1.0\", \"description\": \"First model\", \"learning_rate\": 0.1, \"l1_regularization_strengths\": [ 0.000001 ], \"l2_regularization_strengths\": [ 0.000001 ], \"training_max_steps\": 10000000, \"improvement_ratio\": 0.00001 }" "{url}/v2/projects/{project_id}/document_classifiers/{classifier_id}/models?version=2023-03-31"
Response
Information about a document classifier model.
The Universally Unique Identifier (UUID) of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
The Universally Unique Identifier (UUID) of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- microAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- perClass
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- micro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- per_class
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- micro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- per_class
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- Evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- MicroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- MacroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- PerClass
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Status Code
Returns the document classifier details.
Bad request. A required parameter is null or invalid. Specific failure messages include:
- No request body (discovery_missing_model_config).
- Missing required request parameter or its value (classifier_missing_field_name).
Missing project or classifier.
{ "name": "Model 1.0", "model_id": "47477591-b520-6039-0000-017e9d20e634", "description": "First model", "created": "2022-01-27T20:01:25.952Z", "updated": "2022-01-27T20:01:26.094Z", "status": "training" }
{ "name": "Model 1.0", "model_id": "47477591-b520-6039-0000-017e9d20e634", "description": "First model", "created": "2022-01-27T20:01:25.952Z", "updated": "2022-01-27T20:01:26.094Z", "status": "training" }
Get a document classifier model
Get details about a specific document classifier model.
Get details about a specific document classifier model.
Get details about a specific document classifier model.
Get details about a specific document classifier model.
Get details about a specific document classifier model.
GET /v2/projects/{project_id}/document_classifiers/{classifier_id}/models/{model_id}
ServiceCall<DocumentClassifierModel> getDocumentClassifierModel(GetDocumentClassifierModelOptions getDocumentClassifierModelOptions)
getDocumentClassifierModel(params)
get_document_classifier_model(
self,
project_id: str,
classifier_id: str,
model_id: str,
**kwargs,
) -> DetailedResponse
GetDocumentClassifierModel(string projectId, string classifierId, string modelId)
Request
Use the GetDocumentClassifierModelOptions.Builder
to create a GetDocumentClassifierModelOptions
object that contains the parameter values for the getDocumentClassifierModel
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the classifier.
The Universally Unique Identifier (UUID) of the classifier model.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The getDocumentClassifierModel options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl {auth} "{url}/v2/projects/{project_id}/document_classifiers/{classifier_id}/models/{model_id}?version=2023-03-31"
Response
Information about a document classifier model.
The Universally Unique Identifier (UUID) of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
The Universally Unique Identifier (UUID) of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- microAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- perClass
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- micro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- per_class
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- micro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- per_class
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- Evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- MicroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- MacroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- PerClass
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Status Code
Returns details about a specific document classifier model.
Project or document classifier or document classifier model not found.
{ "name": "Model 1.0", "model_id": "47477591-b520-6039-0000-017e9d20e634", "description": "First model", "created": "2022-03-14T14:37:33.553Z", "updated": "2022-03-14T14:40:41.634Z", "training_data_file": "training.csv", "test_data_file": "test.csv", "status": "available", "evaluation": { "micro_average": { "precision": 0.3484848439693451, "recall": 0.7666666507720947, "f1": 0.4791666567325592 }, "macro_average": { "precision": 0.3279411792755127, "recall": 0.5714285969734192, "f1": 0.3811137080192566 }, "per_class": [ { "precision": 0.5, "recall": 1, "f1": 0.6666666865348816, "name": "expiration_date" }, { "precision": 0.25, "recall": 0.5, "f1": 0.3333333432674408, "name": "ingredient.allergy" }, { "precision": 0.29411765933036804, "recall": 1, "f1": 0.4545454680919647, "name": "amount.shortage" }, { "precision": 0.3529411852359772, "recall": 1, "f1": 0.52173912525177, "name": "package_container" }, { "precision": 1, "recall": 0.5, "f1": 0.6666666865348816, "name": "prank" }, { "precision": 0.4000000059604645, "recall": 1, "f1": 0.5714285969734192, "name": "package_container.leak" }, { "precision": 0.5, "recall": 1, "f1": 0.6666666865348816, "name": "change_of_properties" }, { "precision": 0, "recall": 0, "f1": 0, "name": "contamination_tampering" }, { "precision": 1, "recall": 1, "f1": 1, "name": "package_container.dirt" }, { "precision": 0.29411765933036804, "recall": 1, "f1": 0.4545454680919647, "name": "other" } ] }, "enrichment_id": "b9a35e7b-5073-1d0b-0000-017f89122c9d", "deployed_at": "2022-03-14T14:40:40.572Z" }
{ "name": "Model 1.0", "model_id": "47477591-b520-6039-0000-017e9d20e634", "description": "First model", "created": "2022-03-14T14:37:33.553Z", "updated": "2022-03-14T14:40:41.634Z", "training_data_file": "training.csv", "test_data_file": "test.csv", "status": "available", "evaluation": { "micro_average": { "precision": 0.3484848439693451, "recall": 0.7666666507720947, "f1": 0.4791666567325592 }, "macro_average": { "precision": 0.3279411792755127, "recall": 0.5714285969734192, "f1": 0.3811137080192566 }, "per_class": [ { "precision": 0.5, "recall": 1, "f1": 0.6666666865348816, "name": "expiration_date" }, { "precision": 0.25, "recall": 0.5, "f1": 0.3333333432674408, "name": "ingredient.allergy" }, { "precision": 0.29411765933036804, "recall": 1, "f1": 0.4545454680919647, "name": "amount.shortage" }, { "precision": 0.3529411852359772, "recall": 1, "f1": 0.52173912525177, "name": "package_container" }, { "precision": 1, "recall": 0.5, "f1": 0.6666666865348816, "name": "prank" }, { "precision": 0.4000000059604645, "recall": 1, "f1": 0.5714285969734192, "name": "package_container.leak" }, { "precision": 0.5, "recall": 1, "f1": 0.6666666865348816, "name": "change_of_properties" }, { "precision": 0, "recall": 0, "f1": 0, "name": "contamination_tampering" }, { "precision": 1, "recall": 1, "f1": 1, "name": "package_container.dirt" }, { "precision": 0.29411765933036804, "recall": 1, "f1": 0.4545454680919647, "name": "other" } ] }, "enrichment_id": "b9a35e7b-5073-1d0b-0000-017f89122c9d", "deployed_at": "2022-03-14T14:40:40.572Z" }
Update a document classifier model
Update the document classifier model name or description.
Update the document classifier model name or description.
Update the document classifier model name or description.
Update the document classifier model name or description.
Update the document classifier model name or description.
POST /v2/projects/{project_id}/document_classifiers/{classifier_id}/models/{model_id}
ServiceCall<DocumentClassifierModel> updateDocumentClassifierModel(UpdateDocumentClassifierModelOptions updateDocumentClassifierModelOptions)
updateDocumentClassifierModel(params)
update_document_classifier_model(
self,
project_id: str,
classifier_id: str,
model_id: str,
*,
name: str = None,
description: str = None,
**kwargs,
) -> DetailedResponse
UpdateDocumentClassifierModel(string projectId, string classifierId, string modelId, string name = null, string description = null)
Request
Use the UpdateDocumentClassifierModelOptions.Builder
to create a UpdateDocumentClassifierModelOptions
object that contains the parameter values for the updateDocumentClassifierModel
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the classifier.
The Universally Unique Identifier (UUID) of the classifier model.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
An object that lists a new name or description for a document classifier model.
A new name for the enrichment.
Possible values: length ≤ 255
A new description for the enrichment.
The updateDocumentClassifierModel options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A new name for the enrichment.
Possible values: length ≤ 255
A new description for the enrichment.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A new name for the enrichment.
Possible values: length ≤ 255
A new description for the enrichment.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A new name for the enrichment.
Possible values: length ≤ 255
A new description for the enrichment.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
A new name for the enrichment.
Possible values: length ≤ 255
A new description for the enrichment.
curl -X POST {auth} --header "Content-Type: application/json" --data "{ \"name\": \"Renamed model\", \"description\": \"Updated model\" }" "{url}/v2/projects/{project_id}/document_classifiers/{classifier_id}/models/{model_id}?version=2023-03-31"
Response
Information about a document classifier model.
The Universally Unique Identifier (UUID) of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
The Universally Unique Identifier (UUID) of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- microAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- perClass
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- micro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- per_class
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- micro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- macro_average
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- per_class
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Information about a document classifier model.
A unique identifier of the document classifier model.
A human-readable name of the document classifier model.
Possible values: length ≤ 255
A description of the document classifier model.
The date that the document classifier model was created.
The date that the document classifier model was last updated.
Name of the CSV file that contains the training data that is used to train the document classifier model.
Name of the CSV file that contains data that is used to test the document classifier model. If no test data is provided, a subset of the training data is used for testing purposes.
The status of the training run.
Possible values: [
training
,available
,failed
]An object that contains information about a trained document classifier model.
- Evaluation
A micro-average aggregates the contributions of all classes to compute the average metric. Classes refers to the classification labels that are specified in the answer_field.
- MicroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A macro-average computes metric independently for each class and then takes the average. Class refers to the classification label that is specified in the answer_field.
- MacroAverage
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
An array of evaluation metrics, one set of metrics for each class, where class refers to the classification label that is specified in the answer_field.
- PerClass
Class name. Each class name is derived from a value in the answer_field.
A metric that measures how many of the overall documents are classified correctly.
Possible values: 0 ≤ value ≤ 1
A metric that measures how often documents that should be classified into certain classes are classified into those classes.
Possible values: 0 ≤ value ≤ 1
A metric that measures whether the optimal balance between precision and recall is reached. The F1 score can be interpreted as a weighted average of the precision and recall values. An F1 score reaches its best value at 1 and worst value at 0.
Possible values: 0 ≤ value ≤ 1
A unique identifier of the enrichment that is generated by this document classifier model.
The date that the document classifier model was deployed.
Status Code
Returns the updated document classifier model details.
No request body.
Project or document classifier or document classifier model not found.
{ "name": "Renamed model", "model_id": "47477591-b520-6039-0000-017e9d20e634", "description": "Updated model", "created": "2022-03-14T14:37:33.553Z", "updated": "2022-03-14T20:59:58.685Z", "training_data_file": "training.csv", "test_data_file": "test.csv", "status": "available", "evaluation": { "micro_average": { "precision": 0.3484848439693451, "recall": 0.7666666507720947, "f1": 0.4791666567325592 }, "macro_average": { "precision": 0.3279411792755127, "recall": 0.5714285969734192, "f1": 0.3811137080192566 }, "per_class": [ { "precision": 0.5, "recall": 1, "f1": 0.6666666865348816, "name": "expiration_date" }, { "precision": 0.25, "recall": 0.5, "f1": 0.3333333432674408, "name": "ingredient.allergy" }, { "precision": 0.29411765933036804, "recall": 1, "f1": 0.4545454680919647, "name": "amount.shortage" }, { "precision": 0.3529411852359772, "recall": 1, "f1": 0.52173912525177, "name": "package_container" }, { "precision": 1, "recall": 0.5, "f1": 0.6666666865348816, "name": "prank" }, { "precision": 0.4000000059604645, "recall": 1, "f1": 0.5714285969734192, "name": "package_container.leak" }, { "precision": 0.5, "recall": 1, "f1": 0.6666666865348816, "name": "change_of_properties" }, { "precision": 0, "recall": 0, "f1": 0, "name": "contamination_tampering" }, { "precision": 1, "recall": 1, "f1": 1, "name": "package_container.dirt" }, { "precision": 0.29411765933036804, "recall": 1, "f1": 0.4545454680919647, "name": "other" } ] }, "enrichment_id": "b9a35e7b-5073-1d0b-0000-017f89122c9d", "deployed_at": "2022-03-14T14:40:40.572Z" }
{ "name": "Renamed model", "model_id": "47477591-b520-6039-0000-017e9d20e634", "description": "Updated model", "created": "2022-03-14T14:37:33.553Z", "updated": "2022-03-14T20:59:58.685Z", "training_data_file": "training.csv", "test_data_file": "test.csv", "status": "available", "evaluation": { "micro_average": { "precision": 0.3484848439693451, "recall": 0.7666666507720947, "f1": 0.4791666567325592 }, "macro_average": { "precision": 0.3279411792755127, "recall": 0.5714285969734192, "f1": 0.3811137080192566 }, "per_class": [ { "precision": 0.5, "recall": 1, "f1": 0.6666666865348816, "name": "expiration_date" }, { "precision": 0.25, "recall": 0.5, "f1": 0.3333333432674408, "name": "ingredient.allergy" }, { "precision": 0.29411765933036804, "recall": 1, "f1": 0.4545454680919647, "name": "amount.shortage" }, { "precision": 0.3529411852359772, "recall": 1, "f1": 0.52173912525177, "name": "package_container" }, { "precision": 1, "recall": 0.5, "f1": 0.6666666865348816, "name": "prank" }, { "precision": 0.4000000059604645, "recall": 1, "f1": 0.5714285969734192, "name": "package_container.leak" }, { "precision": 0.5, "recall": 1, "f1": 0.6666666865348816, "name": "change_of_properties" }, { "precision": 0, "recall": 0, "f1": 0, "name": "contamination_tampering" }, { "precision": 1, "recall": 1, "f1": 1, "name": "package_container.dirt" }, { "precision": 0.29411765933036804, "recall": 1, "f1": 0.4545454680919647, "name": "other" } ] }, "enrichment_id": "b9a35e7b-5073-1d0b-0000-017f89122c9d", "deployed_at": "2022-03-14T14:40:40.572Z" }
Delete a document classifier model
Deletes an existing document classifier model from the specified project.
Deletes an existing document classifier model from the specified project.
Deletes an existing document classifier model from the specified project.
Deletes an existing document classifier model from the specified project.
Deletes an existing document classifier model from the specified project.
DELETE /v2/projects/{project_id}/document_classifiers/{classifier_id}/models/{model_id}
ServiceCall<Void> deleteDocumentClassifierModel(DeleteDocumentClassifierModelOptions deleteDocumentClassifierModelOptions)
deleteDocumentClassifierModel(params)
delete_document_classifier_model(
self,
project_id: str,
classifier_id: str,
model_id: str,
**kwargs,
) -> DetailedResponse
DeleteDocumentClassifierModel(string projectId, string classifierId, string modelId)
Request
Use the DeleteDocumentClassifierModelOptions.Builder
to create a DeleteDocumentClassifierModelOptions
object that contains the parameter values for the deleteDocumentClassifierModel
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the classifier.
The Universally Unique Identifier (UUID) of the classifier model.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
The deleteDocumentClassifierModel options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the classifier model.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
curl -X DELETE {auth} "{url}/v2/projects/{project_id}/document_classifiers/{classifier_id}/models/{model_id}?version=2023-03-31"
Response
Response type: object
Status Code
The document classifier model was deleted successfully.
The document classifier model cannot be deleted because the resulting classifier enrichment is applied to a collection. You must remove the enrichment from any collections that are using it before you attempt to delete the model.
Project or document classifier or document classifier model not found.
No Sample Response
Analyze a document
Process a document and return it for realtime use. Supports JSON files only.
The file is not stored in the collection, but is processed according to the collection's configuration settings. To get results, enrichments must be applied to a field in the collection that also exists in the file that you want to analyze. For example, to analyze text in a Quote
field, you must apply enrichments to the Quote
field in the collection configuration. Then, when you analyze the file, the text in the Quote
field is analyzed and results are written to a field named enriched_Quote
.
Submit a request against only one collection at a time. Remember, the documents in the collection are not significant. It is the enrichments that are defined for the collection that matter. If you submit requests to several collections, then several models are initiated at the same time, which can cause request failures.
Note: This method is supported with Enterprise plan deployments and installed deployments only.
Process a document and return it for realtime use. Supports JSON files only.
The file is not stored in the collection, but is processed according to the collection's configuration settings. To get results, enrichments must be applied to a field in the collection that also exists in the file that you want to analyze. For example, to analyze text in a Quote
field, you must apply enrichments to the Quote
field in the collection configuration. Then, when you analyze the file, the text in the Quote
field is analyzed and results are written to a field named enriched_Quote
.
Submit a request against only one collection at a time. Remember, the documents in the collection are not significant. It is the enrichments that are defined for the collection that matter. If you submit requests to several collections, then several models are initiated at the same time, which can cause request failures.
Note: This method is supported with Enterprise plan deployments and installed deployments only.
Process a document and return it for realtime use. Supports JSON files only.
The file is not stored in the collection, but is processed according to the collection's configuration settings. To get results, enrichments must be applied to a field in the collection that also exists in the file that you want to analyze. For example, to analyze text in a Quote
field, you must apply enrichments to the Quote
field in the collection configuration. Then, when you analyze the file, the text in the Quote
field is analyzed and results are written to a field named enriched_Quote
.
Submit a request against only one collection at a time. Remember, the documents in the collection are not significant. It is the enrichments that are defined for the collection that matter. If you submit requests to several collections, then several models are initiated at the same time, which can cause request failures.
Note: This method is supported with Enterprise plan deployments and installed deployments only.
Process a document and return it for realtime use. Supports JSON files only.
The file is not stored in the collection, but is processed according to the collection's configuration settings. To get results, enrichments must be applied to a field in the collection that also exists in the file that you want to analyze. For example, to analyze text in a Quote
field, you must apply enrichments to the Quote
field in the collection configuration. Then, when you analyze the file, the text in the Quote
field is analyzed and results are written to a field named enriched_Quote
.
Submit a request against only one collection at a time. Remember, the documents in the collection are not significant. It is the enrichments that are defined for the collection that matter. If you submit requests to several collections, then several models are initiated at the same time, which can cause request failures.
Note: This method is supported with Enterprise plan deployments and installed deployments only.
Process a document and return it for realtime use. Supports JSON files only.
The file is not stored in the collection, but is processed according to the collection's configuration settings. To get results, enrichments must be applied to a field in the collection that also exists in the file that you want to analyze. For example, to analyze text in a Quote
field, you must apply enrichments to the Quote
field in the collection configuration. Then, when you analyze the file, the text in the Quote
field is analyzed and results are written to a field named enriched_Quote
.
Submit a request against only one collection at a time. Remember, the documents in the collection are not significant. It is the enrichments that are defined for the collection that matter. If you submit requests to several collections, then several models are initiated at the same time, which can cause request failures.
Note: This method is supported with Enterprise plan deployments and installed deployments only.
POST /v2/projects/{project_id}/collections/{collection_id}/analyze
ServiceCall<AnalyzedDocument> analyzeDocument(AnalyzeDocumentOptions analyzeDocumentOptions)
analyzeDocument(params)
analyze_document(
self,
project_id: str,
collection_id: str,
*,
file: BinaryIO = None,
filename: str = None,
file_content_type: str = None,
metadata: str = None,
**kwargs,
) -> DetailedResponse
AnalyzeDocument(string projectId, string collectionId, System.IO.MemoryStream file = null, string filename = null, string fileContentType = null, string metadata = null)
Request
Use the AnalyzeDocumentOptions.Builder
to create a AnalyzeDocumentOptions
object that contains the parameter values for the analyzeDocument
method.
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The Universally Unique Identifier (UUID) of the collection.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
Form Parameters
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }
The analyzeDocument options.
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file. Values for this parameter can be obtained from the HttpMediaType class.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
parameters
The ID of the project. This information can be found from the Integrate and Deploy page in Discovery.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
The ID of the collection.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
/^[a-zA-Z0-9_-]*$/
Add a document: The content of the document to ingest. For the supported file types and maximum supported file size limits when adding a document, see the documentation.
Analyze a document: The content of the document to analyze but not ingest. Only the
application/json
content type is supported by the Analyze API. For maximum supported file size limits, see the product documentation.The filename for file.
The content type of file.
Allowable values: [
application/json
,application/msword
,application/vnd.openxmlformats-officedocument.wordprocessingml.document
,application/pdf
,text/html
,application/xhtml+xml
]Add information about the file that you want to include in the response.
The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected.
Example:
{ "filename": "favorites2.json", "file_type": "json" }.
curl -X POST {auth} --header "Content-Type: multipart/form-data" --form metadata="{\"filename\": \"favorites2.json\", \"file_type\": \"json\"}" --form "file=@favorites2.json;type=application/json" "{url}/v2/projects/{project_id}/collections/{collection_id}/analyze?version=2023-03-31"
Download example document favorites2.json
Response
An object that contains the converted document and any identified enrichments. Root-level fields from the original file are returned also.
Array of notices that are triggered when the files are processed.
Result of the document analysis.
- result
The remaining key-value pairs
An object that contains the converted document and any identified enrichments. Root-level fields from the original file are returned also.
Array of notices that are triggered when the files are processed.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Result of the document analysis.
- result
Metadata that was specified with the request.
An object that contains the converted document and any identified enrichments. Root-level fields from the original file are returned also.
Array of notices that are triggered when the files are processed.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Result of the document analysis.
- result
Metadata that was specified with the request.
An object that contains the converted document and any identified enrichments. Root-level fields from the original file are returned also.
Array of notices that are triggered when the files are processed.
- notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Result of the document analysis.
- result
Metadata that was specified with the request.
An object that contains the converted document and any identified enrichments. Root-level fields from the original file are returned also.
Array of notices that are triggered when the files are processed.
- Notices
Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include:
index_failed
,index_failed_too_many_requests
,index_failed_incompatible_field
,index_failed_cluster_unavailable
,ingestion_timeout
,ingestion_error
,bad_request
,internal_error
,missing_model
,unsupported_model
,smart_document_understanding_failed_incompatible_field
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_internal_error
,smart_document_understanding_failed_warning
,smart_document_understanding_page_error
,smart_document_understanding_page_warning
. Note: This is not a complete list. Other values might be returned.The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.
Unique identifier of the document.
Unique identifier of the collection.
Unique identifier of the query used for relevance training.
Severity level of the notice.
Possible values: [
warning
,error
]Ingestion or training step in which the notice occurred.
The description of the notice.
Result of the document analysis.
- Result
Metadata that was specified with the request.
Status Code
The analyzed document.
Bad request.
Collection not supported for Analyze.
Project or collection not found.
Analyze timeout.
Document or metadata too large.
Unsupported media type.
Too many requests, try again later.
{ "result": { "enriched_Quote": [ { "keywords": [ { "text": "day", "mentions": [ { "text": "day", "location": { "begin": 10, "end": 13 } } ], "relevance": 0.673739 }, { "text": "stranger", "mentions": [ { "text": "stranger", "location": { "begin": 28, "end": 36 } } ], "relevance": 0.596757 }, { "text": "parents", "mentions": [ { "text": "parents", "location": { "begin": 52, "end": 59 } } ], "relevance": 0.568336 }, { "text": "mother", "mentions": [ { "text": "mother", "location": { "begin": 66, "end": 72 } } ], "relevance": 0.755562 }, { "text": "Mr. Collins", "mentions": [ { "text": "Mr. Collins", "location": { "begin": 118, "end": 129 } } ], "relevance": 0.945891 } ], "entities": [ { "text": "one", "type": "Number", "mentions": [ { "text": "one", "confidence": 0.8, "location": { "begin": 40, "end": 43 } } ], "model_name": "natural_language_understanding" }, { "text": "Mr. Collins", "type": "Person", "mentions": [ { "text": "Mr. Collins", "confidence": 0.89255315, "location": { "begin": 118, "end": 129 } } ], "model_name": "natural_language_understanding" } ] } ], "url": "https://www.gutenberg.org/files/1342/1342-h/1342-h.htm#link2HCH0020", "Subject": "Parental love", "Year": "1813/01/01", "Book": "Pride and Prejudice", "Author": "Jane Austen", "Quote": [ "From this day you must be a stranger to one of your parents. Your mother will never see you again if you do not marry Mr. Collins, and I will never see you again if you do." ], "metadata": { "filename": "favorites2.json", "file_type": "json" }, "Speaker": "Mr. Bennett" }, "notices": [] }
{ "result": { "enriched_Quote": [ { "keywords": [ { "text": "day", "mentions": [ { "text": "day", "location": { "begin": 10, "end": 13 } } ], "relevance": 0.673739 }, { "text": "stranger", "mentions": [ { "text": "stranger", "location": { "begin": 28, "end": 36 } } ], "relevance": 0.596757 }, { "text": "parents", "mentions": [ { "text": "parents", "location": { "begin": 52, "end": 59 } } ], "relevance": 0.568336 }, { "text": "mother", "mentions": [ { "text": "mother", "location": { "begin": 66, "end": 72 } } ], "relevance": 0.755562 }, { "text": "Mr. Collins", "mentions": [ { "text": "Mr. Collins", "location": { "begin": 118, "end": 129 } } ], "relevance": 0.945891 } ], "entities": [ { "text": "one", "type": "Number", "mentions": [ { "text": "one", "confidence": 0.8, "location": { "begin": 40, "end": 43 } } ], "model_name": "natural_language_understanding" }, { "text": "Mr. Collins", "type": "Person", "mentions": [ { "text": "Mr. Collins", "confidence": 0.89255315, "location": { "begin": 118, "end": 129 } } ], "model_name": "natural_language_understanding" } ] } ], "url": "https://www.gutenberg.org/files/1342/1342-h/1342-h.htm#link2HCH0020", "Subject": "Parental love", "Year": "1813/01/01", "Book": "Pride and Prejudice", "Author": "Jane Austen", "Quote": [ "From this day you must be a stranger to one of your parents. Your mother will never see you again if you do not marry Mr. Collins, and I will never see you again if you do." ], "metadata": { "filename": "favorites2.json", "file_type": "json" }, "Speaker": "Mr. Bennett" }, "notices": [] }
Delete labeled data
Deletes all data associated with a specified customer ID. The method has no effect if no data is associated with the customer ID.
You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes data. For more information about personal data and customer IDs, see Information security.
Note: This method is only supported on IBM Cloud instances of Discovery.
Deletes all data associated with a specified customer ID. The method has no effect if no data is associated with the customer ID.
You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes data. For more information about personal data and customer IDs, see Information security.
Note: This method is only supported on IBM Cloud instances of Discovery.
Deletes all data associated with a specified customer ID. The method has no effect if no data is associated with the customer ID.
You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes data. For more information about personal data and customer IDs, see Information security.
Note: This method is only supported on IBM Cloud instances of Discovery.
Deletes all data associated with a specified customer ID. The method has no effect if no data is associated with the customer ID.
You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes data. For more information about personal data and customer IDs, see Information security.
Note: This method is only supported on IBM Cloud instances of Discovery.
Deletes all data associated with a specified customer ID. The method has no effect if no data is associated with the customer ID.
You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes data. For more information about personal data and customer IDs, see Information security.
Note: This method is only supported on IBM Cloud instances of Discovery.
DELETE /v2/user_data
ServiceCall<Void> deleteUserData(DeleteUserDataOptions deleteUserDataOptions)
deleteUserData(params)
delete_user_data(
self,
customer_id: str,
**kwargs,
) -> DetailedResponse
DeleteUserData(string customerId)
Request
Use the DeleteUserDataOptions.Builder
to create a DeleteUserDataOptions
object that contains the parameter values for the deleteUserData
method.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.The customer ID for which all data is to be deleted.
The deleteUserData options.
The customer ID for which all data is to be deleted.
parameters
The customer ID for which all data is to be deleted.
parameters
The customer ID for which all data is to be deleted.
parameters
The customer ID for which all data is to be deleted.
curl -X DELETE {auth} "{url}/v2/user_data?customer_id={id}&version=2023-03-31"
List curations
Lists the currently configured curation queries and the associated curated responses. The curations API methods are beta functionality. Beta features are not supported by the SDKs.
GET /v2/projects/{project_id}/curations
Request
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
curl {auth} "{url}/v2/projects/{project_id}/curations?version=2023-03-31"
Response
Array of queries with curated responses for the specified project.
The project ID of the project that contains these curations.
Array of curated queries and responses.
Status Code
List of curations associated with the specified project.
Specified project not found.
{ "curations": [ { "curation_id": "a3dd3f517283be53c7a05013213450d960635334", "natural_language_query": "What types of data sources are supported", "curated_results": [ { "collection_id": "47477591-b520-6039-0000-017ea213e837", "document_id": "web_crawl_b43bc2d2-9445-51b1-ab1d-57459c7446ce" } ] }, { "curation_id": "c1175536f509405bc68a9f76235fa7bbb6f9af2f", "natural_language_query": "What is a project", "curated_results": [ { "collection_id": "47477591-b520-6039-0000-017ea213e837", "document_id": "web_crawl_123a2a56-8c26-5acb-9544-c4702ac899a4", "snippet": "A project is a convenient way to collect and manage the resources in your application. You can assign a project type and connect your data to the project by creating a collection." } ] } ] }
Create curation
Add a new curated query and specify result documents. Curations with a total of 5,000 queries and 50,000 documents are allowed per project.
Note: The curations API methods are beta functionality. Beta features are not supported by the SDKs.
POST /v2/projects/{project_id}/curations
Request
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
Natural language query to curate and array of results to return when the query is specified.
The curated natural language query.
Array of curated results.
curl -X POST {auth} --header "Content-Type: application/json" --data "{ \"natural_language_query\": \"What is a project\", \"curated_results\": [{ \"document_id\": \"web_crawl_123a2a56-8c26-5acb-9544-c4702ac899a4\", \"collection_id\": \"47477591-b520-6039-0000-017ea213e837\", \"snippet\": \"A project is a convenient way to collect and manage the resources in your application. You can assign a project type and connect your data to the project by creating a collection.\" }] }" "{url}/v2/projects/{project_id}/curations?version=2023-03-31"
Response
Curated query and responses.
The curation ID of this curation.
The curated natural language query.
Array of curated results.
Status Code
Curation that has been created.
Specified natural language query already exists.
Specified project, collection, or document not found.
{ "curation_id": "c1175536f509405bc68a9f76235fa7bbb6f9af2f", "natural_language_query": "What is a project", "curated_results": [ { "collection_id": "47477591-b520-6039-0000-017ea213e837", "document_id": "web_crawl_123a2a56-8c26-5acb-9544-c4702ac899a4", "snippet": "A project is a convenient way to collect and manage the resources in your application. You can assign a project type and connect your data to the project by creating a collection." } ] }
Get curation
Gets details about the specified curation. The curations API methods are beta functionality. Beta features are not supported by the SDKs.
GET /v2/projects/{project_id}/curations/{curation_id}
Request
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The ID of the curation.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
curl {auth} "{url}/v2/projects/3d8e91bc-a277-41b5-aa1e-c39327bc15bf/curations/a3dd3f517283be53c7a05013213450d960635334?version=2023-03-31"
Response
Object that contains an array of curated results.
Array of curated results.
Status Code
Object that contains an array of curated results for the specified curation id.
Specified project or curation ID not found.
{ "curated_results": [ { "collection_id": "47477591-b520-6039-0000-017ea213e837", "document_id": "web_crawl_b43bc2d2-9445-51b1-ab1d-57459c7446ce" } ] }
Delete curation
Deletes the specified curation. The curations API methods are beta functionality. Beta features are not supported by the SDKs.
DELETE /v2/projects/{project_id}/curations/{curation_id}
Request
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The ID of the curation.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
curl -X DELETE {auth} "{url}/v2/projects/3d8e91bc-a277-41b5-aa1e-c39327bc15bf/curations/c1175536f509405bc68a9f76235fa7bbb6f9af2f?version=2023-03-31"
Update curation results
Update an existing curated results documents for the specified query. For example, you can enhance the curated result by adding a text snippet to it. The curations API methods are beta functionality. Beta features are not supported by the SDKs.
POST /v2/projects/{project_id}/curations/{curation_id}/curated_results
Request
Path Parameters
The Universally Unique Identifier (UUID) of the project. This information can be found from the Integrate and Deploy page in Discovery.
The ID of the curation.
Possible values: 1 ≤ length ≤ 255, Value must match regular expression
^[a-zA-Z0-9_-]*$
Query Parameters
Release date of the version of the API you want to use. Specify dates in YYYY-MM-DD format. The current version is
2023-03-31
.
Result to add to the specified curated query.
The document ID of the curated result.
The collection ID of the curated result.
Text to return in the
passage_text
field when this curated document is returned for the specified natural language query. If passages.per_document istrue
, the text snippet that you specify is returned as the top passage instead of the original passage that is chosen by search. Only one text snippet can be specified per document. If passages.max_per_document is greater than1
, the snippet is returned first, followed by the passages that are chosen by search.Possible values: 0 ≤ length ≤ 2000
curl -X POST {auth} --data "{ \"document_id\": \"web_crawl_b43bc2d2-9445-51b1-ab1d-57459c7446ce\", \"collection_id\": \"47477591-b520-6039-0000-017ea213e837\", \"snippet\": \"Watson Discovery can get data from many popular third-party data repositories. See the \"Supported data sources\" table for more details.\"}" "{url}/v2/projects/3d8e91bc-a277-41b5-aa1e-c39327bc15bf/curations/a3dd3f517283be53c7a05013213450d960635334/curated_results?version=2023-03-31"
Response
Result information for a curated query.
The document ID of the curated result.
The collection ID of the curated result.
Text to return in the
passage_text
field when this curated document is returned for the specified natural language query. If passages.per_document istrue
, the text snippet that you specify is returned as the top passage instead of the original passage that is chosen by search. Only one text snippet can be specified per document. If passages.max_per_document is greater than1
, the snippet is returned first, followed by the passages that are chosen by search.Possible values: 0 ≤ length ≤ 2000
Status Code
Result has been added to the curation.
Specified document ID already exists in this curation.
Specified project, collection or document not found.
{ "collection_id": "47477591-b520-6039-0000-017ea213e837", "document_id": "web_crawl_b43bc2d2-9445-51b1-ab1d-57459c7446ce", "snippet": "Watson Discovery can get data from many popular third-party data repositories. See the \"Supported data sources\" table for more details." }