Analyzing documents based on their structure
Create a model that understands the content of a document based on the document's format and structure.
First, decide whether you want to use a pretrained model or define your own.
- Pretrained model
-
Applies a noncustomizable model that extracts text and identifies tables, lists, and sections.
Instead of training the model yourself, you can apply an existing model that is trained to identify tables, lists, and sections in various types of documents.
If capturing information from tables is critical to your use case, consider using a pretrained model.
For more information, see Apply a pretrained SDU model.
Superscripts and subscripts are only identified when the collection is using the pretrained model. It is not supported in collections that use the custom model.
- User-trained model
-
Opens the Smart Document Understanding tool that you can use to pick certain types of text to store in fields other than the
textfield.When you label a section of a document as a custom field, later you can apply enrichments to the field or split your documents on each occurrence of the field. You can search or filter by the field, or omit the field from the index.
For more information, see Define a user-trained SDU model.
- Text extraction only
-
Indexes any text that is recognized in the source documents in the
textfield. This option is used by default.