Daily index limit considerations for IBM Cloud Logs Priority insights

The Priority insights feature in IBM® Cloud Logs parses log records into typed fields and stores them in an index to provide reduced latency and very fast queries. However, Priority insights limits the number of fields that can be stored in the daily index of an IBM Cloud Logs service instance.

When you reach the index field limit, new fields in log records are no longer indexed. When you use filters or queries in Priority insights on the Logs page on fields that are not indexed, log records with non-index fields will not be returned. In this case, you might have the impression that log records are missing. In fact, log records are there, but are not found by searches in Priority insights.

This topic explains:

How log records are parsed into fields.
What is a field-based search.
What is the daily index.
What happens when the daily index field limit is reached.
How to determine whether the daily index field limit is reached.
How to determine whether a field is in the daily index.
What you can do to stay below the daily index limit.

The Store and search feature in IBM Cloud Logs works differently than Priority insights. Store and search has no indexing limits. If you are missing log records in Priority insights queries, use the same query in All Logs on the Logs page or submit it on the Archive query page. The Store and search feature requires that you have connected an IBM Cloud Object Storage (COS) bucket to your IBM Cloud Logs service instance.

Parsing log records into fields

Log records sent to IBM Cloud Logs can be text in many different formats: JSON, logfmt, Extended Log File Format, Syslog, Common Log Format, HAProxy log formats, and so on or any text that does not follow a known format.

If a log record is in JSON format, its JSON members are parsed into named fields together with their contents. For example: { "key": "value" } is parsed into a field named key with content value.

If a log record is not in JSON format, or mixed with non-JSON data such as a timestamp at the beginning of the log record, parsing rules can be used in IBM Cloud Logs to transform the log record into JSON format that can be parsed. Parsing rules can also be used to extract portions of the log record into named fields.

If a log record is in proper JSON format, the log record is parsed into named fields with its contents and the fields are indexed. Parsing rules can be applied to fix improper JSON format or to extract log record content into named fields with its contents. After the parsing rules are applied, the log record is indexed.

When a log record is stored in Priority insights, it is also stored in Store and search if a IBM Cloud Logs COS data bucket is connected. That is, the log record is stored in two places in parallel.

What happens when Priority insights processes log records?

Priority insights keeps track of all known fields in its index. When a log record is processed, Priority insights checks whether the field names in the log record are already known. For an unknown field, a field mapping is added to the index that determines the field type plus other instructions about how to index the field.

After unknown fields have been added to the index, field values from the log record are indexed for fast field-based and full-text search. In addition, the full log record is stored.

Label and metadata fields

Log records in IBM Cloud Logs have label and metadata fields:

Application or Subsystem are log record label fields. In Lucene queries, these fields need to be prefixed with coralogix.. In DataPrime queries, these fields need to be prefixed with $l..
Timestamp, Severity, and priorityclass are log record metadata fields. In Lucene queries, these fields need to be prefixed with coralogix.. In DataPrime queries, these fields need to be prefixed with $m..

Label and metadata fields are received by IBM Cloud Logs together with the log record data or are populated by IBM Cloud Logs. All label and metadata fields are known fields in the Priority insights index. IBM Cloud Logs users can not add label and metadata fields to the index or remove them.

For more information, see:

Full-text and field-based search

If you don't restrict queries in Priority insights to a searching a field, the search will be made on all indexed log record fields, also known as full-text query:

Lucene queries: Don't prefix a query term with <field> when searching on indexed log record fields. This includes log record label and metadata fields. For example:
```
"application unvailable"
```
DataPrime queries: Use a query similar to source logs | filter $d ~~ '<text>' when searching on indexed log record data fields. Searching log record data, label, and metadata fields at the same time requires a more complex query.

You can use field-based search and filters in Priority insights to search only specified indexed log record fields:

Filters: On the Filter pane, you can select field values from a list of detected values. Only log records that match the selected field values will be returned. By default, the Application, Subsystem, and Severity fields are available for filtering. Other fields detected in log records can be added to define filters.
Queries: Lucene and DataPrime queries can be used to search for only specified indexed fields.
- Lucene: Use the <field>:<query term> syntax to apply the specified query term only to the specified field. For example:
```
message: "application unvailable" 
```
- DataPrime: Use field accessors $d.<data field>, $l.<label field>, or $m.<metadata field> to query the specified field. For example:
```
filter $d.message ~ 'application unavailable'
```

Use field-based searches whenever possible to optimize your searches and reduce overall query time.

For more information, see:

Daily indices

Each IBM Cloud Logs service instance has a separate set of Priority insights indices. The first index in the set is created when the service instance is created. Once a day, at 00:00am UTC, a new index is created and added to the set. At the same time, the oldest index is removed if it has reached its expiration date.

A new daily index only contains the known label and metadata fields. These already count towards the daily index limit. Every day, all other fields in the index are learnt anew from the processed log records. As a result, fields in daily indices can differ if processed log records differ.

Daily index limits

Priority insights limits the number of fields that can be stored in the daily index of an IBM Cloud Logs service instance.

Once the daily index field limit is reached, the following will happen when Priority insights processes log records:

Unknown fields are no longer added to the daily index and their field values are not indexed. These fields and their values are not lost because Priority insights always stores the full log record. Fields that are not index cannot be searched either in full-text or field-based searches.
Field values of fields that are already known in the daily index will be indexed as usual.

For example:

Priority insights processes a new log record with two fields: { "known": "indexed", "unknown": "stored" }.
Field known already has a field mapping in the daily index. Since it is known, its value indexed is indexed.
Field unknown has no field mapping yet in the daily index. It is unknown. Priority insights cannot add a field mapping because the daily index limit is reached. Its value stored is not indexed.
Priority insights stores the full log record.
The following queries will return the log record:
- Lucene query indexed.
- Lucene query known:indexed.
- DataPrime query source logs | filter $d ~~ 'indexed'.
- DataPrime query source logs | filter $d.known ~ 'indexed'.
The following queries will not return the log record because field unknown has no field mapping in the daily index and value stored is not indexed for field unknown:
- Lucene query stored.
- Lucene query unknown:stored.
- DataPrime query source logs | filter $d ~~ 'stored'.
- DataPrime query source logs | filter $d.unknown ~ 'stored'.

A new daily index automatically contains the known label and metadata fields. These are counted towards the daily index limit.

Since new fields are added to the index in the order in which they appear and no more fields are added after the limit is reached, the set of fields in the index and the indexed field values, can differ day-to-day. For this reason, searches for a field might work one day but might not work the next.

Check daily index usage

You can check the usage of the current daily index from the IBM Cloud Logs UI by clicking Usage icon Usage > Mapping stats . Check the Used keys today statistic. If all keys are used, the daily index field limit is reached.

You can only check usage for the current day, not for previous days. Typically, this statistic is meaningful unless you check immediately after the daily index is created or your log records contain many new fields shortly before the current daily index closes.

Determine if a field is in the daily index

On the the IBM Cloud Logs UI, click Explore logs icon Explore Logs > Logs. Then select Priority insights and the Logs tab. Open Settings on the result list header. Select to Show mapping errors under Annotations. With that annotation option, all log record fields of the selected log record in the result list that have no mapping in the daily Priority insights index will be marked with a red exclamation mark indicator symbol. If you hover the mouse pointer over the indicator symbol, IBM Cloud Logs will display a message.

There are two main reasons why a field is not indexed:

You reached the daily index field limit of your IBM Cloud Logs service instance.
The field is only contained in log records that have a mapping exception.

Staying below daily index field limit

There are different strategies to stay below the daily index field limit:

Modify your log structure.
Use TCO policies to only store high priority log records in Priority insights.
Use Stringify JSON field parsing rules to turn the value of complex JSON object values into escaped JSON.
Use Remove Fields parsing rules to remove fields from log records.
Use TCO policies or Block parsing rules to prevent log records from being stored.

Modifying your log structure

If you are sending logs from an application that you control, you might be able to modify your log structure. Try to optimise your log structure to contain only fields that are relevent to your searchs and investigations. All other information can be sent under a text field if indexing is not relevant.

You can always use free text searches to find full or partial matches within a text field.

TCO policies

TCO policies in IBM Cloud Logs can be used to classify received log records into different priority classes. Only logs records with high priority are processed and stored by Priority insights. Only logs records that are frequently searched should be sent to Priority insights.

Reducing the amount of logs sent to Priority insights will typically also reduce the number of indexed fields.

TCO policies can be applied by Application Name, Subsystem Name and Severity of the logs. For example, you might want to send info or debug logs to the Analyze and alert or even the Store and search tier, keeping only the most critical logs in Priority insights for quick searches.

Stringify JSON field

Use a Stringify JSON field parsing rules to rename a field with a complex JSON object value and turn the value into escaped JSON. Priority insights only parses values in JSON format into named fields with content, but not escaped JSON. With the parsing rule, the transformed field is no longer an object field but a text field. Priority insights will no longer parse the former complex JSON object value into nested fields. Content of the transformed field will no longer work with filters or field-based queries, but can still be searched with full-text queries.

This approach is useful for complex log records in JSON format where most of the nested fields will rarely be used in filters or field searches.

For example, IBM Cloud activity tracking events have requestData and responseData fields that have JSON object values. If you need audit events in Priority insights instead of Analyze and alert or Store and search, you can define Stringify JSON field parsing rules to change the values of requestData and responseData fields into escaped JSON.

Remove fields

Use a Remove Fields parsing rules to permanently delete unnecessary fields from log records. Removed fields are not stored in Priority insights or Store and search and can not be used in alert conditions. There is no way to recover removed fields.

Block log records

TCO policies or Block parsing rules can be used to permanently prevent log records from being processed and stored by Priority insights. There is no way to recover blocked logs.

Reducing the amount of logs in Priority insights will typically also reduce the number of indexed fields.

By default, blocked logs are not processed or stored by Analyze and alert and Store and search either. Block parsing rules offer an option to partially block log records and still process and store then in Store and search.