Configuring buckets for long term storage and search
IBM Cloud Logs uses IBM Cloud Object Storage buckets to store data and metrics for long term storage and search.
About buckets
IBM Cloud Object Storage is a highly available, durable, and secure platform for storing unstructured data. The files that are uploaded into IBM Cloud Object Storage are called objects. Objects can be anywhere from a few bytes up to 10TB. They are organized into buckets that serve as containers for objects, and which can be configured independently from one another in terms of locations, resiliency, billing rates, security, and object lifecycle. For more information, see What is IBM Cloud Object Storage?.
To manage buckets, your user must be granted permissions to work with buckets on the IBM Cloud Object Storage instance. For more information about roles, see Identity and Access Management roles.
To create a bucket, you can choose 1 of the following options:
| Action | More info |
|---|---|
| Create a bucket through the IBM Cloud UI | Learn more |
| Create a bucket through the IBM Cloud CLI | Learn more |
| Create a bucket by using cURL | Learn more |
| Create a bucket by using the REST API | Learn more |
| Create a bucket with a different storage class by using the REST API | Learn more |
| Create a bucket with Key Protect or Hyper Protect Crypto Services managed encryption keys (SSE-KP) by using the REST API | Learn more |
| Create a bucket by using Terraform | Learn more |
For more information, see Getting started with IBM Cloud Object Storage.
About buckets with IBM Cloud Logs
For each IBM Cloud Logs instance, you can configure 1 data bucket and 1 metrics bucket.
-
The data bucket is used to store logs that are ingested and not blocked by a TCO policy or a block parsing rule. For more information, see Configuring the data bucket.
-
The metrics bucket is used to store collected data usage metrics as well as metrics generated by IBM Cloud Logs. For information on configuring the metrics bucket, see Configuring the metrics bucket.
You can configure the data and metrics buckets in the same region or in a different region from your IBM Cloud Logs instance. The buckets and the IBM Cloud Logs instance can be in the same account or in different accounts.
You should create a bucket with Cross Region resiliency to store and access data across multiple geographical regions to ensure high availability, durability, and disaster recovery capabilities. See Creating and modifying IBM Cloud Object Storage buckets.
You can configure the same bucket as your data bucket and your metrics bucket. However, consider the following recommendations:
Use different buckets for data and for metrics for production environments.
Use separate buckets for logs and metrics if you have different data retention requirements on logs and metrics.
You are responsible for the bucket and the data that is uploaded into the buckets. You decide for how long you want to keep the data in a bucket.
-
Compliance, corporate and industry requirements are key inputs to help define how long to keep the data for.
-
In IBM Cloud Object Storage, you can configure object lifecycle policies, including tags, to automatically delete files from your buckets.
Deleted data will no longer be queryable. Ensure that you no longer require the deleted data for any queries or processes before removing it.
To use different object lifecycle periods for metrics and logs data, you must use different buckets to handle your log data and your metrics data separately, and configure the lifecycle policies appropriately.
To use different lifecycle periods for logs data ingested through different data pipelines, you must configure archive retention tags in IBM Cloud Logs and lifecycle policies filtering by tag appropriately.
While all data stored in IBM Cloud Object Storage is automatically encrypted using randomly generated keys, some workloads require that the keys can be rotated, deleted, or otherwise controlled by a key management system (KMS) like IBM® Key Protect for IBM Cloud®. Data at rest is encrypted with automatic provider-side Advanced Encryption Standard (AES) 256-bit encryption and the Secure Hash Algorithm (SHA)-256 hash. Data in motion is secured by using the built-in carrier grade Transport Layer Security/Secure Sockets Layer (TLS/SSL) or SNMPv3 with AES encryption. If you want more control over encryption, you can make use of IBM® Key Protect for IBM Cloud® to manage generated or "bring your own" keying. For more information, see Encrypting a bucket with IBM® Key Protect for IBM Cloud® and Key-protect COS Integration.
Notice that data that is stored in the data bucket includes data across all TCO data pipelines: data from Priority insights, Analyze and alert and Store and search. If the data must be protected by a customer-managed encryption only, then TCO policies need to be configured to exclusively process data through the Analyze and alert or Store and search data pipelines. For more information, see Configuring the TCO Optimizer.
The IBM Cloud Object Storage service is billed separately from IBM Cloud Logs. IBM Cloud Object Storage storage costs are determined by the pricing plan that you choose for the IBM Cloud Object Storage instance.
IBM Cloud Logs does not support IBM Cloud Object Storage buckets configured with retention policies, object lock policies, or with public access enabled since IBM Cloud Logs requires deletion permissions on the logs and metrics buckets.
IAM Service to service authorization between IBM Cloud Logs and IBM Cloud Object Storage
You must define a service to service (S2S) authorization between IBM Cloud Logs and IBM Cloud Object Storage to allow IBM Cloud Logs to read and write data into the buckets.
For more information, see:
Data bucket
You can configure a data bucket for an IBM Cloud Logs instance. For more information, see Configuring the data bucket.
-
The data bucket stores and retains logs for as long as you need them.
-
If you have regulatory and compliance requirements, check the location where you can create the bucket. Then, if performance is critical, consider creating the bucket in the same region where the IBM Cloud Logs instance is provisioned.
-
You must configure the direct endpoint as the bucket endpoint.
Direct endpoints are used for requests originating from resources within VPCs. Direct endpoints provide better performance over Public endpoints and do not incur charges for any outgoing or incoming bandwidth even if the traffic is cross regions or across data centers. For more information, see Endpoint Types.
-
You are responsible for the maintenance of the data bucket. In IBM Cloud Logs, you can use IBM Cloud Object Storage object tags to help you manage automatically the log data in a bucket. For more information, see Deleting files from the data bucket.
Files uploaded to the data bucket
Logs are stored as Parquet files with the following structure:
cx/parquet/v1/team_id=<TEAM>/dt=<DT>/hr=<HR>/UUID.parquet
Metadata is stored in manifest files with this structure:
cx/parquet/v1/_manifest/team_id=<TEAM>/dt=<DT>/hr=<HR>/_manifest/UUID.manifest
For example:
cx/parquet/v1/team_id=58/dt=2024-12-18/hr=14/_manifest/df7bda51-9a1a-4c67-9f4d-b17f93ec4fd1.manifest
cx/parquet/v1/team_id=58/dt=2024-12-18/hr=14/710bb5f8-0cfc-4706-8aec-27ec7d993af8.parquet
Deleting files from the data bucket
In IBM Cloud Object Storage, you can define expiration rules (lifecycle policies) on buckets. An expiration rule deletes objects after a defined period (from the object creation date). The expiration rules for each bucket are evaluated once every 24 hours. Any object that qualifies for expiration (based on the objects' expiration date) will be queued for deletion. The deletion of expired objects begins the following day and will typically take less than 24 hours.
- You can configure expiration rules that can limit the scope of the rule by using one or more filters such as an object prefix, an object tags, or an object size.
- You can use tags as a filter option that allows expiration rules to apply to objects that contain a matching tag. The tag filter is provided as a container that specifies a key string and value string. The key string must be less than 128 characters.
- If no prefix, tag or object size is configured, the policy will apply to all objects in the bucket. For more information, see Deleting stale data with expiration rules.
In IBM Cloud Object Storage, you can configure expiration rules (lifecycle policies) to manage automatically the deletion of object files based on number of days since the object creation date. However, if you want a more granular control
on the data that is kept for search in the data bucket and delete files automatically by using different retention periods on the data, you must configure in IBM Cloud Object Storage expiration rules that limit the scope by using the object
tag ICL_ARCHIVE_RETENTION and use the tag values that you define in your IBM Cloud Logs instance.
To use archive retention tags, you must complete the following steps:
-
In IBM Cloud Logs, configure IBM Cloud Object Storage object tags to manage automatically how long log data is available for search in the data bucket.
- You must configure and enable archive retention tags in your IBM Cloud Logs instance. For more information, see Configuring archive retention tags to manage data retention.
- You can define up to 3 custom object tags that you can use to define 3 different expiration periods on the log data.
- You can use the
defaulttag to define a default expiration period that you can apply to data that is not explicitly managed through a custom object tag.
After you activate archive retention tags, every file in your data bucket is tagged with the custom tag
ICL_ARCHIVE_RETENTION. The value of the tag is set to a custom tag value or todefault. This action cannot be undone. Retention tags cannot be deactivated once enabled. -
In your IBM Cloud Object Storage data bucket lifecycle policies section, configure expiration rules for each tag, including default.
Use the key
ICL_ARCHIVE_RETENTION.The value string must be less than 256 characters. For example, you can use values like
high,medium, andlow.Make sure the tag names that you configure in IBM Cloud Logs match the tag values you set in the expiration policies in your bucket. Tag values are case-sensitive.
-
In IBM Cloud Logs, configure 1 or more TCO policies and define the object tag to use with the data selected in the policy. If no tag is configured, the
defaulttag is used.Data that is sent to the log data bucket is uploaded into object files. Each file has 1 object tag
ICL_ARCHIVE_RETENTIONand value. For more information, see Retention tags.
Archive retention tags are attached to object files that are uploaded into the data bucket after they are defined and enabled in the IBM Cloud Logs instance.
Data bucket restrictions
Storage classes
IBM Cloud Object Storage buckets used by IBM Cloud Logs as data buckets can be configured only with the following storage classes:
- Smart Tier
- Standard
The following storage classes are not supported by IBM Cloud Logs as data buckets:
- Vault
- Cold Vault
Archive rules
IBM Cloud Object Storage allows you to define archive rules on buckets that archive objects automatically after the defined time period. Archived objects have a lower cost than regular objects, but need to be restored before they can be read again.
IBM Cloud Logs cannot read archived objects. IBM Cloud Logs searching of archived objects in the All Logs view, or querying in Archive queries, returns an error message.
IBM Cloud Object Storage buckets used as IBM Cloud Logs data buckets must not define archive rules that immediately archive objects, or archive objects within a few hours.
If you do not need to search logs older than a certain time period, for example, a month, you can define an IBM Cloud Object Storage archive rule to archive objects older that the time period required for searching. Do not configure archiving for a period of less than 7 days.
By archiving data that you do not need to search, you can retain the log data at a reduced cost. If required, you can restore archived objects if you need to search the data by using IBM Cloud Logs in the future.
Successful read activity tracking events
IBM Cloud Activity Tracker Event Routing drops successful cloud-object-storage.object.read events that are initiated by IBM Cloud Logs instances because they are not needed. When reviewing activity tracking events related to IBM
Cloud Logs activity, you will not see see successful cloud-object-storage.object.read events.
Metrics bucket
You can configure a metrics bucket for an IBM Cloud Logs instance. For more information, see Configuring the metrics bucket.
-
The metrics bucket stores and retains metrics from your events in a long-term index for as long as you need them.
When you enable metrics, you can generate metrics from logs. These metrics are stored in the metrics bucket as Prometheus index blocks.
-
If you have regulatory and compliance requirements, check the location where you can create the bucket. Then, if performance is critical, consider creating the bucket in the same region where the IBM Cloud Logs instance is provisioned.
-
You must configure the direct endpoint as the bucket endpoint.
Direct endpoints are used for requests to a bucket that originate from resources within VPCs. Direct endpoints provide better performance over Public endpoints and do not incur charges for any outgoing or incoming bandwidth even if the traffic is cross regions or across data centers. For more information, see Endpoint Types.
-
You are responsible for the maintenance of the metrics bucket. In IBM Cloud Object Storage, you can define an expiration rule to maintain data in the metrics bucket. For more information, see Deleting stale data with expiration rules.