Querying data directly from the archive
You can query your IBM Cloud Logs archive by using a third-party framework with the standard Apache Parquet reader provided by the relevant framework and required schema.
Archive folder structure
IBM Cloud Logs archive data is stored in standard hive-like partitions with the following partition fields:
team_id=<team-id>: IBM Cloud Logs Team ID
dt=YYYY-MM-DD: Date of the data in UTC
hr=HH: Hour of the data in UTC
These fields can be defined as virtual columns inside the framework and can be used as filters in a query.
Be aware of the following:
- Both
dt
andhr
are based on the event timestamp. - The
team_id=<team-id>
partition lets you reuse the same bucket and prefix to write data from multiple IBM Cloud Logs teams and query them in one query.
Fields
Each Apache Parquet file has three fields with data as JSON-formatted strings:
src_obj__event_metadata
: A JSON object containing metadata related to the event.src_obj__event_labels
: A JSON object containing the labels of the event (such as the IBM Cloud LogsapplicationName
andsubsystemName
).src_obj__user_data
: A JSON object containing actual event data.
The following is an example of src_obj__event_metadata
:
{
"timestamp": "2022-03-28T08:50:57.946",
"severity": "Debug",
"priorityclass": "low",
"logid": "some-uuid"
}
The following is an example of src_obj__event_labels
:
{
"applicationname": "some-app",
"subsystemname": "some-subsystem",
"category": "some-category",
"classname": "some-class",
"methodname": "some-method",
"computername": "some-computer",
"threadid": "some-thread-id",
"ipaddress": "some-ip-address"
}
The following is an example of src_obj__user_data
:
{
"_container_id": "0f099482cf3b507462020e9052516554b65865fb761af8e076735312772352bf",
"host": "ip-10-1-11-144",
"short_message": "10.1.11.144 - - [28/Mar/2022:08:50:57 +0000] \\"GET /check HTTP/1.1\\" 200 16559 \\"-\\" \\"Consul Health Check\\" \\"-\\""
}
Archive Query with SQL and parsing
IBM Cloud Logs’s Archive Query feature lets you query logs directly from your archive using SQL query syntax without counting against your daily quota, even if the data was never indexed. This lets you store more of your data in the Analyze and alert and Store and search pipelines and take advantage of IBM Cloud Logs’s real-time analysis and remote storage search capabilities. This means you can use a shorter retention period and still be able to query all your data in a short period of time.
Archive queries run on the archive that you set in IBM Cloud Logs and are available for all TCO logging levels. For example, prioritizing logs for the Analyze and alert pipeline still allows you to query them without indexing the data as well You can also view and query them in the LiveTail, receive real-time alerts and notification of anomalies, use parsing rules, log aggregation, and events to metrics at a lesser cost than data sent to the Frequest Search pipeline.