Querying data directly from the archive
You can query your IBM Cloud Logs archive by using a third-party framework with the standard Apache Parquet reader provided by the relevant framework and required schema.
Archive folder structure
IBM Cloud Logs archive data is stored in standard hive-like partitions with the following partition fields:
team_id=<team-id>: IBM Cloud Logs Team ID
dt=YYYY-MM-DD: Date of the data in UTC
hr=HH: Hour of the data in UTC
These fields can be defined as virtual columns inside the framework and can be used as filters in a query.
Be aware of the following:
- Both
dt
andhr
are based on the event timestamp. - The
team_id=<team-id>
partition lets you reuse the same bucket and prefix to write data from multiple IBM Cloud Logs teams and query them in one query.
Fields
Each Apache Parquet file has three fields with data as JSON-formatted strings:
src_obj__event_metadata
: A JSON object containing metadata related to the event.src_obj__event_labels
: A JSON object containing the labels of the event (such as the IBM Cloud LogsapplicationName
andsubsystemName
).src_obj__user_data
: A JSON object containing actual event data.
The following is an example of src_obj__event_metadata
:
{
"timestamp": "2022-03-28T08:50:57.946",
"severity": "Debug",
"priorityclass": "low",
"logid": "some-uuid"
}
The following is an example of src_obj__event_labels
:
{
"applicationname": "some-app",
"subsystemname": "some-subsystem",
"category": "some-category",
"classname": "some-class",
"methodname": "some-method",
"computername": "some-computer",
"threadid": "some-thread-id",
"ipaddress": "some-ip-address"
}
The following is an example of src_obj__user_data
:
{
"_container_id": "0f099482cf3b507462020e9052516554b65865fb761af8e076735312772352bf",
"host": "ip-10-1-11-144",
"short_message": "10.1.11.144 - - [28/Mar/2022:08:50:57 +0000] \\"GET /check HTTP/1.1\\" 200 16559 \\"-\\" \\"Consul Health Check\\" \\"-\\""
}
Archive queries and parsing
With the IBM Cloud Logs Archive Query feature you can query logs directly from your archive using Lucene, DataPrime, and regex query syntax without counting against your daily quota, even if the data was never indexed. Yoou can store more of your data in the Analyze and alert and Store and search pipelines and take advantage of IBM Cloud Logs real-time analysis and remote storage search capabilities. This means you can use a shorter retention period and still be able to quickly query all your data.
Archive queries run on the archive that you set in IBM Cloud Logs and are available for all TCO logging levels. For example, if you prioritize logs for the Analyze and alert pipeline you can still query them without indexing the data. You can also view and query them in the LiveTail, receive real-time alerts and notification of anomalies, use parsing rules, log aggregation, and events to metrics at a lesser cost than data sent to the Priority insights pipeline.