IBM Cloud Docs
Using sampling to optimize metrics

Using sampling to optimize metrics

Use sampling to analyze a subset of data rather than processing every individual data point. It helps maintain performance and scalability while delivering accurate insights. Given the volume of data CIS handles—over 700 million events per second—sampling is essential for delivering fast, cost-effective metrics across large datasets.

In a small number of cases, metrics that are provided in the CIS dashboard and GraphQL API are based on a sample—a subset of the dataset. In these cases, CIS metrics return an estimate derived from the sampled value. For example, if during an attack the sampling rate is 10% and 5,000 events are sampled, CIS estimates the total number of events as 50,000 (5,000 × 10) and reports this value.

CIS primarily uses adaptive sampling, including a method called Adaptive Bit Rate (ABR), which adjusts the level of detail in the data returned based on query complexity and volume. When the number of records is small or the query is simple, full-resolution data (100%) is used. As the dataset grows, or the query becomes more complex, progressively lower sample rates (such as 10% or 1%), are applied to ensure efficient query completion.

This approach prevents large queries from consuming excessive computing resources, ensuring fair distribution and consistent performance for all users. Data is stored at multiple resolutions (100%, 10%, and 1%), allowing the system to select the appropriate resolution based on the query’s complexity and size, helping ABR deliver fast, accurate results.

CIS GraphQL API exposes datasets that are powered by adaptive sampling. These nodes have Adaptive in the name and can be discovered through introspection.

Why sampling is applied

CIS metrics are designed to deliver data at the appropriate level of detail as quickly as possible. Sampling helps achieve this by reducing the amount of data that is processed, allowing CIS to return metrics within seconds—even during spikes in volume, such as a burst of firewall events during an attack. Without sampling, queries can take minutes or more to complete, which is too long when validating mitigation efforts or troubleshooting issues.

CIS processes over 700 million events per second across its global network. Storing and processing all of this data in real time would take too much time and computing power to be practical. Sampling balances accuracy with performance, making metrics faster, more scalable, and more efficient. Because the datasets are so large, sampled values remain statistically meaningful and provide reliable insights.

This approach is similar to other domains:

  • Google Maps: Lower-resolution imagery when zoomed out mirrors how CIS adjusts sampling rates to deliver fast, relevant insights based on query size.
  • Opinion Polls: A small, representative sample can reflect system-wide trends.
  • Movie Frames: Viewing at 30 frames per second (fps) instead of 60 fps still tells the full story. Similarly, sampling maintains the key patterns in your data.

While ABR sampling resolution isn’t always visible, the number of rows read is a good indicator: the more rows read, the higher the resolution and reliability of the results.

Types of sampling

CIS metrics use two primary types of sampling: adaptive sampling and fixed sampling. The method applied depends on the dataset and how the data is queried.

Adaptive sampling

CIS metrics primarily rely on adaptive sampling, which means the sample rate fluctuates depending on the volume of data that is ingested or queried. If the number of records is relatively small, sampling is typically not used, allowing full data to be returned. However, as the volume of records increases, progressively lower sample rates are applied to maintain performance and responsiveness.

This model is used in several data sources, including Security Events (also known as Firewall Events) and the Security Event Log. Data nodes that use adaptive sampling are easy to identify by the Adaptive suffix in the node name, as in firewallEventsAdaptive.

Fixed sampling

The following data nodes are based on fixed sampling, where the sample rate does not vary:

Fixed sampling
Dataset Rate Notes
Firewall Rules Preview

Nodes: firewallRulePreviewGroups

1% Use with caution. A 1% sample rate does not provide accurate estimates for datasets smaller than a certain threshold, a scenario the CIS dashboard calls out explicitly but the API does not.
Network metrics

Nodes:

ipFlows1mGroups
ipFlows1hGroups
ipFlows1dGroups
ipFlows1mAttacksGroups

0.012% Sampling rate is in terms of packet count (1 of every 8,192 packets).

Other considerations

Considerations to keep in mind:

Access to raw data
Because sampling is primarily adaptive and automatically adjusts to provide an accurate estimate, the sampling rate cannot be directly controlled. Enterprise customers have access to raw data through CIS logs.
When sampling occurs
Sampling is typically applied to high-traffic datasets where full data metrics are impractical. For smaller datasets, full data analysis is often performed without sampling.
Sampling rates
Sampling rates vary depending on the dataset and product. CIS helps ensure that sampling rates are consistent within a single dataset to maintain accuracy across queries.
Impact on metrics
While sampling reduces the volume of processed data, aggregated metrics like totals, averages, and percentiles are extrapolated based on the sample size. This ensures that the reported metrics represent the entire dataset accurately.
Limitations
Sampling might not capture extremely rare events with very low occurrence rates.