Monitoring Event Streams service metrics by using IBM Cloud Monitoring

Gen 2

IBM Cloud® Monitoring is a third-party cloud-native, and container-intelligence management system that you can include as part of your IBM Cloud architecture. Use it to gain operational visibility into the performance and health of your applications, services, and platforms. It offers administrators, DevOps teams, and developers full stack telemetry with advanced features to monitor and troubleshoot, define alerts, and design custom dashboards.

While you monitor service metrics with IBM Cloud Monitoring, Kafka clients (producers and consumers) have their own set of metrics to monitor their performance and health.

Opting in to and enabling Event Streams service metrics

Event Streams service metrics can broadly be categorized into two different groups: Default and Enhanced.

Enabling default Event Streams service metrics

Before you can start to use Event Streams IBM Cloud Monitoring metrics, you must first opt in, and then enable platform metrics by completing the following steps:

  1. Enable platform metrics for Event Streams. For more information, see Enabling platform metrics.

    The owner of the account has full access to this metrics data. For more information about managing access for other users, see Getting started with IBM Cloud Monitoring - manage user access.

  2. To navigate from the Event Streams instance page to the Monitoring dashboard, click Actions on the instance page and select Monitoring.

    On your first usage, you might see a welcome wizard. To advance to the dashboard selection menu, select Next and then Skip on the Choosing an installation method page. Accept the prompts that follow. You can then select the IBM Event Streams or IBM Event Streams (Enterprise) dashboard, depending on the plan that you use.

Enabling enhanced Event Streams metrics

The enhanced Event Streams metrics consist of three groups; topic, partition and consumers. You can opt in to either one, two, or all. The metrics available are described in the topic, partition and consumers tables.

Enabling enhanced metrics introduces more global gauge metrics and therefore increases the costs.

Before you can start to use enhanced Event Streams metrics, you must first enable them by completing the following step:

  • Run the following command to update the service instance to start using enhanced metrics:

    ibmcloud resource service-instance-update <instance-name> -p '{"dataservices": {"options":{"metrics":["topic","partition","consumers"]}}}'
    

When enhanced metrics are enabled, depending on the selection, the following new dashboards are available; IBM Event Streams(Topic), IBM Event Streams(Partitions) and IBM Event Streams(Consumers).

To opt out of enhanced metrics, run the following command:

ibmcloud resource service-instance-update <instance-name> -p '{"dataservices": {"options":{"metrics":[]}}}'

Dashboards are available only after metrics started to be recorded; it might take a few minutes to initialize.

Event Streams service metrics cost information

Before you opt in to using Monitoring metrics, be aware of the cost of doing so. The estimated cost depends on the following considerations:

  • The Event Streams plan that you use.
  • How many unique time series are sent for each plan.
  • The number of topics that you created.
  • The number of partitions that you created.
  • Whether you have topics, partitions, consumers, or all enabled.

For more information, see Monitoring pricing.

Event Streams service metrics details

The following tables describe the specific metrics that are provided by Event Streams for each plan.

Service metrics available by service plan

Service metrics available by service plan
Metric name Enterprise Gen2
Authentication failures Checkmark icon
Consume message conversion time Checkmark icon
Inactive consumer groups Checkmark icon
Instance bytes in per second Checkmark icon
Instance bytes out per second Checkmark icon
Instance utilization Checkmark icon
Number of partitions Checkmark icon
Produce message conversion time Checkmark icon
IAM ID bytes in per second Checkmark icon
IAM ID bytes out per second Checkmark icon
Rebalancing consumer groups Checkmark icon
Stable consumer groups Checkmark icon
Topic bytes in per second Checkmark icon
Topic bytes out per second Checkmark icon
Number of topics Checkmark icon
Number of offline partitions Checkmark icon
Number of under in-sync replica partitions Checkmark icon

Enhanced service metrics available with topic enabled

Metrics available for topic
Metric name Enterprise Gen2
Topic size Checkmark icon

Enhanced service metrics available with partition enabled

Metrics available for partition
Metric name Enterprise Gen2
Message rate per partition Checkmark icon

This information is useful for detecting if the distribution of message activity across the partitions in a topic is unbalanced and if the number of partitions a topic is scaled appropriately.

Authentication failures

This metric tracks the total number of authentication failures.

Authentication failures metric metadata
Metadata Description
Metric Name ibm_eventstreams_kafka_authentication_failure_total
Metric Type counter
Value Type none
Segment By Service instance, Service instance name

A value of 0 is expected under normal conditions. A nonzero value indicates that one or more clients are attempting to connect using invalid credentials. Verify that all clients are configured with correct authentication credentials and that no outdated or revoked credentials are in use.

Consume message conversion time

Indicates that the accumulated time spent performing message conversion from clients that are consuming by using older protocol versions.

Consume message conversion time metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_consume_conversions_time_quantile
Metric Type gauge
Value Type second
Segment By Service instance, Quantile, Service instance name

Ideally zero, as nonzero indicates that clients are experiencing more latency because of using an older protocol level. Those clients are down-level and must be upgraded. Ensure that all clients are at the latest levels.

IAM ID bytes in per second

This metric tracks the rate of data, in bytes per second, that is sent to the service by each IAM ID.

IAM ID bytes in per second metric metadata
Metadata Description
Metric Name ibm_eventstreams_iam_id_bytes_in_per_second
Metric Type gauge
Value Type byte
Segment By Service instance name, IBM IAM Id, Service instance

Use this metric to monitor usage patterns and identify trends in data ingestion across IAM IDs. Higher than expected values for a specific IAM ID can indicate disproportionate throughput usage. Compare activity across IAM IDs to understand traffic distribution and detect anomalies. This metric can also help inform decisions about rate limits or quotas if certain IAM IDs consistently consume more throughput than intended.

IAM ID bytes out per second

This metric tracks the rate of data, in bytes per second, that is sent from the service to each IAM ID.

IAM ID bytes out per second metric metadata
Metadata Description
Metric Name ibm_eventstreams_iam_id_bytes_out_per_second
Metric Type gauge
Value Type byte
Segment By Service instance name, IBM IAM Id, Service instance

Use this metric to monitor usage patterns and track how data is distributed to IAM IDs. Higher than expected values for a specific IAM ID can indicate disproportionate data consumption. Compare activity across IAM IDs to understand traffic patterns and identify potential anomalies. This metric can also help inform decisions about rate limits or quotas if certain IAM IDs consistently consume more outbound throughput than intended.

Inactive consumer groups

The number of inactive consumer groups in an Event Streams instance.

Inactive consumer groups metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_inactive_consumergroups
Metric Type gauge
Value Type none
Segment By Service instance, Service instance name

This is for information only and is not an issue. Spikes indicate that a set of consumer groups stopped sending messages.

Instance bytes in per second

This metric tracks the rate of data, in bytes per second, that is produced to an Event Streams instance.

Instance bytes in per second metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_bytes_in_per_second
Metric Type gauge
Value Type byte
Segment By Service instance, Service instance name

Use this metric to monitor data ingestion rates and identify trends in client-produced throughput. Compare observed values against the recommended limits for your plan and instance to ensure that your workload remains within supported thresholds. For more information, see Event Streams.

Instance bytes out per second

This metric tracks the rate of data, in bytes per second, that is consumed from an Event Streams instance.

Instance bytes out per second metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_bytes_out_per_second
Metric Type gauge
Value Type byte
Segment By Service instance, Service instance name

This is for information to help you monitor trends in your usage of how many incoming or outgoing MB/s your clients are transferring to and from your cluster. To determine what the recommended limits for your plan and instance are, see Event Streams.

Instance utilization

The level of utilization of an Event Streams instance. This is a numeric value between zero and two (inclusive):

  • 0 indicates that the workload being processed by this instance is within the capacity of the instance. More precisely, the utilization level is under 80%.
  • 1 indicates that the workload being processed by this instance is approaching the capacity limit for the instance. Review whether it is appropriate to scale the service instance. More precisely, the utilization level is over 80% and under 95%.
  • 2 indicates the workload being processed by this instance is at the capacity limit for the instance. As a result of this, messaging latency might increase. Review whether it is appropriate to scale the service instance. More precisely, the utilization level is over 95%.
Instance utilization metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_utilization
Metric Type gauge
Value Type none
Segment By Service instance, Service instance name

Message rate per partition

The rate of change of this metric gives the message per second that is incoming in to a partition of a Event Streams instance topic.

Message rate per partition metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_message_rate_per_partition
Metric Type gauge
Value Type none
Segment By Service instance, Service instance name, IBM Event Streams Kafka topic, IBM Event Streams Kafka partition

Number of offline partitions

The number of partitions offline in an Event Streams instance.

Number of offline partitions metric metadata
Metadata Description
Metric Name ibm_eventstreams_kafka_offline_partitions
Metric Type gauge
Value Type none
Segment By Service instance, Service instance name

Ideally this value should be zero. A nonzero value might indicate to a temporary issue with the cluster. It might also indicate to a Kafka partition leader election difficulty.

Number of partitions

The number of leader partitions in an Event Streams instance.

Number of partitions metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_partitions
Metric Type gauge
Value Type none
Segment By Service instance, Service instance name

This is for information to help you monitor trends in your usage. Refer to Event Streams to determine what the recommended limits are for your plan and instance.

Number of topics

The number of topics in an Event Streams instance.

Number of topics metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_topics
Metric Type gauge
Value Type none
Segment By Service instance, Service instance name

Number of under min ISR in-sync replica partitions

The number of partitions with fewer than two in-sync replicas.

Number of under in-sync replica partitions metric metadata
Metadata Description
Metric Name ibm_eventstreams_kafka_under_minisr_partitions
Metric Type gauge
Value Type none
Segment By Service instance, Service instance name

Ideally this value should be zero. A nonzero value might highlight a temporary issue with the cluster.

Produce message conversion time

Indicates that the accumulated time spent performing message conversion from clients that are producing by using older protocol versions.

Produce message conversion time metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_produce_conversions_time_quantile
Metric Type gauge
Value Type second
Segment By Service instance, Quantile, Service instance name

Ideally zero. A consistent growth in this indicates that some clients are down-level and should be upgraded. Ensure that all clients are at the latest levels.

Rebalancing consumer groups

The number of rebalancing consumer groups in an Event Streams instance.

Rebalancing consumer groups metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_rebalancing_consumergroups
Metric Type gauge
Value Type none
Segment By Service instance, Service instance name

While it is expected that this figure is occasionally >0 (as broker restarts happen frequently,) sustained high levels suggest that consumers might be restarting frequently and leaving or rejoining the consumer groups. Check you client logs.

Stable consumer groups

The number of stable consumer groups in an Event Streams instance.

Stable consumer groups metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_stable_consumergroups
Metric Type gauge
Value Type none
Segment By Service instance, Service instance name

Use along with rebalancing consumer groups. If this is consistently zero and rebalancing high, then it indicates a cluster problem. If this is nonzero and rebalancing high, it indicates a consumer group issue.

Topic bytes in per second

This metric tracks the rate of data, in bytes per second, that is produced to each topic.

Topic bytes in per second metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_topic_bytes_in_per_second
Metric Type gauge
Value Type byte
Segment By Service instance, IBM Event Streams Kafka topic, Service instance name

Use this metric to monitor data ingestion at the topic level and identify trends in producer activity. Compare throughput across topics to detect imbalances and potential anomalies.

Topic bytes out per second

This metric tracks the rate of data, in bytes per second, that is consumed from each topic.

Topic bytes out per second metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_topic_bytes_out_per_second
Metric Type gauge
Value Type byte
Segment By Service instance, IBM Event Streams Kafka topic, Service instance name

Use this metric to monitor data consumption at the topic level and understand how throughput is distributed across consumers. Compare activity across topics to identify bottlenecks, imbalances, or anomalies.

Topic size

Total disk size currently being used by partitions of a topic e.g if a topic has two partitions, one with 2MB of data and one with 4MB of data, the metric will report the size as 6MB. This can be used to monitor storage usage and optimise partitioning.

Topic size metric metadata
Metadata Description
Metric Name ibm_eventstreams_instance_topic_size
Metric Type gauge
Value Type none
Segment By Service instance, Service instance name, IBM Event Streams Kafka topic

Attributes for Segmentation

Global attributes

The following attributes are available for segmenting all of the listed metrics.

Global attributes
Attribute Attribute name Attribute description
Cloud Type ibm_ctype The cloud type is a value of public, dedicated, or local.
Location ibm_location The location of the monitored resource - this might be a region, data center or global.
Scope ibm_scope The scope is the account, organization, or space GUID associated with this metric.
Service name ibm_service_name Name of the service that generates this metric.
Service instance ibm_service_instance The service instance GUID identifies the instance that the metric is associated with.
Service instance name ibm_service_instance_name The service instance name provides the user-provided name of the service instance that isn't necessarily a unique value that depends on the name that is provided by the user.
Resource group ibm_resource_group_name The resource group name where the service instance was created.

Additional attributes

The following attributes are available for segmenting one or more attributes. See the individual metrics for segmentation options.

Additional attributes
Attribute Attribute name Attribute description
IBM Event Streams Kafka partition ibm_eventstreams_partition IBM Event Streams Kafka partition.
IBM Event Streams Kafka topic ibm_eventstreams_topic IBM Event Streams Kafka topic.
Quantile ibm_quantile The quantile represented when a metric supports segmenting by quantile

For more information about enabling platform metrics from the Event Streams dashboard and viewing metrics, see Monitoring Event Streams metrics.