Monitoring in IBM Cloud

You can use IBM Cloud® Monitoring to monitor the performance and overall system health of your organization.

You can collect metrics from a number of platforms, orchestrators, and a wide range of applications such as Prometheus, JMX, StatsD, Kubernetes, and other application stacks, that are available in the IBM Cloud, outside the IBM Cloud, or on-prem. You can also add more metrics by creating custom metrics and adding integrations.

See the IBM support statement for the use of agents and exporters.

You can monitor metrics through the Monitoring web UI, or through other platforms like Grafana.

The following figure shows the components overview for the IBM Cloud Monitoring service that is running on IBM Cloud:

IBM Cloud Monitoring component overview on the IBM Cloud

When you configure a Monitoring agent, data for default metrics is automatically collected. These metrics include metadata that you can use to label, segment, and display metrics when you monitor them. You do not need additional instrumentation or configuration in your hosts to obtain metrics that are collected automatically by the agent to gain insight into what is happening in them.

To monitor your infrastructure, network, and applications with the IBM Cloud® Monitoring service, you can deploy Monitoring agents on supported hosts. The host determines the agent type that you can deploy. The agent type determines the metrics that are collected automatically for that host.

To start collecting default metrics, you must configure a Monitoring agent per environment that you want to monitor.

You can monitor hosts in IBM Cloud, on-prem, and in other clouds.

Get started working with IBM Cloud Monitoring
Source	Info
IBM Cloud services	Enabling platform metrics Working with platform metrics Services generating metrics
Kubernetes clusters	Monitoring a Kubernetes cluster
Red Hat OpenShift clusters	Monitoring a Red Hat OpenShift cluster
VPC/VSI	Monitoring a Linux VPC server instance
Bare metal	Monitoring a Linux bare metal server
Windows environments	Monitoring a Windows environment
Linux environments	Working with the Linux agent Deploying the agent on a Linux host with no public access
VMware Solutions	Monitoring for VMware Shared
VMware as a service	Monitoring for VMware as a service
VMware self-managed solution - vCenter Server with NSX-T architecture	Monitoring for VMware vCenter Server deployments

Provisioning the Monitoring service

To start using the IBM Cloud Monitoring service in the IBM Cloud, you must provision an instance of the Monitoring service in each region where you operate within IBM Cloud.

You provision an instance within the context of a resource group. You use a resource group to organize your services in the IBM Cloud for access control and billing purposes. You can provision the Monitoring instance in the default resource group or in a custom resource group. For more information, see Provisioning an instance.

When you provision an instance, you automatically get an access key. The access key is used to authenticate the sender of metrics from a resource to a Monitoring instance.

Configuring sources

After you provision an instance, you must configure metric sources, enable platform metrics, or both.

A metric source is any resource that you want to monitor and control its performance and health.

When you configure a source by using an agent, data for default metrics is automatically collected.

You can also configure custom metrics and add labels to those metrics to describe their characteristics. Data for these custom metrics is also automatically collected.

For more information on how to configure sources to collect default metrics and custom metrics, see Collecting metrics.

For example, you can configure a Monitoring agent to collect metrics from a Kubernetes cluster. You use the access key to configure the agent that is responsible for collecting and forwarding metric data to your instance. After the agent is deployed, collection and forwarding of metrics to the Monitoring instance is automatic. The Monitoring agent automatically collects and reports on pre-defined metrics. You can configure which metrics to monitor in an environment.
You can enable platform metrics to monitor IBM Cloud services. You can only configure 1 Monitoring instance per region to collect automatically platform metrics. Learn more.

Collecting metrics

You can collect metrics from a number of platforms, orchestrators, and a wide range of applications such as Prometheus, JMX, StatsD, Kubernetes, and other application stacks, that are available in the IBM Cloud®, outside the IBM Cloud, or on-prem. You can also add more metrics by creating custom metrics and adding integrations. For more information, see Collecting metrics.

Sending metrics

You can send metrics via the public or the private endpoints by using the appropriate ingestion URL. Details can be found in the endpoints section.

Viewing metrics

You can monitor and manage metrics through the Monitoring Web UI. For more information, see Viewing metrics.

Notice that there is a delay showing metric data for new time series. Data is not ready until the initial indexing of a new metric source is completed. Therefore, new sources such as clusters, platform metrics, or systems that you configure, all take some time to become visible through the Monitoring UI.

Sending notifications

You can configure single alerts and multi-condition alerts to notify about problems that may require attention. When an alert is triggered, you can be notified through 1 or more notification channels. An alert definition can generate multi-channel notifications.

An alert is a notification event that you can use to warn about situations that require attention. Each alert has a severity status. This status informs you about the criticality of the information it reports on.

For example, you can set up Monitoring to send alert notifications to IBM Cloud Event Notifications.

For more information, see Working with alerts and events.

Data location

Metric data is hosted on the IBM Cloud.

Each multi-zone region (MZR) location collects and aggregates metrics for each instance of the IBM Cloud Monitoring that runs in that location.
Data is colocated in the region where the IBM Cloud Monitoring instance is provisioned. For example, metric data for an instance that is provisioned in US South is hosted in the US South region.

Data collection

Monitoring agent data is collected at 10-seconds frequency.

Data that is published by platform metrics is collected on a 1-minute frequency.

Data retention

Data is retained for each instance based on a roll-up policy.

As time progresses, the data is rolled up from a fine granularity to a coarser one by the end 2 months.

The roll-up policy describes the granularity of the data over time:

Data is retained at 10-second resolution for the first 4 hours.
Data is retained at 1-minute resolution for 2 days.
Data is retained at 10-minute resolution for 2 weeks.
Data is retained at 1-hour resolution for 2 months.
Data is retained at 1-day resolution for 15 months.