Default alerts reference
Cloud Databases integrates with IBM Cloud® Monitoring to provide visibility into the health and performance of your database instances. Monitoring and alerting helps administrators to detect issues early, respond to resource constraints, and maintain service reliability. Starting in December 2025, IBM Cloud Monitoring will automatically enable a default set of up to five critical alerts for each new and existing Cloud Databases instance when platform metrics are enabled. These alerts will monitor key resource metrics such as memory usage, disk I/O, and CPU load, and will be preconfigured to send notifications to the email of the account owner.
This guide explains the critical alerts installed for your managed database. For each alert, we outline what the metric monitors, why it matters for performance and availability, and the recommended actions to take when the alert triggers. Use this reference to proactively manage capacity, prevent capacity or settings related outages, and maintain resiliency.
Setting up monitoring
Default alerts are only installed if you integrate with IBM Cloud Monitoring to gain operational visibility into the performance and health of their applications, services, and platforms.
To begin collecting metrics for your database instance:
- Provision an instance of IBM Cloud Monitoring.
- Enable Platform metrics in the same region as your database instance.
- Access your monitoring dashboards from the IBM Cloud Monitoring section in the Cloud console under Observability.
Metrics for instances in multi-zone regions (MZRs) are available in-region. For single-zone regions (SZRs), metrics are forwarded to a designated MZR, for example che01.
For more information, see Cloud DatabasesIBM Cloud Monitoring integration.
Critical alerts: Benefits and response guidance
Cloud Databases provide a set of common and service-specific metrics to help you monitor performance and resource usage. Every metric supported by each database is listed in the Observability section of the documentation. For a few of critical metrics, at least one alert is set up to notify you when thresholds are exceeded. These alerts are explained below. To see the full list of metrics for each database, see the ICD monitoring integration.
PostgreSQL alerts
| Alert | Condition | Explanation |
|---|---|---|
PostgreSQL CPU usage is greater than 90%avg by (ibm_service_instance_name, ibm_service_instance, ibm_scope, ibm_resource) (avg_over_time (ibm_databases_for_postgresql_cpu_used_percent[10m])) > 0.9 |
> 0.95 | This metric tracks CPU usage for Databases for PostgreSQL. When usage stays above 90%, the database may slow down, stall transactions, or cause application timeouts. Sustained CPU pressure is often due to inefficient queries, large workloads, or insufficient resources. Review and optimize expensive queries or scale compute resources to restore headroom. |
PostgreSQL disk usage is greater than 80%max by (ibm_service_instance_name, ibm_service_instance, ibm_scope) (avg_over_time (ibm_databases_for_postgresql_disk_used_percent[10m])) > 0.8 |
> 0.80 | Tracks the maximum disk usage across Databases for PostgreSQL instances. Above 80%, at least one instance is critically close to running out of space, risking blocked transactions and degraded performance. Expand storage, archive or purge unused data immediately. |
MongoDB alerts
| Alert | Condition | Explanation |
|---|---|---|
MongoDB CPU usage is greater than 90%avg by (ibm_service_instance_name, ibm_service_instance, ibm_scope, ibm_resource) (avg_over_time (ibm_databases_for_mongodb_cpu_used_percent[10m])) > 0.9 |
> 0.90 | CPU usage above 90% in Databases for MongoDB signals heavy query load or insufficient capacity. Sustained pressure impacts replication lag, write throughput, and query latency. Review slow queries with profiling tools, shard or index data where needed, or scale the instance’s CPU resources. |
MongoDB disk usage is greater than 90%max by (ibm_service_instance_name, ibm_service_instance, ibm_scope) (avg_over_time (ibm_databases_for_mongodb_disk_used_percent[10m])) > 0.9 |
> 0.90 | This metric tracks the maximum Databases for MongoDB disk usage across instances. At greater than 90%, journaling, replication, and storage engine operations may fail. Databases for MongoDB requires free space for internal writes and recovery operations. Expand storage, archive or purge unused collections to prevent write failures. |
MongoDB connection count is greater than 1000sum by (ibm_service_instance_name, ibm_service_instance, ibm_scope) (avg_over_time (ibm_databases_for_mongodb_connections[10m])) > 1000 |
> 1000 | This metric shows active client connections to Databases for MongoDB. Surpassing 1,000 connections may overwhelm available resources, leading to errors or degraded performance. Connection surges often come from unpooled apps or misbehaving clients. Implement connection pooling and, if needed, scale the instance to handle demand. |
MySQL alerts
| Alert | Condition | Explanation |
|---|---|---|
MySQL CPU Usage is greater than 95%avg by (ibm_service_instance_name, ibm_service_instance, ibm_scope, ibm_resource) (avg_over_time (ibm_databases_for_mysql_cpu_used_percent[10m])) > 0.95 |
> 0.95 | Databases for MySQL CPU above 95% indicates that the system is overloaded with queries or background processes. This can delay transactions and degrade application performance. Tune inefficient queries (for example, via EXPLAIN plans) or scale compute capacity to handle demand. |
MySQL disk usage is greater than 90%max by(ibm_service_instance_name, ibm_service_instance, ibm_scope) (avg_over_time (ibm_databases_for_mysql_disk_used_percent[10m])) > 0.9 |
> 0.90 | Maximum disk usage exceeding 90% indicates at least one Databases for MySQL instance is critically close to running out of space. This can halt transactions and degrade stability. Add storage immediately, purge or archive unused tables to reduce pressure. |
MySQL connection count is above 95% of total availableavg by (ibm_service_instance_name, ibm_service_instance, ibm_scope) (avg_over_time (ibm_databases_for_mysql_connection_used_percent[10m])) > 0.95 |
> 0.95 | This metric tracks percentage of used Databases for MySQL connections. When it reaches 100%, new clients will be blocked, leading to connection errors. When connection usage exceeds 95%, increase max_connections cautiously or adopt connection pooling to avoid overload. |
Elasticsearch alerts
| Alert | Condition | Explanation |
|---|---|---|
Elasticsearch CPU Usage is greater than 95%avg by (ibm_service_instance_name, ibm_service_instance, ibm_scope, ibm_resource) (avg_over_time (ibm_databases_for_elasticsearch_cpu_used_percent[10m])) > 0.95 |
> 0.95 | Databases for Elasticsearch CPU usage above 95% affects indexing, queries, and cluster responsiveness. Sustained overload risks node instability. Optimize queries, reduce shard counts, or scale compute resources. |
Elasticsearch cluster status is redavg by (ibm_service_instance_name, ibm_service_instance, ibm_scope) (avg_over_time ibm_databases_for_elasticsearch_cluster_status[10m])) == 0 |
= 0 | Cluster status = 0 indicates that Databases for Elasticsearch is red, meaning primary shards are missing or unassigned. This poses a risk of data loss. Check node health, ensure sufficient disk space, and reallocate shards. |
Elasticsearch disk usage is greater than 80%max by (ibm_service_instance_name, ibm_service_instance, ibm_scope) (avg_over_time (ibm_databases_for_elasticsearch_disk_used_percent[10m])) > 0.8 |
> 0.80 | Databases for Elasticsearch disk above 80% prevents new indices or replicas and risks cluster instability. Free space is vital for shard balancing and merging. Expand storage, delete or archive old indices. |
Elasticsearch JVM heap usage is greater than 95%avg by (ibm_service_instance_name, ibm_service_instance, ibm_scope, ibm_resource) (avg_over_time (ibm_databases_for_elasticsearch_jvm_heap_percent[10m])) > 95 |
> 95 | JVM heap above 95% in Databases for Elasticsearch indicates garbage collection pressure and risk of node crashes. Increase heap size cautiously, optimize queries, or scale the cluster to distribute load. |
Redis alerts
| Alert | Condition | Explanation |
|---|---|---|
Redis memory usage is greater than 85%max by (ibm_service_instance_name, ibm_service_instance, ibm_scope) (avg_over_time (ibm_databases_for_redis_memory_used_percent[10m])) > 0.85 |
> 0.85 | Databases for Redis is memory-driven, and usage above 85% risks forced key evictions or OOM errors. High memory pressure can cause unpredictable data loss if eviction policies are triggered. Scale the memory allocation or enforce TTL/eviction policies aligned with application needs. |
Redis disk usage is greater than 80%max by (ibm_service_instance_name, ibm_service_instance, ibm_scope) (avg_over_time (ibm_databases_for_redis_disk_used_percent[10m])) > 0.80 |
> 0.80 | Databases for Redis persistence relies on disk space for snapshots and AOF logs. At greater than 80% usage, data persistence may fail, risking durability. Expand storage capacity or clean up unnecessary keys and backups. |
Redis connection count is greater than 9500avg by (ibm_service_instance_name, ibm_service_instance, ibm_scope, ibm_resource) (avg_over_time (ibm_databases_for_redis_connected_clients[10m])) > 9500 |
> 9500 | This metric measures the number of connected Databases for Redis clients. Surpassing 9,500 can overwhelm networking resources, slow responses, or cause dropped connections. Ensure efficient client pooling and scale Databases for Redis instances if the workload requires more connections. |
RabbitMQ alerts
| Alert | Condition | Explanation |
|---|---|---|
RabbitMQ CPU usage is greater than 95%avg by (ibm_service_instance_name, ibm_service_instance, ibm_scope, ibm_resource) (avg_over_time(ibm_messages_for_rabbitmq_cpu_used_percent[10m])) > 0.95 |
> 0.95 | Messages for RabbitMQ CPU above 95% suggests the broker is overloaded by message throughput or routing. Sustained CPU saturation risks slowdowns or dropped messages. Scale compute or optimize routing/queues. |
avg(avg (ibm_messages_for_rabbitmq_disk_used_percent)) |
> 0.85 | Messages for RabbitMQ relies on disk for message durability. Above 85% usage, queues may block publishers or lose messages. Expand disk capacity or clear unused queues. |
RabbitMQ disk usage is greater than 85%max by (ibm_service_instance_name, ibm_service_instance, ibm_scope) (avg_over_time (ibm_messages_for_rabbitmq_disk_used_percent[10m])) > 0.85 |
> 0.85 | Maximum Messages for RabbitMQ disk usage over 85% indicates some nodes are nearly full, risking message persistence failures. Add disk capacity or purge old/unconsumed queues immediately. |
Configure alerts
You can modify, test, silence or delete individual alerts. In addition, Cloud Databases default alerts can be disabled per database or as a whole in your IBM Cloud Monitoring dashboard under Alerts on the left navigation panel. You can customize alert thresholds for your workloads and explore the full Alerts library for deeper insights and proactive monitoring with pre-configured alerts and best practices.
Next steps
Default alerts for Cloud Databases only cover critical alerts, chosen based on proven patterns observed across enterprise scale deployments. For most customers using their databases effectively, no additional notifications will be routed to their inbox. To ensure you are receiving these critical notifications, verify that your notification channels are correctly conigured by adding and managing multiple notification channels. For instructions on how to do this, see Working with notification channels.