IBM Cloud Docs
Performance

Performance

IBM Cloud® Messages for RabbitMQ deployments can be both manually scaled to your usage, or configured to autoscale under certain resource conditions. When you are tuning the performance of your deployment, consider a few factors.

Monitoring your deployment

Messages for RabbitMQ deployments offer an integration with the IBM Cloud® Monitoring service for basic monitoring of resource usage on your deployment. Many of the available metrics, like disk usage and IOPS, are presented to help you configure autoscaling on your deployment. Observing trends in your usage and configuring the autoscaling to respond to them can help alleviate performance problems before your databases become unstable due to resource exhaustion.

RabbitMQ Memory Usage

RabbitMQ provides a robust breakdown of memory usage, which can provide you information on how memory resources are allocated and being used in your deployment. Most notably, connections, queue mirrors, and accumulated messages all use memory. If your use-case calls for many open connections at a time, you might want to look into increasing memory. Likewise, if you have queues that contain only transient messages that don't need replication, you can bring down memory usage by adjusting their mirroring policy. Changes to mirroring policy, where messages have fewer or no mirrors, can cause messages to be deleted on node restarts and those messages are gone forever.

Occasionally, RabbitMQ can experience memory spikes. Specifically, with Messages for RabbitMQ deployments, updates and maintenance where we restart or delete a node causes memory usage to increase to resync the restarted or new node. If your RabbitMQ consistently uses a high percentage of its available memory, one of these spikes can run your deployment out of memory and cause it to crash. It is a good idea to scale your memory so that it can accommodate resyncing a node.

RabbitMQ Memory Alarms

By default, when the RabbitMQ server uses above 40% of the available RAM, it raises a memory alarm and blocks incoming messages from publishers. The memory alarm is cleared if consumers use enough of the messages or the messages are moved to disk. Once the memory alarm is cleared normal service resumes, and the publishers are unblocked. Note that this does not prevent the RabbitMQ server from using more than 40% of the allocated memory, it is merely the point at which publishers are throttled. For more information, see RabbitMQ documentation.

RabbitMQ Disk Alarms

By default, when the RabbitMQ server detects that free disk space has dropped below a certain threshold, it raises a disk alarm. The threshold for Messages for RabbitMQ is 80% of your deployment's disk size. The alarm blocks incoming messages from publishers and prevents messages in memory from being written to disk. The alarm is cluster-wide so if disk space on one node gets too low, the alarm blocks on all nodes. To clear the alarm, either messages that have been written to disk need to be consumed and that space is reclaimed, or scale your deployment to a larger disk size.

More information about memory alarms can be found in the RabbitMQ documentation.

Disk IOPS

The number of input/output operations per second (IOPS) is limited by the type of storage volume that is being used. Storage volumes for Messages for RabbitMQ deployments are provisioned on Block Storage Endurance Volumes in the 10 IOPS per GB tier. IOPS limits can affect RabbitMQ message throughput and storage operations. Reaching these limits can cause disk to fall behind on reclaiming space after messages are consumed, leading to disk alarms and publisher throttling until activity slows down. You can increase the number IOPS available to your deployment by increasing disk space.

Quorum Queues

High-availability can be managed with quorum queues. Using quorum queues impacts performance; it needs more memory and disk space for the WAL that it uses to maintain state for operations. It also needs more disk I/O as it persists all data on disk. If you have implemented quorum queues, or are considering them, the RabbitMQ documentation has a good write-up of their effect on both resource use and performance.

RabbitMQ Alarm Monitoring

When a disk or memory alarm is triggered, RabbitMQ emits a connection.blocked notification to publishing connections. Many drivers support the protocol necessary to catch the notification so you can design your application to respond to RabbitMQ alarms.

You can also monitor alarms from the RabbitMQ HTTP API. Use the GET /api/nodes endpoints, and look for mem_alarm and disk_free_alarm in the response.

For more checks related to memory alarms, you can gather information that is related to a single node's memory by using the GET /api/nodes/{node}/memory endpoint.

Standard Health Checks

The RabbitMQ HTTP API provides a couple of health check endpoints to verify the state of the RabbitMQ nodes in your deployment.

Health checks consume system resources. For smaller, less busy deployments, the health check shouldn't take long to give you a response. Larger deployments, or deployments under load, can take some time to return results.