Monitoring an IBM Cloudant cluster

A key part of ensuring best performance, or troubleshooting any problems, is monitoring the affected system.

The monitoring API is only available to IBM® Cloudant® for IBM Cloud® Enterprise customers with dedicated clusters and not to IBM Cloud® Public customers.

You want to be able to answer the question:

In what way has the system behavior changed as a result of any configuration or application modifications?

To answer the question, you need data. The data comes from monitoring the system. Monitoring the system while it replicates can be performed by using the _active_tasks endpoint, which is described in more detail in the Active tasks documentation.

For more detailed system information, use the cluster monitoring API.

Monitoring metrics overview

When you monitor the cluster, you can obtain data about how it is performing. Details such as the number of HTTP requests processed per second, or how many documents are processed per second, can all be obtained through the monitoring API. The API can be invoked only by an administrative user, and is applied to a specific monitoring endpoint.

For example, if you wanted to monitor the number of documents processed by a map function each second, you would direct the request to the map_doc endpoint.

For more information, see a full list of the available endpoints.

The data is returned in JSON format by default. You can specify a raw format if you prefer.

Syntax of the monitoring request

All requests to the monitoring API have the following form:

curl -u $ADMIN_USER "https://$ADMIN_USER.cloudant.com/_api/v2/monitoring/$END_POINT?cluster=$CLUSTER[&format=(json|raw)]"

The fields are described in the following table:

Monitoring API request fields
Field	Meaning
`ADMIN_USER`	The account name. The account must have administrative privileges.
`CLUSTER`	The cluster that you are interested in.
`DURATION`	Specifies the duration of the preferred time series query. Select from one of the following time intervals: `["5min", "30min", "1h", "12h", "24h", "1d", "3d", "7d", "1w", "1m", "3m", "6m", "12m", "1y"]`. `DURATION` must be paired with either the `START` or `END` request.
`END`	UTC timestamp in ISO-8601 or UTC epoch second, which specifies the end of a time series query. The timestamp can't have a query where `START` and `END` are the same, or where `END` is before `START`, or where `START` is after `END`.
`END_POINT`	The aspect of the cluster you want to monitor.
`START`	UTC timestamp in ISO-8601 or integer seconds where epoch format specifies the starting point of a time series query that is mutually exclusive with `END`.

Several of the fields have default values:

Default values for monitoring API request fields
Field	Default value
`DURATION`	5 minutes
`END`	No default value
`START`	The current time

Results format

By default, the monitoring results are returned in JSON format. If you prefer, you can choose to receive the results in raw format.

The results include a text string that identifies the metric that is stored on the server that provides the API capability, for example:

sumSeries(net.cloudant.mycustomer001.db*.df.srv.used)

The results include cluster-level data.

IBM Cloudant stores the queried data at the following resolutions: 10 seconds for the past 24 hours; 1 minute for the past 7 days; and 1 hour for the past 2 years. As a result, and to ensure that IBM Cloudant always stores the higher resolution interval length, deltas on the boundary of these resolutions are trimmed by one interval's length.

With `format=json` (default)

Unless you specify otherwise, the metric data that is returned is in JSON format. Each value that is returned consists of [datapoint, timestamp] values.

See an example monitoring request for disk use data returned in JSON format:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/disk_use?cluster=myclustername&format=json"

See an example result after you request disk use data in JSON format:

[
	{
		"target": "sumSeries(net.cloudant.mycustomer001.db*.df.srv.used)",
		"datapoints": [
			[523562172416.0, 1391019360],
			[524413976576.0, 1391019420],
			[519036682240.0, 1391019480],
			[518762102784.0, 1391019540],
			[523719393280.0, 1391019600]
		]
	},
	{
		"target": "sumSeries(net.cloudant.mycustomer001.db*.df.srv.free)",
		"datapoints": [
			[6488926978048.0, 1391019360],
			[6487768301568.0, 1391019420],
			[6493145661440.0, 1391019480],
			[6493420257280.0, 1391019540],
			[4330660167680.0, 1391019600]
		]
	}
]

With `format=raw`

The raw format data contains a series of text strings, identifying the name of the metric and associated values.

The text string (for example sumSeries(net.cloudant.mycustomer001.db*.df.srv.used)) is the name of the metric. The next two numbers are the start and end times, expressed as UTC epoch seconds. The final number is the step size in seconds.

The numbers after the | character contain the metric data that is obtained from your chosen endpoint. For example, requesting metric data from the disk use endpoint returns the output from a df command, with the disk use expressed as bytes stored.

See an example monitoring request for disk use data returned in raw format:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/disk_use?cluster=myclustername&format=raw"

See an example result after you request disk use data in raw format:

sumSeries(net.cloudant.mycustomer001.db*.df.srv.used),1391019780,1391020080,60|344708448256.0,345318227968.0,346120126464.0,346716471296.0,175483256832.0
sumSeries(net.cloudant.mycustomer001.db*.df.srv.free),1391019780,1391020080,60|6.49070326579e+12,6.4896982057e+12,6.48884414054e+12,6.48801589658e+12,4.32277107507e+12

Monitoring endpoints

To list all of the currently supported monitoring endpoints, make a request to the monitoring endpoint.

The following table lists the supported monitoring endpoints that are provided by the API:

Monitoring API endpoints
Endpoint	Description
`connections`	The status of multiple load balancer connections.
`disk_use`	The disk use, as measured by a `df` command.
`kv_emits`	The number of `key:value` emits per second.
`map_doc`	The number of documents processed by a map function, per second.
`network`	The octets that are received and transmitted.
`rate/status_code`	The rate of requests, which are grouped by status code.
`rate/verb`	The rate of requests, which are grouped by HTTP verb.
`rps`	The number of reads per second.
`wps`	The number of writes per second.

See an example showing how to obtain a list of the currently supported monitoring endpoints:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring"

See an example response that lists the available monitoring endpoints:

{
    "targets": [
        "node_disk_free_srv",
        "rps",
        "kv_emits",
        "smoosh_channels/slack_dbs",
        "smoosh_channels/upgrade_dbs",
        "smoosh_channels/ratio_dbs",
        "smoosh_channels/ratio_views",
        "smoosh_channels/slack_views",
        "smoosh_channels/upgrade_views",
        "uptime",
        "map_doc",
        "wps",
        "node_peak_cpu",
        "rate/status_code",
        "rate/verb",
        "disk_use",
        "node_mean_cpu",
        "memory",
        "os_proc_count",
        "run_queue",
        "node_disk_use_srv",
        "process_count",
        "response_time"
    ]
}

Examples of monitoring requests

`connections`

See an example of a connections monitoring request:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/connections?cluster=myclustername&node=myloadbalancername&format=json"

The response includes a data series for the following connection states:

{
  "end": 1512989500,
  "start": 1512989170,
  "target_responses": [
     {
       "datapoints": [
            [
              0,
              1512989170
            ]
        ],
        "target": "myclustername.myloadbalancername Connections CLOSED"
      },
      { 
        "datapoints": [
            [
              19,
              1512989170
            ]
        ],
        "target": "myclustername.myloadbalancername Connections CLOSE_WAIT"
      },
      {
        "datapoints": [
            [
              2,
              1512989170
            ]
        ],
        "target": "myclustername.myloadbalancername Connections CLOSING"
      },
      {
        "datapoints": [
            [
              280,
              1512989170
            ]
        ],
        "target": "myclustername.myloadbalancername Connections ESTABLISHED"
      },
      {
        "datapoints": [
            [
              7,
              1512989170
            ]
        ],
        "target": "myclustername.myloadbalancername Connections FIN_WAIT1"
      },
      { 
        "datapoints": [
            [
              0,
              1512989170
            ]
        ],
        "target": "myclustername.myloadbalancername Connections FIN_WAIT2"
      },
      {
        "datapoints": [
            [
              0,
              1512989170
            ]
        ],
        "target": "myclustername.myloadbalancername Connections LAST_ACK"
      },
      {
        "datapoints": [
            [
              4,
              1512989170
            ]
        ],
        "target": "myclustername.myloadbalancername Connections LISTEN"
      },
      {
        "datapoints": [
            [
              1,
              1512989170
            ]
        ],
        "target": "myclustername.myloadbalancername Connections SYN_RECV"
      },
      {
        "datapoints": [
            [
              0,
              1512989170
            ]
        ],
        "target": "myclustername.myloadbalancername Connections SYN_SENT"
      },
      {
        "datapoints": [
            [
              28,
              1512989170
              ]
            ],
        "target": "myclustername.myloadbalancername Connections TIME_WAIT"
      }
   ]
}

You must explicitly specify the load balancer in the request.

`disk_use`

See an example of a disk_use monitoring request:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/disk_use?cluster=myclustername&format=json"

See example results (abbreviated) from a disk_use monitoring request:

{
	"start": 1435935759,
	"end": 1435936059,
	"target_responses": [
		{
			"target": "myclustername Used disk space (bytes)",
			"datapoints": [
				[
					6855438336.0,
					1435935780
				],
				[
					null,
					1435935795
				],
				[
					null,
					1435935810
				],
				...
				[
					null,
					1435936065
				]
			]
		},
		{
			"target": "myclustername Free disk space (bytes)",
			"datapoints": [
				[
					7141069422592.0,
					1435935780
				],
				[
					null,
					1435935795
				],
				[
					null,
					1435935810
				],
				...
				[
					null,
					1435936065
				]
			]
		}
	]
}

`kv_emits`

See an example of a kv_emits monitoring request:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/kv_emits?cluster=myclustername&format=json"

See example results (abbreviated) from a kv_emits monitoring request:

{
	"start": 1436194248,
	"end": 1436194548,
	"target_responses": [
		{
			"target": "myclustername Key:value pairs emitted per second from map functions",
			"datapoints": [
				[
					0.0,
					1436194230
				],
				[
					0.0,
					1436194245
				],
				[
					0.8000000000001819,
					1436194260
				],
				...
				[
					null,
					1436194515
				]
			]
		}
	]
}

`map_doc`

See an example of a map_doc monitoring request:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/map_doc?cluster=myclustername&format=json"

See example results (abbreviated) from a map_doc monitoring request:

{
	"start": 1436194475,
	"end": 1436194775,
	"target_responses": [
		{
			"target": "myclustername Documents per second through map functions",
			"datapoints": [
				[
					0.0,
					1436194480
				],
				[
					0.5,
					1436194495
				],
				[
					0.4000000000005457,
					1436194510
				],
				...
				[
					0.0,
					1436194765
				]
			]
		}
	]
}

`network`

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/network?cluster=myclustername&node=myloadbalancername&format=json"

{
  "end": 1512989748,
  "start": 1512989450,
  "target_responses": [
      {
        "datapoints": [
          [
           20247725.5,
           1512989450
          ]
        ],
        "target": "myclustername Octets tx Per Second"
      },
      {
         "datapoints": [
             [
               17697329.3046875,
               1512989450
             ]
         ],
         "target": "myclustername Octets rx Per Second" 
       }
    ]
}

You must explicitly specify the load balancer in the request.

`rate/status_code`

See an example of a rate/status_code monitoring request:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/rate/status_code?cluster=myclustername&format=json"

See example results (abbreviated) from a rate/status_code monitoring request:

{
	"start": 1436194902,
	"end": 1436195202,
	"target_responses": [
		{
			"target": "myclustername 2xx",
			"datapoints": [
				[
					null,
					1436194910
				],
				[
					36.0,
					1436194920
				],
				...
				[
					0.0,
					1436195200
				]
			]
		},
		{
			"target": "myclustername 3xx",
			"datapoints": [
				[
					null,
					1436194910
				],
				[
					0.0,
					1436194920
				],
				...
				[
					null,
					1436195200
				]
			]
		},
		{
			"target": "myclustername 4xx",
			"datapoints": [
				[
					null,
					1436194910
				],
				[
					0.0,
					1436194920
				],
				...
				[
					0.0,
					1436195200
				]
			]
		},
		{
			"target": "myclustername 5xx",
			"datapoints": [
				[
					null,
					1436194910
				],
				[
					0.0,
					1436194920
				],
				...
				[
					null,
					1436195200
				]
			]
		}
	]
}

`rate/verb`

See an example of a rate/verb monitoring request:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/rate/verb?cluster=myclustername&format=json"

See example results (abbreviated) from a rate/verb monitoring request:

{
	"start": 1436195497,
    "end": 1436195797, 
	"target_responses": [
		{
			"target": "myclustername GET", 
			"datapoints": [
				[
					null, 
					1436195500
				], 
				[
					36.0, 
					1436195510
				], 
				...
				[
					49.5, 
					1436195790
				]
			]
		}, 
		{
			"target": "myclustername POST", 
			"datapoints": [
				[
					null, 
					1436195500
				], 
				[
					0.0, 
					1436195510
				], 
				...
				[
					0.0, 
					1436195790
				]
			]
		}, 
		{
			"target": "myclustername PUT", 
			"datapoints": [
				[
					null, 
					1436195500
				], 
				[
					0.0, 
					1436195510
				], 
				...
				[
					0.0, 
					1436195790
				]
			]
		}, 
		{
			"target": "myclustername DELETE", 
			"datapoints": [
				[
					null, 
					1436195500
				], 
				[
					0.0, 
					1436195510
				], 
				...
				[
					0.0, 
					1436195790
				]
			]
		}, 
		{
			"target": "myclustername COPY", 
			"datapoints": [
				[
					null, 
					1436195500
				], 
				[
					0.0, 
					1436195510
				], 
				...
				[
					0.0, 
					1436195790
				]
			]
		}, 
		{
			"target": "myclustername HEAD", 
			"datapoints": [
				[
					null, 
					1436195500
				], 
				[
					0.0,
					1436195510
				], 
				...
				[
					0.0, 
					1436195790
				]
			]
		}
	]
}

`response_time`

See an example of a response_time monitoring request:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/response_time?cluster=myclustername&format=json"

See example results (abbreviated) from a response_time monitoring request:

{
  "start": 1523984559,
  "end": 1523984859,
  "target_responses": [
    {
      "datapoints": [
        [
          118.1668472290039,
          1523984540
        ],
        [
          90.57628631591797,
          1523984550
        ],
        [
          142.6778106689453,
          1523984560
        ],
        [
          118.42487335205078,
          1523984570
        ],
        [
          120.38044738769531,
          1523984580
        ],
        [
          103.94148254394531,
          1523984590
        ],
        [
          126.64134979248047,
          1523984600
        ],
        [
          113.03324127197266,
          1523984610
        ],
        [
          136.9058074951172,
          1523984620
        ],
        [
          148.68711853027344,
          1523984630
        ],
        [
          121.22771453857422,
          1523984640
        ],
        [
          142.86459350585938,
          1523984650
        ],
        [
          103.75953674316406,
          1523984660
        ],
        [
          139.1707763671875,
          1523984670
        ],
        [
          118.29866027832031,
          1523984680
        ],
        [
          126.3541259765625,
          1523984690
        ],
        [
          115.5962905883789,
          1523984700
        ],
        [
          106.68751525878906,
          1523984710
        ],
        [
          144.12387084960938,
          1523984720
        ],
        [
          103.8598861694336,
          1523984730
        ],
        [
          136.84429931640625,
          1523984740
        ],
        [
          110.58084106445312,
          1523984750
        ],
        [
          94.69702911376953,
          1523984760
        ],
        [
          126.85747528076172,
          1523984770
        ],
        [
          100.8759994506836,
          1523984780
        ],
        [
          145.0876922607422,
          1523984790
        ],
        [
          100.77622985839844,
          1523984800
        ],
        [
          null,
          1523984810
        ],
        [
          null,
          1523984820
        ],
        [
          null,
          1523984830
        ]
      ],
      "target": "myclustername Response Time (ms)"
    }
  ]
}

`rps`

See an example of an rps monitoring request:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/rps?cluster=myclustername&format=json"

See example results (abbreviated) from an rps monitoring request:

{
	"start": 1436195908,
	"end": 1436196208,
	"target_responses": [
		{
			"target": "myclustername Document Reads Per Second",
			"datapoints": [
				[
					0.20000000001164153,
					1436195910
				],
				[
					0.10000000000582077,
					1436195925
				],
				...
				[
					null,
					1436196195
				]
			]
		}
	]
}

`wps`

See an example of a wps monitoring request:

curl "https://$ACCOUNT.cloudant.com/_api/v2/monitoring/wps?cluster=myclustername&format=json"

See example results (abbreviated) from a wps monitoring request:

{
	"start": 1436195999,
	"end": 1436196299,
	"target_responses": [
		{
			"target": "myclustername Document Writes Per Second",
			"datapoints": [
				[
					1.2999999999992724,
					1436196000
				],
				[
					0.5,
					1436196015
				],
				...
				[
					null,
					1436196285
				]
			]
		}
	]
}

Monitoring an IBM Cloudant cluster

Monitoring metrics overview

Syntax of the monitoring request

Results format

With format=json (default)

With format=raw

Monitoring endpoints

Examples of monitoring requests

connections

disk_use

kv_emits

map_doc

network

rate/status_code

rate/verb

response_time

rps

wps