Setting up an operational monitoring solution

This tutorial shows you one way that can be used to meet the IBM Cloud Framework for Financial Services requirements that are related to operational monitoring by using Prometheus and Grafana on Red Hat OpenShift on IBM Cloud. This approach can be used to can gain insight into the health and performance of the provisioned infrastructure and the workloads that are running on the infrastructure -- all while keeping your monitoring data safe within your environment.

We provide guidance here, but you are solely responsible for installing, configuring, and operating IBM third-party software in a way that satisfies IBM Cloud Framework for Financial Services requirements. In addition, IBM does not provide support for third-party software.

Monitoring solution architecture

The architecture diagram shows a monitoring deployment within a Red Hat OpenShift on IBM Cloud cluster for a single region. The architecture enables gathering metrics for Red Hat OpenShift on IBM Cloud applications and virtual server instances within your VPCs. The label "Prom" in the workload clusters represents Prometheus.

Single-region IBM Cloud for Financial Services reference architecture with operational monitoring

When you configure Red Hat OpenShift on IBM Cloud for both operational logging and operational monitoring, the worker nodes can be shared. You can use the same worker pool for both logging and monitoring. You can use the same taint tag to steer monitoring and logging pods to the shared worker pool.

Red Hat OpenShift on IBM Cloud provides a built-in monitoring stack. This stack is used to set up monitoring and default alerting. For more information, see Understanding the monitoring stack

To implement your operational monitoring solution, you need to complete the following high-level steps:

Provision an instance of Red Hat OpenShift on IBM Cloud.
Configure the worker pool in your Red Hat OpenShift on IBM Cloud cluster.
Configuring the Red Hat OpenShift on IBM Cloud monitoring stack.
Configure monitoring for a user-defined Red Hat OpenShift on IBM Cloud project.
Configure monitoring for a VPC virtual server instance.
Set up a custom Grafana dashboard.

Before you begin

You have a VPC provisioned.
Subnets are provisioned across 3 zones within a region.

Provision Red Hat® OpenShift® on IBM Cloud®

To capture metrics from workloads running outside of Red Hat OpenShift on IBM Cloud (such as a virtual server instance), you need to provision an instance of Red Hat OpenShift on IBM Cloud if you don't already have one.

Provision Red Hat® OpenShift® on IBM Cloud® within the workload VPC where you plan to install the monitoring service. Use the following configuration:

Red Hat OpenShift on IBM Cloud version: 4.6.x
Worker zones: User defined subnet in each zone of the region
Worker nodes per zone: 1
Flavor: mx2.4x32 - 4 vCPU, 32 GB Memory
Master service endpoint: Private endpoint only

Provision a Red Hat OpenShift on IBM Cloud worker pool

Use a separate worker pool for the monitoring stack to keep the monitoring stack resources distinct from other workload resources. Provision a new worker pool with the following configuration:
- Worker zones: User defined subnet in each zone of the region
- Worker nodes per zone: 1
- Flavor: mx2.4x32 - 4 vCPU, 32 GB Memory
After provisioning completes, access the Red Hat OpenShift on IBM Cloud cluster.
Taint the worker pool. By providing a taint on the worker pool, it ensures that only the monitoring stack runs on the worker pool. For more information on taints and tolerations, see the Red Hat OpenShift on IBM Cloud documentation. A taint can be set on a worker pool with the following ibmcloud CLI command:
```
ibmcloud oc worker-pool taint set --worker-pool <WORKER_POOL> --cluster <CLUSTER> --taint KEY=VALUE:EFFECT
```
An example taint setting of logging-monitoring=node:NoExecute, can be set by using the following ibmcloud cli command:
```
ibmcloud oc worker-pool taint set --worker-pool <WORKER_POOL> --cluster <CLUSTER> --taint logging-monitoring=node:NoExecute
```

Configure the Red Hat OpenShift on IBM Cloud monitoring stack

For conceptual information about the monitoring stack, see Understanding the monitoring stack.

To configure the monitoring stack with supported options, see Maintenance and support for monitoring.

Create and edit the cluster-monitoring-config ConfigMap in the openshift-monitoring project with the following configuration. For more information, see the Configuring the monitoring stack. The sample code configures the monitoring stack to complete the following tasks:

Run the monitoring stack on the provisioned worker pool.
Add a retention period of 1 year.
Add a 100 GB persistent volume to Prometheus.

data:
  config.yaml: |
    enableUserWorkload: true
    prometheusOperator:
      tolerations:
      - key: "logging-monitoring"
        operator: "Equal"
        value: "node"
        effect: "NoExecute"
    prometheusK8s:
      retention: 1y
      volumeClaimTemplate:
        spec:
          storageClassName: ibmc-vpc-block-retain-general-purpose
          volumeMode: Filesystem
          resources:
             requests:
               storage: 100Gi
      tolerations:
      - key: "logging-monitoring"
        operator: "Equal"
        value: "node"
        effect: "NoExecute"
    alertmanagerMain:
      tolerations:
      - key: "logging-monitoring"
        operator: "Equal"
        value: "node"
        effect: "NoExecute"
    kubeStateMetrics:
      tolerations:
      - key: "logging-monitoring"
        operator: "Equal"
        value: "node"
        effect: "NoExecute"
    openshiftStateMetrics:
      tolerations:
      - key: "logging-monitoring"
        operator: "Equal"
        value: "node"
        effect: "NoExecute"
    telemeterClient:
      tolerations:
      - key: "logging-monitoring"
        operator: "Equal"
        value: "node"
        effect: "NoExecute"
    k8sPrometheusAdapter:
      tolerations:
      - key: "logging-monitoring"
        operator: "Equal"
        value: "node"
        effect: "NoExecute"
    thanosQuerier:
      tolerations:
      - key: "logging-monitoring"
        operator: "Equal"
        value: "node"
        effect: "NoExecute"

Create the user-workload-monitoring-config ConfigMap in the openshift-user-workload-monitoring project with the following configuration. For more information, see the Configuring the monitoring stack. The sample code configures the monitoring stack to complete the following tasks:

Run the user workload monitoring stack on the provisioned worker pool.
Add a retention period of 1 year.
Add a 100 GB persistent volume to Prometheus.

data:
  config.yaml: |
    prometheus:
      retention: 1y
      volumeClaimTemplate:
        spec:
          storageClassName: ibmc-vpc-block-retain-general-purpose
          volumeMode: Filesystem
          resources:
            requests:
              storage: 100Gi
      tolerations:
      - key: "logging-monitoring"
        operator: "Equal"
        value: "node"
        effect: "NoExecute"

Configure monitoring for a user-defined Red Hat OpenShift on IBM Cloud project

If you are running your applications within Red Hat OpenShift on IBM Cloud, you can configure the monitoring stack to monitor your user-defined project.

Create a ServiceMonitor for each of your projects where you want to collect metrics.

If you do not have an application available, you can use the sample service to test and verify.
Create and manage alerts for your project.

Step 5: Configure monitoring for a virtual server instance

You can use the Red Hat OpenShift on IBM Cloud monitoring stack to gather metrics for your virtual server instances in your VPC. Complete the following steps to set up monitoring for virtual server instances.

Linux

To monitor Linux host metrics, you can install Prometheus Node Exporter on the virtual server to expose a wide range of metrics.

Log in to your virtual server instance by using SSH.
Issue the following commands:

cd /tmp
curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz
tar -xvf node_exporter-1.1.2.linux-amd64.tar.gz
mv node_exporter-1.1.2.linux-amd64/node_exporter /usr/local/bin/

Create a custom node exporter service file in /etc/systemd/system/node_exporter.service with the following content:

[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

sudo useradd --system --shell /bin/false node_exporter
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter

Windows

To monitor Windows host metrics, you can install the Prometheus exporter for Windows machines.

Custom endpoint for exposing-metrics

Your application can also expose an endpoint that you can use to provide metrics to Prometheus.

Allow TCP inbound traffic to port 9100 and any other ports that your application uses to expose metrics within the VPC access control list and security group that your virtual server instance is in.
Within the Red Hat OpenShift on IBM Cloud cluster, create a project called monitoring-vpc-vsi.
```
oc new-project monitoring-vpc-vsi
```

For each of your virtual server instances apply the following configuration into the monitoring-vpc-vsi project. Make the appropriate changes where noted with <variables>. The following content is a sample and assumes that it is scraping your virtual server instance by using the HTTP protocol on port 9100 with the path of /metrics:

Apply the Service resource:

kind: Service
apiVersion: v1
metadata:
  name: vsi-<Name of VSI you used to provision>
  namespace: monitoring-vpc-vsi
  labels:
    k8s-app: vsi-<Name of VSI you used to provision>
spec:
  externalName: <IP Address of your VSI>
  type: ExternalName
  ports:
  - name: http
    port: 9100
    protocol: TCP
    targetPort: 9100

Apply the Endpoint resource:

apiVersion: v1
kind: Endpoints
metadata:
  name: vsi-<Name of VSI you used to provision>
  namespace: monitoring-vpc-vsi
  labels:
    k8s-app: vsi-<Name of VSI you used to provision>
subsets:
- addresses:
  - ip: <IP Address of your VSI>
  ports:
  - name: http
    port: 9100
    protocol: TCP

Apply the ServiceMonitor resource:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: vsi-<Name of VSI you used to provision>
  labels:
    k8s-app: vsi-<Name of VSI you used to provision>
  namespace: monitoring-vpc-vsi
spec:
  endpoints:
  - interval: 15s
    port: http
    path: /metrics
    honorLabels: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - monitoring-vpc-vsi
  selector:
    matchLabels:
      k8s-app: vsi-<Name of VSI you used to provision>

Set up a custom Grafana dashboard

Red Hat OpenShift on IBM Cloud comes with a static Grafana dashboard by default. The static dashboard cannot be easily extended to add new dashboards, alerts, or other features. The following steps describe how to provision a Grafana instance by using the Grafana Operator and creating your own dashboards. For more information and for available installation options, see Grafana Operator.

Install the Grafana Operator:
1. Create a namespace to which you install the operator.
```
oc new-project grafana
```
2. From the web interface of your Red Hat OpenShift on IBM Cloud cluster switch to the grafana project.
3. From the web interface, select Operators -> OperatorHub and find Grafana Operator (community).
4. Click Continue to accept the disclaimer and then click Install.
5. Keep the configuration as is and make sure that the Installed Namespace is grafana.

Apply the following Grafana resource. The settings are configurable.

apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
  name: grafana
  namespace: grafana
spec:
  dataStorage:
    accessModes:
      - ReadWriteOnce
    class: ibmc-vpc-block-metro-10iops-tier
    size: 10Gi
  config:
    auth:
      disable_signout_menu: false
    auth.anonymous:
      enabled: false
    log:
      level: warn
      mode: console
    security:
      admin_password: secret
      admin_user: root
  tolerations:
    - key: "logging-monitoring"
      operator: "Equal"
      value: "node"
      effect: "NoExecute"
  ingress:
    enabled: true
  dashboardLabelSelector:
    - matchExpressions:
        - key: app
          operator: In
          values:
            - grafana

Wait for all pods in the Grafana namespace to be in Running state.
Log in to the Grafana instance. The default username and password is root/secret. After you are logged in, change the password. You can locate the route by using the following command:
```
oc get route
```
Add Grafana users.
Connect Prometheus to Grafana.
1. Grant the cluster-monitoring-view cluster role to the grafana-serviceaccount service account.
```
oc adm policy add-cluster-role-to-user cluster-monitoring-view -z grafana-serviceaccount
```
2. Create a bearer token for the grafana-serviceaccount service account.
```
oc serviceaccounts get-token grafana-serviceaccount -n grafana
```
3. Create a GrafanaDataSource resource. In the following YAML example, substitute <BEARER_TOKEN> with the output of the previous command.
```
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
  name: prometheus-datasource
  namespace: grafana
spec:
  datasources:
    - access: proxy
      editable: true
      isDefault: true
      jsonData:
        httpHeaderName1: 'Authorization'
        timeInterval: 5s
        tlsSkipVerify: true
      name: Prometheus
      secureJsonData:
        httpHeaderValue1: 'Bearer <BEARER_TOKEN>'
      type: prometheus
      url: 'https://thanos-querier.openshift-monitoring.svc.cluster.local:9091'
  name: prometheus-datasource.yaml
```
  You can add another Prometheus for your Red Hat OpenShift on IBM Cloud cluster by including it in the data sources list. You must create a grafana-serviceaccount service account within the Red Hat OpenShift on IBM Cloud cluster that you want to connect to and grant the cluster role privileges. Then, you can generate a bearer token.
4. Create Grafana Dashboard resources. You can also import existing dashboards by using the JSON.

Setting up an operational monitoring solution

Monitoring solution architecture

Before you begin

Provision Red Hat® OpenShift® on IBM Cloud®

Provision a Red Hat OpenShift on IBM Cloud worker pool

Configure the Red Hat OpenShift on IBM Cloud monitoring stack

Configure monitoring for a user-defined Red Hat OpenShift on IBM Cloud project

Step 5: Configure monitoring for a virtual server instance

Linux

Windows

Custom endpoint for exposing-metrics

Set up a custom Grafana dashboard

Related controls in IBM Cloud Framework for Financial Services