IBM Cloud Docs
Setting up an operational monitoring solution

Setting up an operational monitoring solution

This tutorial shows you one way that can be used to meet the IBM Cloud Framework for Financial Services requirements that are related to operational monitoring by using Prometheus and Grafana on Red Hat OpenShift on IBM Cloud. This approach can be used to can gain insight into the health and performance of the provisioned infrastructure and the workloads that are running on the infrastructure -- all while keeping your monitoring data safe within your environment.

We provide guidance here, but you are solely responsible for installing, configuring, and operating IBM third-party software in a way that satisfies IBM Cloud Framework for Financial Services requirements. In addition, IBM does not provide support for third-party software.

Monitoring solution architecture

The architecture diagram shows a monitoring deployment within a Red Hat OpenShift on IBM Cloud cluster for a single region. The architecture enables gathering metrics for Red Hat OpenShift on IBM Cloud applications and virtual server instances within your VPCs. The label "Prom" in the workload clusters represents Prometheus.

IBM Cloud for Financial Services reference architecture with operational monitoring
Figure 1. Single-region IBM Cloud for Financial Services reference architecture with operational monitoring

When you configure Red Hat OpenShift on IBM Cloud for both operational logging and operational monitoring, the worker nodes can be shared. You can use the same worker pool for both logging and monitoring. You can use the same taint tag to steer monitoring and logging pods to the shared worker pool.

Red Hat OpenShift on IBM Cloud provides a built-in monitoring stack. This stack is used to set up monitoring and default alerting. For more information, see Understanding the monitoring stack

To implement your operational monitoring solution, you need to complete the following high-level steps:

  1. Provision an instance of Red Hat OpenShift on IBM Cloud.
  2. Configure the worker pool in your Red Hat OpenShift on IBM Cloud cluster.
  3. Configuring the Red Hat OpenShift on IBM Cloud monitoring stack.
  4. Configure monitoring for a user-defined Red Hat OpenShift on IBM Cloud project.
  5. Configure monitoring for a VPC virtual server instance.
  6. Set up a custom Grafana dashboard.

Before you begin

  • You have a VPC provisioned.
  • Subnets are provisioned across 3 zones within a region.

Provision Red Hat® OpenShift® on IBM Cloud®

To capture metrics from workloads running outside of Red Hat OpenShift on IBM Cloud (such as a virtual server instance), you need to provision an instance of Red Hat OpenShift on IBM Cloud if you don't already have one.

  1. Provision Red Hat® OpenShift® on IBM Cloud® within the workload VPC where you plan to install the monitoring service. Use the following configuration:
  • Red Hat OpenShift on IBM Cloud version: 4.6.x
  • Worker zones: User defined subnet in each zone of the region
  • Worker nodes per zone: 1
  • Flavor: mx2.4x32 - 4 vCPU, 32 GB Memory
  • Master service endpoint: Private endpoint only

Provision a Red Hat OpenShift on IBM Cloud worker pool

  1. Use a separate worker pool for the monitoring stack to keep the monitoring stack resources distinct from other workload resources. Provision a new worker pool with the following configuration:

    • Worker zones: User defined subnet in each zone of the region
    • Worker nodes per zone: 1
    • Flavor: mx2.4x32 - 4 vCPU, 32 GB Memory
  2. After provisioning completes, access the Red Hat OpenShift on IBM Cloud cluster.

  3. Taint the worker pool. By providing a taint on the worker pool, it ensures that only the monitoring stack runs on the worker pool. For more information on taints and tolerations, see the Red Hat OpenShift on IBM Cloud documentation. A taint can be set on a worker pool with the following ibmcloud CLI command:

    ibmcloud oc worker-pool taint set --worker-pool <WORKER_POOL> --cluster <CLUSTER> --taint KEY=VALUE:EFFECT
    

    An example taint setting of logging-monitoring=node:NoExecute, can be set by using the following ibmcloud cli command:

    ibmcloud oc worker-pool taint set --worker-pool <WORKER_POOL> --cluster <CLUSTER> --taint logging-monitoring=node:NoExecute
    

Configure the Red Hat OpenShift on IBM Cloud monitoring stack

  1. For conceptual information about the monitoring stack, see Understanding the monitoring stack.

  2. To configure the monitoring stack with supported options, see Maintenance and support for monitoring.

    1. Create and edit the cluster-monitoring-config ConfigMap in the openshift-monitoring project with the following configuration. For more information, see the Configuring the monitoring stack. The sample code configures the monitoring stack to complete the following tasks:
    • Run the monitoring stack on the provisioned worker pool.
    • Add a retention period of 1 year.
    • Add a 100 GB persistent volume to Prometheus.
    data:
      config.yaml: |
        enableUserWorkload: true
        prometheusOperator:
          tolerations:
          - key: "logging-monitoring"
            operator: "Equal"
            value: "node"
            effect: "NoExecute"
        prometheusK8s:
          retention: 1y
          volumeClaimTemplate:
            spec:
              storageClassName: ibmc-vpc-block-retain-general-purpose
              volumeMode: Filesystem
              resources:
                 requests:
                   storage: 100Gi
          tolerations:
          - key: "logging-monitoring"
            operator: "Equal"
            value: "node"
            effect: "NoExecute"
        alertmanagerMain:
          tolerations:
          - key: "logging-monitoring"
            operator: "Equal"
            value: "node"
            effect: "NoExecute"
        kubeStateMetrics:
          tolerations:
          - key: "logging-monitoring"
            operator: "Equal"
            value: "node"
            effect: "NoExecute"
        openshiftStateMetrics:
          tolerations:
          - key: "logging-monitoring"
            operator: "Equal"
            value: "node"
            effect: "NoExecute"
        telemeterClient:
          tolerations:
          - key: "logging-monitoring"
            operator: "Equal"
            value: "node"
            effect: "NoExecute"
        k8sPrometheusAdapter:
          tolerations:
          - key: "logging-monitoring"
            operator: "Equal"
            value: "node"
            effect: "NoExecute"
        thanosQuerier:
          tolerations:
          - key: "logging-monitoring"
            operator: "Equal"
            value: "node"
            effect: "NoExecute"
    
    1. Create the user-workload-monitoring-config ConfigMap in the openshift-user-workload-monitoring project with the following configuration. For more information, see the Configuring the monitoring stack. The sample code configures the monitoring stack to complete the following tasks:
    • Run the user workload monitoring stack on the provisioned worker pool.
    • Add a retention period of 1 year.
    • Add a 100 GB persistent volume to Prometheus.
    data:
      config.yaml: |
        prometheus:
          retention: 1y
          volumeClaimTemplate:
            spec:
              storageClassName: ibmc-vpc-block-retain-general-purpose
              volumeMode: Filesystem
              resources:
                requests:
                  storage: 100Gi
          tolerations:
          - key: "logging-monitoring"
            operator: "Equal"
            value: "node"
            effect: "NoExecute"
    

Configure monitoring for a user-defined Red Hat OpenShift on IBM Cloud project

If you are running your applications within Red Hat OpenShift on IBM Cloud, you can configure the monitoring stack to monitor your user-defined project.

  1. Create a ServiceMonitor for each of your projects where you want to collect metrics.

    If you do not have an application available, you can use the sample service to test and verify.

  2. Create and manage alerts for your project.

Step 5: Configure monitoring for a virtual server instance

You can use the Red Hat OpenShift on IBM Cloud monitoring stack to gather metrics for your virtual server instances in your VPC. Complete the following steps to set up monitoring for virtual server instances.

Linux

  1. To monitor Linux host metrics, you can install Prometheus Node Exporter on the virtual server to expose a wide range of metrics.
    1. Log in to your virtual server instance by using SSH.
    2. Issue the following commands:
    cd /tmp
    curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz
    tar -xvf node_exporter-1.1.2.linux-amd64.tar.gz
    mv node_exporter-1.1.2.linux-amd64/node_exporter /usr/local/bin/
    
    1. Create a custom node exporter service file in /etc/systemd/system/node_exporter.service with the following content:
    [Unit]
    Description=Node Exporter
    After=network.target
    
    [Service]
    User=node_exporter
    Group=node_exporter
    Type=simple
    ExecStart=/usr/local/bin/node_exporter
    
    [Install]
    WantedBy=multi-user.target
    
    1. Register the node exporter service:
    sudo useradd --system --shell /bin/false node_exporter
    sudo systemctl daemon-reload
    sudo systemctl start node_exporter
    sudo systemctl enable node_exporter
    

Windows

  1. To monitor Windows host metrics, you can install the Prometheus exporter for Windows machines.

Custom endpoint for exposing-metrics

Your application can also expose an endpoint that you can use to provide metrics to Prometheus.

  1. Allow TCP inbound traffic to port 9100 and any other ports that your application uses to expose metrics within the VPC access control list and security group that your virtual server instance is in.
  2. Within the Red Hat OpenShift on IBM Cloud cluster, create a project called monitoring-vpc-vsi.
    oc new-project monitoring-vpc-vsi
    
  3. For each of your virtual server instances apply the following configuration into the monitoring-vpc-vsi project. Make the appropriate changes where noted with <variables>. The following content is a sample and assumes that it is scraping your virtual server instance by using the HTTP protocol on port 9100 with the path of /metrics:
    1. Apply the Service resource:
      kind: Service
      apiVersion: v1
      metadata:
        name: vsi-<Name of VSI you used to provision>
        namespace: monitoring-vpc-vsi
        labels:
          k8s-app: vsi-<Name of VSI you used to provision>
      spec:
        externalName: <IP Address of your VSI>
        type: ExternalName
        ports:
        - name: http
          port: 9100
          protocol: TCP
          targetPort: 9100
      
    2. Apply the Endpoint resource:
      apiVersion: v1
      kind: Endpoints
      metadata:
        name: vsi-<Name of VSI you used to provision>
        namespace: monitoring-vpc-vsi
        labels:
          k8s-app: vsi-<Name of VSI you used to provision>
      subsets:
      - addresses:
        - ip: <IP Address of your VSI>
        ports:
        - name: http
          port: 9100
          protocol: TCP
      
    3. Apply the ServiceMonitor resource:
      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        name: vsi-<Name of VSI you used to provision>
        labels:
          k8s-app: vsi-<Name of VSI you used to provision>
        namespace: monitoring-vpc-vsi
      spec:
        endpoints:
        - interval: 15s
          port: http
          path: /metrics
          honorLabels: true
        jobLabel: k8s-app
        namespaceSelector:
          matchNames:
          - monitoring-vpc-vsi
        selector:
          matchLabels:
            k8s-app: vsi-<Name of VSI you used to provision>
      

Set up a custom Grafana dashboard

Red Hat OpenShift on IBM Cloud comes with a static Grafana dashboard by default. The static dashboard cannot be easily extended to add new dashboards, alerts, or other features. The following steps describe how to provision a Grafana instance by using the Grafana Operator and creating your own dashboards. For more information and for available installation options, see Grafana Operator.

  1. Install the Grafana Operator:

    1. Create a namespace to which you install the operator.
      oc new-project grafana
      
    2. From the web interface of your Red Hat OpenShift on IBM Cloud cluster switch to the grafana project.
    3. From the web interface, select Operators -> OperatorHub and find Grafana Operator (community).
    4. Click Continue to accept the disclaimer and then click Install.
    5. Keep the configuration as is and make sure that the Installed Namespace is grafana.
  2. Apply the following Grafana resource. The settings are configurable.

    apiVersion: integreatly.org/v1alpha1
    kind: Grafana
    metadata:
      name: grafana
      namespace: grafana
    spec:
      dataStorage:
        accessModes:
          - ReadWriteOnce
        class: ibmc-vpc-block-metro-10iops-tier
        size: 10Gi
      config:
        auth:
          disable_signout_menu: false
        auth.anonymous:
          enabled: false
        log:
          level: warn
          mode: console
        security:
          admin_password: secret
          admin_user: root
      tolerations:
        - key: "logging-monitoring"
          operator: "Equal"
          value: "node"
          effect: "NoExecute"
      ingress:
        enabled: true
      dashboardLabelSelector:
        - matchExpressions:
            - key: app
              operator: In
              values:
                - grafana
    
  3. Wait for all pods in the Grafana namespace to be in Running state.

  4. Log in to the Grafana instance. The default username and password is root/secret. After you are logged in, change the password. You can locate the route by using the following command:

    oc get route
    
  5. Add Grafana users.

  6. Connect Prometheus to Grafana.

    1. Grant the cluster-monitoring-view cluster role to the grafana-serviceaccount service account.

      oc adm policy add-cluster-role-to-user cluster-monitoring-view -z grafana-serviceaccount
      
    2. Create a bearer token for the grafana-serviceaccount service account.

      oc serviceaccounts get-token grafana-serviceaccount -n grafana
      
    3. Create a GrafanaDataSource resource. In the following YAML example, substitute <BEARER_TOKEN> with the output of the previous command.

      apiVersion: integreatly.org/v1alpha1
      kind: GrafanaDataSource
      metadata:
        name: prometheus-datasource
        namespace: grafana
      spec:
        datasources:
          - access: proxy
            editable: true
            isDefault: true
            jsonData:
              httpHeaderName1: 'Authorization'
              timeInterval: 5s
              tlsSkipVerify: true
            name: Prometheus
            secureJsonData:
              httpHeaderValue1: 'Bearer <BEARER_TOKEN>'
            type: prometheus
            url: 'https://thanos-querier.openshift-monitoring.svc.cluster.local:9091'
        name: prometheus-datasource.yaml
      

      You can add another Prometheus for your Red Hat OpenShift on IBM Cloud cluster by including it in the data sources list. You must create a grafana-serviceaccount service account within the Red Hat OpenShift on IBM Cloud cluster that you want to connect to and grant the cluster role privileges. Then, you can generate a bearer token.

    4. Create Grafana Dashboard resources. You can also import existing dashboards by using the JSON.