Readme file
StormForge Optimize Live delivers continuous, autonomous rightsizing for Kubernetes workloads.
The StormForge Agent is a helm chart which combines the stormforge-agent that surfaces minimum Kubernetes resource (pods, hpa) metrics and a prometheus agent to forward these metrics to StormForge Optimize Live's SaaS backend.
Before you begin
- Sign up for a StormForge Optimize Live account by visiting https://app.stormforge.io/signup
- Download the StormForge CLI by following the instructions here: https://docs.stormforge.io/optimize-live/getting-started/install-v2/#install-the-stormforge-cli-tool
- Create the access credential that will contain the input variables required to successfully authenticate and deploy the StormForge Agent: https://docs.stormforge.io/optimize-live/getting-started/install-v2/#generate-an-access-credential
Required resources
To run the software, the following resources are required:
- A Kubernetes cluster > v1.16
- The StormForge CLI: https://docs.stormforge.io/optimize-live/getting-started/install-v2/#install-the-stormforge-cli-tool
- A valid StormForge Optimize Live license: https://app.stormforge.io
Installing the software
Generating the credentials:
It will generate the following file. Save the file locally, i.e. as AUTH_NAME-credentials.yaml
:
Running the installation (replace LATEST_VERSION
and CLUSTER_NAME
in example with appropriate values)
If you no longer have access to the Chart values used to install the agent but don't need to change any of them, you might be able to retrive and reuse the existing values using helm get values
as follows.
Parameters
Parameter | Description | Default |
---|---|---|
stormforge.address |
API endpoint for StormForge Optimize Live Saas | https://api.stormforge.io/ |
authorization.issuer |
Authorization Issuer | https://api.stormforge.io/ |
authorization.clientID |
client.ID string from credential YAML. Visit docs.stormforge.io for details | [] |
authorization.clientSecret |
client.Secret string from credential YAML. Visit docs.stormforge.io for details | [] |
workload.allowNamespaces |
List specific namespaces for Optimize Live's recommendations. Default behavior is all namespaces expect "kube-system" | [] |
workload.denyNamespaces |
List specific namespaces to exclude from Optimize Live's recommendations. Note: workload.allowNamespaces and workload.denyNamespaces are mutually exclusive with workload.Allownamespaces taking precendence. |
[] |
clusterName |
String used to define Cluster Name in the StormForge SaaS UI | [] |
Upgrading to a new version
A typical upgrade might look something like the following.
Uninstalling the software
Complete the following steps to uninstall a Helm Chart from your account.
Workload Metrics
Here are the workload metrics produced by StormForge Agent
Metric | Source | Why |
---|---|---|
sf_kube_pod_container_resource_requests | KSM-like/pod-metrics | Track requests for each container |
sf_kube_pod_container_resource_limits | KSM-like/pod-metrics | Track limits for each container |
sf_kube_replicaset_spec_replicas | KSM-like/replicaset-metrics | Track replicas for replicasets |
sf_kube_statefulset_replicas | KSM-like/statefulset-metrics | Track replicas for statefulsets |
sf_workload_pod_owner | Consolidated metric for ownership | With this metric, we have pod owner and workload, replacing KSM kube_pod_owner and kube_replicaset_owner |
sf_workload_replicas | Consolidated metric for replicas number | With this metric, we have all replica metrics regardless type of pod owner. Should eventually replace KSM-like kube_replicaset_spec_replicas and kube_statefulset_replicas |
sf_workload_spec_replicas | Consolidated metric for desired replicas number | With this metric, we have all desired replica metrics regardless type of pod owner. Should eventually replace KSM-like kube_replicaset_spec_replicas and kube_statefulset_replicas |
sf_workload_status_replicas | Consolidated metric for observed replicas number | With this metric, we have all observed replica metrics regardless type of pod owner. Should eventually replace KSM-like kube_replicaset_status_replicas and kube_statefulset_replicas |
sf_workload_pod_container_resource_requests | Consolidated pod metric with requests | With this metric, we have all requests metrics in a single metric. Should eventually replace KSM-like kube_pod_container_resource_requests |
sf_workload_pod_container_resource_limits | Consolidated pod metric with limits | With this metric, we have all limits metrics in a single metric. Should eventually replace KSM-like kube_pod_container_resource_limits |
container_cpu_usage_seconds_total | cadvisor | Track cpu usage for each container |
container_memory_working_set_bytes | cadvisor | Track memory usage for each container |
sf_horizontalpodautoscaler_spec_min_replicas | KSM-like/horizontalpodautoscaler-metrics | Track minimum replicas for each HPA |
sf_horizontalpodautoscaler_spec_max_replicas | KSM-like/horizontalpodautoscaler-metrics | Track maximum replicas for each HPA |
sf_horizontalpodautoscaler_spec_target_metric | KSM-like/horizontalpodautoscaler-metrics | Track target metric for each HPA |
Individual tenants could have additional metrics.
Troubleshooting StormForge Agent
Getting Logs from Prometheus Agent
In case one does not see data on AMP, check the prometheus agent logs. In this example below, the agent is running on namespace stormforge-system
:
If there is no errors, see the next steps.
Verify Prom Targets
When you install the agent, you should be sure to verify it is actually able to scrape the workload metrics. In particular, stormforge-agent has a static url config which makes it config error prone, which is https://<>:8080/metrics
. In this example below, the agent is running on namespace stormforge-system
To look at the actual metrics from the perspective of the stormforge-agent:
Checking Prometheus WAL
Data should be on the WAL. In this example below, the agent is running on namespace stormforge-system
:
By default, we are holding 30 minutes on data on WAL.
Credentials
Credentials are not authorized, ask permission:
Bad credentials, double check parameters passed during installation (i.e. secrets):
Enable debug logging
Debug logging can now be enabled via http requests.
This should make it more useful to enable debug logging for a short period.
The default log level is 1
( info ).
This can be changed by: