Google Cloud Managed Service for Prometheus

Overview

Kubecost leverages the open-source Prometheus project as a time series database and post-processes the data in Prometheus to perform cost allocation calculations and provide optimization insights for your Kubernetes clusters. Prometheus is a single machine statically-resourced container, so depending on your cluster size or when your cluster scales out, your cluster could exceed the scraping capabilities of a single Prometheus server. In this doc, you will learn how Kubecost integrates with Google Cloud Managed Service for Prometheus (GMP), a managed Prometheus-compatible monitoring service, to enable the customer to monitor Kubernetes costs at scale easily.

For this integration, GMP is required to be enabled for your GKE cluster with the managed collection. Next, Kubecost is installed in your GKE cluster and leverages GMP Prometheus binary to ingest metrics into GMP database seamlessly. In this setup, Kubecost deployment also automatically creates a Prometheus proxy that allows Kubecost to query the metrics from the GMP database for cost allocation calculation.

This integration is currently in beta.

Reference resources

Instructions

Prerequisites

  • You have a GCP account/subscription.

  • You have permission to manage GKE clusters and GCP monitoring services.

  • You have an existing GKE cluster with GMP enabled. You can learn more here.

Installation

You can use the following command to install Kubecost on your GKE cluster and integrate with GMP:

export GCP_PROJECT_ID="YOUR_GCP_PROJECT_ID"
export CLUSTER_NAME="YOUR_GKE_CLUSTER_NAME"
helm upgrade -i kubecost cost-analyzer/ \
--namespace kubecost --create-namespace \
--set prometheus.server.image.repository="gke.gcr.io/prometheus-engine/prometheus" \
--set prometheus.server.image.tag="v2.35.0-gmp.2-gke.0" \
--set global.gmp.enabled="true" \
--set global.gmp.gmpProxy.projectId="${GCP_PROJECT_ID}" \
--set prometheus.server.global.external_labels.cluster_id="${CLUSTER_NAME}" \
--set kubecostProductConfigs.clusterName="${CLUSTER_NAME}" \
--set federatedETL.useMultiClusterDB=true

In this installation command, these additional flags are added to have Kubecost work with GMP:

  • prometheus.server.image.repository and prometheus.server.image.tag replace the standard Prometheus image with GMP specific image.

  • global.gmp.enabled and global.gmp.gmpProxy.projectId are for enabling the GMP integration.

  • prometheus.server.global.external_labels.cluster_id and kubecostProductConfigs.clusterName helps to set the name for your Kubecost setup.

You can find additional configurations at our main values.yaml file.

Your Kubecost setup now writes and collects data from GMP. Data should be ready for viewing within 15 minutes.

Verification

Run the following command to enable port-forwarding to expose the Kubecost dashboard:

kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090

To verify that the integration is set up, go to Settings in the Kubecost UI, and check the Prometheus Status section.

From your GCP Monitoring - Metrics explorer console, You can run the following query to verify if Kubecost metrics are collected:

avg(node_cpu_hourly_cost) by (cluster_id)

Troubleshooting

Cluster efficiency displaying as 0%, or efficieny only displaying for most recent cluster

The below queries must return data for Kubecost to calculate costs correctly. For the queries to work, set the environment variables:

KUBECOST_NAMESPACE=kubecost
KUBECOST_DEPLOYMENT=kubecost-cost-analyzer
CLUSTER_ID=YOUR_CLUSTER_NAME
  1. Verify connection to GMP and that the metric for container_memory_working_set_bytes is available:

If you have set kubecostModel.promClusterIDLabel in the Helm chart, you will need to change the query (CLUSTER_ID) to match the label.

kubectl exec -it -n $KUBECOST_NAMESPACE \
  deployments/$KUBECOST_DEPLOYMENT -c cost-analyzer-frontend \
  -- curl "0:9090/model/prometheusQuery?query=container_memory_working_set_bytes\{CLUSTER_ID=\"$CLUSTER_ID\"\}" \
 | jq
  1. Verify Kubecost metrics are available in GMP:

kubectl exec -it -n $KUBECOST_NAMESPACE \
  deployments/$KUBECOST_DEPLOYMENT -c cost-analyzer-frontend \
  -- curl "0:9090/model/prometheusQuery?query=node_total_hourly_cost\{CLUSTER_ID=\"$CLUSTER_ID\"\}" \
 | jq

You should receive an output similar to:

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "",
          "cluster": "",
          "id": "/",
          "instance": "",
          "job": "kubelet",
          "location": "",
          "node": "",
          "project_id": ""
        },
        "value": [
          1697627020,
          "2358820864"
        ]
      },

If id returns as a blank value, you can set the following Helm value to force set cluster as the Prometheus cluster ID label:

kubecostModel:
  promClusterIDLabel: cluster

If the above queries fail, check the following:

  1. Check logs of the sigv4proxy container (may be the Kubecost deployment or Prometheus Server deployment depending on your setup):

kubectl logs deployments/$KUBECOST_DEPLOYMENT -c sigv4proxy --tail -1

In a working sigv4proxy, there will be very few logs.

Correctly working log output:

time="2023-09-21T17:40:15Z" level=info msg="Stripping headers []" StripHeaders="[]"
time="2023-09-21T17:40:15Z" level=info msg="Listening on :8005" port=":8005"
  1. Check logs in the cost-model container for Prometheus connection issues:

kubectl logs deployments/$KUBECOST_DEPLOYMENT -c cost-model --tail -1 | grep -i err

Example errors:

ERR CostModel.ComputeAllocation: pod query 1 try 2 failed: avg(kube_pod_container_status_running...
Prometheus communication error: 502 (Bad Gateway) ...

Additionally, read our Custom Prometheus integration troubleshooting guide if you run into any other errors while setting up the integration. For support from GCP, you can submit a support request at the GCP support hub.

Last updated