Kubecost leverages the open-source Prometheus project as a time series database and post-processes the data in Prometheus to perform cost allocation calculations and provide optimization insights for your Kubernetes clusters. Prometheus is a single machine statically-resourced container, so depending on your cluster size or when your cluster scales out, your cluster could exceed the scraping capabilities of a single Prometheus server. In this doc, you will learn how Kubecost integrates with Google Cloud Managed Service for Prometheus (GMP), a managed Prometheus-compatible monitoring service, to enable the customer to monitor Kubernetes costs at scale easily.
For this integration, GMP is required to be enabled for your GKE cluster with the managed collection. Next, Kubecost is installed in your GKE cluster and leverages GMP Prometheus binary to ingest metrics into GMP database seamlessly. In this setup, Kubecost deployment also automatically creates a Prometheus proxy that allows Kubecost to query the metrics from the GMP database for cost allocation calculation.
This integration is currently in beta.
You have a GCP account/subscription.
You have permission to manage GKE clusters and GCP monitoring services.
You have an existing GKE cluster with GMP enabled. You can learn more here.
You can use the following command to install Kubecost on your GKE cluster and integrate with GMP:
In this installation command, these additional flags are added to have Kubecost work with GMP:
prometheus.server.image.repository
and prometheus.server.image.tag
replace the standard Prometheus image with GMP specific image.
global.gmp.enabled
and global.gmp.gmpProxy.projectId
are for enabling the GMP integration.
prometheus.server.global.external_labels.cluster_id
and kubecostProductConfigs.clusterName
helps to set the name for your Kubecost setup.
You can find additional configurations at our main values.yaml file.
Your Kubecost setup now writes and collects data from GMP. Data should be ready for viewing within 15 minutes.
Run the following command to enable port-forwarding to expose the Kubecost dashboard:
To verify that the integration is set up, go to Settings in the Kubecost UI, and check the Prometheus Status section.
From your GCP Monitoring - Metrics explorer console, You can run the following query to verify if Kubecost metrics are collected:
The below queries must return data for Kubecost to calculate costs correctly. For the queries to work, set the environment variables:
Verify connection to GMP and that the metric for container_memory_working_set_bytes
is available:
If you have set kubecostModel.promClusterIDLabel
in the Helm chart, you will need to change the query (CLUSTER_ID
) to match the label.
Verify Kubecost metrics are available in GMP:
You should receive an output similar to:
If id
returns as a blank value, you can set the following Helm value to force set cluster
as the Prometheus cluster ID label:
If the above queries fail, check the following:
Check logs of the sigv4proxy
container (may be the Kubecost deployment or Prometheus Server deployment depending on your setup):
In a working sigv4proxy
, there will be very few logs.
Correctly working log output:
Check logs in the cost-model
container for Prometheus connection issues:
Example errors:
Additionally, read our Custom Prometheus integration troubleshooting guide if you run into any other errors while setting up the integration. For support from GCP, you can submit a support request at the GCP support hub.
In the standard deployment of Kubecost, Kubecost is deployed with a bundled Prometheus instance to collect and store metrics of your Kubernetes cluster. Kubecost also provides the flexibility to connect with your time series database or storage. Grafana Mimir is an open-source, horizontally scalable, highly available, multi-tenant TSDB for long-term storage for Prometheus.
This document will show you how to integrate the Grafana Mimir with Kubecost for long-term metrics retention. In this setup, you need to use Grafana Agent to collect metrics from Kubecost and your Kubernetes cluster. The metrics will be re-written to your existing Grafana Mimir setup without an authenticating reverse proxy
You have access to a running Kubernetes cluster
You have an existing Grafana Mimir setup
Install the Grafana Agent for Kubernetes on your cluster. On the existing K8s cluster that you intend to install Kubecost, run the following commands to install the Grafana Agent to scrape the metrics from Kubecost /metrics
endpoint. The script below installs the Grafana Agent with the necessary scraping configuration for Kubecost, you may want to add additional scrape configuration for your setup.
You can also verify if grafana-agent
is scraping data with the following command (optional):
To learn more about how to install and configure the Grafana agent, as well as additional scrape configuration, please refer to Grafana Agent documentation, or you can view the Kubecost Prometheus scrape config at this GitHub repository.
Run the following command to deploy Kubecost. Please remember to update the environment variables values with your Mimir setup information.
The process is complete. By now, you should have successfully completed the Kubecost integration with your Grafana Mimir setup.
There are several considerations when disabling the Kubecost included Prometheus deployment. Kubecost strongly recommends installing Kubecost with the bundled Prometheus in most environments.
The Kubecost Prometheus deployment is optimized to not interfere with other observability instrumentation and by default only contains metrics that are useful to the Kubecost product. This results in 70-90% fewer metrics than a Prometheus deployment using default settings.
Additionally, if multi-cluster metric aggregation is required, Kubecost provides a turnkey solution that is highly tuned and simple to support using the included Prometheus deployment.
This feature is accessible to all users. However, please note that comprehensive support is provided with a paid support plan.
Kubecost requires the following minimum versions:
Prometheus: v2.18 (v2.13-2.17 supported with limited functionality)
kube-state-metrics: v1.6.0+
cAdvisor: kubelet v1.11.0+
node-exporter: v0.16+ (Optional)
If you have node-exporter and/or KSM running on your cluster, follow this step to disable the Kubecost included versions. Additional detail on KSM requirements.
In contrast to our recommendation above, we do recommend disabling the Kubecost's node-exporter and kube-state-metrics if you already have them running in your cluster.
This process is not recommended. Before continuing, review the Bring your own Prometheus section if you haven't already.
Pass the following parameters in your Helm install:
The FQDN can be a full path via https://prometheus-prod-us-central-x.grafana.net/api/prom/
if you use Grafana Cloud-managed Prometheus. Learn more in the Grafana Cloud Integration for Kubecost doc.
Have your Prometheus scrape the cost-model /metrics
endpoint. These metrics are needed for reporting accurate pricing data. Here is an example scrape config:
This config needs to be added to extraScrapeConfigs
in the Prometheus configuration. See the example extraScrapeConfigs.yaml.
By default, the Prometheus chart included with Kubecost (bundled-Prometheus) contains scrape configs optimized for Kubecost-required metrics. You need to add those scrape configs jobs into your existing Prometheus setup to allow Kubecost to provide more accurate cost data and optimize the required resources for your existing Prometheus.
You can find the full scrape configs of our bundled-Prometheus here. You can check Prometheus documentation for more information about the scrape config, or read this documentation if you are using Prometheus Operator.
This step is optional. If you do not set up Kubecost's CPU usage recording rule, Kubecost will fall back to a PromQL subquery which may put unnecessary load on your Prometheus.
Kubecost-bundled Prometheus includes a recording rule used to calculate CPU usage max, a critical component of the request right-sizing recommendation functionality. Add the recording rules to reduce query load here.
Alternatively, if your environment supports serviceMonitors
and prometheusRules
, pass these values to your Helm install:
To confirm this job is successfully scraped by Prometheus, you can view the Targets page in Prometheus and look for a job named kubecost
.
This step is optional, and only impacts certain efficiency metrics. View issue/556 for a description of what will be missing if this step is skipped.
You'll need to add the following relabel config to the job that scrapes the node exporter DaemonSet.
This does not override the source label. It creates a new label called kubernetes_node
and copies the value of pod into it.
In order to distinguish between multiple clusters, Kubecost needs to know the label used in prometheus to identify the name. Use the .Values.kubecostModel.promClusterIDLabel
. The default cluster label is cluster_id
, though many environments use the key of cluster
.
By default, metric retention is 91 days, however the retention of data can be further increased with a configurable value for a property etlDailyStoreDurationDays
. You can find this value here.
Increasing the default etlDailyStorageDurationDays
value will naturally result in greater memory usage. At higher values, this can cause errors when trying to display this information in the Kubecost UI. You can remedy this by increasing the Step size when using the Allocations dashboard.
The Diagnostics page (Settings > View Full Diagnostics) provides diagnostic info on your integration. Scroll down to Prometheus Status to verify that your configuration is successful.
Below you can find solutions to common Prometheus configuration problems. View the Kubecost Diagnostics doc for more information.
Evidenced by the following pod error message No valid prometheus config file at ...
and the init pods hanging. We recommend running curl <your_prometheus_url>/api/v1/status/config
from a pod in the cluster to confirm that your Prometheus config is returned. Here is an example, but this needs to be updated based on your pod name and Prometheus address:
In the above example, <your_prometheus_url> may include a port number and/or namespace, example: http://prometheus-operator-kube-p-prometheus.monitoring:9090/api/v1/status/config
If the config file is not returned, this is an indication that an incorrect Prometheus address has been provided. If a config file is returned from one pod in the cluster but not the Kubecost pod, then the Kubecost pod likely has its access restricted by a network policy, service mesh, etc.
Network policies, Mesh networks, or other security related tooling can block network traffic between Prometheus and Kubecost which will result in the Kubecost scrape target state as being down in the Prometheus targets UI. To assist in troubleshooting this type of error you can use the curl
command from within the cost-analyzer container to try and reach the Prometheus target. Note the "namespace" and "deployment" name in this command may need updated to match your environment, this example uses the default Kubecost Prometheus deployment.
When successful, this command should return all of the metrics that Kubecost uses. Failures may be indicative of the network traffic being blocked.
Ensure Prometheus isn't being CPU throttled due to a low resource request.
Review the Dependency Requirements section above
Visit Prometheus Targets page (screenshot above)
Make sure that honor_labels is enabled
Ensure results are not null for both queries below.
Make sure Prometheus is scraping Kubecost search metrics for: node_total_hourly_cost
Ensure kube-state-metrics are available: kube_node_status_capacity
For both queries, verify nodes are returned. A successful response should look like:
An error will look like:
Ensure that all clusters and nodes have values- output should be similar to the above Single Cluster Tests
Make sure Prometheus is scraping Kubecost search metrics for: node_total_hourly_cost
On macOS, change date -d '1 day ago'
to date -v '-1d'
Ensure kube-state-metrics are available: kube_node_status_capacity
For both queries, verify nodes are returned. A successful response should look like:
An error will look like:
Kubecost leverages the open-source Prometheus project as a time series database and post-processes the data in Prometheus to perform cost allocation calculations and provide optimization insights for your Kubernetes clusters such as Amazon Elastic Kubernetes Service (Amazon EKS). Prometheus is a single machine statically-resourced container, so depending on your cluster size or when your cluster scales out, it could exceed the scraping capabilities of a single Prometheus server. In collaboration with Amazon Web Services (AWS), Kubecost integrates with Amazon Managed Service for Prometheus (AMP), a managed Prometheus-compatible monitoring service, to enable the customer to easily monitor Kubernetes cost at scale.
The architecture of this integration is similar to Amazon EKS cost monitoring with Kubecost, which is described in the previous blog post, with some enhancements as follows:
In this integration, an additional AWS SigV4 container is added to the cost-analyzer pod, acting as a proxy to help query metrics from Amazon Managed Service for Prometheus using the AWS SigV4 signing process. It enables passwordless authentication to reduce the risk of exposing your AWS credentials.
When the Amazon Managed Service for Prometheus integration is enabled, the bundled Prometheus server in the Kubecost Helm Chart is configured in the remote_write mode. The bundled Prometheus server sends the collected metrics to Amazon Managed Service for Prometheus using the AWS SigV4 signing process. All metrics and data are stored in Amazon Managed Service for Prometheus, and Kubecost queries the metrics directly from Amazon Managed Service for Prometheus instead of the bundled Prometheus. It helps customers not worry about maintaining and scaling the local Prometheus instance.
There are two architectures you can deploy:
The Quick-Start architecture supports a small multi-cluster setup of up to 100 clusters.
The Federated architecture supports a large multi-cluster setup for over 100 clusters.
The infrastructure can manageup to 100 clusters. The following architecture diagram illustrates the small-scale infrastructure setup:
To support the large-scale infrastructure of over 100 clusters, Kubecost leverages a Federated ETL architecture. In addition to Amazon Prometheus Workspace, Kubecost stores its extract, transform, and load (ETL) data in a central S3 bucket. Kubecost's ETL data is a computed cache based on Prometheus's metrics, from which users can perform all possible Kubecost queries. By storing the ETL data on an S3 bucket, this integration offers resiliency to your cost allocation data, improves the performance and enables high availability architecture for your Kubecost setup.
The following architecture diagram illustrates the large-scale infrastructure setup:
You have an existing AWS account. You have IAM credentials to create Amazon Managed Service for Prometheus and IAM roles programmatically. You have an existing Amazon EKS cluster with OIDC enabled. Your Amazon EKS clusters have Amazon EBS CSI driver installed
The example output should be in this format:
The Amazon Managed Service for Prometheus workspace should be created in a few seconds. Run the following command to get the workspace ID:
Run the following command to set environment variables for integrating Kubecost with Amazon Managed Service for Prometheus:
Note: You can ignore Step 2 for the small-scale infrastructure setup.
a. Create Object store S3 bucket to store Kubecost ETL metrics. Run the following command in your workspace:
b. Create IAM Policy to grant access to the S3 bucket. The following policy is for demo purposes only. You may need to consult your security team and make appropriate changes depending on your organization's requirements.
Run the following command in your workspace:
c. Create Kubernetes secret to allow Kubecost to write ETL files to the S3 bucket. Run the following command in your workspace:
These following commands help to automate the following tasks:
Create an IAM role with the AWS-managed IAM policy and trusted policy for the following service accounts: kubecost-cost-analyzer-amp
, kubecost-prometheus-server-amp
.
Modify current K8s service accounts with annotation to attach a new IAM role.
Run the following command in your workspace:
For more information, you can check AWS documentation at IAM roles for service accounts and learn more about Amazon Managed Service for Prometheus managed policy at Identity-based policy examples for Amazon Managed Service for Prometheus
Run the following command to create a file called config-values.yaml, which contains the defaults that Kubecost will use for connecting to your Amazon Managed Service for Prometheus workspace.
Run this command to install Kubecost and integrate it with the Amazon Managed Service for Prometheus workspace as the primary:
These installation steps are similar to those for a primary cluster setup, except you do not need to follow the steps in the section "Create Amazon Managed Service for Prometheus workspace", and you need to update these environment variables below to match with your additional clusters. Please note that the AMP_WORKSPACE_ID
and KC_BUCKET
are the same as the primary cluster.
Run this command to install Kubecost and integrate it with the Amazon Managed Service for Prometheus workspace as the additional cluster:
Your Kubecost setup is now writing and collecting data from AMP. Data should be ready for viewing within 15 minutes.
To verify that the integration is set up, go to Settings in the Kubecost UI, and check the Prometheus Status section.
Read our Custom Prometheus integration troubleshooting guide if you run into any errors while setting up the integration. For support from AWS, you can submit a support request through your existing AWS support contract.
You can add these recording rules to improve the performance. Recording rules allow you to precompute frequently needed or computationally expensive expressions and save their results as a new set of time series. Querying the precomputed result is often much faster than running the original expression every time it is needed. Follow these instructions to add the following rules:
The below queries must return data for Kubecost to calculate costs correctly.
For the queries below to work, set the environment variables:
Verify connection to AMP and that the metric for container_memory_working_set_bytes
is available:
If you have set kubecostModel.promClusterIDLabel
, you will need to change the query (CLUSTER_ID
) to match the label (typically cluster
or alpha_eksctl_io_cluster_name
).
The output should contain a JSON entry similar to the following.
The value of cluster_id
should match the value of kubecostProductConfigs.clusterName
.
Verify Kubecost metrics are available in AMP:
The output should contain a JSON entry similar to:
If the above queries fail, check the following:
Check logs of the sigv4proxy
container (may be the Kubecost deployment or Prometheus Server deployment depending on your setup):
In a working sigv4proxy
, there will be very few logs.
Correctly working log output:
Check logs in the `cost-model`` container for Prometheus connection issues:
Example errors:
is a composable observability platform, integrating metrics, traces and logs with Grafana. Customers can leverage the best open source observability software without the overhead of installing, maintaining, and scaling your observability stack.
This document will show you how to integrate the Grafana Cloud Prometheus metrics service with Kubecost.
You have access to a running Kubernetes cluster
You have created a Grafana Cloud account
You have permissions to create Grafana Cloud API keys
Install the Grafana Agent for Kubernetes on your cluster. On the existing K8s cluster that you intend to install Kubecost, run the following commands to install the Grafana Agent to scrape the metrics from Kubecost /metrics
endpoint. The script below installs the Grafana Agent with the necessary scraping configuration for Kubecost, you may want to add additional scrape configuration for your setup. Please remember to replace the following values with your actual Grafana cloud's values:
REPLACE-WITH-GRAFANA-PROM-REMOTE-WRITE-ENDPOINT
REPLACE-WITH-GRAFANA-PROM-REMOTE-WRITE-USERNAME
REPLACE-WITH-GRAFANA-PROM-REMOTE-WRITE-API-KEY
REPLACE-WITH-YOUR-CLUSTER-NAME
You can also verify if grafana-agent
is scraping data with the following command (optional):
dbsecret
to allow Kubecost to query the metrics from Grafana Cloud Prometheus:Create two files in your working directory, called USERNAME
and PASSWORD
respectively
Verify that you can run queries against your Grafana Cloud Prometheus query endpoint (optional):
Create K8s secret name dbsecret
:
Verify if the credentials appear correctly (optional):
After installing the tool, create a file called kubecost_rules.yaml with the following command:
Then, make sure you are in the same directory as your _kubecost\_rules.yaml_
, and load the rules using Cortextool. Replace the address with your Grafana Cloud’s Prometheus endpoint (Remember to omit the /api/prom path from the endpoint URL).
Print out the rules to verify that they’ve been loaded correctly:
Install Kubecost on your K8s cluster with Grafana Cloud Prometheus query endpoint and dbsecret
you created in Step 2.
The process is complete. By now, you should have successfully completed the Kubecost integration with Grafana Cloud.
To learn more about how to install and config Grafana agent as well as additional scrape configuration, please refer to documentation or you can check Kubecost Prometheus scrape config at this .
To set up recording rules in Grafana Cloud, download the . While they are optional, they offer improved performance.
Optionally, you can also add our to your organization to visualize your cloud costs in Grafana.