1 of 52

Installation

Installing Kubecost

To get started with Kubecost and OpenCost, visit our Installation page which will take you step by step through getting Kubecost set up.

This installation method is available for free and leverages the Kubecost Helm Chart. It provides access to all OpenCost and Kubecost community functionality and can scale to large clusters. This will also provide a token for trialing and retaining data across different Kubecost product tiers.

Alternative installation methods

You can also install directly with the Kubecost Helm Chart with Helm v3.1+ using the following commands. This provides the same functionality as the step above but doesn't generate a product token for managing tiers or upgrade trials.

helm upgrade --install kubecost \
  --repo https://kubecost.github.io/cost-analyzer/ cost-analyzer \
  --namespace kubecost --create-namespace

You can run Helm Template against the Kubecost Helm Chart to generate local YAML output. This requires extra effort when compared to directly installing the Helm Chart but is more flexible than deploying a flat manifest.

helm template kubecost \
  --repo https://kubecost.github.io/cost-analyzer/ cost-analyzer \
  --namespace kubecost --create-namespace \
  -f your-custom-values.yaml > kubecost.yaml
kubectl apply -f kubecost.yaml

You can install via flat manifest. This install path is not recommended because it has limited flexibility for managing your deployment and future upgrades.

kubectl apply -f https://raw.githubusercontent.com/kubecost/cost-analyzer-helm-chart/develop/kubecost.yaml

Lastly, you can deploy the open-source OpenCost project directly as a Pod. This install path provides a subset of free functionality and is available here. Specifically, this install path deploys the underlying cost allocation model without the same UI or access to enterprise functionality: cloud provider billing integration, RBAC/SAML support, and scale improvements in Kubecost.

Configuring Kubecost at install

Kubecost has a number of product configuration options that you can specify at install time in order to minimize the number of settings changes required within the product UI. This makes it simple to redeploy Kubecost. These values can be configured under kubecostProductConfigs in our values.yaml. These parameters are passed to a ConfigMap that Kubecost detects and writes to its /var/configs.

Troubleshooting installation

If you encounter any errors while installing Kubecost, first visit our Troubleshoot Install doc. If the error you are experiencing is not already documented here, or a solution is not found, contact our Support team at support@kubecost.com for more help.

Updating Kubecost

Kubecost releases are scheduled on a near-monthly basis. You can keep up to date with new Kubecost updates and patches by following our release notes here.

After installing Kubecost, you will be able to update Kubecost with the following command, which will upgrade you to the most recent version:

helm repo update && helm upgrade kubecost kubecost/cost-analyzer -n kubecost

You can upgrade or downgrade to a specific version of Kubecost with the following command:

helm upgrade kubecost --repo... --version 1.XXX.X

Deleting Kubecost

To uninstall Kubecost and its dependencies, run the following command:

helm uninstall kubecost -n kubecost

Next steps

After successfully installing Kubecost, first time users should review our First Time User Guide to start immediately seeing the benefits of the product while also ensuring their workspace is properly set up.

First Time User Guide

After successfully installing Kubecost, new users should familiarize themselves with these onboarding steps to begin immediately realizing value. This doc will explain to you the core features and options you will have access to and direct you to other necessary docs groups that will help you get set up.

While certain steps in this article may be optional depending on your setup, these are recommended best practices for seeing the most value out of Kubecost as soon as possible.

Step 1: Integrate with your cloud provider(s)

Many Kubernetes adopters may have billing with cloud service providers (CSPs) that differs from public pricing. By default, Kubecost will detect the CSP of the cluster where it is installed and pull list prices for nodes, storage, and LoadBalancers across all major CSPs: Azure, AWS, and GCP.

However, Kubecost is also able to integrate these CSPs to receive the most accurate billing data. By completing a cloud integration, Kubecost is able to reconcile costs with your actual cloud bill to reflect enterprise discounts, Spot market prices, commitment discounts, and more.

New users should seek to integrate any and all CSPs they use into Kubecost. For an overview of cloud integrations and getting started, see our doc. Once you have completed all necessary integrations, return to this article.

Due to the frequency of updates from providers, it can take anywhere from 24 to 48 hours to see adjusted costs.

Step 2: Review your data

Now that your base install and CSP integrations are complete, it's time to determine the accuracy against your cloud bill. Based on different methods of cost aggregation, Kubecost should assess your billing data within a 3-5% margin of error.

Monitoring your cost billing

After enabling port-forwarding, you should have access to the Kubecost UI. Explore the different pages in the left navigation, starting with the Monitor dashboards. These pages, including Allocations, Assets, Clusters, and Cloud Costs, are comprised of different categories of cost spending, and allow you to apply customized queries for specific billing data. These queries can then be saved in the form of reports for future quick access. Each page of the Kubecost UI has more dedicated information in the section.

Step 3: Learn about protecting your data and spend

It's important to take precautions to ensure your billing data is preserved, and you know how to monitor your infrastructure's health.

ETL Backup

Metrics reside in Prometheus, but extracting information for either the UI or through API responses directly from this store is not performant at scale. For this reason, the data is optimized and stored in a structure is called extract, transform, load, or ETL. Kubecost's definition of ETL usually will refer to this ETL process.

Alerts and Health

Similar to most systems, monitoring health is vital. For this, we offer several means of monitoring the health of both Kubecost and the host cluster.

Alerts

Health

Step 4: Multi-cluster and Federated setups

Kubecost has multiple ways of supporting multi-cluster environments, which vary based on your Kubecost product tier.

Kubecost Enterprise provides a "single-pane-of-glass" view which combines metrics across all clusters into a shared storage bucket. One cluster is designated as the primary cluster from which you view the UI, with all other clusters considered secondary. Attempting to view the UI through a secondary cluster will not display metrics across your entire environment.

Learning more about Kubecost

Next Steps with Kubecost

Once you have familiarized yourself with Kubecost and integrated with any cloud providers, it's time to move on to more advanced concepts. This doc provides commonly used product configurations and feature overviews to help get you up and running after the Kubecost product has been installed. You may be redirected to other Kubecost docs to learn more about specific concepts or follow tutorials.

Memory and storage

The default Kubecost installation has a 32Gb persistent volume and a 15-day retention period for Prometheus metrics. This is enough space to retain data for roughly 300 pods, depending on your exact node and container count. See the Kubecost Helm chart configuration options to adjust both the retention period and storage size.

To determine the appropriate disk size, you can use this formula to approximate:

needed_disk_space = retention_time_minutes * ingested_samples_per_minutes * bytes_per_sample

Where ingested samples can be measured as the average over a recent period, e.g. sum(avg_over_time(scrape_samples_post_metric_relabeling[24h])). On average, Prometheus uses around 1.5-2 bytes per sample. So, ingesting 100k samples per minute and retaining them for 15 days would demand around 40 GB. It’s recommended to add another 20-30% capacity for headroom and WAL. More info on disk sizing here.

More than 30 days of data should not be stored in Prometheus for larger clusters. For long-term data retention, contact us at support@kubecost.com about Kubecost with durable storage enabled. More info on Kubecost storage here.

Setting requests and limits

Users should set and/or update resource requests and limits before taking Kubecost into production at scale. These inputs can be configured in the Kubecost values.yaml for Kubecost modules and subcharts.

The exact recommended values for these parameters depend on the size of your cluster, availability requirements, and usage of the Kubecost product. Suggested values for each container can be found within Kubecost itself on the namespace page. More info on these recommendations is available here.

For best results, run Kubecost for up to seven days on a production cluster, then tune resource requests/limits based on resource consumption.

Configure security of user access

To broaden usage to other teams or departments within your Kubecost environment, basic security measures will usually be required. There are a number of options for protecting your workspace depending on your Kubecost product tier.

Ingress controller

Establishing an ingress controller will allow for control of access for your workspace. Learn more about enabling external access in Kubecost with our Ingress Examples doc.

SSO/SAML/RBAC/OIDC

SSO/SAML/RBAC/OIDC are only officially supported on Kubecost Enterprise plans.

You can configure SSO and RBAC on a separate baseline deployment, which will not only shorten the deployment time of security features, but it will also avoid unwanted access denial. This is helpful when using only one developer deployment. See our user management guides below:

Using an existing node exporter

For teams already running node exporter on the default port, our bundled node exporter may remain in a Pending state. You can optionally use an existing node exporter DaemonSet by setting the prometheus.nodeExporter.enabled and prometheus.serviceAccounts.nodeExporter.create Kubecost Helm chart config options to false. This requires your existing node exporter endpoint to be visible from the namespace where Kubecost is installed. More configs options shown here.

Deploying Kubecost without persistent volumes

You may optionally pass the following Helm flags to install Kubecost and its bundled dependencies without any persistent volumes. However, any time the Prometheus server pod is restarted, all historical billing data will be lost unless Thanos or other long-term storage is enabled in the Kubecost product.

--set prometheus.alertmanager.persistentVolume.enabled=false
--set prometheus.pushgateway.persistentVolume.enabled=false
--set prometheus.server.persistentVolume.enabled=false
--set persistentVolume.enabled=false

Resource efficiency and idle costs

Efficiency and idle costs can teach you more about the cost-value of your Kubernetes spend by showing you how efficiently your resources are used. To learn more about pod resource efficiency and cluster idle costs, see Efficiency and Idle.

Environment

Kubecost requires a Kubernetes cluster to be deployed.

Supported Kubernetes versions

Users should be running Kubernetes 1.20+.
Kubernetes 1.28 is officially supported as of v1.105.
Versions outside of the stated compatibility range may work, depending on individual configurations, but are untested.

Supported cluster types

Managed Kubernetes clusters (e.g. EKS, GKE, AKS) most common
Kubernetes distributions (e.g. OpenShift, DigitalOcean, Rancher, Tanzu)
Bootstrapped Kubernetes cluster
On-prem and air-gapped using custom pricing sheets

Supported Cloud Providers

AWS (Amazon Web Services)
- All regions supported, as shown in opencost/pkg/cloud/awsprovider.go
- x86, ARM
GCP (Google Cloud Platform)
- All regions supported, as shown in opencost/pkg/cloud/gcpprovider.go
- x86
Azure (Microsoft)
- All regions supported, as shown in opencost/pkg/cloud/azureprovider.go
- x86

This list is certainly not exhaustive! This is simply a list of observations as to where our users run Kubecost based on their questions and feedback. Please contact us with any questions!

Helm Parameters

Often while using and configuring Kubecost, our documentation may ask you to pass certain Helm flag values. There are three different approaches for passing custom Helm values into your Kubecost product, which are explained in this doc. In these examples, we are updating the kubecostProductConfigs.productKey.key Helm value which enables Kubecost Enterprise, however these methods will work for all other Helm flags.

Method 1: Pass exact parameters via `--set` command-line flags

For example, you can only pass a product key if that is all you need to configure.

$ helm install kubecost cost-analyzer \
    --repo https://kubecost.github.io/cost-analyzer/ \
    --namespace kubecost --create-namespace \
    --set kubecostProductConfigs.productKey.key="123" \
    --set kubecostProductConfigs.productKey.enabled=true
    ...

Method 2: Pass exact parameters via custom `values` file

Similar to Method 1, you can create a separate values file that contains only the parameters needed.

Your values.yaml should look like this:

kubecostProductConfigs:
  productKey: 
    key: "123"
    enabled: true

Then run your install command:

$ helm install kubecost cost-analyzer \
    --repo https://kubecost.github.io/cost-analyzer/ \
    --namespace kubecost --create-namespace \
    --values values.yaml

Ingress Examples

Enabling external access to the Kubecost product requires exposing access to port 9090 on the kubecost-cost-analyzer pod. Exposing this endpoint will handle routing to Grafana as well. There are multiple ways to do this, including Ingress or Service definitions.

Please exercise caution when exposing Kubecost via an Ingress controller especially if there is no authentication in use. Consult your organization's internal recommendations.

Common samples below and others can be found on our GitHub repository.

The following example definitions use the NGINX Ingress Controller.

Basic auth example

# https://kubernetes.github.io/ingress-nginx/examples/auth/basic/
apiVersion: v1
data:
  auth: Zm9vOiRhcHIxJE9GRzNYeWJwJGNrTDBGSERBa29YWUlsSDkuY3lzVDAK
kind: Secret
metadata:
  name: basic-auth
  namespace: default
type: Opaque
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kubecost-ingress-tls
  annotations:
    # type of authentication
    nginx.ingress.kubernetes.io/auth-type: basic
    # name of the secret that contains the user/password definitions
    nginx.ingress.kubernetes.io/auth-secret: basic-auth
    # message to display with an appropriate context why the authentication is required
    nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required - kubecost'
spec:
  ingressClassName: nginx
  rules:
  - host: kubecost.your.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kubecost-cost-analyzer
            port:
              number: 9090
  tls:
  - hosts:
      - kubecost.your.com
    secretName: kubecost-tls
    # Use any cert tool/cert-manager or create manually: kubectl create secret tls kubecost-tls --cert /etc/letsencrypt/live/kubecost.your.com/fullchain.pem --key /etc/letsencrypt/live/kubecost.your.com/privkey.pem

Here is a second basic auth example that uses a Kubernetes Secret.

Non-root path example

When deploying Grafana on a non-root URL, you also need to update your grafana.ini to reflect this. More info can be found in values.yaml.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kubecost-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/enable-cors: "true"
    # remove path prefix from requests before sending to kubecost-frontend
    nginx.ingress.kubernetes.io/rewrite-target: /$2
    # add trailing slash to requests of index
    nginx.ingress.kubernetes.io/configuration-snippet: |
      rewrite ^(/kubecost)$ $1/ permanent;
spec:
  rules:
  - host: demo.kubecost.io
    http:
      paths:
      # serve kubecost from demo.kubecost.io/kubecost/
      - path: /kubecost(/|$)(.*)
        pathType: ImplementationSpecific
        backend:
          service:
            name: kubecost-cost-analyzer # should be configured if another helm name or service address is used
            port:
              number: 9090

ALB Example

Once an AWS Load Balancer (ALB) Controller is installed, you can use the following Ingress resource manifest pointed at the Kubecost cost-analyzer service:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kubecost-alb-ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/scheme: internet-facing
spec:
  rules:
    - http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: kubecost-cost-analyzer
              port:
                number: 9090

Cloud Billing Integrations

Integration with cloud service providers (CSPs) via their respective billing APIs allows Kubecost to display out-of-cluster (OOC) costs (e.g. AWS S3, Google Cloud Storage, Azure Storage Account). Additionally, it allows Kubecost to reconcile Kubecost's in-cluster predictions with actual billing data to improve accuracy.

If you are using Kubecost Cloud, do not attempt to modify your install using information from this article. You need to consult Kubecost Cloud's specific cloud integration procedures which can be found here.

Kubecost's cloud processes

As indicated above, setting up a cloud integration with your CSP allows Kubecost to pull in additional billing data. The two processes that incorporate this information are reconciliation and CloudCost (formerly known as CloudUsage).

Reconciliation

Reconciliation matches in-cluster assets with items found in the billing data pulled from the CSP. This allows Kubecost to display the most accurate depiction of your in-cluster spending. Additionally, the reconciliation process creates Network assets for in-cluster nodes based on the information in the billing data. The main drawback of this process is that the CSPs have between a 6 to 24-hour delay in releasing billing data, and reconciliation requires a complete day of cost data to reconcile with the in-cluster assets. This requires a 48-hour window between resource usage and reconciliation. If reconciliation is performed within this window, asset cost is deflated to the partially complete cost shown in the billing data.

Cost-based metrics are based on on-demand pricing unless there is definitive data from a CSP that the node is not on-demand. This way estimates are as accurate as possible. If a new reserved instance is provisioned or a node joins a savings plan:

Kubecost continues to emit on-demand pricing until the node is added to the cloud bill.
Once the node is added to the cloud bill, Kubecost starts emitting something closer to the actual price.
For the time period where Kubecost assumed the node was on-demand but it was actually reserved, reconciliation fixes the price in ETL.

The reconciled assets will inherit the labels from the corresponding items in the billing data. If there exist identical label keys between the original assets and those of the billing data items, the label value of the original asset will take precedence.

Visualize unreconciled costs

Visit Settings, then toggle on Highlight Unreconciled Costs, then select Save at the bottom of the page to apply changes. Now, when you visit your Allocations or Assets dashboards, the most recent 36 hours of data will display hatching to signify unreconciled costs.

CloudCost

As of v1.106 of Kubecost, CloudCost is enabled by default, and Cloud Usage is disabled. Upgrading Kubecost will not affect the UI or hinder performance relating to this.

CloudCost allows Kubecost to pull in OOC cloud spend from your CSP's billing data, including any services run by the CSP as well as compute resources. By labelling OOC costs, their value can be distributed to your Allocations data as external costs. This allows you to better understand the proportion of OOC cloud spend that your in-cluster usage depends on.

Your cloud billing data is reflected in the aggregate costs of Account, Provider, Invoice Entity, and Service. Aggregating and drilling down into any of these categories will provide a subset of the entire bill, based on the Helm value .values.cloudCost.topNItems, which will log 1,000 values. This subset is each days' top n items by cost. An optional label list can be used to include or exclude items to be pulled from the bill.

CloudCost becomes available as soon as they appear in the billing data, with the 6 to 24-hour delay mentioned above, and are updated as they become more complete.

Managing cloud integrations

You can view your existing cloud integrations and their success status in the Kubecost UI by visiting Settings, then scrolling to Cloud Integrations. To create a new integration or learn more about existing integrations, select View additional details to go to the Cloud Integrations page.

Here, you can view your integrations and filter by successful or failed integrations. For non-successful integrations, Kubecost will display a diagnostic error message in the Status column to contextualize steps toward successful integration.

Select an individual integration to view a side panel that contains the most recent run, next run, refresh rate, and an exportable YAML of Helm configs for its CSP's integration values.

Adding a cloud integration

You can add a new cloud integration by selecting Add Integration. For guides on how to set up an integration for a specific CSP, follow these links to helpful Kubecost documentation:

Multi-Cloud
AWS
GCP
Azure

Deleting a cloud integration

Select an existing cloud integration, then in the slide panel that appears, select Delete.

Cloud integration configurations

The Kubecost Helm chart provides values that can enable or disable each cloud process on the deployment once a cloud integration has been set up. Turning off either of these processes will disable all the benefits provided by them.

Value

Default

Description

.Values.kubecostModel.etlAssetReconciliationEnabled

true

Enables reconciliation processes and endpoints. This Helm value corresponds to the ETL_ASSET_RECONCILIATION_ENABLED environment variable.

.Values.kubecostModel.etlCloudUsage

true

Enables Cloud Usage processes and endpoints. This Helm value corresponds to the ETL_CLOUD_USAGE_ENABLED environment variable.

.Values.kubecostModel.etlCloudRefreshRateHours

The interval at which the run loop executes for both reconciliation and Cloud Usage. Reducing this value will decrease resource usage and billing data access costs, but will result in a larger delay in the most current data being displayed. This Helm value corresponds to the ETL_CLOUD_REFRESH_RATE_HOURS environment variable.

.Values.kubecostModel.etlCloudQueryWindowDays

The maximum number of days that will be queried from a cloud integration in a single query. Reducing this value can help to reduce memory usage during the build process, but will also result in more queries which can drive up billing data access costs. This Helm value corresponds to the ETL_CLOUD_QUERY_WINDOW_DAYS environment variable.

.Values.kubecostModel.etlCloudRunWindowDays

The number of days into the past each run loop will query. Reducing this value will reduce memory load, however, it can cause Kubecost to miss updates to the CUR, if this has happened the day will need to be manually repaired. This Helm value corresponds to the ETL_CLOUD_RUN_WINDOW_DAYS environment variable.

Cloud account name aliasing

Often an integrated cloud account name may be a series of random letter and numbers which do not reflect the account's owner, team, or function. Kubecost allows you to rename cloud accounts to create more readable cloud metrics in your Kubecost UI. After you have successfully integrated your cloud account (see above), you need to manually edit your values.yaml and provide the original account name and your intended rename:

kubecostProductConfigs:
  cloudAccountMapping:
    ACCOUNT_ID: "ACCOUNT_NAME"

You will see these changes reflected in Kubecost's UI on the Overview page under Cloud Costs Breakdown. These example account IDs could benefit from being renamed:

Cloud Stores

The ETL contains a Map of Cloud Stores, each representing an integration with a CSP. Each Cloud Store is responsible for the Cloud Usage and reconciliation pipelines which add OOC costs and adjust Kubecost's estimated cost respectively by cost and usage data pulled from the CSP. Each Cloud Store has a unique identifier called the ProviderKey which varies depending on which CSP is being connected to and ensures that duplicate configurations are not introduced into the ETL. The value of the ProviderKey is the following for each CSP at a scope that the billing data is being for:

AWS: Account Id
GCP: Project Id
Azure: Subscription Id

The ProviderKey can be used as an argument for the endpoints for Cloud Usage and Reconciliation repair APIs, to indicate that the specified operation should only be done on a single Cloud Store rather than all of them, which is the default behavior. Additionally, the Cloud Store keeps track of the Status of the Cloud Connection Diagnostics for each of the Cloud Usage and reconciliation. The Cloud Connection Status is meant to be used as a tool in determining the health of the Cloud Connection that is the basis of each Cloud Store. The Cloud Connection Status has various failure states that are meant to provide actionable information on how to get your Cloud Connection running properly. These are the Cloud Connection Statuses:

INITIAL_STATUS: The zero value of Cloud Connection Status means that the cloud connection is untested. Once Cloud Connection Status has been changed and it should not return to this value. This status is assigned on creation to the Cloud Store
MISSING_CONFIGURATION: Kubecost has not detected any method of Cloud Configuration. This value is only possible on the first Cloud Store that is created as a wrapper for the open-source CSP. This status is assigned during failures in Configuration Retrieval.
INCOMPLETE_CONFIGURATION: Cloud Configuration is missing the required values to connect to the cloud provider. This status is assigned during failures in Configuration Retrieval.
FAILED_CONNECTION: All required Cloud Configuration values are filled in, but a connection with the CSP cannot be established. This is indicative of a typo in one of the Cloud Configuration values or an issue in how the connection was set up in the CSP's Console. The assignment of this status varies between CSPs but should happen if there if an error is thrown when an interaction with an object from the CSP's SDK occurs.
MISSING_DATA: The Cloud Integration is properly configured, but the CSP is not returning billing/cost and usage data. This status is indicative of the billing/cost and usage data export of the CSP being incorrectly set up or the export being set up in the last 48 hours and not having started populating data yet. This status is set when a query has been successfully made but the results come back empty. If the CSP already has a SUCCESSFUL_CONNECTION status, then this status should not be set because this indicates that the specific query made may have been empty.
SUCCESSFUL_CONNECTION: The Cloud Integration is properly configured and returning data. This status is set on any successful query where data is returned

After starting or restarting Cloud Usage or reconciliation, two subprocesses are started: one which fills in historic data over the coverage of the Daily CloudUsage and Asset Store, and one which runs periodically on a predefined interval to collect and process new cost and usage data as it is made available by the CSP. The ETL's status endpoint contains a cloud object that provides information about each Cloud Store including the Cloud Connection Status and diagnostic information about Cloud Usage and Reconciliation. The diagnostic items on the Cloud Usage and Reconciliation are:

Coverage: The window of time that the historical subprocess has covered
LastRun: The last time that the process ran, updates each time the periodic subprocess runs
NextRun: Next scheduled run of the periodic subprocess
Progress: Ratio of Coverage to Total amount of time to be covered
RefreshRate: The interval that the periodic subprocess runs
Resolution: The window size of the process
StartTime: When the Cloud Process was started

For more information on APIs related to rebuilding and repairing Cloud Usage or reconciliation, see the CloudCost Diagnostic APIs doc.

Multi-Cloud Integrations

Multi-cloud integrations are only officially supported on Kubecost Enteprise plans.

This document outlines how to set up cloud integration for accounts on multiple cloud service providers (CSPs), or multiple accounts on the same cloud provider. This configuration can be used independently of, or in addition, to other cloud integration configurations provided by Kubecost. Once configured, Kubecost will display cloud assets for all configured accounts and perform reconciliation for all federated clusters that have their respective accounts configured.

Step 1: Set up cloud cost and usage reporting

For each cloud account that you would like to configure, you will need to make sure that it is exporting cost data to its respective service to allow Kubecost to gain access to it.

Azure: Set up cost data export following this guide.
GCP: Set up BigQuery billing data exports with this guide.
AWS: Follow steps 1-3 to set up and configure a Cost and Usage Report (CUR) in our guide.
Alibaba: Create a user account with access to the QueryInstanceBill API.

Step 2: Create cloud integration secret

The secret should contain a file named cloud-integration.json with the following format (only containing applicable CSPs in your setup):

{
  "azure": [],
  "gcp": [],
  "aws": [],
  "alibaba": []
}

This method of cloud integration supports multiple configurations per cloud provider simply by adding each cost export to their respective arrays in the .json file. The structure and required values for the configuration objects for each cloud provider are described below. Once you have filled in the configuration object, use the command:

kubectl create secret generic <SECRET_NAME> --from-file=cloud-integration.json -n kubecost

Once the secret is created, set .Values.kubecostProductConfigs.cloudIntegrationSecret to <SECRET_NAME> and upgrade Kubecost via Helm.

A GitHub repository with sample files required can be found here. Select the folder with the name of the cloud service you are configuring.

Azure

The following values can be located in the Azure Portal under Cost Management > Exports, or Storage accounts:

azureSubscriptionID is the Subscription ID belonging to the Storage account which stores your exported Azure cost report data.
azureStorageAccount is the name of the Storage account where the exported Azure cost report data is being stored.
azureStorageAccessKey can be found by selecting Access Keys from the navigation sidebar then selecting Show keys. Using either of the two keys will work.
azureStorageContainer is the name that you chose for the exported cost report when you set it up. This is the name of the container where the CSV cost reports are saved in your Storage account.
azureContainerPath is an optional value which should be used if there is more than one billing report that is exported to the configured container. The path provided should have only one billing export because Kubecost will retrieve the most recent billing report for a given month found within the path.
azureCloud is an optional value which denotes the cloud where the storage account exists. Possible values are public and gov. The default is public.

Set these values into the following object and add them to the Azure array:

{
    "azureSubscriptionID": "AZ_cloud_integration_subscriptionId",
    "azureStorageAccount": "AZ_cloud_integration_azureStorageAccount",
    "azureStorageAccessKey": "AZ_cloud_integration_azureStorageAccessKey",
    "azureStorageContainer": "AZ_cloud_integration_azureStorageContainer",
    "azureContainerPath": "",
    "azureCloud": "public/gov"
}

GCP

If you don't already have a GCP service key for any of the projects you would like to configure, you can run the following commands in your command line to generate and export one. Make sure your GCP project is where your external costs are being run.

export PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create compute-viewer-kubecost --display-name "Compute Read Only Account Created For Kubecost" --format json
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com --role roles/compute.viewer
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com --role roles/bigquery.user
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com --role roles/bigquery.dataViewer
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com --role roles/bigquery.jobUser
gcloud iam service-accounts keys create ./compute-viewer-kubecost-key.json --iam-account compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com

You can then get your service account key to paste into the UI:

cat compute-viewer-kubecost-key.json

<KEY_JSON> is the GCP service key created above. This value should be left as a JSON when inserted into the configuration object
<PROJECT_ID> is the Project ID in the GCP service key.
<BILLING_DATA_DATASET> requires a BigQuery dataset prefix (e.g. billing_data) in addition to the BigQuery table name. A full example is billing_data.gcp_billing_export_v1_018AIF_74KD1D_534A2.

Set these values into the following object and add it to the GCP array:

{
    "key": {
        "type": "service_account",
        "project_id": "<GCP_PROJECT_ID>",
        "private_key_id": "<PRIVATE_KEY_ID>",
        "private_key": "<PRIVATE_KEY>",
        "client_email": "<CLIENT_EMAIL>",
        "client_id": "<CLIENT_ID>",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token",
        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
        "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/<CERT_NAME>"
    },
    "projectID": "<GCP_PROJECT_ID>",
    "billingDataDataset": "<GCP_BILLING_BIGQUERY_EXPORT>"
}

Many of these values in this config can be generated using the following command:

gcloud iam service-accounts keys create ./compute-viewer-kubecost-key.json --iam-account compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com

AWS

For each AWS account that you would like to configure, create an Access Key for the Kubecost user who has access to the CUR. Navigate to IAM Management Console dashboard, and select Access Management > Users. Find the Kubecost user and select Security Credentials > Create Access Key. Note the Access Key ID and Secret access key.

Gather each of these values from the AWS console for each account you would like to configure.

<ACCESS_KEY_ID> is the ID of the Access Key created in the previous step.
<ACCESS_KEY_SECRET> is the secret of the Access Key created in the
<ATHENA_BUCKET_NAME> is the S3 bucket storing Athena query results which Kubecost has permission to access. The name of the bucket should match s3://aws-athena-query-results-*, so the IAM roles defined above will automatically allow access to it. The bucket can have a canned ACL set to Private or other permissions as needed.
<ATHENA_REGION> is the AWS region Athena is running in
<ATHENA_DATABASE> is the name of the database created by the Athena setup. The Athena database name is available as the value (physical id) of AWSCURDatabase in the CloudFormation stack created above.
<ATHENA_TABLE> is the name of the table created by the Athena setup The table name is typically the database name with the leading athenacurcfn_ removed (but is not available as a CloudFormation stack resource).
<ATHENA_WORKGROUP> is the workgroup assigned to be used with Athena. Default value is Primary.
<ATHENA_PROJECT_ID>is the AWS AccountID where the Athena CUR is. For example: 530337586277.
<MASTER_PAYER_ARN> is an optional value which should be set if you are using a multi-account billing set-up and are not accessing Athena through the primary account. It should be set to the ARN of the role in the management (formerly master payer) account, for example: arn:aws:iam::530337586275:role/KubecostRole.

Set these values into the following object and add them to the AWS array in the cloud-integration.json:

{
    "serviceKeyName": "<ACCESS_KEY_ID>",
    "serviceKeySecret":"<ACCESS_KEY_SECRET>",
    "athenaBucketName": "<ATHENA_BUCKET_NAME>",
    "athenaRegion": "<ATHENA_REGION>",
    "athenaDatabase": "<ATHENA_DATABASE>",
    "athenaTable": "<ATHENA_TABLE>",
    "athenaWorkgroup": "<ATHENA_WORKGROUP>",
    "projectID": "<ATHENA_PROJECT_ID>",
    "masterPayerARN": "<MASTER_PAYER_ARN>"
}

Additionally set the kubecostProductConfigs.athenaProjectID Helm value to the AWS account that Kubecost is being installed in.

Alibaba

Kubecost does not support complete integrations with Alibaba, but you will still be able to view accurate list prices for cloud resources. Gather these following values from the Alibaba Cloud Console for your account:

clusterRegion is the most used region
accountID is your Alibaba account ID
serviceKeyName is the RAM user key name
serviceKeySecret is the RAM user secret

Set these values into the following object and add them to the Alibaba array in your cloud-integration.json:

"alibaba" : [
    {
      "clusterRegion": "",
      "accountID": "",
      "serviceKeyName": "",
      "serviceKeySecret": ""
    }
  ]

AWS Cloud Billing Integration

By default, Kubecost pulls on-demand asset prices from the public AWS pricing API. For more accurate pricing, this integration will allow Kubecost to reconcile your current measured Kubernetes spend with your actual AWS bill. This integration also properly accounts for Enterprise Discount Programs, Reserved Instance usage, Savings Plans, Spot usage, and more.

You will need permissions to create the Cost and Usage Report (CUR), and add IAM credentials for Athena and S3. Optional permission is the ability to add and execute CloudFormation templates. Kubecost does not require root access in the AWS account.

Quick Start for IRSA

This guide contains multiple possible methods for connecting Kubecost to AWS billing, based on user environment and preference. Because of this, there may not be a straightforward approach for new users. To address this, a streamlined guide containing best practices can be found here for IRSA environments. This quick start guide has some assumptions to carefully consider, and may not be applicable for all users. See prerequisites in the linked article.

Key AWS terminology

Integrating your AWS account with Kubecost may be a complicated process if you aren’t deeply familiar with the AWS platform and how it interacts with Kubecost. This section provides an overview of some of the key terminology and AWS services that are involved in the process of integration.

Cost and Usage Report: AWS report which tracks cloud spending and writes to an Amazon Simple Storage Service (Amazon S3) bucket for ingestion and long term historical data. The CUR is originally formatted as a CSV, but when integrated with Athena, is converted to Parquet format.

Amazon Athena: Analytics service which queries the CUR S3 bucket for your AWS cloud spending, then outputs data to a separate S3 bucket. Kubecost uses Athena to query for the bill data to perform reconciliation. Athena is technically optional for AWS cloud integration, but as a result, Kubecost will only provide unreconciled costs (on-demand public rates).

S3 bucket: Cloud object storage tool which both CURs and Athena output cost data to. Kubecost needs access to these buckets in order to read that data.

Cost and Usage Report integration

For the below guide, a GitHub repository with sample files can be found here.

Step 1: Setting up a CUR

Follow these steps to set up a Legacy CUR using the settings below.

Select the Legacy CUR export type.
For time granularity, select Daily.
Under 'Additional content', select the Enable resource IDs checkbox.
Under 'Report data integration' select the Amazon Athena checkbox.

For CUR data written to an S3 bucket only accessed by Kubecost, it is safe to expire or delete the objects after seven days of retention.

Remember the name of the bucket you create for CUR data. This will be used in Step 2.

Familiarize yourself with how column name restrictions differ between CURs and Athena tables. AWS may change your CUR name when you upload your CUR to your Athena table in Step 2, documented in AWS' Running Amazon Athena queries. As best practice, use all lowercase letters and only use _ as a special character.

AWS may take up to 24 hours to publish data. Wait until this is complete before continuing to the next step.

If you believe you have the correct permissions, but cannot access the Billing and Cost Management page, have the owner of your organization's root account follow these instructions.

Step 2: Setting up Athena

As part of the CUR creation process, Amazon also creates a CloudFormation template that is used to create the Athena integration. It is created in the CUR S3 bucket, listed in the Objects tab in the path s3-path-prefix/cur-name and typically has the filename crawler-cfn.yml. This .yml is your necessary CloudFormation template. You will need it in order to complete the CUR Athena integration. For more information, see the AWS doc Setting up Athena using AWS CloudFormation templates.

Your S3 path prefix can be found by going to your AWS Cost and Usage Reports dashboard and selecting your newly-created CUR. In the 'Report details' tab, you will find the S3 path prefix.

Once Athena is set up with the CUR, you will need to create a new S3 bucket for Athena query results.

Navigate to the S3 Management Console.
Select Create bucket. The Create Bucket page opens.
Use the same region used for the CUR bucket and pick a name that follows the format aws-athena-query-results-.
Select Create bucket at the bottom of the page.
Navigate to the Amazon Athena dashboard.
Select Settings, then select Manage. The Manage settings window opens.
Set Location of query result to the S3 bucket you just created, which will look like s3://aws-athena-query-results..., then select Save.

For Athena query results written to an S3 bucket only accessed by Kubecost, it is safe to expire or delete the objects after 1 day of retention.

Step 3: Setting up IAM permissions

Add via CloudFormation:

Kubecost offers a set of CloudFormation templates to help set your IAM roles up.

If you’re new to provisioning IAM roles, we suggest downloading our templates and using the CloudFormation wizard to set these up. You can learn how to do this in AWS' Creating a stack on the AWS CloudFormation console doc. Open the step below which represents your CUR and management account arrangement, download the .yaml file listed, and upload them as the stack template in the 'Creating a stack' > 'Selecting a stack template' step.

My CUR exists in the same account as Kubecost or the management account

Download this .yaml file.
Navigate to the AWS Console Cloud Formation page.
Select Create Stack, then select With new resources (standard) from the dropdown.
On the 'Create stack' page, under 'Prerequisite - Prepare Template', make sure Template is ready has been preselected. Under 'Specify Template', select Upload a template file. Select Choose file, then select your downloaded .yaml file from your file explorer. Select Next.
On the 'Specify stack details' page, enter a name for your stack, then provide the following parameters:
- AthenaCURBucket: The name of the Athena CUR bucket you created in Step 2.
- SpotDataFeedBucketName: (Optional, skip if you have not configured Spot data) The bucket where the Spot data feed is sent
Select Next.
On the 'Configure stack options' page opens, configure any additional options as needed. Select Next.
On the 'Review stack' page, confirm all information, then select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
Select Submit.

My CUR exists in a member account different from Kubecost or the management account

On each sub account running Kubecost:

Download this .yaml file.
- Navigate to the AWS Console Cloud Formation page.
- Select Create Stack, then select With existing resources (import resources) from the dropdown. On the 'Identify resources' page, select Next.
- Under Template source, choose Upload a template file.
- Select Choose file, which will open your file explorer. Select the .yaml template, and then select Open. Then, select Next.
- On the 'Identify resources' page, provide any additional resources to import. Then, select Next.
- For Stack name, enter a name for your template.
- Set the following parameters:
  - MasterPayerAccountID: The account ID of the management account (formerly called master payer account) where the CUR has been created
  - SpotDataFeedBucketName: The bucket where the Spot data feed is sent
- Select Next.
- Select Next.
- At the bottom of the page, select I acknowledge that AWS CloudFormation might create IAM resources.
- Select Create Stack.

On the management account:

Follow the same steps to create a CloudFormation stack as above, but using this .yaml file instead, and with these parameters:
- AthenaCURBucket: The bucket where the CUR is set from Step 1
- KubecostClusterID: An account that Kubecost is running on that requires access to the Athena CUR.

Add manually:

My CUR exists in the same account as Kubecost or the management account

Attach both of the following policies to the same role or user. Use a user if you intend to integrate via ServiceKey, and a role if via IAM annotation (see more below under Via Pod Annotation by EKS). The SpotDataAccess policy statement is optional if the Spot data feed is configured (see “Setting up the Spot Data feed” step below).

        {
           "Version": "2012-10-17",
           "Statement": [
              {
                 "Sid": "AthenaAccess",
                 "Effect": "Allow",
                 "Action": [
                    "athena:*"
                 ],
                 "Resource": [
                    "*"
                 ]
              },
              {
                 "Sid": "ReadAccessToAthenaCurDataViaGlue",
                 "Effect": "Allow",
                 "Action": [
                    "glue:GetDatabase*",
                    "glue:GetTable*",
                    "glue:GetPartition*",
                    "glue:GetUserDefinedFunction",
                    "glue:BatchGetPartition"
                 ],
                 "Resource": [
                    "arn:aws:glue:*:*:catalog",
                    "arn:aws:glue:*:*:database/athenacurcfn*",
                    "arn:aws:glue:*:*:table/athenacurcfn*/*"
                 ]
              },
              {
                 "Sid": "AthenaQueryResultsOutput",
                 "Effect": "Allow",
                 "Action": [
                    "s3:GetBucketLocation",
                    "s3:GetObject",
                    "s3:ListBucket",
                    "s3:ListBucketMultipartUploads",
                    "s3:ListMultipartUploadParts",
                    "s3:AbortMultipartUpload",
                    "s3:CreateBucket",
                    "s3:PutObject"
                 ],
                 "Resource": [
                    "arn:aws:s3:::aws-athena-query-results-*"
                 ]
              },


              {
                 "Sid": "S3ReadAccessToAwsBillingData",
                 "Effect": "Allow",
                 "Action": [
                    "s3:Get*",
                    "s3:List*"
                 ],
                 "Resource": [
                    "arn:aws:s3:::${AthenaCURBucket}*"
                 ]
              }
           ]
        }
	{
           "Version": "2012-10-17",
           "Statement": [
              {
                 "Sid": "SpotDataAccess",
                 "Effect": "Allow",
                 "Action": [
                    "s3:ListAllMyBuckets",
                    "s3:ListBucket",
                    "s3:HeadBucket",
                    "s3:HeadObject",
                    "s3:List*",
                    "s3:Get*"
                 ],
                 "Resource": "arn:aws:s3:::${SpotDataFeedBucketName}*"
              }
           ]
        }

My CUR exists in a member account different from Kubecost or the management account

On each sub account running Kubecost, attach both of the following policies to the same role or user. Use a user if you intend to integrate via Service Key, and a role if via IAM annotation (see more below under Via Pod Annotation by EKS). The SpotDataAccess policy statement is optional if the Spot data feed is configured (see “Setting up the Spot Data feed” step below).

	{
               "Version": "2012-10-17",
               "Statement": [
                  {
                     "Sid": "AssumeRoleInMasterPayer",
                     "Effect": "Allow",
                     "Action": "sts:AssumeRole",
                     "Resource": "arn:aws:iam::${MasterPayerAccountID}:role/KubecostRole-${This-account’s-id}"
                  }
               ]
	}

	{
               "Version": "2012-10-17",
               "Statement": [
                  {
                     "Sid": "SpotDataAccess",
                     "Effect": "Allow",
                     "Action": [
                        "s3:ListAllMyBuckets",
                        "s3:ListBucket",
                        "s3:HeadBucket",
                        "s3:HeadObject",
                        "s3:List*",
                        "s3:Get*"
                     ],
                     "Resource": "arn:aws:s3:::${SpotDataFeedBucketName}*"
                  }
               ]
	}

On the management account, attach this policy to a role (replace ${AthenaCURBucket} variable):

	{
               "Version": "2012-10-17",
               "Statement": [
                  {
                     "Sid": "AthenaAccess",
                     "Effect": "Allow",
                     "Action": [
                        "athena:*"
                     ],
                     "Resource": [
                        "*"
                     ]
	},
	{
                     "Sid": "ReadAccessToAthenaCurDataViaGlue",
                     "Effect": "Allow",
                     "Action": [
                        "glue:GetDatabase*",
                        "glue:GetTable*",
                        "glue:GetPartition*",
                        "glue:GetUserDefinedFunction",
                        "glue:BatchGetPartition"
                     ],
                     "Resource": [
                        "arn:aws:glue:*:*:catalog",
                        "arn:aws:glue:*:*:database/athenacurcfn*",
                        "arn:aws:glue:*:*:table/athenacurcfn*/*"
                     ]
                  },
                  {
                     "Sid": "AthenaQueryResultsOutput",
                     "Effect": "Allow",
                     "Action": [
                        "s3:GetBucketLocation",
                        "s3:GetObject",
                        "s3:ListBucket",
                        "s3:ListBucketMultipartUploads",
                        "s3:ListMultipartUploadParts",
                        "s3:AbortMultipartUpload",
                        "s3:CreateBucket",
                        "s3:PutObject"
                     ],
                     "Resource": [
                        "arn:aws:s3:::aws-athena-query-results-*"
                     ]
                  },
                  {
                     "Sid": "S3ReadAccessToAwsBillingData",
                     "Effect": "Allow",
                     "Action": [
                        "s3:Get*",
                        "s3:List*"
                     ],
                     "Resource": [
                        "arn:aws:s3:::${AthenaCURBucket}*"
                     ]
                  }
               ]
	}

Then add the following trust statement to the role the policy is attached to on the management account (replace the ${aws-mgmt-account-id} variable with the account you want to assume the role):

	{
               "Version": "2012-10-17",
               "Statement": [
                  {
                     "Effect": "Allow",
                     "Principal": {
                        "AWS": "arn:aws:iam::${aws-mgmt-account-id}:root"
                     },
                     "Action": [
                        "sts:AssumeRole"
                     ]
                  }
               ]
            }

Step 4: Attaching IAM permissions to Kubecost

If you are using the alternative multi-cloud integration method, steps 4 and 5 are not required.

Now that the policies have been created, attach those policies to Kubecost. We support the following methods:

Attach via Service Key and Kubernetes Secret

Navigate to the AWS IAM Console, then select Access Management > Users from the left navigation. Find the Kubecost User and select Security credentials > Create access key. Follow along to receive the Access Key ID and Secret Access Key (AWS will not provide you the Secret Access Key in the future, so make sure you save this value). Then, follow the steps from either Option 1 or Option 2 below, but not both.

Option 1: Generate a secret from Helm values:

Note that this will leave your AWS keys unencrypted in your values.yaml. Set the following Helm values:

kubecostProductConfigs:
  createServiceKeySecret: true
  awsServiceKeyName: <ACCESS_KEY_ID>
  awsServiceKeyPassword: <SECRET_ACCESS_KEY>

Option 2: Manually create a secret:

This may be the preferred method if your Helm values are in version control and you want to keep your AWS secrets out of version control.

Create a service-key.json:

{
    "aws_access_key_id": "<ACCESS_KEY_ID>",
    "aws_secret_access_key": "<SECRET_ACCESS_KEY>"
}

Create a Kubernetes secret:

$ kubectl create secret generic <SECRET_NAME> --from-file=service-key.json --namespace <kubecost>

Set the Helm value:

kubecostProductConfigs:
  serviceKeySecretName: <SECRET_NAME>

Attach via Service Key in Kubecost UI

In the AWS IAM Console, select Access Management > Users. Find the Kubecost user and select Security credentials > Create access key. Note the Access Key ID and Secret Access Key.
To add the Access Key ID and Secret Access Key in the Kubecost UI, select Settings from the left navigation, then scroll to Cloud Cost Settings. Select Update next to External Cloud Cost Configuration (AWS). Fill in the Service key name and Service key secret fields respectively, then select Update.

You may not be allowed to update your Billing Data Export Configuration without filling out all fields first. For explanations of your Athena/AWS-related fields, see Step 5 below.

Attach via pod annotation with eksctl

Prerequisites:

Amazon EKS cluster set up via eksctl
AWS CLI is installed on your device

Step 1: Update configuration

Download the following configuration files:

Update the following variables in the files you downloaded:

In cloud-integration.json, update the following values with the information in Step 2: Setting up Athena:

    "athenaBucketName": "s3://<AWS_cloud_integration_athenaBucketName>",
    "athenaRegion": "<AWS_cloud_integration_athenaRegion>",
    "athenaDatabase": "<AWS_cloud_integration_athenaDatabase>",
    "athenaTable": "<AWS_cloud_integration_athenaTable>",
    "projectID": "<AWS_account_ID>"

In kubecost-athena-policy.json, replace ${AthenaCURBucket} with your Athena S3 bucket name (configured in Step 2: Setting up Athena).

Step 2: Create policy

In the same location where your downloaded configuration files are, run the following command to create the appropriate policy:

aws iam create-policy --policy-name kubecost-athena-policy --policy-document file://kubecost-athena-policy.json

Step 3: Create OIDC provider for your cluster

kubectl create ns kubecost
export YOUR_CLUSTER_NAME=<ENTER_YOUR_ACTUAL_CLUSTER_NAME>
export AWS_REGION=<ENTER_YOUR_AWS_REGION>
eksctl utils associate-iam-oidc-provider \
    --cluster ${YOUR_CLUSTER_NAME} --region ${AWS_REGION} \
    --approve

Step 4: Create required IAM service accounts

Note: Remember to replace 1234567890 with your AWS account ID number.

eksctl create iamserviceaccount \
    --name kubecost-serviceaccount-cur-athena-thanos \
    --namespace kubecost \
    --cluster ${YOUR_CLUSTER_NAME} --region ${AWS_REGION} \
    --attach-policy-arn arn:aws:iam::1234567890:policy/kubecost-athena-policy \
    --override-existing-serviceaccounts \
    --approve

Step 5: Create required secret to store the configuration

kubectl create secret generic cloud-integration -n kubecost --from-file=cloud-integration.json

Step 6. Install Kubecost via Helm

helm upgrade --install kubecost --repo https://kubecost.github.io/cost-analyzer/ cost-analyzer \
--namespace kubecost \
-f https://raw.githubusercontent.com/kubecost/poc-common-configurations/main/aws-attach-roles/values-amazon-primary.yaml

Step 5: Provide CUR config values to Kubecost

These values can either be set from the Kubecost UI or via .Values.kubecostProductConfigs in the Helm chart. Values for all fields must be provided.

Option 1: Add config values via UI

To add values in the Kubecost UI, select Settings from the left navigation, then scroll to Cloud Cost Settings. Select Update next to External Cloud Cost Configuration (AWS). The Billing Data Export Configuration window opens. Fill in all the below fields:

Field

Description

Athena Region

The AWS region Athena is running in

Athena Database

The name of the database created by the Athena setup

Athena Tablename

The name of the table created by the Athena setup

Athena Result Bucket

An S3 bucket to store Athena query results that you’ve created that Kubecost has permission to access

AWS account ID

The AWS account ID where the Athena CUR is, likely your management account.

When you are done, select Update to confirm.

Option 2: Add config values via Helm

If you set any kubecostProductConfigs from the Helm chart, all changes via the front end will be overridden on pod restart.

athenaProjectID: The AWS AccountID where the Athena CUR is, likely your management account.
athenaBucketName: An S3 bucket to store Athena query results that you’ve created that Kubecost has permission to access
- The name of the bucket should match s3://aws-athena-query-results-*, so the IAM roles defined above will automatically allow access to it
- The bucket can have a Canned ACL of Private or other permissions as you see fit.
athenaRegion: The AWS region Athena is running in
athenaDatabase: The name of the database created by the Athena setup
- The athena database name is available as the value (physical id) of AWSCURDatabase in the CloudFormation stack created above (in Step 2: Setting up Athena)
athenaTable: the name of the table created by the Athena setup
- The table name is typically the database name with the leading athenacurcfn_ removed (but is not available as a CloudFormation stack resource). Confirm the table name by visiting the Athena dashboard.
athenaWorkgroup: The workgroup assigned to be used with Athena. If not specified, defaults to Primary

Make sure to use only underscore as a delimiter if needed for tables and views. Using a hyphen/dash will not work even though you might be able to create it. See the AWS docs for more info.

If you are using a multi-account setup, you will also need to set .Values.kubecostProductConfigs.masterPayerARN to the Amazon Resource Number (ARN) of the role in the management account, e.g. arn:aws:iam::530337586275:role/KubecostRole.

Troubleshooting

Once you've integrated with the CUR, you can visit Settings > View Full Diagnostics in the UI to determine if Kubecost has been successfully integrated with your CUR. If any problems are detected, you will see a yellow warning sign under the cloud provider permissions status header

You can check pod logs for authentication errors by running: kubectl get pods -n <namespace> kubectl logs <kubecost-pod-name> -n <namespace> -c cost-model

If you do not see any authentication errors, log in to your AWS console and visit the Athena dashboard. You should be able to find the CUR. Ensure that the database with the CUR matches the athenaTable entered in Step 5. It likely has a prefix with athenacurcfn_ :

You can also check query history to see if any queries are failing:

Common Athena errors

Incorrect bucket in IAM Policy

Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources. You can search in the Athena "Recent queries" dashboard to find additional info about the error.

```
QueryAthenaPaginated: query execution error: no query results available for query <Athena Query ID>
```

And/or the following error will be found in the Kubecost `cost-model` container logs.

```
Permission denied on S3 path: s3://cur-report/cur-report/cur-report/year=2022/month=8

This query ran against the "athenacurcfn_test" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: <Athena Query ID>
```

Resolution: This error is typically caused by the incorrect (Athena results) s3 bucket being specified in the CloudFormation template of Step 3 from above. To resolve the issue, ensure the bucket used for storing the AWS CUR report (Step 1) is specified in the S3ReadAccessToAwsBillingData SID of the IAM policy (default: kubecost-athena-access) attached to the user or role used by Kubecost (Default: KubecostUser / KubecostRole). See the following example.

This error can also occur when the management account cross-account permissions are incorrect, however, the solution may differ.

        {
        "Action": [
            "s3:Get*",
            "s3:List*"
        ],
        "Resource": [
            "arn:aws:s3:::<AWS CUR BUCKET>*"
        ],
        "Effect": "Allow",
        "Sid": "S3ReadAccessToAwsBillingData"
    }

outputLocation is not a valid S3 path

Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources.

Connection test failed for cloud integration config: Fetch error: cloud billing data fetch error: GetCloudCost: error getting Athena columns: QueryAthenaPaginated: start query error: operation error Athena: StartQueryExecution, https response error StatusCode: 400, RequestID: a6059220-5ac8-4c24-97d2-401a2dbfd421, InvalidRequestException: outputLocation is not a valid S3 path.

Resolution: Please verify that the prefix s3:// was used when setting the athenaBucketName Helm value or when configuring the bucket name in the Kubecost UI.

Query not supported

Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources.

QueryAthenaPaginated: start query error: operation error Athena: StartQueryExecution, https response error StatusCode: 400, RequestID: <Athena Query ID>, InvalidRequestException: Queries of this type are not supported

Resolution: While rare, this issue was caused by an Athena instance that failed to provision properly on AWS. The solution was to delete the Athena DB and deploy a new one. To verify this is needed, find the failed query ID in the Athena "Recent queries" dashboard and attempt to manually run the query.

HTTPS Response error

Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources.

QueryAthenaPaginated: start query error: operation error Athena: StartQueryExecution, https response error StatusCode: 400, RequestID: ********************, InvalidRequestException: Unable to verify/create output bucket aws-athena-query-results-test

Resolution: Previously, if you ran a query without specifying a value for query result location, and the query result location setting was not overridden by a workgroup, Athena created a default location for you. Now, before you can run an Athena query in a region in which your account hasn't used Athena previously, you must specify a query result location, or use a workgroup that overrides the query result location setting. While Athena no longer creates a default query results location for you, previously created default aws-athena-query-results-MyAcctID-MyRegion locations remain valid and you can continue to use them. The bucket should be in the format of: aws-athena-query-results-MyAcctID-MyRegion It may also be required to remove and reinstall Kubecost. If doing this please remeber to backup ETL files prior or contact support for additional assistance. See also this AWS doc on specifying a query result location.

Missing Athena column

Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources or in the Kubecost cost-model container logs.

QueryAthenaPaginated: query execution error: no query results available for query <Athena Query ID>

Checking the Athena logs we see a syntax error:

SYNTAX_ERROR: line 4:3: Column 'line_item_resource_id' cannot be resolved

This query ran against the "<DB Name>" database, unless qualified by the query

Resolution: Verify in AWS' Cost and Usage Reports dashboard that the Resource IDs are enabled as "Report content" for the CUR created in Step 1. If the Resource IDs are not enabled, you will need to re-create the report (this will require redoing Steps 1 and 2 from this doc).

Not a valid S3 path

Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources or in the Kubecost cost-model container logs.

QueryAthenaPaginated: start query error: operation error Athena: StartQueryExecution, https response error StatusCode: 400, RequestID: <Athena Query ID>, InvalidRequestException: outputLocation is not a valid S3 path.

Resolution: Verify that s3:// was included in the bucket name when setting the .Values.kubecostProductConfigs.athenaBucketName Helm value.

Summary and pricing

AWS services used here are:

Kubecost's cost-model requires roughly 2 CPU and 10 GB of RAM per 50,000 pods monitored. The backing Prometheus database requires roughly 2 CPU and 25 GB per million metrics ingested per minute. You can pick the EC2 instances necessary to run Kubecost accordingly.

Kubecost can write its cache to disk. Roughly 32 GB per 100,000 pods monitored is sufficient. (Optional: our cache can exist in memory)

Cloudformation (Optional: manual IAM configuration or via Terraform is fine)
EKS (Optional: all K8s flavors are supported)

AWS Cloud Integration using IAM Roles for Service Accounts (IRSA)

There are many ways to integrate your AWS Cost and Usage Report (CUR) with Kubecost. This tutorial is intended as the best-practice method for users whose environments meet the following assumptions:

Kubecost will run in a different account than the AWS Payer Account
The IAM permissions will utilize AWS IRSA to avoid shared secrets
The configuration of Kubecost will be done using a cloud-integration.json file, and not via Kubecost UI (following infrastructure as code practices)

If this is not an accurate description of your environment, see our AWS Cloud Integration doc for more options.

Overview of Kubecost CUR integration

This guide is a one-time setup per AWS payer account and is typically one per organization. It can be automated, but may not be worth the effort given that it will not be needed again.

Basic diagram when the below steps are complete:

Kubecost supports multiple AWS payer accounts as well as multiple cloud providers from a single Kubecost primary cluster. For multiple payer accounts, create additional entries inside the array below.

Detail for multiple cloud provider setups is here.

Configuration

Step 1: Download configuration files

To begin, download the recommended configuration template files from our poc-common-config repo. You will need the following files from this folder:

cloud-integration.json
iam-payer-account-cur-athena-glue-s3-access.json
iam-payer-account-trust-primary-account.json
iam-access-cur-in-payer-account.json

Begin by opening cloud_integration.json, which should look like this:

{
    "aws": [
        {
            "athenaBucketName": "s3://ATHENA_RESULTS_BUCKET_NAME",
            "athenaRegion": "ATHENA_REGION",
            "athenaDatabase": "ATHENA_DATABASE",
            "athenaTable": "ATHENA_TABLE",
            "athenaWorkgroup": "ATHENA_WORKGROUP",
            "projectID": "ATHENA_PROJECT_ID",
            "masterPayerARN": "PAYER_ACCOUNT_ROLE_ARN"
        }
    ]
}

Update athenaWorkgroup to primary, then save the file and close it. The remaining values will be obtained during this tutorial.

Step 2: Create a CUR Export (and wait 24 hours)

Follow the AWS documentation to create a CUR export using the settings below.

For time granularity, select Daily.
Select the checkbox to enable Resource IDs in the report.
Select the checkbox to enable Athena integration with the report.
Select the checkbox to enable the JSON IAM policy to be applied to your bucket.

Screenshots from select CUR creation in the above AWS documentation

If this CUR data is only used by Kubecost, it is safe to expire or delete the objects after seven days of retention.

AWS may take up to 24 hours to publish data. Wait until this is complete before continuing to the next step.

While you wait, update the following configuration files:

Update your cloud-integration.json file by providing a projectID value, which will be the AWS payer account number where the CUR is located and where the Kubecost primary cluster is running.
Update your iam-payer-account-cur-athena-glue-s2-access.json file by replacing all instances of CUR_BUCKET_NAME to the name of the bucket you created for CUR data.

Step 3: Setting up Athena

As part of the CUR creation process, Amazon creates a CloudFormation template that is used to create the Athena integration. It is created in the CUR S3 bucket under s3-path-prefix/cur-name and typically has the filename crawler-cfn.yml. This .yml is your CloudFormation template. You will need it in order to complete the CUR Athena integration. You can read more about this here.

Your S3 path prefix can be found by going to your AWS Cost and Usage Reports dashboard and selecting your bucket's report. In the Report details tab, you will find the S3 path prefix.

Once Athena is set up with the CUR, you will need to create a new S3 bucket for Athena query results. The bucket used for the CUR cannot be used for the Athena output.

Navigate to the S3 Management Console.
Select Create bucket. The Create Bucket page opens.
Provide a name for your bucket. This is the value for athenaBucketName in your cloud-integration.json file. Use the same region used for the CUR bucket.
Select Create bucket at the bottom of the page.
Navigate to the Amazon Athena dashboard.
Select Settings, then select Manage. The Manage settings window opens.
Set Location of query result to the S3 bucket you just created, then select Save.

Navigate to Athena in the AWS Console. Be sure the region matches the one used in the steps above. Update your cloud-integration.json file with the following values. Use the screenshots below for help.

athenaBucketName: the name of the Athena bucket your created in this step
athenaDatabase: the value in the Database dropdown
athenaRegion: the AWS region value where your Athena query is configured
athenaTable: the partitioned value found in the Table list

For Athena query results written to an S3 bucket only accessed by Kubecost, it is safe to expire or delete the objects after one day of retention.

Step 4: Setting up payer account IAM permissions

From the AWS payer account

In iam-payer-account-cur-athena-glue-s3-access.json, replace all ATHENA_RESULTS_BUCKET_NAME instances with your Athena S3 bucket name (the default will look like aws-athena-query-results-xxxx).

In iam-payer-account-trust-primary-account.json, replace SUB_ACCOUNT_222222222 with the account number of the account where the Kubecost primary cluster will run.

In the same location as your downloaded configuration files, run the following command to create the appropriate policy (jq is not required):

aws iam create-role --role-name kubecost-cur-access \
  --assume-role-policy-document file://iam-payer-account-trust-primary-account.json \
  --output json | jq -r .Role.Arn

Now we can obtain the last value masterPayerARN for cloud-integration.json as the ARN associated with the newly-created IAM role, as seen below in the AWS console:

Step 5: Setting up IAM permissions for the primary cluster

By arriving at this step, you should have been able to provide all values to your cloud-integration.json file. If any values are missing, reread the tutorial and follow any steps needed to obtain those values.

From the AWS Account where the Kubecost primary cluster will run

In iam-access-cur-in-payer-account.json, update PAYER_ACCOUNT_11111111111 with the AWS account number of the payer account and create a policy allowing Kubecost to assumeRole in the payer account:

aws iam create-policy --policy-name kubecost-access-cur-in-payer-account \
  --policy-document file://iam-access-cur-in-payer-account.json \
  --output json |jq -r .Policy.Arn

Note the output ARN (used in the iamserviceaccount --attach-policy-arn below):

arn:aws:iam::SUB_ACCOUNT_222222222:policy/kubecost-access-cur-in-payer-account

Create a namespace and set environment variables:

kubectl create ns kubecost
export CLUSTER_NAME=YOUR_CLUSTER
export AWS_REGION=YOUR_REGION

Enable the OIDC-Provider:

eksctl utils associate-iam-oidc-provider \
    --cluster $CLUSTER_NAME --region $AWS_REGION \
    --approve

Create the Kubernetes service account, attaching the assumeRole policy. Replace SUB_ACCOUNT_222222222 with the AWS account number where the primary Kubecost cluster will run.

eksctl create iamserviceaccount \
    --name kubecost-serviceaccount \
    --namespace kubecost \
    --cluster $CLUSTER_NAME --region $AWS_REGION \
    --attach-policy-arn arn:aws:iam::SUB_ACCOUNT_222222222:policy/kubecost-access-cur-in-payer-account \
    --override-existing-serviceaccounts \
    --approve

Create the secret (in this setup, there are no actual secrets in this file):

kubectl create secret generic cloud-integration -n kubecost --from-file=cloud-integration.json

Install Kubecost using the service account and cloud-integration secret:

helm install kubecost \
  --repo https://kubecost.github.io/cost-analyzer/ cost-analyzer \
  --namespace kubecost \
  --set serviceAccount.name=kubecost-serviceaccount \
  --set serviceAccount.create=false \
  --set kubecostProductConfigs.cloudIntegrationSecret=cloud-integration

Validation

It can take over an hour to process the billing data for large AWS accounts. In the short-term, follow the logs and look for a message similar to (7.7 complete), which should grow gradually to (100.0 complete). Some errors (ERR) are expected, as seen below.

kubectl logs -l app=cost-analyzer --tail -1 --follow |grep -i athena
------------------
Defaulted container "cost-model" out of: cost-model, cost-analyzer-frontend
2023-05-24T19:41:31.63093249Z ERR Failed to lookup reserved instance data: no reservation data available in Athena
2023-05-24T19:41:31.630973097Z ERR Failed to lookup savings plan data: Error fetching Savings Plan Data: QueryAthenaPaginated: athena configuration incomplete
2023-05-24T19:41:34.577437245Z INF Adding AWS Provider to map with key: 440082503234/s3://aws-athena-query-results-
2023-05-24T19:41:34.57901059Z INF ETL: CloudUsage[440082503234/s3://aws-athena-query-results-]: Starting PipelineController
2023-05-24T19:41:34.579037927Z INF CloudCost: IngestionManager: creating integration with key: 440082503234/s3://aws-athena-query-results-
2023-05-24T19:41:34.581131953Z INF RunBuildProcess[CloudCost][440082503234/s3://aws-athena-query-results-]: build[NLzAH]: Starting build back to 2023-02-22 00:00:00 +0000 UTC in blocks of 7d
2023-05-24T19:41:34.581715777Z INF CloudCost[440082503234/s3://aws-athena-query-results-]: ingestor: building window [2023-05-17T00:00:00+0000, 2023-05-24T00:00:00+0000)
2023-05-24T19:41:34.581771159Z INF AthenaIntegration[440082503234/s3://aws-athena-query-results-]: StoreCloudCost: [2023-05-17T00:00:00+0000, 2023-05-24T00:00:00+0000)
2023-05-24T19:41:34.608040758Z ERR ETL:  CloudUsage[440082503234/s3://aws-athena-query-results-]: Build[XPQMa]:  failed to load range from back up 2023-05-18 00:00:00 +0000 UTC - 2023-05-25 00:00:00 +0000 UTC: file does not exist
2023-05-24T19:41:34.60806207Z INF ETL: CloudUsage[440082503234/s3://aws-athena-query-results-]: Build[XPQMa]: QueryCloudUsage [2023-05-18T00:00:00+0000, 2023-05-25T00:00:00+0000)
2023-05-24T19:41:34.588661972Z INF ETL: CloudUsage[440082503234/s3://aws-athena-query-results-]: Run[JkSYH]: QueryCloudUsage [2023-05-21T00:00:00+0000, 2023-05-25T00:00:00+0000)
2023-05-24T19:41:50.528544821Z INF ETL: CloudUsage[440082503234/s3://aws-athena-query-results-]: Build[XPQMa]: coverage [2023-05-18T00:00:00+0000, 2023-05-25T00:00:00+0000) (7.7 complete)

Troubleshooting

For help with troubleshooting, see the section in our original AWS integration guide.

AWS Out of Cluster

Integrating Kubecost with your AWS data provides the ability to allocate out-of-cluster (OOC) costs, e.g. RDS instances and S3 buckets, back to Kubernetes concepts like namespace and deployment as well as reconcile cluster assets back to your billing data. The latter is especially helpful when teams are using Reserved Instances, Savings Plans, or Enterprise Discounts. All billing data remains on your cluster when using this functionality and is not shared externally. Read our Cloud Integrations doc for more information on how Kubecost connects with Cloud Service Providers.

The following guide provides the steps required for enabling OOC costs allocation and accurate pricing, e.g. reserved instance price allocation. In a multi-account organization, all of the following steps will need to be completed in the payer account.

Step 1: Create an AWS Cost and Usage Report (CUR) and integrate it with Kubecost

You can learn how to perform this using our AWS Cloud Integration doc.

Step 2: Tag your resources

Kubecost utilizes AWS tagging to allocate the costs of AWS resources outside of the Kubernetes cluster to specific Kubernetes concepts, such as namespaces, pods, etc. These costs are then shown in a unified dashboard within the Kubecost interface.

To allocate external AWS resources to a Kubernetes concept, use the following tag naming scheme:

Kubernetes Concept

AWS Tag Key

AWS Tag Value

Cluster

kubernetes_cluster

cluster-name

Namespace

kubernetes_namespace

namespace-name

Deployment

kubernetes_deployment

deployment-name

Label

kubernetes_label_NAME*

label-value

DaemonSet

kubernetes_daemonset

daemonset-name

Pod

kubernetes_pod

pod-name

Container

kubernetes_container

container-name

In the kubernetes_label_NAME tag key, the NAME portion should appear exactly as the tag appears inside of Kubernetes. For example, for the tag app.kubernetes.io/name, this tag key would appear as kubernetes_label_app.kubernetes.io/name.

To use an alternative or existing AWS tag schema, you may supply these in your values.yaml under kubecostProductConfigs.labelMappingConfigs.\<aggregation\>\_external_label. Also be sure to set kubecostProductConfigs.labelMappingConfigs.enabled=true.

For more information, consult AWS' Tag your Amazon EC2 resources.

Tags may take several hours to show up in the Cost Allocations Tags section described in the next step.

Custom tags mapping

Tags that contain : in the key may be converted to _ in the Kubecost UI due to Prometheus readability. To use AWS Label Mapping Configs, use this mapping format:

kubecostProductConfigs:
  labelMappingConfigs:
    enabled: true
    namespace_external_label: mycompanybilling_namespace
    cluster_external_label: mycompanybilling_cluster

To view examples of common label mapping configs, see here.

Step 3: Enable user-defined cost allocation tags

In order to make the custom Kubecost AWS tags appear on the CURs, and therefore in Kubecost, individual cost allocation tags must be enabled. Details on which tags to enable can be found in Step 2.

For instructions on enabling user-defined cost allocation tags, consult AWS' Activating user-defined cost allocation tags

Viewing account-level tags

Account-level tags are applied (as labels) to all the Assets built from resources defined under a given AWS account. You can filter AWS resources in the Kubecost Assets View (or API) by account-level tags by adding them ('tag:value') in the Label/Tag filter.

If a resource has a label with the same name as an account-level tag, the resource label value will take precedence.

Modifications incurred on account-level tags may take several hours to update on Kubecost.

Your AWS account will need to support the organizations:ListAccounts and organizations:ListTagsForResource policies to benefit from this feature.

Troubleshooting

In the Kubecost UI, view the Allocations dashboard. If external costs are not shown, open your browser's Developer Tools > Console to see any reported errors.
Query Athena directly to ensure data is available. Note: it can take up to 6 hours for data to be written.
You may need to upgrade your AWS Glue if you are running an old version. See Upgrading to the AWS Glue Data Catalog step-by-step for more info.
Finally, review pod logs from the cost-model container in the cost-analyzer pod and look for auth errors or Athena query results.

AWS Spot Instances

Considerations before configuring Spot pricing

Kubecost uses public pricing from Cloud Service Providers (CSPs) to calculate costs until the actual cloud bill is available, at which point Kubecost will reconcile your Spot prices from your Cost and Usage Report (CUR). This is almost always ready in 48 hours. Most users will likely prefer to configure instead of configuring the Spot data feed manually as demonstrated in this article.

However, if the majority of costs are due to Spot nodes, it may be useful to configure the Spot pricing data feed as it will increase accuracy for short-term (<48 hour) node costs until the Spot prices from the CUR are available. Note that all other (non-Spot) costs will still be based on public (on-demand) pricing until CUR billing data is reconciled.

Configuring the Spot data feed in Kubecost

With Kubecost, Spot pricing data can be pulled hourly by integrating directly with the AWS Spot feed.

First, to enable the AWS Spot data feed, follow AWS' doc.

While configuring, note the settings used as these values will be needed for the Kubecost configuration.

There are multiple options: this can either be set from the Kubecost UI or via .Values.kubecostProductConfigs in the Helm chart. If you set any kubecostProductConfigs from the Helm chart, all changes via the front end will be deleted on pod restart.

projectID the Account ID of the AWS Account on which the Spot nodes are running.
awsSpotDataRegion region of your Spot data bucket
awsSpotDataBucket the configured bucket for the Spot data feed
awsSpotDataPrefix optional configured prefix for your Spot data feed bucket
spotLabel optional Kubernetes node label name designating whether a node is a Spot node. Used to provide pricing estimates until exact Spot data becomes available from the CUR
spotLabelValue optional Kubernetes node label value designating a Spot node. Used to provide pricing estimates until exact Spot data becomes available from the CUR. For example, if your Spot nodes carry a label lifecycle:spot, then the spotLabel would be lifecycle and the spotLabelValue would be spot

In the UI, you can access these fields via the Settings page, then scroll to Cloud Cost Settings. Next to Spot Instance Configuration, select Update, then fill out all fields.

Spot data feeds are an account level setting, not a payer level. Every AWS Account will have its own Spot data feed. Spot data feed is not currently available in AWS GovCloud.

For Spot data written to an S3 bucket only accessed by Kubecost, it is safe to delete objects after three days of retention.

Configuring IAM

Kubecost requires read access to the Spot data feed bucket. The following IAM policy can be used to grant Kubecost read access to the Spot data feed bucket.

{
  "Version": "2012-10-17",
  "Statement": [
    {
        "Sid": "SpotDataFeed",
        "Effect": "Allow",
        "Action": [
          "s3:ListAllMyBuckets",
          "s3:ListBucket",
          "s3:HeadBucket",
          "s3:HeadObject",
          "s3:List*",
          "s3:Get*"
        ],
        "Resource": "arn:aws:s3:::${SpotDataFeedBucketName}*"
    }
  ]
}

To attach the IAM policy to the Kubecost service account, you can use IRSA or the account's service key.

Option 1: IRSA (IAM Roles for Service Accounts)

If your serviceaccount/kubecost-cost-analyzer already has IRSA annotations attached, be sure to include all policies necessary when running this command.

eksctl create iamserviceaccount \
    --name kubecost-cost-analyzer \
    --namespace kubecost \
    --cluster $CLUSTER_NAME --region $REGION_NAME \
    --attach-policy-arn arn:aws:iam::$ACCOUNT_NUMBER:policy/SpotDataFeed \
    --override-existing-serviceaccounts \
    --approve

Option 2: Service Keys

Create a service-key.json as shown:

{
    "aws_access_key_id": "AWS_service_key_aws_access_key_id",
    "aws_secret_access_key": "AWS_service_key_aws_secret_access_key"
}

Create a K8s secret:

$ kubectl create secret generic cloud-service-key --from-file=service-key.json

Set the following Helm config:

kubecostProductConfigs:
  serviceKeySecretName: "cloud-service-key"

Troubleshooting Spot data feed

No Spot instances detected

Verify the below points:

Make sure data is present in the Spot data feed bucket.
Make sure Project ID is configured correctly. You can cross-verify the values under Helm values in bug report
Check the value of kubecost_node_is_spot in Prometheus:
- "1" means Spot data instance configuration is correct.
- "0" means not configured properly.
Is there a prefix? If so, is it configured in Kubecost?
Make sure the IAM permissions are aligned with https://github.com/kubecost/cloudformation/blob/7feace26637aa2ece1481fda394927ef8e1e3cad/kubecost-single-account-permissions.yaml#L36
Make sure the Spot data feed bucket has all permissions to access by Kubecost
The Spot Instance in the Spot data feed bucket should match the instance in the cluster where the Spot data feed is configured. awsSpotDataBucket has to be present in the right cluster.

AWS Node Price Reconciliation Methodology

Kubecost is capable of aggregating the costs of EC2 compute resources over a given timeframe with a specified duration step size. To achieve this, Kubecost uses Athena queries to gather usage data points with differing price models. The result of this process is a list of resources with their cost by timeframe.

Athena queries

The reconciliation process makes two queries to Athena, one to gather resources that are paid for with either the on-demand model or a savings plan and one query for resources on the reservation price model. The first query includes resources given at a blended rate, which could be on-demand usage or resources that have exceeded the limits of a savings plan. It will also include resources that are part of a savings plan which will have a savings plan effective cost. The second query only includes reserved resources and the cost which reflects the rate they were reserved at.

The queries make use of the following columns from Athena:

line_item_usage_start_date The beginning timestamp of the line item usage. Used to filter resource usage within a date range and to aggregate on usage window.
line_item_usage_end_date The ending timestamp of the line item usage. Used to filter resource usage within a date range and to aggregate on usage window.
line_item_resource_id An ID, also called the provider ID, is given to line items that are instantiated resources.
line_item_line_item_type The type of a line item, used to determine if the resource usage is covered by a savings plan and has a discounted price.
line_item_usage_type What is being used in a line item, for the purposes of a compute resource this, is the type of VM and where it is running
line_item_product_code The service that a line item is from. Used to filter out items that are not from EC2.
reservation_reservation_a_r_n Amazon Resource Name for reservation of line item, the presence of this value is used to identify a resource as being part of a reservation plan.
line_item_unblended_cost The undiscounted cost of a resource.
savings_plan_savings_plan_effective_cost The cost of a resource discounted by a savings plan
reservation_effective_cost The cost of a resource discounted by a reservation

On-Demand/Savings plan query

This query is grouped by six columns:

line_item_usage_start_date
line_item_usage_end_date
line_item_resource_id
line_item_line_item_type
line_item_usage_type
line_item_product_code

The columns line_item_unblended_cost and savings_plan_savings_plan_effective_cost are summed on this grouping. Finally, the query filters out rows that are not within a given date range, have a missing line_item_resource_id, and have a line_item_product_code not equal to "AmazonEC2". The grouping has three important aspects, the timeframe of the line items, the resource as defined by the resource id, and the usage type, which is later used to determine the proper cost of the resources as it was used. This means that line items are grouped according to the resource, the time frame of the usage, and the rate at which the usage was charged.

Reservation query

The reservation query is grouped on five columns:

line_item_usage_start_date
line_item_usage_end_date
reservation_reservation_a_r_n
line_item_resource_id
line_item_product_code

The query is summed on the reservation_effective_cost and filtered by the date window, for missing reservation_reservation_a_r_n values and also removes line items with line_item_product_code not equal to "AmazonEC2". This grouping is on resource id by timeframe removing all non-reservation line items.

Processing query results

The on-demand query is categorized into different resource types: compute, network, storage, and others. The network is identified by the presence of the "byte" in the line_item_usage_type. Compute and storage are identified by the presence of "i-" and "vol-" prefixes in line_item_resource_id respectively. Non compute values are removed from the results. Out of the two costs aggregated by this query the correct one to use is determined by the line_item_line_item_type, if it has a value of "SavingsPlanCoveredUsage", then the savings_plan_savings_plan_effective_cost is used as the cost, and if not then the line_item_unblended_cost is used.

In the reservation query, all of the results are of the compute category and there is only the reservation_effective_cost to use as a cost.

These results are then merged into one set, with the provider id used to associate the cost with other information about the resource.

Why doesn't this match the AWS Cost Explorer?

There are several different ways to look at your node cost data. The default for the cost explorer is Unblended" but it makes the most sense from an allocation perspective to use the amortized rates. Be sure Amortized costs is selected when looking at cost data. Here's an example of how they can vary dramatically on our test cluster.

The t2-mediums here are covered by a savings plan. Unblended, the cost is only $0.06/day for two.

When Amortized costs is selected, the price jumps to $1.50/day

This should closely match our data on the Assets page, for days where we have adjustments come in from the pricing CUR.

Azure Cloud Billing Integration

Connecting your Azure account to Kubecost allows you to view Kubernetes metrics side-by-side with out-of-cluster (OOC) costs (e.g. Azure Database Services). Additionally, it allows Kubecost to reconcile measured Kubernetes spend with your actual Azure bill. This gives teams running Kubernetes a complete and accurate picture of costs. For more information, read Cloud Billing Integrations and this blog post.

To configure Kubecost's Azure Cloud Integration, you will need to set up daily exports of cost reports to Azure storage. Kubecost will then access your cost reports through the Azure Storage API to display your OOC cost data alongside your in-cluster costs.

A GitHub repository with sample files used in below instructions can be found here.

Step 1: Export Azure cost report

Follow Azure's Create and Manage Exported Data tutorial to export cost reports. For Metric, make sure you select Amortized cost (Usage and Purchases). For Export type, make sure you select Daily export of month-to-date costs. Do not select File Partitioning. Also, take note of the Account name and Container specified when choosing where to export the data to. Note that a successful cost export will require Microsoft.CostManagementExports to be registered in your subscription.

Alternatively, you can follow this Kubecost guide.

It will take a few hours to generate the first report, after which Kubecost can use the Azure Storage API to pull that data.

Once the cost export has successfully executed, verify that a non-empty CSV file has been created at this path: <STORAGE_ACCOUNT>/<CONTAINER_NAME>/<OPTIONAL_CONTAINER_PATH>/<COST_EXPORT_NAME>/<DATE_RANGE>/<CSV_FILE>.

If you have sensitive data in an existing Azure Storage account, it is recommended to create a separate Azure Storage account to store your cost data export.

For more granular billing data it is possible to scope Azure cost exports to resource groups, management groups, departments, or enrollments. AKS clusters will create their own resource groups which can be used. This functionality can then be combined with Kubecost multi-cloud to ingest multiple scoped billing exports.

Step 2: Provide access to Azure Storage API

Obtain the following values from Azure to provide to Kubecost. These values can be located in the Azure Portal by selecting Storage Accounts, then selecting your specific Storage account for details.

azureSubscriptionID is the "Subscription ID" belonging to the Storage account which stores your exported Azure cost report data.
azureStorageAccount is the name of the Storage account where the exported Azure cost report data is being stored.
azureStorageAccessKey can be found by selecting Access keys in your Storage account left navigation under "Security + networking". Using either of the two keys will work.
azureStorageContainer is the name that you chose for the exported cost report when you set it up. This is the name of the container where the CSV cost reports are saved in your Storage account.
azureContainerPath is an optional value which should be used if there is more than one billing report that is exported to the configured container. The path provided should have only one billing export because Kubecost will retrieve the most recent billing report for a given month found within the path.
azureCloud is an optional value which denotes the cloud where the storage account exist, possible values are public and gov. The default is public.

Next, create a JSON file which must be named cloud-integration.json with the following format:

{
    "azure": [
        {
            "azureSubscriptionID": "AZ_cloud_integration_subscriptionId",
            "azureStorageAccount": "AZ_cloud_integration_azureStorageAccount",
            "azureStorageAccessKey": "AZ_cloud_integration_azureStorageAccessKey",
            "azureStorageContainer": "AZ_cloud_integration_azureStorageContainer",
            "azureContainerPath": "",
            "azureCloud": "public/gov"
        }
    ]
}

Additional details about the cloud-integration.json file can be found in our multi-cloud integration doc.

Next, create the Secret:

$ kubectl create secret generic <SECRET_NAME> --from-file=cloud-integration.json -n kubecost

Next, ensure the following are set in your Helm values:

kubecostProductConfigs:
  cloudIntegrationSecret: <SECRET_NAME>

Next, upgrade Kubecost via Helm:

$ helm upgrade kubecost kubecost/cost-analyzer -n kubecost -f values.yaml

You can verify a successful configuration by checking the following in the Kubecost UI:

The Assets dashboard will be broken down by Kubernetes assets.
The Assets dashboard will no longer show a banner that says "External cloud cost not configured".
The Diagnostics page (via Settings > View Full Diagnostics) view will show a green checkmark under Cloud Integrations.

If there are no in-cluster costs for a particular day, then there will not be out-of-cluster costs either

Step 3: Tagging Azure resources

Kubecost utilizes Azure tagging to allocate the costs of Azure resources outside of the Kubernetes cluster to specific Kubernetes concepts, such as namespaces, pods, etc. These costs are then shown in a unified dashboard within the Kubecost interface.

To allocate external Azure resources to a Kubernetes concept, use the following tag naming scheme:

Kubernetes Concept

Azure Tag Key

Azure Tag Value

Cluster

kubernetes_cluster

cluster-name

Namespace

kubernetes_namespace

namespace-name

Deployment

kubernetes_deployment

deployment-name

Label

kubernetes_label_NAME*

label-value

DaemonSet

kubernetes_daemonset

daemonset-name

Pod

kubernetes_pod

pod-name

Container

kubernetes_container

container-name

In the kubernetes_label_NAME tag key, the NAME portion should appear exactly as the tag appears inside of Kubernetes. For example, for the tag app.kubernetes.io/name, this tag key would appear as kubernetes_label_app.kubernetes.io/name.

To use an alternative or existing Azure tag schema, you may supply these in your values.yaml under the kubecostProductConfigs.labelMappingConfigs.<aggregation>_external_label . Also be sure to set kubecostProductConfigs.labelMappingConfigs.enabled = true

For more details on what Azure resources support tagging, along with what resource type tags are available in cost reports, please review the official Microsoft documentation here.

Troubleshooting and debugging

To troubleshoot a configuration that is not yet working:

$ kubectl get secrets -n kubecost to verify you've properly configured cloud-integration.json.
$ helm get values kubecost to verify you've properly set .Values.kubecostProductConfigs.cloudIntegrationSecret
Verify that a non-empty CSV file has been created at this path in your Azure Portal Storage Account: <STORAGE_ACCOUNT>/<CONTAINER_NAME>/<OPTIONAL_CONTAINER_PATH>/<COST_EXPORT_NAME>/<DATE_RANGE>/<CSV_FILE>. Ensure new CSVs are being generated every day.
When opening a cost report CSV, ensure that there are rows in the file that do not have a MeterCategory of “Virtual Machines” or “Storage” as these items are ignored because they are in cluster costs. Additionally, make sure that there are items with a UsageDateTime that matches the date you are interested in.

When reviewing logs:

The following error is reflective of Kubecost's previous Azure Cloud Integration method and can be safely disregarded.
ERR Error, Failed to locate azure storage config file: /var/azure-storage-config/azure-storage-config.json

Azure Rate Card Configuration

Kubecost needs access to the Microsoft Azure Billing Rate Card API to access accurate pricing data for your Kubernetes resources.

You can also get this functionality plus external costs by completing the full .

Creating a custom Azure role

Start by creating an Azure role definition. Below is an example definition, replace YOUR_SUBSCRIPTION_ID with the Subscription ID where your Kubernetes cluster lives:

{
    "Name": "KubecostRole",
    "IsCustom": true,
    "Description": "Rate Card query role",
    "Actions": [
        "Microsoft.Compute/virtualMachines/vmSizes/read",
        "Microsoft.Resources/subscriptions/locations/read",
        "Microsoft.Resources/providers/read",
        "Microsoft.ContainerService/containerServices/read",
        "Microsoft.Commerce/RateCard/read"
    ],
    "AssignableScopes": [
        "/subscriptions/YOUR_SUBSCRIPTION_ID"
    ]
}

Save this into a file called myrole.json.

Next, you'll want to register that role with Azure:

az role definition create --verbose --role-definition @myrole.json

Creating an Azure service principal

Next, create an Azure service principal.

az ad sp create-for-rbac --name "KubecostAccess" --role "KubecostRole" --scope "/subscriptions/YOUR_SUBSCRIPTION_ID" --output json

Keep this information which is used in the service-key.json below.

Supplying Azure service principal details to Kubecost

Option 1: Via a Kubernetes Secret (Recommended)

{
    "subscriptionId": "<Azure Subscription ID>",
    "serviceKey": {
        "appId": "<Entra ID App ID>",
        "displayName": "KubecostAccess",
        "password": "<Entra ID Client Secret>",
        "tenant": "<Entra Tenant ID>"
    }
}

Next, create a Secret for the Azure Service Principal

When managing the service account key as a Kubernetes Secret, the secret must reference the service account key JSON file, and that file must be named service-key.json.

kubectl create secret generic azure-service-key -n kubecost --from-file=service-key.json

Finally, set the kubecostProductConfigs.serviceKeySecretName Helm value to the name of the Kubernetes secret you created. We use the value azure-service-key in our examples.

Option 2: Via Helm values

kubecostProductConfigs:
  azureSubscriptionID: <Azure Subscription ID>
  azureClientID: <Entra ID App ID>
  azureTenantID: <Entra Tenant ID>
  azureClientPassword: <Entra ID Client Secret>
  azureOfferDurableID: MS-AZR-0003P
  azureBillingRegion: US
  currencyCode: USD
  createServiceKeySecret: true

Or at the command line:

helm upgrade --install kubecost kubecost/cost-analyzer -n kubecost \
  --set kubecostProductConfigs.azureSubscriptionID=<Azure Subscription ID> \
  --set kubecostProductConfigs.azureClientID=<Entra ID App ID> \
  --set kubecostProductConfigs.azureTenantID=<Entra Tenant ID> \
  --set kubecostProductConfigs.azureClientPassword=<Entra ID Client Secret> \
  --set kubecostProductConfigs.azureOfferDurableID=MS-AZR-0003P \
  --set kubecostProductConfigs.azureBillingRegion=US
  --set kubecostProductConfigs.currencyCode=USD
  --set kubecostProductConfigs.createServiceKeySecret=true

Azure billing region, offer durable ID, and currency

Kubecost supports querying the Azure APIs for cost data based on the region, offer durable ID, and currency defined in your Microsoft Azure offer.

Those properties are configured with the following Helm values:

kubecostProductConfigs.azureBillingRegion
kubecostProductConfigs.azureOfferDurableID
kubecostProductConfigs.currencyCode

Be sure to verify your billing information with Microsoft and update the above Helm values to reflect your bill to country, subscription offer durable ID/number, and currency.

GCP Cloud Billing Integration

Kubecost provides the ability to allocate out-of-cluster (OOC) costs, e.g. Cloud SQL instances and Cloud Storage buckets, back to Kubernetes concepts like namespaces and deployments.

Read the doc for more information on how Kubecost connects with cloud service providers.

The following guide provides the steps required for allocating OOC costs in GCP.

A GitHub repository with sample files used in the below instructions can be found .

Step 1: Enable billing data export

Begin by reviewing on exporting cloud billing data to BigQuery.

GCP users must create a to gain access to all Kubecost CloudCost features including . Exports of type "Standard usage cost data" and "Pricing Data" do not have the correct information to support CloudCosts.

Step 2: Create a GCP service account

If you are using the alternative method, Step 2 is not required.

If your Big Query dataset is in a different project than the one where Kubecost is installed, please see the section on .

Add a service account key to allocate OOC resources (e.g. storage buckets and managed databases) back to their Kubernetes owners. The service account needs the following:

roles/bigquery.user
roles/compute.viewer
roles/bigquery.dataViewer
roles/bigquery.jobUser

If you don't already have a GCP service account with the appropriate rights, you can run the following commands in your command line to generate and export one. Make sure your GCP project is where your external costs are being run.

export PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create compute-viewer-kubecost --display-name "Compute Read Only Account Created For Kubecost" --format json
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com --role roles/compute.viewer
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com --role roles/bigquery.user
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com --role roles/bigquery.dataViewer
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com --role roles/bigquery.jobUser

Step 3: Connecting GCP service account to Kubecost

After creating the GCP service account, you can connect it to Kubecost in one of two ways before configuring:

Option 3.1: Connect using Workload Identity Federation (recommended)

NAMESPACE is the namespace Kubecost is installed into
KSA_NAME is the name of the service account attributed to the Kubecost deployment

gcloud iam service-accounts add-iam-policy-binding compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member "serviceAccount:$PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"

Option 3.2: Connect using a service account key

Create a service account key:

gcloud iam service-accounts keys create ./compute-viewer-kubecost-key.json --iam-account compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com

Once the GCP service account has been connected, set up the remaining configuration parameters.

Step 4. Configuring GCP for Kubecost

You're almost done. Now it's time to configure Kubecost to finalize your connectivity.

Option 4.1: Configuring using values.yaml (recommended)

kubecostProductConfigs:
  projectID: "$PROJECT_ID"
  bigQueryBillingDataDataset: "YOUR_DATASET.YOUR_TABLE_NAME"

If you've connected using Workload Identity Federation, add these configs:

# Ensure Kubecost deployment runs on nodes that use Workload Identity
nodeSelector:
  iam.gke.io/gke-metadata-server-enabled: "true"
# Add annotations to all kubecost-related serviceaccounts
serviceAccount:
  annotations:
    iam.gke.io/gcp-service-account: "compute-viewer-kubecost@$PROJECT_ID.iam.gserviceaccount.com"

Otherwise, if you've connected using a service account key, create a secret for the GCP service account key you've created and add the following configs:

kubectl create secret generic gcp-secret -n kubecost --from-file=./compute-viewer-kubecost-key.json

kubecostProductConfigs:
  gcpSecretName: "gcp-secret"

When managing the service account key as a Kubernetes secret, the secret must reference the service account key JSON file, and that file must be named compute-viewer-kubecost-key.json.

Option 4.2: Configuring via the Kubecost UI

In Kubecost, select Settings from the left navigation, and under Cloud Integrations, select Add Cloud Integration > GCP, then provide the relevant information in the GCP Billing Data Export Configuration window:

GCP Service Key: Optional field. If you've created a service account key, copy the contents of the compute-viewer-kubecost-key.json file and paste them here. If you've connected using Workload Identity federation in Step 3, you should leave this box empty.
GCP Project Id: The ID of your GCP project.
GCP Billing Database: Requires a BigQuery dataset prefix (e.g. billing_data) in addition to the BigQuery table name. A full example is billing_data.gcp_billing_export_resource_v1_XXXXXX_XXXXXX_XXXXX

Be careful when handling your service key! Ensure you have entered it correctly into Kubecost. Don't lose it or let it become publicly available.

Step 5: Label cloud assets

Cluster:    "kubernetes_cluster" :   <clusterID>
Namespace:  "kubernetes_namespace" : <namespace>
Deployment: "kubernetes_deployment": <deployment>
Label:      "kubernetes_label_NAME": <label>
Pod:        "kubernetes_pod":        <pod>
Daemonset:  "kubernetes_daemonset":  <daemonset>
Container:  "kubernetes_container":  <container>

Google generates special labels for GKE resources (e.g. "goog-gke-node", "goog-gke-volume"). Values with these labels are excluded from OOC costs because Kubecost already includes them as in-cluster assets. Thus, to make sure all cloud assets are included, we recommend installing Kubecost on each cluster where insights into costs are required.

Viewing project-level labels

If a resource has a label with the same name as a project-level label, the resource label value will take precedence.

Modifications incurred on project-level labels may take several hours to update on Kubecost.

Cross-project service account configuration

Due to organizational constraints, it is common that Kubecost must be run in a separate project from the project containing the billing data Big Query dataset, which is needed for Cloud Integration. Configuring Kubecost in this scenario is still possible, but some of the values in the above script will need to be changed. First, you will need the project id of the projects where Kubecost is installed, and the Big Query dataset is located. Additionally, you will need a GCP user with the permissions iam.serviceAccounts.setIamPolicy for the Kubecost project and the ability to manage the roles listed above for the Big Query Project. With these, fill in the following script to set the relevant variables:

export KUBECOST_PROJECT_ID=<Project ID where kubecost is installed>
export BIG_QUERY_PROJECT_ID=<Project ID where bigquery data is stored>
export SERVICE_ACCOUNT_NAME=<Unique name for your service account>

Once these values have been set, this script can be run and will create the service account needed for this configuration.

gcloud config set project $KUBECOST_PROJECT_ID
gcloud iam service-accounts create $SERVICE_ACCOUNT_NAME --display-name "Cross Project CUR" --format json
gcloud projects add-iam-policy-binding $BIG_QUERY_PROJECT_ID --member serviceAccount:$SERVICE_ACCOUNT_NAME@$KUBECOST_PROJECT_ID.iam.gserviceaccount.com --role roles/compute.viewer
gcloud projects add-iam-policy-binding $BIG_QUERY_PROJECT_ID --member serviceAccount:$SERVICE_ACCOUNT_NAME@$KUBECOST_PROJECT_ID.iam.gserviceaccount.com --role roles/bigquery.user
gcloud projects add-iam-policy-binding $BIG_QUERY_PROJECT_ID --member serviceAccount:$SERVICE_ACCOUNT_NAME@$KUBECOST_PROJECT_ID.iam.gserviceaccount.com --role roles/bigquery.dataViewer
gcloud projects add-iam-policy-binding $BIG_QUERY_PROJECT_ID --member serviceAccount:$SERVICE_ACCOUNT_NAME@$KUBECOST_PROJECT_ID.iam.gserviceaccount.com --role roles/bigquery.jobUser

Now that your service account is created follow the normal configuration instructions.

Troubleshooting

Account labels not showing up in partitions

`InvalidQuery` 400 error for GCP integration

In cases where Kubecost does not detect a connection following GCP integration, revisit Step 1 and ensure you have enabled detailed usage cost, not standard usage cost. Kubecost uses detailed billing cost to display your OOC spend, and if it was not configured correctly during installation, you may receive errors about your integration.

Creating a Google Service Account

In order to create a Google service account for use with Thanos, navigate to the and select IAM & Admin > Service Accounts.

From here, select the option Create Service Account.

Provide a service account name, ID, and description, then select Create and Continue.

You should now be at the Service account permissions (optional) page. Select the first Role dropdown and select Storage Object Creator. Select Add Another Role, then select Storage Object Viewer from the second dropdown. Select Continue.

You should now be prompted to allow specific accounts access to this service account. This should be based on specific internal needs and is not a requirement. You can leave this empty and select Done.

Create a key

Once back to the Service accounts page, select the Actions icon > Manage keys. Then, select the Add Key dropdown and select Create new key. A Create private key window opens.

Select JSON as the Key type and select Create. This will download a JSON service account key entry for use with the Thanos object-store.yaml mentioned in the initial setup step.

Accessing Kubecost with GCP Workload Identity

Certain features of Kubecost, including Savings Insights like Orphaned Resources and Reserved Instances, require access to the cluster's GCP account. This is usually indicated by a 403 error from Google APIs which is due to 'insufficient authentication scopes'. Viewing this error in the Kubecost UI will display the cause of the error as "ACCESS_TOKEN_SCOPE_INSUFFICIENT".

To obtain access to these features, follow this tutorial which will show you how to configure your Google IAM Service Account and Workload Identity for your application.

Creating a GCP IAM Service Account

1. Creating an API Key

Go to your GCP Console and select APIs & Services > Credentials from the left navigation. Select + Create Credentials > API Key.

On the Credentials page, select the icon in the Actions column for your newly-created API key, then select Edit API key. The Edit API key page opens.

Under ‘API restrictions’, select Restrict key, then from the dropdown, select only Cloud Billing API. Select OK to confirm. Then select Save at the bottom of the page.

2. Configuring Workload Identity

From here, consult Google Cloud's guide Use Workload Identity to perform the following steps:

Enable Workload Identity on an existing GCP cluster, or spin up a new cluster which will have Workload Identity enabled by default
Migrate any existing workloads to Workload Identity
Configure your applications to use Workload Identity
Create both a Kubernetes service account (KSA) and an IAM service account (GSA).
Annotate the KSA with the email of the GSA.
Update your pod spec to use the annotated KSA, and ensure all nodes on that workload use Workload Identity.

You can stop once you have modified your pod spec (before 'Verify the Workload Identity Setup'). You should now have a GCP cluster with Workload Identity enabled, and both a KSA and a GSA, which are connected via the role roles/iam.workloadIdentityUser.

3. Updating your IAM service account

In the GCP Console, select IAM & Admin > IAM. Find your newly-created GSA and select the Edit Principal pencil icon. You will need to provide the following roles to this service account:

BigQuery Data Viewer
BigQuery Job User
BigQuery User
Compute Viewer
Service Account Token Creator

Select Save.

The following roles need to be added to your IAM service account:

roles/bigquery.user
roles/compute.viewer
roles/bigquery.dataViewer
roles/bigquery.jobUser
roles/iam.serviceAccountTokenCreator

Use this command to add each role individually to the GSA:

gcloud projects add-iam-policy-binding --member=serviceAccount:<your-iam-service-account-email>@<your-project>.iam.gserviceaccount.com --role=<role/foo.bar>

From here, restart the pod(s) to confirm your changes. You should now have access to all expected Kubecost functionality through your service account with Identity Workload.

Multi-Cluster

Kubecost Free can now be installed on an unlimited number of individual clusters. Larger teams will benefit from using Kubecost Enterprise to better manage many clusters. See for more details.

Primary and secondary clusters

In an Enterprise multi-cluster setup, the UI is accessed through a designated primary cluster. All other clusters in the environment send metrics to a central object-store with a lightweight agent (aka secondary clusters). The primary cluster is designated by setting the Helm flag .Values.federatedETL.primaryCluster=true, which instructs this cluster to read from the combined folder that was processed by the federator. This cluster will consume additional resources to run the Kubecost UI and backend.

As of Kubecost 1.108, agent health is monitored by a that collects information from the local cluster and sends it to an object-store. This data is then processed by the Primary cluster and accessed via the UI and API.

Because the UI is only accessible through the primary cluster, Helm flags related to UI display are not applied to secondary clusters.

Enterprise Federation

This feature is only supported for Kubecost Enterprise.

There are two primary methods to aggregate all cluster information back to a single Kubecost UI:

Both methods allow for greater compute efficiency by running the most resource-intensive workloads on a single primary cluster.

For environments that already have a Prometheus instance, ETL Federation may be preferred because only a single Kubecost pod is required.

The below diagrams highlight the two architectures:

Kubecost ETL Federation (Preferred)

Kubecost Thanos Federation

ETL Federation (preferred)

Federated ETL is only officially supported for Kubecost Enterprise plans.

Federated extract, transform, load (ETL) is one of two methods to aggregate all cluster information back to a single display described in our doc. Federated ETL gives teams the benefit of combining multiple Kubecost installations into one view without dependency on Thanos.

There are two primary advantages for using ETL Federation:

For environments that already have a Prometheus instance, Kubecost only requires a single pod per monitored cluster
Many solutions that aggregate Prometheus metrics (like Thanos), are often expensive to scale in large environments

Kubecost ETL Federation diagram

Sample configurations

This guide has specific detail on how ETL Configuration works and deployment options.

Clusters

The federated ETL is composed of three types of clusters.

Federated Clusters: The clusters which are being federated (clusters whose data will be combined and viewable at the end of the federated ETL pipeline). These clusters upload their ETL files after they have built them to Federated Storage.
Federator Clusters: The cluster on which the Federator (see in Other components) is set to run within the core cost-analyzer container. This cluster combines the Federated Cluster data uploaded to federated storage into combined storage.
Primary Cluster: A cluster where you can see the total Federated data that was combined from your Federated Clusters. These clusters read from combined storage.

These cluster designations can overlap, in that some clusters may be several types at once. A cluster that is a Federated Cluster, Federator Cluster, and Primary Cluster will perform the following functions:

As a Federated Cluster, push local cluster cost data to be combined from its local ETL build pipeline.
As a Federator Cluster, run the Federator inside the cost-analyzer, which pulls this local cluster data from S3, combines them, then pushes them back to combined storage.
As a Primary Cluster, pull back this combined data from combined storage to serve it on Kubecost APIs and/or the Kubecost frontend.

Other components

The Storages referred to here are an S3 (or GCP/Azure equivalent) storage bucket which acts as remote storage for the Federated ETL Pipeline.

Federated Storage: A set of folders on paths <bucket>/federated/<cluster id> which are essentially ETL backup data, holding a “copy” of Federated Cluster data. Federated Clusters push this data to Federated Storage to be combined by the Federator. Federated Clusters write this data, and the Federator reads this data.
Combined Storage: A folder on S3 on the path <bucket>/federated/combined which holds one set of ETL data containing all the allocations/assets in all the ETL data from Federated Storage. The Federator takes files from Federated Storage and combines them, adding a single set of combined ETL files to Combined Storage to be read by the Primary Cluster. The Federator writes this data, and the Primary Cluster reads this data.
The Federator: A component of the cost-model which is run on the Federator Cluster, which can be a Federated Cluster, a Primary Cluster, or neither. The Federator takes the ETL binaries from Federated Storage and merges them, adding them to Combined Storage.
Federated ETL: The pipeline containing the above components.

Federated ETL architecture

This diagram shows an example setup of the Federated ETL with:

Three pure Federated Clusters (not classified as any other cluster type): Cluster 1, Cluster 2, and Cluster 3
One Federator Cluster that is also a Federated Cluster: Cluster 4
One Primary Cluster that is also a Federated Cluster: Cluster 5

The result is 5 clusters federated together.

Setup

Step 0: Ensure unique cluster IDs

Ensure each federated cluster has a unique clusterName and cluster_id:

kubecostProductConfigs:
  clusterName: federated-one
prometheus:
  server:
    global:
      external_labels:
        cluster_id: federated-one

Step 1: Storage configuration

For any cluster in the pipeline (Federator, Federated, Primary, or any combination of the three), create a file federated-store.yaml with the same format used for Thanos/S3 backup.
Add a secret using that file: kubectl create secret generic <secret_name> -n kubecost --from-file=federated-store.yaml. Then set .Values.kubecostModel.federatedStorageConfigSecret to the kubernetes secret name.

Using an existing `object-store.yaml`

This method is not recommended, as it would enable the ETL Backup pipeline to run in addition to the the Federated ETL pipeline. If not configured correctly, there may be adverse effects on how ETLs are loaded into your primary.

If you have an existing storage configuration set via .Values.kubecostModel.etlBucketConfigSecret, you can re-use that existing config by setting the following values:

kubecostModel:
  etlBucketConfigSecret: "my-object-store-secret"
federatedETL:
  useExistingS3Config: true
  redirectS3Backup: true

Step 2: Cluster configuration (Federated/Federator)

For all clusters you want to federate together (i.e. see their data on the Primary Cluster), set .Values.federatedETL.federatedCluster to true. This cluster is now a Federated Cluster, and can also be a Federator or Primary Cluster.
For the cluster “hosting” the Federator, set .Values.federatedETL.federator.enabled to true. This cluster is now a Federator Cluster, and can also be a Federated or Primary Cluster.
- Optional: If you have any Federated Clusters pushing to a store that you do not want a Federator Cluster to federate, add the cluster id under the Federator config section .Values.federatedETL.federator.clusters. If this parameter is empty or not set, the Federator will take all ETL files in the /federated directory and federate them automatically.
- Multiple Federators federating from the same source will not break, but it’s not recommended.

Step 3: Cluster configuration (Primary)

In Kubecost, the Primary Cluster serves the UI and API endpoints as well as reconciling cloud billing (cloud-integration).

For the cluster that will be the Primary Cluster, set .Values.federatedETL.primaryCluster to true. This cluster is now a Primary Cluster, and can also be a Federator or Federated Cluster.
Cloud-integration requires .Values.federatedETL.federator.primaryClusterID set to the same value used for .Values.kubecostProductConfigs.clusterName
- Important: If the Primary Cluster is also to be federated, please wait 2-3 hours for data to populate Federated Storage before setting a Federated Cluster to primary (i.e. set .Values.federatedETL.federatedCluster to true, then wait to set .Values.federatedETL.primaryCluster to true). This allows for maximum certainty of data consistency.
- If you do not set this cluster to be federated as well as primary, you will not see local data for this cluster.
- The Primary Cluster’s local ETL will be overwritten with combined federated data.
  - This can be undone by unsetting it as a Primary Cluster and rebuilding ETL.
  - Setting a Primary Cluster may result in a loss of the cluster’s local ETL data, so it is recommended to back up any filestore data that one would want to save to S3 before designating the cluster as primary.
  - Alternatively, a fresh Kubecost install can be used as a consumer of combined federated data by setting it as the Primary but not a Federated Cluster.

Step 4: Verifying successful configuration

The Federated ETL should begin functioning. On any ETL action on a Federated Cluster (Load/Put into local ETL store) the Federated Clusters will add data to Federated Storage. The Federator will run 5 minutes after the Federator Cluster startup, and then every 30 minutes after that. The data is merged into the Combined Storage, where it can be read by the Primary.
- To verify Federated Clusters are uploading their data correctly, check the container logs on a Federated Cluster. It should log federated uploads when ETL build steps run. The S3 bucket can also be checked to see if data is being written to the /federated/<cluster_id> path.
- To verify the Federator is functioning, check the container logs on the Federator Cluster. The S3 bucket can also be checked to verify that data is being written to /federated/combined.
- To verify the entire pipeline is working, either query Allocations/Assets or view the respective views on the frontend. Multi-cluster data should appear after:
  - The Federator has run at least once.
  - There was data in the Federated Storage for the Federator to have combined.

Setup with internal certificate authority

If you are using an internal certificate authority (CA), follow this tutorial instead of the above Setup section.

Begin by creating a ConfigMap with the certificate provided by the CA on every agent, including the Federator and any federated clusters, and name the file kubecost-federator-certs.yaml.

apiVersion: v1
data:
  ca-certificates.crt: |-
    # CA Cert
    -----BEGIN CERTIFICATE-----
    abc . . . . . . . . . . . . . .
    . . . . . . . . . . . . . . . .
    . . . . . . . . . . . . . . . .
    -----END CERTIFICATE-----

    # Root Cert
    -----BEGIN CERTIFICATE-----
    xyz . . . . . . . . . . . . . .
    . . . . . . . . . . . . . . . .
    . . . . . . . . . . . . . . . .
    -----END CERTIFICATE-----

kind: ConfigMap
  name: kubecost-federator-certs
  namespace: kubecost

Now run the following command, making sure you specify the location for the ConfigMap you created:

kubectl create cm kubecost-federator-certs --from-file=/path/to/kubecost-federator-certs.yaml

Mount the certification on the Federator and any federated clusters by passing these Helm flags to your values.yaml/manifest:

extraVolumes:
  - name: kubecost-federator-certs
    configMap:
      name: kubecost-federator-certs
extraVolumeMounts:
  - name: kubecost-federator-certs
    mountPath: /path/to/ca-certificates.crt
    subPath: ca-certificates.crt

federatedETL:
  federator:
    extraVolumes:
      - name: kubecost-federator-certs
        configMap:
          name: kubecost-federator-certs
    extraVolumeMounts:
      - name: kubecost-federator-certs
        mountPath: /path/to/ca-certificates.crt
        subPath: ca-certificates.crt

Create a file federated-store.yaml, which will go on all clusters:

type: S3
config:
  bucket: "kubecost-storage"
  endpoint: <S3 endpoint>
  region: <region>
  aws_sdk_auth: true                                      
  insecure: false
  signature_version2: false
  put_user_metadata:
    "X-Amz-Acl": "bucket-owner-full-control"
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
    tls_config:                                          
      ca_file: "/path/to/ca-certificates.crt"            
      cert_file: "CERT.pem"                              
      key_file: "KEY.PEM"                                 
      insecure_skip_verify: false                         
  trace:
    enable: true
  part_size: 134217728          
  sts_endpoint: <STS endpoint>

Now run the following command (omit kubectl create namespace kubecost if your kubecost namespace already exists, or this command will fail):

kubectl create namespace kubecost
kubectl create secret generic \
  kubecost-object-store -n kubecost \
  --from-file=federated-store.yaml

Kubecost Aggregator

Aggregator is a new backend for Kubecost. It is used in a configuration without Thanos, replacing the component. Aggregator serves a critical subset of Kubecost APIs, but will eventually be the default model for Kubecost and serve all APIs. Currently, Aggregator supports all major monitoring and savings APIs, and also budgets and reporting.

Existing documentation for Kubecost APIs will use endpoints for non-Aggregator environments unless otherwise specified, but will still be compatible after configuring Aggregator.

Aggregator is designed to accommodate queries of large-scale datasets by improving API load times and reducing UI errors. It is not designed to introduce new functionality; it is meant to improve functionality at scale.

Aggregator is currently free for all Enterprise users to configure, and is always able to be rolled back.

Configuring Aggregator

Prerequisites

Aggregator can only be configured in a Federated ETL environment
Must be using v1.107.0 of Kubecost or newer
Your values.yaml file must have set kubecostDeployment.queryServiceReplicas to its default value 0.
You must have your context set to your primary cluster. Kubecost Aggregator cannot be deployed on secondary clusters.

Tutorial

Select from one of the two templates below and save the content as federated-store.yaml. This will be your configuration template required to set up Aggregator.

The name of the .yaml file used to create the secret must be named federated-store.yaml or Aggregator will not start.

Basic configuration:

kubecostAggregator:
  replicas: 1
  enabled: true
  cloudCost:
    enabled: true
federatedETL:
  federatedCluster: true
kubecostModel:
  containerStatsEnabled: true
  cloudCost:
    enabled: false
  federatedStorageConfigSecret: federated-store
kubecostProductConfigs:
  clusterName: YOUR_CLUSTER_NAME
  cloudIntegrationSecret: cloud-integration
  productKey:
    enabled: true
    key: YOUR_KEY
prometheus:
  server:
    global:
      external_labels:
        cluster_id: YOUR_CLUSTER_NAME
# when using managed identity/irsa, set the service account accordingly:
serviceAccount:
  create: false
  name: kubecost-irsa-sa

Advanced configuration (for larger deployments):

kubecostAggregator:
  replicas: 1
  enabled: true
  cloudCost:
    enabled: true
  env:
    # governs parallelism of derivation step
    # more threads speeds derivation, but requires significantly more 
    # log level
    # default: info
    LOG_LEVEL: info
  aggregatorDbStorage:
    # governs storage size of aggregator DB storage
    # !!NOTE!! disk performance is _critically important_ to aggregator performance
    # ensure disk is specd high enough, and check for bottlenecks
    # default: 128Gi
    storageRequest: 512Gi
federatedETL:
  federatedCluster: true
kubecostModel:
  containerStatsEnabled: true
  federatedStorageConfigSecret: federated-store
kubecostProductConfigs:
  clusterName: YOUR_CLUSTER_NAME
  cloudIntegrationSecret: cloud-integration
  productKey:
    enabled: true
    key: YOUR_KEY
prometheus:
  server:
    global:
      external_labels:
        cluster_id: YOUR_CLUSTER_NAME
# when using managed identity/irsa, set the service account accordingly:
serviceAccount:
  create: false
  name: kubecost-irsa-sa

There is no baseline for what is considered a larger deployment, which will be dependent on load times in your Kubecost environment.

Once you’ve configured your federated-store.yaml_, create a secret using the following command:

kubectl create secret generic federated-storage -n kubecost --from-file=federated-store.yaml

kubectl create secret generic cloud-integration -n kubecost --from-file=cloud-integration.json

Finally, upgrade your existing Kubecost installation. This command will install Kubecost if it does not already exist:

Upgrading your existing Kubecost using your configured federated-store.yaml_ file above will reset all existing Helm values configured in your values.yaml. If you wish to preserve any of those changes, append your values.yaml by adding the contents of your federated-store.yaml file into it, then replacing federated-store.yaml with values.yaml in the upgrade command below:

helm upgrade --install "kubecost-primary" \
--namespace kubecost-primary \
--repo https://kubecost.github.io/cost-analyzer/ cost-analyzer \
-f aggregator.yaml

Validating Aggregator pod is running successfully

When first enabled, the aggregator pod will ingest the last three years (if applicable) of ETL data from the federated-store. This may take several hours. Because the combined folder is ignored, the federator pod is not used here, but can still run if needed. You can run kubectl get pods and ensure the aggregator pod is running, but should still wait for all data to be ingested.

Backups and Alerting

Federated ETL Architecture is only officially supported on Kubecost Enterprise plans.

This doc provides recommendations to improve the stability and recoverability of your Kubecost data when deploying in a Federated ETL architecture.

Option 1: Increase Prometheus retention

Kubecost can rebuild its extract, transform, load (ETL) data using Prometheus metrics from each cluster. It is strongly recommended to retain local cluster Prometheus metrics that meet an organization's disaster recovery requirements.

prometheus:
  server:
    retention: 21d
  # Ensure the volume is large enough to hold all metrics
  persistentVolume:
    size: 32Gi
    enabled: true

Option 2: Metrics backup

For long term storage of Prometheus metrics, we recommend setting up a Thanos sidecar container to push Prometheus metrics to a cloud storage bucket.

# This is an abridged example. Full example in link below.
prometheus:
  server:
    extraArgs:
      storage.tsdb.min-block-duration: 2h
      storage.tsdb.max-block-duration: 2h
    extraVolumes:
    - name: object-store-volume
      secret:
        secretName: kubecost-thanos
    sidecarContainers:
    - name: thanos-sidecar
      image: thanosio/thanos:v0.30.2
      args:
        - sidecar
        - --prometheus.url=http://127.0.0.1:9090
        - --objstore.config-file=/etc/config/object-store.yaml
      volumeMounts:
      - name: object-store-volume
        mountPath: /etc/config
      - name: storage-volume
        mountPath: /data
        subPath: ""

Option 3: Bucket versioning

Use your cloud service provider's bucket versioning feature to take frequent snapshots of the bucket holding your Kubecost data (ETL files and Prometheus metrics).

Option 4: Alerting

Thanos Federation

This feature is only officially supported on Kubecost Enterprise plans.

Thanos is a tool to aggregate Prometheus metrics to a central object storage (S3 compatible) bucket. Thanos is implemented as a sidecar on the Prometheus pod on all clusters. Thanos Federation is one of two primary methods to aggregate all cluster information back to a single view as described in our Multi-Cluster article.

The preferred method for multi-cluster is ETL Federation. The configuration guide below is for Kubecost Thanos Federation, which may not scale as well as ETL Federation in large environments.

This guide will cover how to enable Thanos on your primary cluster, and on any additional secondary clusters.

Configuration

Follow steps here to enable all required Thanos components on a Kubecost primary cluster, including the Prometheus sidecar.
For each additional cluster, only the Thanos sidecar is needed.

Consider the following Thanos recommendations for secondaries:

* Reuse your existing storage bucket and access credentials.
* Do not deploy multiple instances of `thanos-compact`.
* Optionally deploy `thanos-bucket` in each additional cluster, but it is not required.
* Optionally disable `thanos.store` and `thanos.query` (Clusters with store/query disabled will only have access to their metrics but will still write to the global bucket.)

Thanos modules can be disabled in [thanos/values.yaml](https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/charts/thanos/values.yaml), or in [values-thanos.yaml](https://github.com/kubecost/cost-analyzer-helm-chart/blob/develop/cost-analyzer/values-thanos.yaml) if overriding these values from a values-thanos.yaml file supplied from the command line (`helm upgrade kubecost -f values.yaml -f values-thanos.yaml`), or by passing these parameters directly via Helm install or upgrade as follows:

```
--set thanos.compact.enabled=false --set thanos.bucket.enabled=false
```

You can also optionally disable `thanos.store`, `thanos.query` and `thanos.queryFrontend` with thanos/values.yaml or with these flags:

```
--set thanos.query.enabled=false --set thanos.store.enabled=false --set thanos.queryFrontend.enabled=false
```

Ensure you provide a unique identifier for prometheus.server.global.external_labels.cluster_id to have additional clusters be visible in the Kubecost product, e.g. cluster-two.

cluster_id can be replaced with another label (e.g. cluster) by modifying .Values.kubecostModel.promClusterIDLabel.

Follow the same verification steps available here.

Sample configurations for each cloud provider can be found here.

Architecture diagram

Configuring Thanos

This feature is only officially supported on Kubecost Enterprise plans.

Kubecost leverages Thanos and durable storage for three different purposes:

Centralize metric data for a global multi-cluster view into Kubernetes costs via a Prometheus sidecar
Allow for unlimited data retention
Backup Kubecost ETL data

To enable Thanos, follow these steps:

Step 1: Create object-store.yaml

This step creates the object-store.yaml file that contains your durable storage target (e.g. GCS, S3, etc.) configuration and access credentials. The details of this file are documented thoroughly in Thanos documentation.

We have guides for using cloud-native storage for the largest cloud providers. Other providers can be similarly configured.

Use the appropriate guide for your cloud provider:

Google Cloud Storage
AWS/S3
Azure

Step 2: Create object-store secret

Create a secret with the .yaml file generated in the previous step:

kubectl create secret generic kubecost-thanos -n kubecost --from-file=./object-store.yaml

Step 3: Unique Cluster ID

Each cluster needs to be labelled with a unique Cluster ID, which is done in two places.

values-clusterName.yaml

kubecostProductConfigs:
  clusterName: kubecostProductConfigs_clusterName
prometheus:
  server:
    global:
      external_labels:
        cluster_id: kubecostProductConfigs_clusterName

Step 4: Deploying Kubecost with Thanos

The Thanos subchart includes thanos-bucket, thanos-query, thanos-store, thanos-compact, and service discovery for thanos-sidecar. These components are recommended when deploying Thanos on the primary cluster.

These values can be adjusted under the thanos block in values-thanos.yaml. Available options are here: thanos/values.yaml

helm upgrade kubecost kubecost/cost-analyzer \
    --install \
    --namespace kubecost \
    -f https://raw.githubusercontent.com/kubecost/cost-analyzer-helm-chart/v1.108/cost-analyzer/values-thanos.yaml \
    -f values-clusterName.yaml

The thanos-store container is configured to request 2.5GB memory, this may be reduced for smaller deployments. thanos-store is only used on the primary Kubecost cluster.

To verify installation, check to see all Pods are in a READY state. View Pod logs for more detail and see common troubleshooting steps below.

Troubleshooting

Thanos sends data to the bucket every 2 hours. Once 2 hours have passed, logs should indicate if data has been sent successfully or not.

You can monitor the logs with:

kubectl logs --namespace kubecost -l app=prometheus -l component=server --prefix=true --container thanos-sidecar --tail=-1 | grep uploaded

Monitoring logs this way should return results like this:

[pod/kubecost-prometheus-server-xxx/thanos-sidecar] level=debug ts=2022-06-09T13:00:10.084904136Z caller=objstore.go:206 msg="uploaded file" from=/data/thanos/upload/BUCKETID/chunks/000001 dst=BUCKETID/chunks/000001 bucket="tracing: kc-thanos-store"

As an aside, you can validate the Prometheus metrics are all configured with correct cluster names with:

kubectl logs --namespace kubecost -l app=prometheus -l component=server --prefix=true --container thanos-sidecar --tail=-1 | grep external_labels

To troubleshoot the IAM Role Attached to the serviceaccount, you can create a Pod using the same service account used by the thanos-sidecar (default is kubecost-prometheus-server):

s3-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: s3-pod
  name: s3-pod
spec:
  serviceAccountName: kubecost-prometheus-server
  containers:
  - image: amazon/aws-cli
    name: my-aws-cli
    command: ['sleep', '500']

kubectl apply -f s3-pod.yaml
kubectl exec -i -t s3-pod -- aws s3 ls s3://kc-thanos-store

This should return a list of objects (or at least not give a permission error).

Cluster not writing data to thanos bucket

If a cluster is not successfully writing data to the bucket, review thanos-sidecar logs with the following command:

kubectl logs kubecost-prometheus-server-<your-pod-id> -n kubecost -c thanos-sidecar

Logs in the following format are evidence of a successful bucket write:

level=debug ts=2019-12-20T20:38:32.288251067Z caller=objstore.go:91 msg="uploaded file" from=/data/thanos/upload/BUCKET-ID/meta.json dst=debug/metas/BUCKET-ID.json bucket=kc-thanos

Stores not listed at the `/stores` endpoint

If thanos-query can't connect to both the sidecar and the store, you may want to directly specify the store gRPC service address instead of using DNS discovery (the default). You can quickly test if this is the issue by running:

kubectl edit deployment kubecost-thanos-query -n kubecost

and adding

--store=kubecost-thanos-store-grpc.kubecost:10901

to the container args. This will cause a query restart and you can visit /stores again to see if the store has been added.

If it has, you'll want to use these addresses instead of DNS more permanently by setting .Values.thanos.query.stores in values-thanos.yaml.

...
thanos:
  store:
    enabled: true
    grpcSeriesMaxConcurrency: 20
    blockSyncConcurrency: 20
    extraEnv:
      - name: GOGC
        value: "100"
    resources:
      requests:
        memory: "2.5Gi"
  query:
    enabled: true
    timeout: 3m
    # Maximum number of queries processed concurrently by query node.
    maxConcurrent: 8
    # Maximum number of select requests made concurrently per a query.
    maxConcurrentSelect: 2
    resources:
      requests:
        memory: "2.5Gi"
    autoDownsampling: false
    extraEnv:
      - name: GOGC
        value: "100"
    stores:
      - "kubecost-thanos-store-grpc.kubecost:10901"

Additional Troubleshooting

A common error is as follows, which means you do not have the correct access to the supplied bucket:

thanos-svc-account@project-227514.iam.gserviceaccount.com does not have storage.objects.list access to thanos-bucket., forbidden"

Assuming pods are running, use port forwarding to connect to the thanos-query-http endpoint:

kubectl port-forward svc/kubecost-thanos-query-http 8080:10902 --namespace kubecost

Then navigate to http://localhost:8080 in your browser. This page should look very similar to the Prometheus console.

If you navigate to Stores using the top navigation bar, you should be able to see the status of both the thanos-store and thanos-sidecar which accompanied the Prometheus server:

Also note that the sidecar should identify with the unique cluster_id provided in your values.yaml in the previous step. Default value is cluster-one.

The default retention period for when data is moved into the object storage is currently 2h. This configuration is based on Thanos suggested values. By default, it will be 2 hours before data is written to the provided bucket.

Instead of waiting 2h to ensure that Thanos was configured correctly, the default log level for the Thanos workloads is debug (it's very light logging even on debug). You can get logs for the thanos-sidecar, which is part of the prometheus-server Pod, and thanos-store. The logs should give you a clear indication of whether or not there was a problem consuming the secret and what the issue is. For more on Thanos architecture, view this resource.

Thanos Upgrade

Kubecost v1.67.0+ uses Thanos 0.15.0. If you're upgrading to Kubecost v1.67.0+ from an older version and using Thanos, with AWS S3 as your backing storage for Thanos, you'll need to make a small change to your Thanos Secret in order to bump the Thanos version to 0.15.0 before you upgrade Kubecost.

Thanos 0.15.0 has over 10x performance improvements, so this is recommended.

Your values-thanos.yaml needs to be updated to the new defaults here. The PR bumps the image version, adds the query-frontend component, and increases concurrency.

This is simplified if you're using our default values-thanos.yaml, which has the new configs already.

For the Thanos Secret you're using, the encrypt-sse line needs to be removed. Everything else should stay the same.

For example, view this sample config:

type: S3
config:
  bucket: ${bucket_name}
  endpoint: "s3.amazonaws.com"
  region: ${your_bucket_region}
  access_key: ${your_access_key}
  insecure: false
  signature_version2: false
  #encrypt_sse: false <-- THIS LINE NEEDS TO BE DELETED
  secret_key: ${your_secret_here}
  put_user_metadata:
      "X-Amz-Acl": "bucket-owner-full-control"
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: true
  part_size: 134217728

The easiest way to do this is to delete the existing secret and upload a new one:

kubectl delete secret -n kubecost kubecost-thanos

Update your secret .YAML file as above, and save it as object-store.yaml.

kubectl create secret generic kubecost-thanos -n kubecost --from-file=./object-store.yaml

Once this is done, you're ready to upgrade!

Long-Term Storage Configuration

AWS Multi-Cluster Storage Configuration

AWS/S3 Federation

Kubecost uses a shared storage bucket to store metrics from clusters, known as durable storage, in order to provide a single-pane-of-glass for viewing cost across many clusters. Multi-cluster is an enterprise feature of Kubecost.

There are multiple methods to provide Kubecost access to an S3 bucket. This guide has two examples:

Using a Kubernetes secret
Attaching an AWS Identity and Access Management (IAM) role to the service account used by Prometheus

Both methods require an S3 bucket. Our example bucket is named kc-thanos-store.

This is a simple S3 bucket with all public access blocked. No other bucket configuration changes should be required.

Once created, add an IAM policy to access this bucket. See our AWS Thanos IAM Policy doc for instructions.

Method 1: Kubernetes Secret Method

To use the Kubernetes secret method for allowing access, create a YAML file named object-store.yaml with contents similar to the following example. See region to endpoint mappings here.


type: S3
config:
  bucket: "kc-thanos-store"
  endpoint: "s3.amazonaws.com"
  region: "us-east-1"
  access_key: "<your-access-key>"
  secret_key: "<your-secret-key>"
  insecure: false
  signature_version2: false
  put_user_metadata:
      "X-Amz-Acl": "bucket-owner-full-control"
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: true
  part_size: 134217728

Method 2: Attach IAM role to Service Account Method

Instead of using a secret key in a file, many will want to use this method.

Attach the policy to the Thanos pods service accounts. Your object-store.yaml should follow the format below when using this option, which does not contain the secret_key and access_key fields.

type: S3
config:
  bucket: "kc-thanos-store"
  endpoint: "s3.amazonaws.com"
  region: "us-east-1"
  insecure: false
  signature_version2: false
  put_user_metadata:
      "X-Amz-Acl": "bucket-owner-full-control"
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: true
  part_size: 134217728

Then, follow this AWS guide to enable attaching IAM roles to pods.

You can define the IAM role to associate with a service account in your cluster by creating a service account in the same namespace as Kubecost and adding an annotation to it of the form eks.amazonaws.com/role-arn: arn:aws:iam::<AWS_ACCOUNT_ID>:role/<IAM_ROLE_NAME> as described here.

Once that annotation has been created, configure the following:

.Values.prometheus.serviceAccounts.server.create: false
.Values.prometheus.serviceAccounts.server.name: serviceAccount # to the name of your created service account
.Values.thanos.compact.serviceAccount: serviceAccount
.Values.thanos.store.serviceAccount: serviceAccount

Thanos Encryption With S3 and KMS

You can encrypt the S3 bucket where Kubecost data is stored in AWS via S3 and KMS. However, because Thanos can store potentially millions of objects, it is suggested that you use bucket-level encryption instead of object-level encryption. More details available in these external docs:

Troubleshooting

Visit the Configuring Thanos doc for troubleshooting help.

AWS Thanos IAM Policy

In order to create an AWS IAM policy for use with Thanos:

Navigate to the AWS console and select IAM.
Select Policies in the Navigation menu, then select Create Policy.
Add the following JSON in the policy editor:

Make sure to replace <your-bucket-name> with the name of your newly-created S3 bucket.

```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket-name>/*",
                "arn:aws:s3:::<your-bucket-name>"
            ]
        }
    ]
}
```

4. Select Review policy and name this policy, e.g. kc-thanos-store-policy.

Navigate to Users in IAM control panel, then select Add user.
Provide a username (e.g. kubecost-thanos-service-account) and select Programmatic access.
Select Attach existing policies directly, search for the policy name provided in Step 4, then create the user.
Capture your Access Key ID and secret in the view below:

If you don’t want to use a service account, IAM credentials retrieved from an instance profile are also supported. You must get both access key and secret key from the same method (i.e. both from service or instance profile). More info on retrieving credentials here.

Azure Long-Term Storage

To use Azure Storage as Thanos object store, you need to precreate a storage account from Azure portal or using Azure CLI. Follow the instructions from the Azure Storage Documentation.

Now create a .YAML file named object-store.yaml with the following format:

type: AZURE
config:
  storage_account: ""
  storage_account_key: ""
  container: ""
  # config.endpoint is only needed if primary blob service endpoint domain is not blob.core.windows.net
  # Example: blob.core.chinacloudapi.cn
  endpoint: ""
  max_retries: 0

GCP Long-Term Storage

Start by creating a new Google Cloud Storage bucket. The following example uses a bucket named thanos-bucket. Next, download a service account JSON file from Google's service account manager (steps).

Now create a YAML file named object-store.yaml in the following format, using your bucket name and service account details:

type: GCS
config:
  bucket: "thanos-bucket"
  service_account: |-
    {
      "type": "service_account",
      "project_id": "...",
      "private_key_id": "...",
      "private_key": "...",
      "client_email": "...",
      "client_id": "...",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://oauth2.googleapis.com/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": ""
    }

Note: Because this is a YAML file, it requires this specific indention.

Warning: Do not apply a retention policy to your Thanos bucket, as it will prevent Thanos compaction from completing.

Mulit-Cluster Diagnostics

This feature is currently in beta. It is enabled by default.

Multi-Cluster Diagnostics offers a single view into the health of all the clusters you currently monitor with Kubecost.

Health checks include, but are not limited to:

Whether Kubecost is correctly emitting metrics
Whether Kubecost is being scraped by Prometheus
Whether Prometheus has scraped the required metrics
Whether Kubecost's ETL files are healthy

Configuration

# This is an abridged example. Full example in link below.
diagnostics:
  enabled: true
  isDiagnosticsPrimary:
    enabled: true  # Only enable this on your primary cluster

# Ensure you have configured a unique CLUSTER_ID.
prometheus:
  server:
    global:
      external_labels:
        cluster_id: YOUR_CLUSTER_ID

# Ensure you have configured a storage config secret. Using `.Values.thanos.storeSecretName` would also work here.
kubecostModel:
  federatedStorageConfigSecret: federated-store

Architecture

The multi-cluster diagnostics feature is run as an independent deployment (i.e. deployment/kubecost-diagnostics). Each diagnostics deployment monitors the health of Kubecost and sends that health data to the central object store at the /diagnostics filepath.

API usage

The diagnostics API can be accessed through /model/multi-cluster-diagnostics?window=2d (or /model/mcd for short)

The window query parameter is required, which will return all diagnostics within the specified time window.

Multi-cluster Diagnostics API

GET http://<your-kubecost-address>/model/multi-cluster-diagnostics

The Multi-cluster Diagnostics API provides a single view into the health of all the clusters you currently monitor with Kubecost.

Path Parameters

{
    "code": 200,
    "data": {
        "overview": {
            "kubecostEmittingMetricDiagnosticPassed": true,
            "prometheusHasKubecostMetricDiagnosticPassed": true,
            "prometheusHasCadvisorMetricDiagnosticPassed": true,
            "prometheusHasKSMMetricDiagnosticPassed": true,
            "dailyAllocationEtlHealthyDiagnosticPassed": true,
            "dailyAssetEtlHealthyDiagnosticPassed": true,
            "kubecostPodsNotOOMKilledDiagnosticPassed": true,
            "kubecostPodsNotPendingDiagnosticPassed": false
        },
        "clusters": {
            "cluster_one": {
                "latestRun": "2023-12-12T22:42:32Z",
                "kubecostEmittingMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "prometheusHasKubecostMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "prometheusHasCadvisorMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "prometheusHasKSMMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "dailyAllocationEtlHealthy": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "dailyAssetEtlHealthy": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "kubecostPodsNotOOMKilled": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "kubecostPodsNotPending": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                }
            },
            "cluster_two": {
                "latestRun": "2023-12-12T22:40:17Z",
                "kubecostEmittingMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "prometheusHasKubecostMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "prometheusHasCadvisorMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "prometheusHasKSMMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "dailyAllocationEtlHealthy": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "dailyAssetEtlHealthy": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "kubecostPodsNotOOMKilled": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "kubecostPodsNotPending": {
                    "diagnosticPassed": false,
                    "numFailures": 52,
                    "firstFailureDate": "2023-12-12T18:25:09Z",
                    "diagnosticOutput": "RunDiagnostic: checkKubecostPodsNotPending: queryPrometheusCheckResultEmpty: the following query returned a non-empty result sum(kube_pod_status_phase{namespace='kubecost-etl-fed', phase='Pending'}) by (pod,namespace) > 0"
                }
            },
            "cluster_three": {
                "latestRun": "2023-12-12T22:40:15Z",
                "kubecostEmittingMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "prometheusHasKubecostMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "prometheusHasCadvisorMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "prometheusHasKSMMetric": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "dailyAllocationEtlHealthy": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "dailyAssetEtlHealthy": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "kubecostPodsNotOOMKilled": {
                    "diagnosticPassed": true,
                    "numFailures": 0,
                    "firstFailureDate": "",
                    "diagnosticOutput": ""
                },
                "kubecostPodsNotPending": {
                    "diagnosticPassed": false,
                    "numFailures": 52,
                    "firstFailureDate": "2023-12-12T18:24:42Z",
                    "diagnosticOutput": "RunDiagnostic: checkKubecostPodsNotPending: queryPrometheusCheckResultEmpty: the following query returned a non-empty result sum(kube_pod_status_phase{namespace='kubecost-etl-fed', phase='Pending'}) by (pod,namespace) > 0"
                }
            }
        }
    }
}

Secondary Clusters Guide

Secondary clusters use a minimal Kubecost deployment to send their metrics to a central storage-bucket (aka durable storage) that is accessed by the primary cluster to provide a single-pane-of-glass view into all aggregated cluster costs globally. This aggregated cluster view is exclusive to Kubecost Enterprise.

Kubecost's UI will appear broken when set to a secondary cluster. It should only be used for troubleshooting.

This guide explains settings that can be tuned in order to run the minimum Kubecost components to run Kubecost more efficiently.

See the section below for complete examples in our GitHub repo.

Kubecost Global

Disable product caching and reduce query concurrency with the following parameters:

--set kubecostModel.warmCache=false
--set kubecostModel.warmSavingsCache=false
--set kubecostModel.etl=false
--set kubecostModel.etlCloudAsset=false
--set kubecostModel.maxQueryConcurrency=1

Grafana

Grafana is not needed on secondary clusters.

--set global.grafana.enabled=false
--set global.grafana.proxy=false

Prometheus

Kubecost and its accompanying Prometheus collect a reduced set of metrics that allow for lower resource/storage usage than a standard Prometheus deployment.

The following configuration options further reduce resource consumption when not using the Kubecost frontend:

--set prometheus.server.retention=2d

Potentially reducing retention even further, metrics are sent to the storage-bucket every 2 hours.

You can tune prometheus.server.persistentVolume.size depending on scale, or outright disable persistent storage.

Thanos

Secondary clusters write to the global storage-bucket via the thanos-sidecar on the prometheus-server pod.

--set thanos.compact.enabled=false
--set thanos.bucket.enabled=false
--set thanos.query.enabled=false
--set thanos.queryFrontend.enabled=false
--set thanos.store.enabled=false

Node-Exporter

You can disable node-exporter and the service account if cluster/node rightsizing recommendations are not required.

Helm values

For reference, this secondary-clusters.yaml snippet is a list of the most common settings for efficient secondary clusters:

kubecostProductConfigs:
  clusterName: kubecostProductConfigs_clusterName
  # productKey not needed on secondary clusters
kubecostModel:
  warmCache: false
  warmSavingsCache: false
  etl: false
  etlCloudAsset: false
  maxQueryConcurrency: 1
global:
  grafana:
    enabled: false
    proxy: false
prometheus:
  server:
    global:
      external_labels:
        # cluster_id should be unique for all clusters and the same value as .kubecostProductConfigs.clusterName
        cluster_id: kubecostProductConfigs_clusterName
    retention: 2d
  # nodeExporter:
  #   enabled: false
  # serviceAccounts:
  #   nodeExporter:
  #     create: false
thanos:
  compact:
    enabled: false
  bucket:
    enabled: false
  query:
    enabled: false
  queryFrontend:
    enabled: false
  store:
    enabled: false

Additional resources

ETL Backup

We do not recommend enabling ETL Backup in conjunction with .

Kubecost's extract, transform, load (ETL) data is a computed cache based on Prometheus's metrics, from which the user can perform all possible Kubecost queries. The ETL data is stored in a persistent volume mounted to the kubecost-cost-analyzer pod.

There are a number of reasons why you may want to backup this ETL data:

To ensure a copy of your Kubecost data exists, so you can restore the data if needed
To reduce the amount of historical data stored in Prometheus/Thanos, and instead retain historical ETL data

Beginning in v1.100, this feature is enabled by default if you have Thanos enabled. To opt out, set .Values.kubecostModel.etlBucketConfigSecret="".

Option 1: Automated durable ETL backups and monitoring

Kubecost provides cloud storage backups for ETL backing storage. Backups are not the typical approach of "halt all reads/writes and dump the database." Instead, the backup system is a transparent feature that will always ensure that local ETL data is backed up, and if local data is missing, it can be retrieved from backup storage. This feature protects users from accidental data loss by ensuring that previously backed-up data can be restored at runtime.

Durable backup storage functionality is supported with a Kubecost Enterprise plan.

When the ETL pipeline collects data, it stores daily and hourly (if configured) cost metrics on a configured storage. This defaults to a PV-based disk storage, but can be configured to use external durable storage on the following providers:

AWS S3
Azure Blob Storage
Google Cloud Storage

Step 1: Create storage configuration secret

This configuration secret follows the same layout documented for Thanos .

You will need to create a file named object-store.yaml using the chosen storage provider configuration (documented below), and run the following command to create the secret from this file:

kubectl create secret generic <YOUR_SECRET_NAME> -n kubecost --from-file=object-store.yaml

The file must be named object-store.yaml.

Existing Thanos users

Setting .Values.kubecostModel.etlBucketConfigSecret=kubecost-thanos will enable the backup feature. This will back up all ETL data to the same bucket being used by Thanos.

type: S3
config:
  bucket: "my-bucket"
  endpoint: "s3.amazonaws.com"
  region: "us-west-2"
  access_key: "<AWS_ACCESS_KEY>"
  secret_key: "<AWS_SECRET_KEY>"
  insecure: false
  signature_version2: false
  put_user_metadata:
    "X-Amz-Acl": "bucket-owner-full-control"
prefix: ""  # Optional. Specify a path within the bucket (e.g. "kubecost/etlbackup").

Google Cloud Storage

type: GCS
config:
  bucket: "my-bucket"
  service_account: |-
    {
      "type": "service_account",
      "project_id": "project",
      "private_key_id": "abcdefghijklmnopqrstuvwxyz12345678906666",
      "private_key": "-----BEGIN PRIVATE KEY-----\...\n-----END PRIVATE KEY-----\n",
      "client_email": "project@kubecost.iam.gserviceaccount.com",
      "client_id": "123456789012345678901",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://oauth2.googleapis.com/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/kubecost%40gitpods.iam.gserviceaccount.com"
    }
prefix: ""  # Optional. Specify a path within the bucket (e.g. "kubecost/etlbackup").

Azure

type: AZURE
config:
  storage_account: "<STORAGE_ACCOUNT>"
  storage_account_key: "<STORAGE_ACCOUNT_KEY>"
  container: "my-bucket"
  endpoint: ""
prefix: ""  # Optional. Specify a path within the bucket (e.g. "kubecost/etlbackup").

S3 compatible tooling

Storj

type: S3
config:
  bucket: "my-bucket"
  endpoint: "gateway.storjshare.io"
  access_key: "<STORJ_ACCESS_KEY>"
  secret_key: "<STORJ_SECRET_KEY>"
  insecure: false
  signature_version2: false
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: true
  part_size: 134217728
prefix: ""  # Optional. Specify a path within the bucket (e.g. "kubecost/etlbackup").

Hitachi Content Platform (HCP)

For bucket, the value should be the folder created in the HCP endpoint bucket, not the pre-existing bucket name.

type: S3
config:
  bucket: "folder name"
  endpoint: "gateway.storjshare.io"
  access_key: "<HITACHI_ACCESS_KEY>"
  secret_key: "<HITACHI_SECRET_KEY>"
  insecure: false
  signature_version2: false
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: true
  part_size: 134217728
prefix: ""  # Optional. Specify a path within the bucket (e.g. "kubecost/etlbackup").

Step 2: Enable ETL backup in Helm values

If Kubecost was installed via Helm, ensure the following value is set.

kubecostModel:
  etlBucketConfigSecret: <YOUR_SECRET_NAME>

Compatibility

If you are using an existing disk storage option for your ETL data, enabling the durable backup feature will retroactively back up all previously stored data*. This feature is also fully compatible with the existing S3 backup feature.

If you are using a memory store for your ETL data with a local disk backup (kubecostModel.etlFileStoreEnabled: false), the backup feature will simply replace the local backup. In order to take advantage of the retroactive backup feature, you will need to update to file store (kubecostModel.etlFileStoreEnabled: true). This option is now enabled by default in the Helm chart.

Option 2: Manual backup via Bash script

To restore the backup, untar the results of the ETL backup script into the ETL directory pod.

kubectl cp -c cost-model <untarred-results-of-script>/bingen <kubecost-namespace>/<kubecost-pod-name>:/var/configs/db/etl

Monitoring

Currently, this feature is still in development, but there is currently a status card available on the Diagnostics page that will eventually show the status of the backup system:

Troubleshooting

Sharing ETL Backups

This document will describe why your Kubecost instance’s data can be useful to share with us, what content is in the data, and how to share it.

Kubecost product releases are tested and verified against a combination of generated/synthetic Kubernetes cluster data and examples of customer data that have been shared with us. Customers who share snapshots of their data with us help to ensure that product changes handle their specific use cases and scales. Because the Kubecost product for many customers is run as an on-prem service, with no data sharing back to us, we do not inherently have this data for many of our customers.

Sharing data with us requires an ETL backup executed by the customer in their own environment before the resulting data can be sent out. Kubecost's ETL is a computed cache built upon Prometheus metrics and cloud billing data, on which nearly all API requests made by the user and the Kubecost frontend currently rely. Therefore, the ETL data will contain metric data and identifying information for that metric (e.g. a container name, pod name, namespace, and cluster name) during a time window, but will not contain other information about containers, pods, clusters, cloud resources, etc. You can read more about these metric details in our doc.

The full methodology for creating the ETL backup can be found in our doc. Once these files have been backed up, the content will look as follows before compressing the data:

├── etl
│   ├── bingen
│   │   ├── allocations
│   │   │   ├── 1d # data chunks of 1 day
│   │   │   │   ├── filename: {start timestamp}-{end timestamp}
│   │   │   ├── 1h # data chunks of 1 hour
│   │   │   │   ├── filename: {start timestamp}-{end timestamp}
│   │   ├── assets
│   │   │   ├── 1d # data chunks of 1 day
│   │   │   │   ├── filename: {start timestamp}-{end timestamp}
│   │   │   ├── 1h # data chunks of 1 hour
│   │   │   │   ├── filename: {start timestamp}-{end timestamp}

Once the data is downloaded to the local disk from either the automated or manual ETL backup methods, the data must be converted to a gzip file. A suggested method for downloading the ETL backup and compressing it quickly is to use . Check out the tar syntax in that script if doing this manually without the script. When the compressed ETL backup is ready to share, please work with a Kubecost support engineer on sharing the file with us. Our most common approach is to use a Google Drive folder with access limited to you and the support engineer, but we recognize not all companies are open to this and will work with you to determine the most business-appropriate method.

If you are interested in reviewing the contents of the data, either before or after sending the ETL backup to us, you can find an example Golang implementation on how to read the .

Query Service Replicas

This feature is only supported on Kubecost Enterprise plans.

The query service replica (QSR) is a scale-out query service that reduces load on the cost-model pod. It allows for improved horizontal scaling by being able to handle queries for larger intervals, and multiple simultaneous queries.

Overview

The query service will forward /model/allocation and /model/assets requests to the Query Services StatefulSet.

The diagram below demonstrates the backing architecture of this query service and its functionality.

Requirements

ETL data source

There are three options that can be used for the source ETL Files:

Persistent volume on Kubecost primary instance

QSR uses persistent volume storage to avoid excessive S3 transfers. Data is retrieved from S3 hourly as new ETL files are created and stored in these PVs. The databaseVolumeSize should be larger than the size of the data in the S3 bucket.

When the pods start, data from the object-store is synced and this can take a significant time in large environments. During the sync, parts of the Kubecost UI will appear broken or have missing data. You can follow the pod logs to see when the sync is complete.

The default of 100Gi is enough storage for 1M pods and 90 days of retention. This can be adjusted:

kubecostDeployment:
  queryServiceReplicas: 2
  queryService:
    # default storage class
    storageClass: ""
    databaseVolumeSize: 100Gi
    configVolumeSize: 1G

Enabling QSR

Once the data store is configured, set kubecostDeployment.queryServiceReplicas to a non-zero value and perform a Helm upgrade.

Usage

Once QSR has been enabled, the new pods will automatically handle all API requests to /model/allocation and /model/assets.

Provider Installations

Amazon EKS Integration

is a managed container service to run and scale Kubernetes applications in the AWS cloud. In collaboration with Amazon EKS, Kubecost provides optimized bundle for Amazon EKS cluster cost visibility that enables customers to accurately track costs by namespace, cluster, pod or organizational concepts such as team or application. Customers can use their existing AWS support agreements to obtain support. Kubernetes platform administrators and finance leaders can use Kubecost to visualize a breakdown of their Amazon EKS cluster charges, allocate costs, and chargeback organizational units such as application teams.

In this article, you will learn more about how the Amazon EKS architecture interacts with Kubecost. You will also learn to deploy Kubecost on EKS using one of three different methods:

Deploy Kubecost on an Amazon EKS cluster using Amazon EKS add-on
Deploy Kubecost on an Amazon EKS cluster via Helm
Deploy Kubecost on an Amazon EKS Anywhere cluster using Helm

Architecture overview

User experience diagram:

Amazon EKS cost monitoring with Kubecost architecture:

Deploying Kubecost on an Amazon EKS cluster using Amazon EKS add-on

Prerequisites

Enable Kubecost add-on from AWS console

Enable Kubecost add-on using AWS CLI

On your workspace, run the following command to enable the Kubecost add-on for your Amazon EKS cluster:

You need to replace $YOUR_CLUSTER_NAME and $AWS_REGION accordingly with your actual Amazon EKS cluster name and AWS region.

aws eks create-addon --addon-name kubecost_kubecost --cluster-name $YOUR_CLUSTER_NAME --region $AWS_REGION

{
    "addon": {
        "addonName": "kubecost_kubecost",
        "clusterName": "$YOUR_CLUSTER_NAME",
        "status": "CREATING",
        "addonVersion": "v1.97.0-eksbuild.1",
        "health": {
            "issues": []
        },
        "addonArn": "arn:aws:eks:$AWS_REGION:xxxxxxxxxxxx:addon/$YOUR_CLUSTER_NAME/kubecost_kubecost/90c23198-cdd3-b295-c410-xxxxxxxxxxxx",
        "createdAt": "2022-12-01T12:18:26.497000-08:00",
        "modifiedAt": "2022-12-01T12:50:52.222000-08:00",
        "tags": {}
    }
}

To monitor the installation status, you can run the following command:

aws eks describe-addon --addon-name kubecost_kubecost --cluster-name $YOUR_CLUSTER_NAME --region $AWS_REGION

{
    "addon": {
        "addonName": "kubecost_kubecost",
        "clusterName": "$YOUR_CLUSTER_NAME",
        "status": "ACTIVE",
        "addonVersion": "v1.97.0-eksbuild.1",
        "health": {
            "issues": []
        },
        "addonArn": "arn:aws:eks:$AWS_REGION:xxxxxxxxxxxx:addon/$YOUR_CLUSTER_NAME/kubecost_kubecost/90c23198-cdd3-b295-c410-xxxxxxxxxxxx",
        "createdAt": "2022-12-01T12:18:26.497000-08:00",
        "modifiedAt": "2022-11-10T12:53:21.140000-08:00",
        "tags": {}
    }
}

The Kubecost add-on should be available in a few minutes. Run the following command to enable port-forwarding to expose the Kubecost dashboard:

kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090

Disable Kubecost add-on

To disable Kubecost add-on, you can run the following command:

aws eks delete-addon --addon-name kubecost_kubecost --cluster-name $YOUR_CLUSTER_NAME --region $AWS_REGION

Deploying Kubecost on an Amazon EKS cluster using Helm

To get started, you can follow these steps to deploy Kubecost into your Amazon EKS cluster in a few minutes using Helm.

Prerequisites:

Run the following command to create an IAM service account with the policies needed to use the Amazon EBS CSI Driver.

Remember to replace $CLUSTER_NAME with your actual cluster name.

eksctl create iamserviceaccount   \
    --name ebs-csi-controller-sa   \
    --namespace kube-system   \
    --cluster $CLUSTER_NAME   \
    --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy  \
    --approve \
    --role-only \
    --role-name AmazonEKS_EBS_CSI_DriverRole
export SERVICE_ACCOUNT_ROLE_ARN=$(aws iam get-role --role-name AmazonEKS_EBS_CSI_DriverRole --output json | jq -r '.Role.Arn')

Install the Amazon EBS CSI add-on for EKS using the AmazonEKS_EBS_CSI_DriverRole by issuing the following command:

eksctl create addon --name aws-ebs-csi-driver --cluster $CLUSTER_NAME \
    --service-account-role-arn $SERVICE_ACCOUNT_ROLE_ARN --force

After completing these prerequisite steps, you're ready to begin EKS integration.

Step 1: Install Kubecost on your Amazon EKS cluster

In your environment, run the following command from your terminal to install Kubecost on your existing Amazon EKS cluster:

helm upgrade -i kubecost \
oci://public.ecr.aws/kubecost/cost-analyzer --version <$VERSION> \
--namespace kubecost --create-namespace \
-f https://raw.githubusercontent.com/kubecost/cost-analyzer-helm-chart/develop/cost-analyzer/values-eks-cost-monitoring.yaml

To install Kubecost on Amazon EKS cluster on AWS Graviton2 (ARM-based processor), you can run following command:

helm upgrade -i kubecost \
oci://public.ecr.aws/kubecost/cost-analyzer --version <$VERSION> \
--namespace kubecost --create-namespace \
-f https://raw.githubusercontent.com/kubecost/cost-analyzer-helm-chart/develop/cost-analyzer/values-eks-cost-monitoring.yaml \
--set prometheus.configmapReload.prometheus.image.repository=jimmidyson/configmap-reload \
--set prometheus.configmapReload.prometheus.image.tag=v0.7.1

On the Amazon EKS cluster with mixed processor architecture worker nodes (AMD64, ARM64), this parameter can be used to schedule Kubecost deployment on ARM-based worker nodes: --set nodeSelector."beta\\.kubernetes\\.io/arch"=arm64

Step 2: Generate Kubecost dashboard endpoint

Run the following command to enable port-forwarding to expose the Kubecost dashboard:

kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090

Step 3: Access Monitoring dashboards

You can now access Kubecost's UI by visiting http://localhost:9090 in your local web browser. Here, you can monitor your Amazon EKS cluster cost and efficiency. Depending on your organization’s requirements and setup, you may have different options to expose Kubecost for internal access. There are a few examples that you can use for your references:

Deploying Kubecost on an EKS Anywhere cluster using Helm

Prerequisites:

Step 1: Install Kubecost on your Amazon EKS cluster

In your environment, run the following command from your terminal to install Kubecost on your existing Amazon EKS cluster:

helm upgrade -i kubecost \
oci://public.ecr.aws/kubecost/cost-analyzer --version <$VERSION> \
--namespace kubecost --create-namespace \
-f https://raw.githubusercontent.com/kubecost/cost-analyzer-helm-chart/develop/cost-analyzer/values-eks-cost-monitoring.yaml

To install Kubecost on an EKS-A cluster on AWS Graviton2 (ARM-based processor), you can run following command:

helm upgrade -i kubecost \
oci://public.ecr.aws/kubecost/cost-analyzer --version <$VERSION> \
--namespace kubecost --create-namespace \
-f https://raw.githubusercontent.com/kubecost/cost-analyzer-helm-chart/develop/cost-analyzer/values-eks-cost-monitoring.yaml \
--set prometheus.configmapReload.prometheus.image.repository=jimmidyson/configmap-reload \
--set prometheus.configmapReload.prometheus.image.tag=v0.7.1

Step 2: Generate Kubecost dashboard endpoint

Run the following command to enable port-forwarding to expose the Kubecost dashboard:

kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090

Step 3: Access Monitoring dashboards

You can now access Kubecost's UI by visiting http://localhost:9090 in your local web browser. Here, you can monitor your Amazon EKS cluster cost and efficiency through the Allocations and Assets pages.

Additional resources

Amazon EKS documentation:

AWS blog content:

Installation

Installing Kubecost

Alternative installation methods

Configuring Kubecost at install

Troubleshooting installation

Updating Kubecost

Deleting Kubecost

Next steps

First Time User Guide

Step 1: Integrate with your cloud provider(s)

Step 2: Review your data

Monitoring your cost billing

Step 3: Learn about protecting your data and spend

ETL Backup

Alerts and Health

Alerts

Health

Step 4: Multi-cluster and Federated setups

Learning more about Kubecost

Next Steps with Kubecost

Memory and storage

Setting requests and limits

Configure security of user access

Ingress controller

SSO/SAML/RBAC/OIDC

Using an existing node exporter

Deploying Kubecost without persistent volumes

Resource efficiency and idle costs

See also

Environment

Supported Kubernetes versions

Supported cluster types

Supported Cloud Providers

Helm Parameters

Method 1: Pass exact parameters via --set command-line flags

Method 2: Pass exact parameters via custom values file

Ingress Examples

Basic auth example

Non-root path example

ALB Example

Cloud Billing Integrations

Kubecost's cloud processes

Reconciliation

Visualize unreconciled costs

CloudCost

Managing cloud integrations

Adding a cloud integration

Deleting a cloud integration

Cloud integration configurations

Cloud account name aliasing

Cloud Stores

Multi-Cloud Integrations

Step 1: Set up cloud cost and usage reporting

Step 2: Create cloud integration secret

Azure

GCP

AWS

Alibaba

AWS Cloud Billing Integration

Quick Start for IRSA

Key AWS terminology

Cost and Usage Report integration

Step 1: Setting up a CUR

Step 2: Setting up Athena

Step 3: Setting up IAM permissions

Add via CloudFormation:

Add manually:

Step 4: Attaching IAM permissions to Kubecost

Step 5: Provide CUR config values to Kubecost

Option 1: Add config values via UI

Option 2: Add config values via Helm

Troubleshooting

Common Athena errors

Incorrect bucket in IAM Policy

outputLocation is not a valid S3 path

Query not supported

HTTPS Response error

Missing Athena column

Not a valid S3 path

Summary and pricing

Method 1: Pass exact parameters via `--set` command-line flags

Method 2: Pass exact parameters via custom `values` file

`InvalidQuery` 400 error for GCP integration