Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
To get started with Kubecost and OpenCost, visit our Installation page which will take you step by step through getting Kubecost set up.
This installation method is available for free and leverages the Kubecost Helm Chart. It provides access to all OpenCost and Kubecost community functionality and can scale to large clusters. This will also provide a token for trialing and retaining data across different Kubecost product tiers.
You can also install directly with the Kubecost Helm Chart with Helm v3.1+ using the following commands. This provides the same functionality as the step above but doesn't generate a product token for managing tiers or upgrade trials.
You can run Helm Template against the Kubecost Helm Chart to generate local YAML output. This requires extra effort when compared to directly installing the Helm Chart but is more flexible than deploying a flat manifest.
You can install via flat manifest. This install path is not recommended because it has limited flexibility for managing your deployment and future upgrades.
Lastly, you can deploy the open-source OpenCost project directly as a Pod. This install path provides a subset of free functionality and is available here. Specifically, this install path deploys the underlying cost allocation model without the same UI or access to enterprise functionality: cloud provider billing integration, RBAC/SAML support, and scale improvements in Kubecost.
Kubecost has a number of product configuration options that you can specify at install time in order to minimize the number of settings changes required within the product UI. This makes it simple to redeploy Kubecost. These values can be configured under kubecostProductConfigs
in our values.yaml. These parameters are passed to a ConfigMap that Kubecost detects and writes to its /var/configs
.
If you encounter any errors while installing Kubecost, first visit our Troubleshoot Install doc. If the error you are experiencing is not already documented here, or a solution is not found, contact our Support team at support@kubecost.com for more help.
Kubecost releases are scheduled on a near-monthly basis. You can keep up to date with new Kubecost updates and patches by following our release notes here.
After installing Kubecost, you will be able to update Kubecost with the following command, which will upgrade you to the most recent version:
You can upgrade or downgrade to a specific version of Kubecost with the following command:
To uninstall Kubecost and its dependencies, run the following command:
After successfully installing Kubecost, first time users should review our First Time User Guide to start immediately seeing the benefits of the product while also ensuring their workspace is properly set up.
After successfully installing Kubecost, new users should familiarize themselves with these onboarding steps to begin immediately realizing value. This doc will explain to you the core features and options you will have access to and direct you to other necessary docs groups that will help you get set up.
While certain steps in this article may be optional depending on your setup, these are recommended best practices for seeing the most value out of Kubecost as soon as possible.
Many Kubernetes adopters may have billing with cloud service providers (CSPs) that differs from public pricing. By default, Kubecost will detect the CSP of the cluster where it is installed and pull list prices for nodes, storage, and LoadBalancers across all major CSPs: Azure, AWS, and GCP.
However, Kubecost is also able to integrate these CSPs to receive the most accurate billing data. By completing a cloud integration, Kubecost is able to reconcile costs with your actual cloud bill to reflect enterprise discounts, Spot market prices, commitment discounts, and more.
New users should seek to integrate any and all CSPs they use into Kubecost. For an overview of cloud integrations and getting started, see our Cloud Billing Integrations doc. Once you have completed all necessary integrations, return to this article.
Due to the frequency of updates from providers, it can take anywhere from 24 to 48 hours to see adjusted costs.
Now that your base install and CSP integrations are complete, it's time to determine the accuracy against your cloud bill. Based on different methods of cost aggregation, Kubecost should assess your billing data within a 3-5% margin of error.
After enabling port-forwarding, you should have access to the Kubecost UI. Explore the different pages in the left navigation, starting with the Monitor dashboards. These pages, including Allocations, Assets, Clusters, and Cloud Costs, are comprised of different categories of cost spending, and allow you to apply customized queries for specific billing data. These queries can then be saved in the form of reports for future quick access. Each page of the Kubecost UI has more dedicated information in the Navigating the Kubecost UI section.
It's important to take precautions to ensure your billing data is preserved, and you know how to monitor your infrastructure's health.
Metrics reside in Prometheus, but extracting information for either the UI or through API responses directly from this store is not performant at scale. For this reason, the data is optimized and stored in a structure is called extract, transform, load, or ETL. Kubecost's definition of ETL usually will refer to this ETL process.
Like any other system, backup of critical data is a must, and backing up ETL is no exception. To address this, we offer a number of different options based on your product tier. Descriptions and instructions for our backup functionalities can be found in our ETL Backup doc.
Similar to most systems, monitoring health is vital. For this, we offer several means of monitoring the health of both Kubecost and the host cluster.
Alerts can be configured to enable a proactive approach to monitoring your spend, and can be distributed across different workplace communication tools including email, Slack, and Microsoft Teams. Alerts can establish budgets for your different types of spend and cost-efficiency, and warn you if those budgets are reached. These Alerts are able to be configured via Helm or directly in your Kubecost UI.
The Health page will display an overall cluster health score which assesses how reliably and efficiently your infrastructure is performing. Scores start at 100 and decrease based on how severe any present errors are.
Kubecost has multiple ways of supporting multi-cluster environments, which vary based on your Kubecost product tier.
Kubecost Free will only allow you to view a single cluster at a time in the Kubecost UI. However, you can connect multiple different clusters and switch through them using Kubecost's context switcher.
Kubecost Enterprise provides a "single-pane-of-glass" view which combines metrics across all clusters into a shared storage bucket. One cluster is designated as the primary cluster from which you view the UI, with all other clusters considered secondary. Attempting to view the UI through a secondary cluster will not display metrics across your entire environment.
It is recommended to complete the steps above for your primary cluster before adding any secondary clusters. To learn more about advanced multi-cluster/Federated configurations, see our Multi-Cluster doc.
After completing these primary steps, you are well on your way to being proficient in Kubecost. However, managing Kubernetes infrastructure can be complicated, and for that we have plenty more documentation to help. For advanced or optional configuration options, see our Next Steps with Kubecost guide which will introduce you to additional concepts.
Kubecost requires a Kubernetes cluster to be deployed.
Users should be running Kubernetes 1.20+.
Kubernetes 1.28 is officially supported as of v1.105.
Versions outside of the stated compatibility range may work, depending on individual configurations, but are untested.
Managed Kubernetes clusters (e.g. EKS, GKE, AKS) most common
Kubernetes distributions (e.g. OpenShift, DigitalOcean, Rancher, Tanzu)
Bootstrapped Kubernetes cluster
On-prem and air-gapped using custom pricing sheets
AWS (Amazon Web Services)
All regions supported, as shown in opencost/pkg/cloud/awsprovider.go
x86, ARM
GCP (Google Cloud Platform)
All regions supported, as shown in opencost/pkg/cloud/gcpprovider.go
x86
Azure (Microsoft)
All regions supported, as shown in opencost/pkg/cloud/azureprovider.go
x86
This list is certainly not exhaustive! This is simply a list of observations as to where our users run Kubecost based on their questions and feedback. Please contact us with any questions!
Once you have familiarized yourself with Kubecost and integrated with any cloud providers, it's time to move on to more advanced concepts. This doc provides commonly used product configurations and feature overviews to help get you up and running after the Kubecost product has been installed. You may be redirected to other Kubecost docs to learn more about specific concepts or follow tutorials.
The default Kubecost installation has a 32Gb persistent volume and a 15-day retention period for Prometheus metrics. This is enough space to retain data for roughly 300 pods, depending on your exact node and container count. See the Kubecost Helm chart configuration options to adjust both the retention period and storage size.
To determine the appropriate disk size, you can use this formula to approximate:
Where ingested samples can be measured as the average over a recent period, e.g. sum(avg_over_time(scrape_samples_post_metric_relabeling[24h]))
. On average, Prometheus uses around 1.5-2 bytes per sample. So, ingesting 100k samples per minute and retaining them for 15 days would demand around 40 GB. It’s recommended to add another 20-30% capacity for headroom and WAL. More info on disk sizing here.
More than 30 days of data should not be stored in Prometheus for larger clusters. For long-term data retention, contact us at support@kubecost.com about Kubecost with durable storage enabled. More info on Kubecost storage here.
Users should set and/or update resource requests and limits before taking Kubecost into production at scale. These inputs can be configured in the Kubecost values.yaml for Kubecost modules and subcharts.
The exact recommended values for these parameters depend on the size of your cluster, availability requirements, and usage of the Kubecost product. Suggested values for each container can be found within Kubecost itself on the namespace page. More info on these recommendations is available here.
For best results, run Kubecost for up to seven days on a production cluster, then tune resource requests/limits based on resource consumption.
To broaden usage to other teams or departments within your Kubecost environment, basic security measures will usually be required. There are a number of options for protecting your workspace depending on your Kubecost product tier.
Establishing an ingress controller will allow for control of access for your workspace. Learn more about enabling external access in Kubecost with our Ingress Examples doc.
SSO/SAML/RBAC/OIDC are only officially supported on Kubecost Enterprise plans.
You can configure SSO and RBAC on a separate baseline deployment, which will not only shorten the deployment time of security features, but it will also avoid unwanted access denial. This is helpful when using only one developer deployment. See our user management guides below:
For teams already running node exporter on the default port, our bundled node exporter may remain in a Pending
state. You can optionally use an existing node exporter DaemonSet by setting the prometheus.nodeExporter.enabled
and prometheus.serviceAccounts.nodeExporter.create
Kubecost Helm chart config options to false
. This requires your existing node exporter endpoint to be visible from the namespace where Kubecost is installed. More configs options shown here.
You may optionally pass the following Helm flags to install Kubecost and its bundled dependencies without any persistent volumes. However, any time the Prometheus server pod is restarted, all historical billing data will be lost unless Thanos or other long-term storage is enabled in the Kubecost product.
Efficiency and idle costs can teach you more about the cost-value of your Kubernetes spend by showing you how efficiently your resources are used. To learn more about pod resource efficiency and cluster idle costs, see Efficiency and Idle.
Often while using and configuring Kubecost, our documentation may ask you to pass certain Helm flag values. There are three different approaches for passing custom Helm values into your Kubecost product, which are explained in this doc. In these examples, we are updating the kubecostProductConfigs.productKey.key
Helm value which enables Kubecost Enterprise, however these methods will work for all other Helm flags.
--set
command-line flagsFor example, you can only pass a product key if that is all you need to configure.
values
fileSimilar to Method 1, you can create a separate values file that contains only the parameters needed.
Your values.yaml should look like this:
Then run your install command:
Enabling external access to the Kubecost product requires exposing access to port 9090 on the kubecost-cost-analyzer
pod. Exposing this endpoint will handle routing to Grafana as well. There are multiple ways to do this, including Ingress or Service definitions.
Please exercise caution when exposing Kubecost via an Ingress controller especially if there is no authentication in use. Consult your organization's internal recommendations.
Common samples below and others can be found on our .
The following example definitions use the NGINX .
Once an AWS Load Balancer (ALB) Controller is installed, you can use the following Ingress resource manifest pointed at the Kubecost cost-analyzer service:
This file contains the default Helm values that come with your Kubecost install. Taking this approach means you may need to sync with the repo to use the latest release. Be careful when applying certain Helm values related to your UI configuration to your secondary clusters. For more information, see this section in our Multi-Cluster doc about .
Here is a that uses a Kubernetes Secret.
When deploying Grafana on a non-root URL, you also need to update your grafana.ini to reflect this. More info can be found in .
Integration with cloud service providers (CSPs) via their respective billing APIs allows Kubecost to display out-of-cluster (OOC) costs (e.g. AWS S3, Google Cloud Storage, Azure Storage Account). Additionally, it allows Kubecost to reconcile Kubecost's in-cluster predictions with actual billing data to improve accuracy.
If you are using Kubecost Cloud, do not attempt to modify your install using information from this article. You need to consult Kubecost Cloud's specific cloud integration procedures which can be found here.
As indicated above, setting up a cloud integration with your CSP allows Kubecost to pull in additional billing data. The two processes that incorporate this information are reconciliation and CloudCost (formerly known as CloudUsage).
Reconciliation matches in-cluster assets with items found in the billing data pulled from the CSP. This allows Kubecost to display the most accurate depiction of your in-cluster spending. Additionally, the reconciliation process creates Network
assets for in-cluster nodes based on the information in the billing data. The main drawback of this process is that the CSPs have between a 6 to 24-hour delay in releasing billing data, and reconciliation requires a complete day of cost data to reconcile with the in-cluster assets. This requires a 48-hour window between resource usage and reconciliation. If reconciliation is performed within this window, asset cost is deflated to the partially complete cost shown in the billing data.
Cost-based metrics are based on on-demand pricing unless there is definitive data from a CSP that the node is not on-demand. This way estimates are as accurate as possible. If a new reserved instance is provisioned or a node joins a savings plan:
Kubecost continues to emit on-demand pricing until the node is added to the cloud bill.
Once the node is added to the cloud bill, Kubecost starts emitting something closer to the actual price.
For the time period where Kubecost assumed the node was on-demand but it was actually reserved, reconciliation fixes the price in ETL.
The reconciled assets will inherit the labels from the corresponding items in the billing data. If there exist identical label keys between the original assets and those of the billing data items, the label value of the original asset will take precedence.
Visit Settings, then toggle on Highlight Unreconciled Costs, then select Save at the bottom of the page to apply changes. Now, when you visit your Allocations or Assets dashboards, the most recent 36 hours of data will display hatching to signify unreconciled costs.
As of v1.106 of Kubecost, CloudCost is enabled by default, and Cloud Usage is disabled. Upgrading Kubecost will not affect the UI or hinder performance relating to this.
CloudCost allows Kubecost to pull in OOC cloud spend from your CSP's billing data, including any services run by the CSP as well as compute resources. By labelling OOC costs, their value can be distributed to your Allocations data as external costs. This allows you to better understand the proportion of OOC cloud spend that your in-cluster usage depends on.
Your cloud billing data is reflected in the aggregate costs of Account
, Provider
, Invoice Entity
, and Service
. Aggregating and drilling down into any of these categories will provide a subset of the entire bill, based on the Helm value .values.cloudCost.topNItems
, which will log 1,000 values. This subset is each days' top n
items by cost. An optional label list can be used to include or exclude items to be pulled from the bill.
CloudCost becomes available as soon as they appear in the billing data, with the 6 to 24-hour delay mentioned above, and are updated as they become more complete.
You can view your existing cloud integrations and their success status in the Kubecost UI by visiting Settings, then scrolling to Cloud Integrations. To create a new integration or learn more about existing integrations, select View additional details to go to the Cloud Integrations page.
Here, you can view your integrations and filter by successful or failed integrations. For non-successful integrations, Kubecost will display a diagnostic error message in the Status column to contextualize steps toward successful integration.
Select an individual integration to view a side panel that contains the most recent run, next run, refresh rate, and an exportable YAML of Helm configs for its CSP's integration values.
You can add a new cloud integration by selecting Add Integration. For guides on how to set up an integration for a specific CSP, follow these links to helpful Kubecost documentation:
Select an existing cloud integration, then in the slide panel that appears, select Delete.
The Kubecost Helm chart provides values that can enable or disable each cloud process on the deployment once a cloud integration has been set up. Turning off either of these processes will disable all the benefits provided by them.
Often an integrated cloud account name may be a series of random letter and numbers which do not reflect the account's owner, team, or function. Kubecost allows you to rename cloud accounts to create more readable cloud metrics in your Kubecost UI. After you have successfully integrated your cloud account (see above), you need to manually edit your values.yaml and provide the original account name and your intended rename:
You will see these changes reflected in Kubecost's UI on the Overview page under Cloud Costs Breakdown. These example account IDs could benefit from being renamed:
The ETL contains a Map of Cloud Stores, each representing an integration with a CSP. Each Cloud Store is responsible for the Cloud Usage and reconciliation pipelines which add OOC costs and adjust Kubecost's estimated cost respectively by cost and usage data pulled from the CSP. Each Cloud Store has a unique identifier called the ProviderKey
which varies depending on which CSP is being connected to and ensures that duplicate configurations are not introduced into the ETL. The value of the ProviderKey
is the following for each CSP at a scope that the billing data is being for:
AWS: Account Id
GCP: Project Id
Azure: Subscription Id
The ProviderKey
can be used as an argument for the endpoints for Cloud Usage and Reconciliation repair APIs, to indicate that the specified operation should only be done on a single Cloud Store rather than all of them, which is the default behavior. Additionally, the Cloud Store keeps track of the Status of the Cloud Connection Diagnostics for each of the Cloud Usage and reconciliation. The Cloud Connection Status is meant to be used as a tool in determining the health of the Cloud Connection that is the basis of each Cloud Store. The Cloud Connection Status has various failure states that are meant to provide actionable information on how to get your Cloud Connection running properly. These are the Cloud Connection Statuses:
INITIAL_STATUS: The zero value of Cloud Connection Status means that the cloud connection is untested. Once Cloud Connection Status has been changed and it should not return to this value. This status is assigned on creation to the Cloud Store
MISSING_CONFIGURATION: Kubecost has not detected any method of Cloud Configuration. This value is only possible on the first Cloud Store that is created as a wrapper for the open-source CSP. This status is assigned during failures in Configuration Retrieval.
INCOMPLETE_CONFIGURATION: Cloud Configuration is missing the required values to connect to the cloud provider. This status is assigned during failures in Configuration Retrieval.
FAILED_CONNECTION: All required Cloud Configuration values are filled in, but a connection with the CSP cannot be established. This is indicative of a typo in one of the Cloud Configuration values or an issue in how the connection was set up in the CSP's Console. The assignment of this status varies between CSPs but should happen if there if an error is thrown when an interaction with an object from the CSP's SDK occurs.
MISSING_DATA: The Cloud Integration is properly configured, but the CSP is not returning billing/cost and usage data. This status is indicative of the billing/cost and usage data export of the CSP being incorrectly set up or the export being set up in the last 48 hours and not having started populating data yet. This status is set when a query has been successfully made but the results come back empty. If the CSP already has a SUCCESSFUL_CONNECTION status, then this status should not be set because this indicates that the specific query made may have been empty.
SUCCESSFUL_CONNECTION: The Cloud Integration is properly configured and returning data. This status is set on any successful query where data is returned
After starting or restarting Cloud Usage or reconciliation, two subprocesses are started: one which fills in historic data over the coverage of the Daily CloudUsage and Asset Store, and one which runs periodically on a predefined interval to collect and process new cost and usage data as it is made available by the CSP. The ETL's status endpoint contains a cloud object that provides information about each Cloud Store including the Cloud Connection Status and diagnostic information about Cloud Usage and Reconciliation. The diagnostic items on the Cloud Usage and Reconciliation are:
Coverage: The window of time that the historical subprocess has covered
LastRun: The last time that the process ran, updates each time the periodic subprocess runs
NextRun: Next scheduled run of the periodic subprocess
Progress: Ratio of Coverage to Total amount of time to be covered
RefreshRate: The interval that the periodic subprocess runs
Resolution: The window size of the process
StartTime: When the Cloud Process was started
For more information on APIs related to rebuilding and repairing Cloud Usage or reconciliation, see the CloudCost Diagnostic APIs doc.
Multi-cloud integrations are only officially supported on Kubecost Enteprise plans.
This document outlines how to set up cloud integration for accounts on multiple cloud service providers (CSPs), or multiple accounts on the same cloud provider. This configuration can be used independently of, or in addition, to other cloud integration configurations provided by Kubecost. Once configured, Kubecost will display cloud assets for all configured accounts and perform reconciliation for all that have their respective accounts configured.
For each cloud account that you would like to configure, you will need to make sure that it is exporting cost data to its respective service to allow Kubecost to gain access to it.
Azure: Set up cost data export following this .
GCP: Set up BigQuery billing data exports with this .
AWS: Follow steps 1-3 to set up and configure a Cost and Usage Report (CUR) in our .
Alibaba: Create a user account with access to the .
The secret should contain a file named cloud-integration.json with the following format (only containing applicable CSPs in your setup):
This method of cloud integration supports multiple configurations per cloud provider simply by adding each cost export to their respective arrays in the .json file. The structure and required values for the configuration objects for each cloud provider are described below. Once you have filled in the configuration object, use the command:
Once the secret is created, set .Values.kubecostProductConfigs.cloudIntegrationSecret
to <SECRET_NAME>
and upgrade Kubecost via Helm.
The following values can be located in the Azure Portal under Cost Management > Exports, or Storage accounts:
azureSubscriptionID
is the Subscription ID belonging to the Storage account which stores your exported Azure cost report data.
azureStorageAccount
is the name of the Storage account where the exported Azure cost report data is being stored.
azureStorageAccessKey
can be found by selecting Access Keys from the navigation sidebar then selecting Show keys. Using either of the two keys will work.
azureStorageContainer
is the name that you chose for the exported cost report when you set it up. This is the name of the container where the CSV cost reports are saved in your Storage account.
azureContainerPath
is an optional value which should be used if there is more than one billing report that is exported to the configured container. The path provided should have only one billing export because Kubecost will retrieve the most recent billing report for a given month found within the path.
azureCloud
is an optional value which denotes the cloud where the storage account exists. Possible values are public
and gov
. The default is public
.
Set these values into the following object and add them to the Azure array:
If you don't already have a GCP service key for any of the projects you would like to configure, you can run the following commands in your command line to generate and export one. Make sure your GCP project is where your external costs are being run.
You can then get your service account key to paste into the UI:
<KEY_JSON>
is the GCP service key created above. This value should be left as a JSON when inserted into the configuration object
<PROJECT_ID>
is the Project ID in the GCP service key.
<BILLING_DATA_DATASET>
requires a BigQuery dataset prefix (e.g. billing_data
) in addition to the BigQuery table name. A full example is billing_data.gcp_billing_export_v1_018AIF_74KD1D_534A2
.
Set these values into the following object and add it to the GCP array:
Many of these values in this config can be generated using the following command:
Gather each of these values from the AWS console for each account you would like to configure.
<ACCESS_KEY_ID>
is the ID of the Access Key created in the previous step.
<ACCESS_KEY_SECRET>
is the secret of the Access Key created in the
<ATHENA_BUCKET_NAME>
is the S3 bucket storing Athena query results which Kubecost has permission to access. The name of the bucket should match s3://aws-athena-query-results-*
, so the IAM roles defined above will automatically allow access to it. The bucket can have a canned ACL set to Private or other permissions as needed.
<ATHENA_REGION>
is the AWS region Athena is running in
<ATHENA_DATABASE>
is the name of the database created by the Athena setup. The Athena database name is available as the value (physical id) of AWSCURDatabase
in the CloudFormation stack created above.
<ATHENA_TABLE>
is the name of the table created by the Athena setup The table name is typically the database name with the leading athenacurcfn_
removed (but is not available as a CloudFormation stack resource).
<ATHENA_WORKGROUP>
is the workgroup assigned to be used with Athena. Default value is Primary
.
<ATHENA_PROJECT_ID>
is the AWS AccountID where the Athena CUR is. For example: 530337586277
.
<MASTER_PAYER_ARN>
is an optional value which should be set if you are using a multi-account billing set-up and are not accessing Athena through the primary account. It should be set to the ARN of the role in the management (formerly master payer) account, for example: arn:aws:iam::530337586275:role/KubecostRole
.
Set these values into the following object and add them to the AWS array in the cloud-integration.json:
Additionally set the kubecostProductConfigs.athenaProjectID
Helm value to the AWS account that Kubecost is being installed in.
Kubecost does not support complete integrations with Alibaba, but you will still be able to view accurate list prices for cloud resources. Gather these following values from the Alibaba Cloud Console for your account:
clusterRegion
is the most used region
accountID
is your Alibaba account ID
serviceKeyName
is the RAM user key name
serviceKeySecret
is the RAM user secret
Set these values into the following object and add them to the Alibaba array in your cloud-integration.json:
There are many ways to integrate your AWS Cost and Usage Report (CUR) with Kubecost. This tutorial is intended as the best-practice method for users whose environments meet the following assumptions:
Kubecost will run in a different account than the AWS Payer Account
The IAM permissions will utilize AWS to avoid shared secrets
The configuration of Kubecost will be done using a cloud-integration.json file, and not via Kubecost UI (following infrastructure as code practices)
If this is not an accurate description of your environment, see our doc for more options.
This guide is a one-time setup per AWS payer account and is typically one per organization. It can be automated, but may not be worth the effort given that it will not be needed again.
Kubecost supports multiple AWS payer accounts as well as multiple cloud providers from a single Kubecost primary cluster. For multiple payer accounts, create additional entries inside the array below.
Detail for multiple cloud provider setups is .
cloud-integration.json
iam-payer-account-cur-athena-glue-s3-access.json
iam-payer-account-trust-primary-account.json
iam-access-cur-in-payer-account.json
Begin by opening cloud_integration.json, which should look like this:
Update athenaWorkgroup
to primary
, then save the file and close it. The remaining values will be obtained during this tutorial.
For time granularity, select Daily.
Select the checkbox to enable Resource IDs in the report.
Select the checkbox to enable Athena integration with the report.
Select the checkbox to enable the JSON IAM policy to be applied to your bucket.
If this CUR data is only used by Kubecost, it is safe to expire or delete the objects after seven days of retention.
AWS may take up to 24 hours to publish data. Wait until this is complete before continuing to the next step.
While you wait, update the following configuration files:
Update your cloud-integration.json file by providing a projectID
value, which will be the AWS payer account number where the CUR is located and where the Kubecost primary cluster is running.
Update your iam-payer-account-cur-athena-glue-s2-access.json file by replacing all instances of CUR_BUCKET_NAME
to the name of the bucket you created for CUR data.
Your S3 path prefix can be found by going to your AWS Cost and Usage Reports dashboard and selecting your bucket's report. In the Report details tab, you will find the S3 path prefix.
Once Athena is set up with the CUR, you will need to create a new S3 bucket for Athena query results. The bucket used for the CUR cannot be used for the Athena output.
Select Create bucket. The Create Bucket page opens.
Provide a name for your bucket. This is the value for athenaBucketName
in your cloud-integration.json file. Use the same region used for the CUR bucket.
Select Create bucket at the bottom of the page.
Select Settings, then select Manage. The Manage settings window opens.
Set Location of query result to the S3 bucket you just created, then select Save.
Navigate to Athena in the AWS Console. Be sure the region matches the one used in the steps above. Update your cloud-integration.json file with the following values. Use the screenshots below for help.
athenaBucketName
: the name of the Athena bucket your created in this step
athenaDatabase
: the value in the Database dropdown
athenaRegion
: the AWS region value where your Athena query is configured
athenaTable
: the partitioned value found in the Table list
For Athena query results written to an S3 bucket only accessed by Kubecost, it is safe to expire or delete the objects after one day of retention.
From the AWS payer account
In iam-payer-account-cur-athena-glue-s3-access.json, replace all ATHENA_RESULTS_BUCKET_NAME
instances with your Athena S3 bucket name (the default will look like aws-athena-query-results-xxxx
).
In iam-payer-account-trust-primary-account.json, replace SUB_ACCOUNT_222222222
with the account number of the account where the Kubecost primary cluster will run.
In the same location as your downloaded configuration files, run the following command to create the appropriate policy (jq
is not required):
Now we can obtain the last value masterPayerARN
for cloud-integration.json as the ARN associated with the newly-created IAM role, as seen below in the AWS console:
By arriving at this step, you should have been able to provide all values to your cloud-integration.json file. If any values are missing, reread the tutorial and follow any steps needed to obtain those values.
From the AWS Account where the Kubecost primary cluster will run
In iam-access-cur-in-payer-account.json, update PAYER_ACCOUNT_11111111111
with the AWS account number of the payer account and create a policy allowing Kubecost to assumeRole in the payer account:
Note the output ARN (used in the iamserviceaccount --attach-policy-arn
below):
Create a namespace and set environment variables:
Enable the OIDC-Provider:
Create the Kubernetes service account, attaching the assumeRole policy. Replace SUB_ACCOUNT_222222222
with the AWS account number where the primary Kubecost cluster will run.
Create the secret (in this setup, there are no actual secrets in this file):
Install Kubecost using the service account and cloud-integration secret:
It can take over an hour to process the billing data for large AWS accounts. In the short-term, follow the logs and look for a message similar to (7.7 complete)
, which should grow gradually to (100.0 complete)
. Some errors (ERR) are expected, as seen below.
Integrating Kubecost with your AWS data provides the ability to allocate out-of-cluster (OOC) costs, e.g. RDS instances and S3 buckets, back to Kubernetes concepts like namespace and deployment as well as reconcile cluster assets back to your billing data. The latter is especially helpful when teams are using Reserved Instances, Savings Plans, or Enterprise Discounts. All billing data remains on your cluster when using this functionality and is not shared externally. Read our doc for more information on how Kubecost connects with Cloud Service Providers.
The following guide provides the steps required for enabling OOC costs allocation and accurate pricing, e.g. reserved instance price allocation. In a multi-account organization, all of the following steps will need to be completed in the payer account.
You can learn how to perform this using our doc.
Kubecost utilizes AWS tagging to allocate the costs of AWS resources outside of the Kubernetes cluster to specific Kubernetes concepts, such as namespaces, pods, etc. These costs are then shown in a unified dashboard within the Kubecost interface.
To allocate external AWS resources to a Kubernetes concept, use the following tag naming scheme:
Kubernetes Concept | AWS Tag Key | AWS Tag Value |
---|
In the kubernetes_label_NAME
tag key, the NAME
portion should appear exactly as the tag appears inside of Kubernetes. For example, for the tag app.kubernetes.io/name
, this tag key would appear as kubernetes_label_app.kubernetes.io/name
.
To use an alternative or existing AWS tag schema, you may supply these in your under kubecostProductConfigs.labelMappingConfigs.\<aggregation\>\_external_label
. Also be sure to set kubecostProductConfigs.labelMappingConfigs.enabled=true
.
Tags may take several hours to show up in the Cost Allocations Tags section described in the next step.
Tags that contain :
in the key may be converted to _
in the Kubecost UI due to Prometheus readability. To use AWS Label Mapping Configs, use this mapping format:
In order to make the custom Kubecost AWS tags appear on the CURs, and therefore in Kubecost, individual cost allocation tags must be enabled. Details on which tags to enable can be found in Step 2.
Account-level tags are applied (as labels) to all the Assets built from resources defined under a given AWS account. You can filter AWS resources in the Kubecost Assets View (or API) by account-level tags by adding them ('tag:value') in the Label/Tag filter.
If a resource has a label with the same name as an account-level tag, the resource label value will take precedence.
Modifications incurred on account-level tags may take several hours to update on Kubecost.
Your AWS account will need to support the organizations:ListAccounts
and organizations:ListTagsForResource
policies to benefit from this feature.
In the Kubecost UI, view the Allocations dashboard. If external costs are not shown, open your browser's Developer Tools > Console to see any reported errors.
Query Athena directly to ensure data is available. Note: it can take up to 6 hours for data to be written.
Finally, review pod logs from the cost-model
container in the cost-analyzer
pod and look for auth errors or Athena query results.
Kubecost uses public pricing from Cloud Service Providers (CSPs) to calculate costs until the actual cloud bill is available, at which point Kubecost will reconcile your Spot prices from your Cost and Usage Report (CUR). This is almost always ready in 48 hours. Most users will likely prefer to configure instead of configuring the Spot data feed manually as demonstrated in this article.
However, if the majority of costs are due to Spot nodes, it may be useful to configure the Spot pricing data feed as it will increase accuracy for short-term (<48 hour) node costs until the Spot prices from the CUR are available. Note that all other (non-Spot) costs will still be based on public (on-demand) pricing until CUR billing data is reconciled.
With Kubecost, Spot pricing data can be pulled hourly by integrating directly with the AWS Spot feed.
First, to enable the AWS Spot data feed, follow AWS' doc.
While configuring, note the settings used as these values will be needed for the Kubecost configuration.
There are multiple options: this can either be set from the Kubecost UI or via .Values.kubecostProductConfigs
in the Helm chart. If you set any kubecostProductConfigs
from the Helm chart, all changes via the front end will be deleted on pod restart.
projectID
the Account ID of the AWS Account on which the Spot nodes are running.
awsSpotDataRegion
region of your Spot data bucket
awsSpotDataBucket
the configured bucket for the Spot data feed
awsSpotDataPrefix
optional configured prefix for your Spot data feed bucket
spotLabel
optional Kubernetes node label name designating whether a node is a Spot node. Used to provide pricing estimates until exact Spot data becomes available from the CUR
spotLabelValue
optional Kubernetes node label value designating a Spot node. Used to provide pricing estimates until exact Spot data becomes available from the CUR. For example, if your Spot nodes carry a label lifecycle:spot
, then the spotLabel
would be lifecycle
and the spotLabelValue
would be spot
In the UI, you can access these fields via the Settings page, then scroll to Cloud Cost Settings. Next to Spot Instance Configuration, select Update, then fill out all fields.
Spot data feeds are an account level setting, not a payer level. Every AWS Account will have its own Spot data feed. Spot data feed is not currently available in AWS GovCloud.
For Spot data written to an S3 bucket only accessed by Kubecost, it is safe to delete objects after three days of retention.
Kubecost requires read access to the Spot data feed bucket. The following IAM policy can be used to grant Kubecost read access to the Spot data feed bucket.
To attach the IAM policy to the Kubecost service account, you can use IRSA or the account's service key.
If your serviceaccount/kubecost-cost-analyzer
already has IRSA annotations attached, be sure to include all policies necessary when running this command.
Create a service-key.json as shown:
Create a K8s secret:
Set the following Helm config:
Verify the below points:
Make sure data is present in the Spot data feed bucket.
Make sure Project ID is configured correctly. You can cross-verify the values under Helm values in bug report
Check the value of kubecost_node_is_spot
in Prometheus:
"1" means Spot data instance configuration is correct.
"0" means not configured properly.
Is there a prefix? If so, is it configured in Kubecost?
Make sure the IAM permissions are aligned with https://github.com/kubecost/cloudformation/blob/7feace26637aa2ece1481fda394927ef8e1e3cad/kubecost-single-account-permissions.yaml#L36
Make sure the Spot data feed bucket has all permissions to access by Kubecost
The Spot Instance in the Spot data feed bucket should match the instance in the cluster where the Spot data feed is configured. awsSpotDataBucket
has to be present in the right cluster.
Value | Default | Description |
---|---|---|
A GitHub repository with sample files required can be found . Select the folder with the name of the cloud service you are configuring.
For each AWS account that you would like to configure, create an Access Key for the Kubecost user who has access to the CUR. Navigate to , and select Access Management > Users. Find the Kubecost user and select Security Credentials > Create Access Key. Note the Access Key ID and Secret access key.
To begin, download the recommended configuration template files from our . You will need the following files from this folder:
Follow the to create a CUR export using the settings below.
As part of the CUR creation process, Amazon creates a CloudFormation template that is used to create the Athena integration. It is created in the CUR S3 bucket under s3-path-prefix/cur-name
and typically has the filename crawler-cfn.yml. This .yml is your CloudFormation template. You will need it in order to complete the CUR Athena integration. You can read more about this .
Navigate to the .
Navigate to the dashboard.
For help with troubleshooting, see the section in our original .
For more information, consult AWS' .
To view examples of common label mapping configs, see .
For instructions on enabling user-defined cost allocation tags, consult AWS'
You may need to upgrade your AWS Glue if you are running an old version. See for more info.
.Values.kubecostModel.etlAssetReconciliationEnabled
true
Enables reconciliation processes and endpoints. This Helm value corresponds to the ETL_ASSET_RECONCILIATION_ENABLED
environment variable.
.Values.kubecostModel.etlCloudUsage
true
Enables Cloud Usage processes and endpoints. This Helm value corresponds to the ETL_CLOUD_USAGE_ENABLED
environment variable.
.Values.kubecostModel.etlCloudRefreshRateHours
6
The interval at which the run loop executes for both reconciliation and Cloud Usage. Reducing this value will decrease resource usage and billing data access costs, but will result in a larger delay in the most current data being displayed. This Helm value corresponds to the ETL_CLOUD_REFRESH_RATE_HOURS
environment variable.
.Values.kubecostModel.etlCloudQueryWindowDays
7
The maximum number of days that will be queried from a cloud integration in a single query. Reducing this value can help to reduce memory usage during the build process, but will also result in more queries which can drive up billing data access costs. This Helm value corresponds to the ETL_CLOUD_QUERY_WINDOW_DAYS
environment variable.
.Values.kubecostModel.etlCloudRunWindowDays
3
The number of days into the past each run loop will query. Reducing this value will reduce memory load, however, it can cause Kubecost to miss updates to the CUR, if this has happened the day will need to be manually repaired. This Helm value corresponds to the ETL_CLOUD_RUN_WINDOW_DAYS
environment variable.
Cluster | kubernetes_cluster | cluster-name |
Namespace | kubernetes_namespace | namespace-name |
Deployment | kubernetes_deployment | deployment-name |
Label | kubernetes_label_NAME* | label-value |
DaemonSet | kubernetes_daemonset | daemonset-name |
Pod | kubernetes_pod | pod-name |
Container | kubernetes_container | container-name |
Kubecost needs access to the Microsoft Azure Billing Rate Card API to access accurate pricing data for your Kubernetes resources.
You can also get this functionality plus external costs by completing the full Azure billing integration.
Start by creating an Azure role definition. Below is an example definition, replace YOUR_SUBSCRIPTION_ID
with the Subscription ID where your Kubernetes cluster lives:
Save this into a file called myrole.json.
Next, you'll want to register that role with Azure:
Next, create an Azure service principal.
Keep this information which is used in the service-key.json below.
Create a file called service-key.json and update it with the Service Principal details from the above steps:
Next, create a Secret for the Azure Service Principal
When managing the service account key as a Kubernetes Secret, the secret must reference the service account key JSON file, and that file must be named service-key.json
.
Finally, set the kubecostProductConfigs.serviceKeySecretName
Helm value to the name of the Kubernetes secret you created. We use the value azure-service-key
in our examples.
In the Helm values file:
Or at the command line:
Kubecost supports querying the Azure APIs for cost data based on the region, offer durable ID, and currency defined in your Microsoft Azure offer.
Those properties are configured with the following Helm values:
kubecostProductConfigs.azureBillingRegion
kubecostProductConfigs.azureOfferDurableID
kubecostProductConfigs.currencyCode
Be sure to verify your billing information with Microsoft and update the above Helm values to reflect your bill to country, subscription offer durable ID/number, and currency.
The following Microsoft documents are a helpful reference:
Kubecost provides the ability to allocate out-of-cluster (OOC) costs, e.g. Cloud SQL instances and Cloud Storage buckets, back to Kubernetes concepts like namespaces and deployments.
Read the Cloud Billing Integrations doc for more information on how Kubecost connects with cloud service providers.
The following guide provides the steps required for allocating OOC costs in GCP.
A GitHub repository with sample files used in the below instructions can be found here.
Begin by reviewing Google's documentation on exporting cloud billing data to BigQuery.
GCP users must create a detailed billing export to gain access to all Kubecost CloudCost features including reconciliation. Exports of type "Standard usage cost data" and "Pricing Data" do not have the correct information to support CloudCosts.
If you are using the alternative multi-cloud integration method, Step 2 is not required.
If your Big Query dataset is in a different project than the one where Kubecost is installed, please see the section on Cross-Project Service Accounts.
Add a service account key to allocate OOC resources (e.g. storage buckets and managed databases) back to their Kubernetes owners. The service account needs the following:
If you don't already have a GCP service account with the appropriate rights, you can run the following commands in your command line to generate and export one. Make sure your GCP project is where your external costs are being run.
After creating the GCP service account, you can connect it to Kubecost in one of two ways before configuring:
You can set up an IAM policy binding to bind a Kubernetes service account to your GCP service account as seen below, where:
NAMESPACE
is the namespace Kubecost is installed into
KSA_NAME
is the name of the service account attributed to the Kubecost deployment
You will also need to enable the IAM Service Account Credentials API in the GCP project.
Create a service account key:
Once the GCP service account has been connected, set up the remaining configuration parameters.
You're almost done. Now it's time to configure Kubecost to finalize your connectivity.
It is recommended to provide the GCP details in your values.yaml to ensure they are retained during an upgrade or redeploy. First, set the following configs:
If you've connected using Workload Identity Federation, add these configs:
Otherwise, if you've connected using a service account key, create a secret for the GCP service account key you've created and add the following configs:
When managing the service account key as a Kubernetes secret, the secret must reference the service account key JSON file, and that file must be named compute-viewer-kubecost-key.json.
In Kubecost, select Settings from the left navigation, and under Cloud Integrations, select Add Cloud Integration > GCP, then provide the relevant information in the GCP Billing Data Export Configuration window:
GCP Service Key: Optional field. If you've created a service account key, copy the contents of the compute-viewer-kubecost-key.json file and paste them here. If you've connected using Workload Identity federation in Step 3, you should leave this box empty.
GCP Project Id: The ID of your GCP project.
GCP Billing Database: Requires a BigQuery dataset prefix (e.g. billing_data
) in addition to the BigQuery table name. A full example is billing_data.gcp_billing_export_resource_v1_XXXXXX_XXXXXX_XXXXX
Be careful when handling your service key! Ensure you have entered it correctly into Kubecost. Don't lose it or let it become publicly available.
You can now label assets with the following schema to allocate costs back to their appropriate Kubernetes owner. Learn more here on updating GCP asset labels.
To use an alternative or existing label schema for GCP cloud assets, you may supply these in your values.yaml under the kubecostProductConfigs.labelMappingConfigs.<aggregation>_external_label
.
Google generates special labels for GKE resources (e.g. "goog-gke-node", "goog-gke-volume"). Values with these labels are excluded from OOC costs because Kubecost already includes them as in-cluster assets. Thus, to make sure all cloud assets are included, we recommend installing Kubecost on each cluster where insights into costs are required.
Project-level labels are applied to all the Assets built from resources defined under a given GCP project. You can filter GCP resources in the Kubecost Cloud Costs Explorer (or API).
If a resource has a label with the same name as a project-level label, the resource label value will take precedence.
Modifications incurred on project-level labels may take several hours to update on Kubecost.
Due to organizational constraints, it is common that Kubecost must be run in a separate project from the project containing the billing data Big Query dataset, which is needed for Cloud Integration. Configuring Kubecost in this scenario is still possible, but some of the values in the above script will need to be changed. First, you will need the project id of the projects where Kubecost is installed, and the Big Query dataset is located. Additionally, you will need a GCP user with the permissions iam.serviceAccounts.setIamPolicy
for the Kubecost project and the ability to manage the roles listed above for the Big Query Project. With these, fill in the following script to set the relevant variables:
Once these values have been set, this script can be run and will create the service account needed for this configuration.
Now that your service account is created follow the normal configuration instructions.
There are cases where labels applied at the account label do not show up in the date-partitioned data. If account level labels are not showing up, you can switch to querying them unpartitioned by setting an extraEnv in Kubecost: GCP_ACCOUNT_LABELS_NOT_PARTITIONED: true
. See here.
InvalidQuery
400 error for GCP integrationIn cases where Kubecost does not detect a connection following GCP integration, revisit Step 1 and ensure you have enabled detailed usage cost, not standard usage cost. Kubecost uses detailed billing cost to display your OOC spend, and if it was not configured correctly during installation, you may receive errors about your integration.
Kubecost is capable of aggregating the costs of EC2 compute resources over a given timeframe with a specified duration step size. To achieve this, Kubecost uses Athena queries to gather usage data points with differing price models. The result of this process is a list of resources with their cost by timeframe.
The reconciliation process makes two queries to Athena, one to gather resources that are paid for with either the on-demand model or a savings plan and one query for resources on the reservation price model. The first query includes resources given at a blended rate, which could be on-demand usage or resources that have exceeded the limits of a savings plan. It will also include resources that are part of a savings plan which will have a savings plan effective cost. The second query only includes reserved resources and the cost which reflects the rate they were reserved at.
The queries make use of the following columns from Athena:
line_item_usage_start_date
The beginning timestamp of the line item usage. Used to filter resource usage within a date range and to aggregate on usage window.
line_item_usage_end_date
The ending timestamp of the line item usage. Used to filter resource usage within a date range and to aggregate on usage window.
line_item_resource_id
An ID, also called the provider ID, is given to line items that are instantiated resources.
line_item_line_item_type
The type of a line item, used to determine if the resource usage is covered by a savings plan and has a discounted price.
line_item_usage_type
What is being used in a line item, for the purposes of a compute resource this, is the type of VM and where it is running
line_item_product_code
The service that a line item is from. Used to filter out items that are not from EC2.
reservation_reservation_a_r_n
Amazon Resource Name for reservation of line item, the presence of this value is used to identify a resource as being part of a reservation plan.
line_item_unblended_cost
The undiscounted cost of a resource.
savings_plan_savings_plan_effective_cost
The cost of a resource discounted by a savings plan
reservation_effective_cost
The cost of a resource discounted by a reservation
This query is grouped by six columns:
line_item_usage_start_date
line_item_usage_end_date
line_item_resource_id
line_item_line_item_type
line_item_usage_type
line_item_product_code
The columns line_item_unblended_cost
and savings_plan_savings_plan_effective_cost
are summed on this grouping. Finally, the query filters out rows that are not within a given date range, have a missing line_item_resource_id
, and have a line_item_product_code
not equal to "AmazonEC2". The grouping has three important aspects, the timeframe of the line items, the resource as defined by the resource id, and the usage type, which is later used to determine the proper cost of the resources as it was used. This means that line items are grouped according to the resource, the time frame of the usage, and the rate at which the usage was charged.
The reservation query is grouped on five columns:
line_item_usage_start_date
line_item_usage_end_date
reservation_reservation_a_r_n
line_item_resource_id
line_item_product_code
The query is summed on the reservation_effective_cost
and filtered by the date window, for missing reservation_reservation_a_r_n
values and also removes line items with line_item_product_code
not equal to "AmazonEC2". This grouping is on resource id by timeframe removing all non-reservation line items.
The on-demand query is categorized into different resource types: compute, network, storage, and others. The network is identified by the presence of the "byte" in the line_item_usage_type
. Compute and storage are identified by the presence of "i-" and "vol-" prefixes in line_item_resource_id
respectively. Non compute values are removed from the results. Out of the two costs aggregated by this query the correct one to use is determined by the line_item_line_item_type
, if it has a value of "SavingsPlanCoveredUsage", then the savings_plan_savings_plan_effective_cost
is used as the cost, and if not then the line_item_unblended_cost
is used.
In the reservation query, all of the results are of the compute category and there is only the reservation_effective_cost
to use as a cost.
These results are then merged into one set, with the provider id used to associate the cost with other information about the resource.
There are several different ways to look at your node cost data. The default for the cost explorer is Unblended" but it makes the most sense from an allocation perspective to use the amortized rates. Be sure Amortized costs is selected when looking at cost data. Here's an example of how they can vary dramatically on our test cluster.
The t2-mediums here are covered by a savings plan. Unblended, the cost is only $0.06/day for two.
When Amortized costs is selected, the price jumps to $1.50/day
This should closely match our data on the Assets page, for days where we have adjustments come in from the pricing CUR.
Connecting your Azure account to Kubecost allows you to view Kubernetes metrics side-by-side with out-of-cluster (OOC) costs (e.g. Azure Database Services). Additionally, it allows Kubecost to reconcile measured Kubernetes spend with your actual Azure bill. This gives teams running Kubernetes a complete and accurate picture of costs. For more information, read Cloud Billing Integrations and this blog post.
To configure Kubecost's Azure Cloud Integration, you will need to set up daily exports of cost reports to Azure storage. Kubecost will then access your cost reports through the Azure Storage API to display your OOC cost data alongside your in-cluster costs.
A GitHub repository with sample files used in below instructions can be found here.
Follow Azure's Create and Manage Exported Data tutorial to export cost reports. For Metric, make sure you select Amortized cost (Usage and Purchases). For Export type, make sure you select Daily export of month-to-date costs. Do not select File Partitioning. Also, take note of the Account name and Container specified when choosing where to export the data to. Note that a successful cost export will require Microsoft.CostManagementExports
to be registered in your subscription.
Alternatively, you can follow this Kubecost guide.
It will take a few hours to generate the first report, after which Kubecost can use the Azure Storage API to pull that data.
Once the cost export has successfully executed, verify that a non-empty CSV file has been created at this path: <STORAGE_ACCOUNT>/<CONTAINER_NAME>/<OPTIONAL_CONTAINER_PATH>/<COST_EXPORT_NAME>/<DATE_RANGE>/<CSV_FILE>
.
If you have sensitive data in an existing Azure Storage account, it is recommended to create a separate Azure Storage account to store your cost data export.
For more granular billing data it is possible to scope Azure cost exports to resource groups, management groups, departments, or enrollments. AKS clusters will create their own resource groups which can be used. This functionality can then be combined with Kubecost multi-cloud to ingest multiple scoped billing exports.
Obtain the following values from Azure to provide to Kubecost. These values can be located in the Azure Portal by selecting Storage Accounts, then selecting your specific Storage account for details.
azureSubscriptionID
is the "Subscription ID" belonging to the Storage account which stores your exported Azure cost report data.
azureStorageAccount
is the name of the Storage account where the exported Azure cost report data is being stored.
azureStorageAccessKey
can be found by selecting Access keys in your Storage account left navigation under "Security + networking". Using either of the two keys will work.
azureStorageContainer
is the name that you chose for the exported cost report when you set it up. This is the name of the container where the CSV cost reports are saved in your Storage account.
azureContainerPath
is an optional value which should be used if there is more than one billing report that is exported to the configured container. The path provided should have only one billing export because Kubecost will retrieve the most recent billing report for a given month found within the path.
azureCloud
is an optional value which denotes the cloud where the storage account exist, possible values are public
and gov
. The default is public
.
Next, create a JSON file which must be named cloud-integration.json with the following format:
Additional details about the cloud-integration.json
file can be found in our multi-cloud integration doc.
Next, create the Secret:
Next, ensure the following are set in your Helm values:
Next, upgrade Kubecost via Helm:
You can verify a successful configuration by checking the following in the Kubecost UI:
The Assets dashboard will be broken down by Kubernetes assets.
The Assets dashboard will no longer show a banner that says "External cloud cost not configured".
The Diagnostics page (via Settings > View Full Diagnostics) view will show a green checkmark under Cloud Integrations.
If there are no in-cluster costs for a particular day, then there will not be out-of-cluster costs either
Kubecost utilizes Azure tagging to allocate the costs of Azure resources outside of the Kubernetes cluster to specific Kubernetes concepts, such as namespaces, pods, etc. These costs are then shown in a unified dashboard within the Kubecost interface.
To allocate external Azure resources to a Kubernetes concept, use the following tag naming scheme:
In the kubernetes_label_NAME
tag key, the NAME portion should appear exactly as the tag appears inside of Kubernetes. For example, for the tag app.kubernetes.io/name
, this tag key would appear as kubernetes_label_app.kubernetes.io/name.
To use an alternative or existing Azure tag schema, you may supply these in your values.yaml under the kubecostProductConfigs.labelMappingConfigs.<aggregation>_external_label
. Also be sure to set kubecostProductConfigs.labelMappingConfigs.enabled = true
For more details on what Azure resources support tagging, along with what resource type tags are available in cost reports, please review the official Microsoft documentation here.
To troubleshoot a configuration that is not yet working:
$ kubectl get secrets -n kubecost
to verify you've properly configured cloud-integration.json
.
$ helm get values kubecost
to verify you've properly set .Values.kubecostProductConfigs.cloudIntegrationSecret
Verify that a non-empty CSV file has been created at this path in your Azure Portal Storage Account: <STORAGE_ACCOUNT>/<CONTAINER_NAME>/<OPTIONAL_CONTAINER_PATH>/<COST_EXPORT_NAME>/<DATE_RANGE>/<CSV_FILE>
. Ensure new CSVs are being generated every day.
When opening a cost report CSV, ensure that there are rows in the file that do not have a MeterCategory of “Virtual Machines” or “Storage” as these items are ignored because they are in cluster costs. Additionally, make sure that there are items with a UsageDateTime that matches the date you are interested in.
When reviewing logs:
The following error is reflective of Kubecost's previous Azure Cloud Integration method and can be safely disregarded.
ERR Error, Failed to locate azure storage config file: /var/azure-storage-config/azure-storage-config.json
By default, Kubecost pulls on-demand asset prices from the public AWS pricing API. For more accurate pricing, this integration will allow Kubecost to reconcile your current measured Kubernetes spend with your actual AWS bill. This integration also properly accounts for Enterprise Discount Programs, Reserved Instance usage, Savings Plans, Spot usage, and more.
You will need permissions to create the Cost and Usage Report (CUR), and add IAM credentials for Athena and S3. Optional permission is the ability to add and execute CloudFormation templates. Kubecost does not require root access in the AWS account.
This guide contains multiple possible methods for connecting Kubecost to AWS billing, based on user environment and preference. Because of this, there may not be a straightforward approach for new users. To address this, a streamlined guide containing best practices can be found here for IRSA environments. This quick start guide has some assumptions to carefully consider, and may not be applicable for all users. See prerequisites in the linked article.
Integrating your AWS account with Kubecost may be a complicated process if you aren’t deeply familiar with the AWS platform and how it interacts with Kubecost. This section provides an overview of some of the key terminology and AWS services that are involved in the process of integration.
Cost and Usage Report: AWS report which tracks cloud spending and writes to an Amazon Simple Storage Service (Amazon S3) bucket for ingestion and long term historical data. The CUR is originally formatted as a CSV, but when integrated with Athena, is converted to Parquet format.
Amazon Athena: Analytics service which queries the CUR S3 bucket for your AWS cloud spending, then outputs data to a separate S3 bucket. Kubecost uses Athena to query for the bill data to perform reconciliation. Athena is technically optional for AWS cloud integration, but as a result, Kubecost will only provide unreconciled costs (on-demand public rates).
S3 bucket: Cloud object storage tool which both CURs and Athena output cost data to. Kubecost needs access to these buckets in order to read that data.
For the below guide, a GitHub repository with sample files can be found here.
Follow these steps to set up a Legacy CUR using the settings below.
Select the Legacy CUR export type.
For time granularity, select Daily.
Under 'Additional content', select the Enable resource IDs checkbox.
Under 'Report data integration' select the Amazon Athena checkbox.
For CUR data written to an S3 bucket only accessed by Kubecost, it is safe to expire or delete the objects after seven days of retention.
Remember the name of the bucket you create for CUR data. This will be used in Step 2.
Familiarize yourself with how column name restrictions differ between CURs and Athena tables. AWS may change your CUR name when you upload your CUR to your Athena table in Step 2, documented in AWS' Running Amazon Athena queries. As best practice, use all lowercase letters and only use _
as a special character.
AWS may take up to 24 hours to publish data. Wait until this is complete before continuing to the next step.
If you believe you have the correct permissions, but cannot access the Billing and Cost Management page, have the owner of your organization's root account follow these instructions.
As part of the CUR creation process, Amazon also creates a CloudFormation template that is used to create the Athena integration. It is created in the CUR S3 bucket, listed in the Objects tab in the path s3-path-prefix/cur-name
and typically has the filename crawler-cfn.yml
. This .yml is your necessary CloudFormation template. You will need it in order to complete the CUR Athena integration. For more information, see the AWS doc Setting up Athena using AWS CloudFormation templates.
Your S3 path prefix can be found by going to your AWS Cost and Usage Reports dashboard and selecting your newly-created CUR. In the 'Report details' tab, you will find the S3 path prefix.
Once Athena is set up with the CUR, you will need to create a new S3 bucket for Athena query results.
Navigate to the S3 Management Console.
Select Create bucket. The Create Bucket page opens.
Use the same region used for the CUR bucket and pick a name that follows the format aws-athena-query-results-.
Select Create bucket at the bottom of the page.
Navigate to the Amazon Athena dashboard.
Select Settings, then select Manage. The Manage settings window opens.
Set Location of query result to the S3 bucket you just created, which will look like s3://aws-athena-query-results..., then select Save.
For Athena query results written to an S3 bucket only accessed by Kubecost, it is safe to expire or delete the objects after 1 day of retention.
Kubecost offers a set of CloudFormation templates to help set your IAM roles up.
If you’re new to provisioning IAM roles, we suggest downloading our templates and using the CloudFormation wizard to set these up. You can learn how to do this in AWS' Creating a stack on the AWS CloudFormation console doc. Open the step below which represents your CUR and management account arrangement, download the .yaml file listed, and upload them as the stack template in the 'Creating a stack' > 'Selecting a stack template' step.
If you are using the alternative multi-cloud integration method, steps 4 and 5 are not required.
Now that the policies have been created, attach those policies to Kubecost. We support the following methods:
These values can either be set from the Kubecost UI or via .Values.kubecostProductConfigs
in the Helm chart. Values for all fields must be provided.
To add values in the Kubecost UI, select Settings from the left navigation, then scroll to Cloud Cost Settings. Select Update next to External Cloud Cost Configuration (AWS). The Billing Data Export Configuration window opens. Fill in all the below fields:
When you are done, select Update to confirm.
If you set any kubecostProductConfigs
from the Helm chart, all changes via the front end will be overridden on pod restart.
athenaProjectID
: The AWS AccountID where the Athena CUR is, likely your management account.
athenaBucketName
: An S3 bucket to store Athena query results that you’ve created that Kubecost has permission to access
The name of the bucket should match s3://aws-athena-query-results-*
, so the IAM roles defined above will automatically allow access to it
The bucket can have a Canned ACL of Private
or other permissions as you see fit.
athenaRegion
: The AWS region Athena is running in
athenaDatabase
: The name of the database created by the Athena setup
The athena database name is available as the value (physical id) of AWSCURDatabase
in the CloudFormation stack created above (in Step 2: Setting up Athena)
athenaTable
: the name of the table created by the Athena setup
The table name is typically the database name with the leading athenacurcfn_
removed (but is not available as a CloudFormation stack resource). Confirm the table name by visiting the Athena dashboard.
athenaWorkgroup
: The workgroup assigned to be used with Athena. If not specified, defaults to Primary
Make sure to use only underscore as a delimiter if needed for tables and views. Using a hyphen/dash will not work even though you might be able to create it. See the AWS docs for more info.
If you are using a multi-account setup, you will also need to set .Values.kubecostProductConfigs.masterPayerARN
to the Amazon Resource Number (ARN) of the role in the management account, e.g. arn:aws:iam::530337586275:role/KubecostRole
.
Once you've integrated with the CUR, you can visit Settings > View Full Diagnostics in the UI to determine if Kubecost has been successfully integrated with your CUR. If any problems are detected, you will see a yellow warning sign under the cloud provider permissions status header
You can check pod logs for authentication errors by running: kubectl get pods -n <namespace>
kubectl logs <kubecost-pod-name> -n <namespace> -c cost-model
If you do not see any authentication errors, log in to your AWS console and visit the Athena dashboard. You should be able to find the CUR. Ensure that the database with the CUR matches the athenaTable entered in Step 5. It likely has a prefix with athenacurcfn_
:
You can also check query history to see if any queries are failing:
Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources. You can search in the Athena "Recent queries" dashboard to find additional info about the error.
Resolution: This error is typically caused by the incorrect (Athena results) s3 bucket being specified in the CloudFormation template of Step 3 from above. To resolve the issue, ensure the bucket used for storing the AWS CUR report (Step 1) is specified in the S3ReadAccessToAwsBillingData
SID of the IAM policy (default: kubecost-athena-access) attached to the user or role used by Kubecost (Default: KubecostUser / KubecostRole). See the following example.
This error can also occur when the management account cross-account permissions are incorrect, however, the solution may differ.
Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources.
Resolution: Please verify that the prefix s3://
was used when setting the athenaBucketName
Helm value or when configuring the bucket name in the Kubecost UI.
Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources.
Resolution: While rare, this issue was caused by an Athena instance that failed to provision properly on AWS. The solution was to delete the Athena DB and deploy a new one. To verify this is needed, find the failed query ID in the Athena "Recent queries" dashboard and attempt to manually run the query.
Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources.
Resolution: Previously, if you ran a query without specifying a value for query result location, and the query result location setting was not overridden by a workgroup, Athena created a default location for you. Now, before you can run an Athena query in a region in which your account hasn't used Athena previously, you must specify a query result location, or use a workgroup that overrides the query result location setting. While Athena no longer creates a default query results location for you, previously created default aws-athena-query-results-MyAcctID-MyRegion
locations remain valid and you can continue to use them. The bucket should be in the format of: aws-athena-query-results-MyAcctID-MyRegion
It may also be required to remove and reinstall Kubecost. If doing this please remeber to backup ETL files prior or contact support for additional assistance. See also this AWS doc on specifying a query result location.
Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources or in the Kubecost cost-model
container logs.
Resolution: Verify in AWS' Cost and Usage Reports dashboard that the Resource IDs are enabled as "Report content" for the CUR created in Step 1. If the Resource IDs are not enabled, you will need to re-create the report (this will require redoing Steps 1 and 2 from this doc).
Symptom: A similar error to this will be shown on the Diagnostics page under Pricing Sources or in the Kubecost cost-model
container logs.
Resolution: Verify that s3://
was included in the bucket name when setting the .Values.kubecostProductConfigs.athenaBucketName
Helm value.
AWS services used here are:
Kubecost's cost-model
requires roughly 2 CPU and 10 GB of RAM per 50,000 pods monitored. The backing Prometheus database requires roughly 2 CPU and 25 GB per million metrics ingested per minute. You can pick the EC2 instances necessary to run Kubecost accordingly.
Kubecost can write its cache to disk. Roughly 32 GB per 100,000 pods monitored is sufficient. (Optional: our cache can exist in memory)
Cloudformation (Optional: manual IAM configuration or via Terraform is fine)
EKS (Optional: all K8s flavors are supported)
In order to create a Google service account for use with Thanos, navigate to the Google Cloud Platform home page and select IAM & Admin > Service Accounts.
From here, select the option Create Service Account.
Provide a service account name, ID, and description, then select Create and Continue.
You should now be at the Service account permissions (optional) page. Select the first Role dropdown and select Storage Object Creator. Select Add Another Role, then select Storage Object Viewer from the second dropdown. Select Continue.
You should now be prompted to allow specific accounts access to this service account. This should be based on specific internal needs and is not a requirement. You can leave this empty and select Done.
Once back to the Service accounts page, select the Actions icon > Manage keys. Then, select the Add Key dropdown and select Create new key. A Create private key window opens.
Select JSON as the Key type and select Create. This will download a JSON service account key entry for use with the Thanos object-store.yaml
mentioned in the initial setup step.
Certain features of Kubecost, including Savings Insights like Orphaned Resources and Reserved Instances, require access to the cluster's GCP account. This is usually indicated by a 403 error from Google APIs which is due to 'insufficient authentication scopes'. Viewing this error in the Kubecost UI will display the cause of the error as "ACCESS_TOKEN_SCOPE_INSUFFICIENT"
.
To obtain access to these features, follow this tutorial which will show you how to configure your Google IAM Service Account and Workload Identity for your application.
Go to your GCP Console and select APIs & Services > Credentials from the left navigation. Select + Create Credentials > API Key.
On the Credentials page, select the icon in the Actions column for your newly-created API key, then select Edit API key. The Edit API key page opens.
Under ‘API restrictions’, select Restrict key, then from the dropdown, select only Cloud Billing API. Select OK to confirm. Then select Save at the bottom of the page.
From here, consult Google Cloud's guide to perform the following steps:
Enable Workload Identity on an existing GCP cluster, or spin up a new cluster which will have Workload Identity enabled by default
Migrate any existing workloads to Workload Identity
Configure your applications to use Workload Identity
Create both a Kubernetes service account (KSA) and an IAM service account (GSA).
Annotate the KSA with the email of the GSA.
Update your pod spec to use the annotated KSA, and ensure all nodes on that workload use Workload Identity.
You can stop once you have modified your pod spec (before 'Verify the Workload Identity Setup'). You should now have a GCP cluster with Workload Identity enabled, and both a KSA and a GSA, which are connected via the role roles/iam.workloadIdentityUser
.
In the GCP Console, select IAM & Admin > IAM. Find your newly-created GSA and select the Edit Principal pencil icon. You will need to provide the following roles to this service account:
BigQuery Data Viewer
BigQuery Job User
BigQuery User
Compute Viewer
Service Account Token Creator
Select Save.
The following roles need to be added to your IAM service account:
roles/bigquery.user
roles/compute.viewer
roles/bigquery.dataViewer
roles/bigquery.jobUser
roles/iam.serviceAccountTokenCreator
Use this command to add each role individually to the GSA:
From here, restart the pod(s) to confirm your changes. You should now have access to all expected Kubecost functionality through your service account with Identity Workload.
Kubecost Free can now be installed on an unlimited number of individual clusters. Larger teams will benefit from using Kubecost Enterprise to better manage many clusters. See for more details.
In an Enterprise multi-cluster setup, the UI is accessed through a designated primary cluster. All other clusters in the environment send metrics to a central object-store with a lightweight agent (aka secondary clusters). The primary cluster is designated by setting the Helm flag .Values.federatedETL.primaryCluster=true
, which instructs this cluster to read from the combined
folder that was processed by the federator. This cluster will consume additional resources to run the Kubecost UI and backend.
As of Kubecost 1.108, agent health is monitored by a that collects information from the local cluster and sends it to an object-store. This data is then processed by the Primary cluster and accessed via the UI and API.
Because the UI is only accessible through the primary cluster, Helm flags related to UI display are not applied to secondary clusters.
This feature is only supported for Kubecost Enterprise.
There are two primary methods to aggregate all cluster information back to a single Kubecost UI:
Both methods allow for greater compute efficiency by running the most resource-intensive workloads on a single primary cluster.
For environments that already have a Prometheus instance, ETL Federation may be preferred because only a single Kubecost pod is required.
The below diagrams highlight the two architectures:
Kubecost ETL Federation (Preferred)
Kubecost Thanos Federation
This feature is only officially supported on Kubecost Enterprise plans.
Thanos is a tool to aggregate Prometheus metrics to a central object storage (S3 compatible) bucket. Thanos is implemented as a sidecar on the Prometheus pod on all clusters. Thanos Federation is one of two primary methods to aggregate all cluster information back to a single view as described in our article.
The preferred method for multi-cluster is . The configuration guide below is for Kubecost Thanos Federation, which may not scale as well as ETL Federation in large environments.
This guide will cover how to enable Thanos on your primary cluster, and on any additional secondary clusters.
Follow steps to enable all required Thanos components on a Kubecost primary cluster, including the Prometheus sidecar.
For each additional cluster, only the Thanos sidecar is needed.
Consider the following Thanos recommendations for secondaries:
Ensure you provide a unique identifier for prometheus.server.global.external_labels.cluster_id
to have additional clusters be visible in the Kubecost product, e.g. cluster-two
.
cluster_id
can be replaced with another label (e.g. cluster
) by modifying .Values.kubecostModel.promClusterIDLabel.
Federated ETL Architecture is only officially supported on Kubecost Enterprise plans.
This doc provides recommendations to improve the stability and recoverability of your Kubecost data when deploying in a Federated ETL architecture.
Kubecost can rebuild its extract, transform, load (ETL) data using Prometheus metrics from each cluster. It is strongly recommended to retain local cluster Prometheus metrics that meet an organization's disaster recovery requirements.
For long term storage of Prometheus metrics, we recommend setting up a Thanos sidecar container to push Prometheus metrics to a cloud storage bucket.
Use your cloud service provider's bucket versioning feature to take frequent snapshots of the bucket holding your Kubecost data (ETL files and Prometheus metrics).
Aggregator is a new backend for Kubecost. It is used in a configuration without Thanos, replacing the component. Aggregator serves a critical subset of Kubecost APIs, but will eventually be the default model for Kubecost and serve all APIs. Currently, Aggregator supports all major monitoring and savings APIs, and also budgets and reporting.
Existing documentation for Kubecost APIs will use endpoints for non-Aggregator environments unless otherwise specified, but will still be compatible after configuring Aggregator.
Aggregator is designed to accommodate queries of large-scale datasets by improving API load times and reducing UI errors. It is not designed to introduce new functionality; it is meant to improve functionality at scale.
Aggregator is currently free for all Enterprise users to configure, and is always able to be rolled back.
Aggregator can only be configured in a Federated ETL environment
Must be using v1.107.0 of Kubecost or newer
Your values.yaml file must have set kubecostDeployment.queryServiceReplicas
to its default value 0
.
You must have your context set to your primary cluster. Kubecost Aggregator cannot be deployed on secondary clusters.
Select from one of the two templates below and save the content as federated-store.yaml. This will be your configuration template required to set up Aggregator.
The name of the .yaml file used to create the secret must be named federated-store.yaml or Aggregator will not start.
Basic configuration:
Advanced configuration (for larger deployments):
There is no baseline for what is considered a larger deployment, which will be dependent on load times in your Kubecost environment.
Once you’ve configured your federated-store.yaml_, create a secret using the following command:
Finally, upgrade your existing Kubecost installation. This command will install Kubecost if it does not already exist:
Upgrading your existing Kubecost using your configured federated-store.yaml_ file above will reset all existing Helm values configured in your values.yaml. If you wish to preserve any of those changes, append your values.yaml by adding the contents of your federated-store.yaml file into it, then replacing federated-store.yaml
with values.yaml
in the upgrade command below:
When first enabled, the aggregator pod will ingest the last three years (if applicable) of ETL data from the federated-store. This may take several hours. Because the combined folder is ignored, the federator pod is not used here, but can still run if needed. You can run kubectl get pods
and ensure the aggregator
pod is running, but should still wait for all data to be ingested.
Federated ETL is only officially supported for Kubecost Enterprise plans.
Federated extract, transform, load (ETL) is one of two methods to aggregate all cluster information back to a single display described in our doc. Federated ETL gives teams the benefit of combining multiple Kubecost installations into one view without dependency on Thanos.
There are two primary advantages for using ETL Federation:
For environments that already have a Prometheus instance, Kubecost only requires a single pod per monitored cluster
Many solutions that aggregate Prometheus metrics (like Thanos), are often expensive to scale in large environments
This guide has specific detail on how ETL Configuration works and deployment options.
The federated ETL is composed of three types of clusters.
Federated Clusters: The clusters which are being federated (clusters whose data will be combined and viewable at the end of the federated ETL pipeline). These clusters upload their ETL files after they have built them to Federated Storage.
Federator Clusters: The cluster on which the Federator (see in Other components) is set to run within the core cost-analyzer container. This cluster combines the Federated Cluster data uploaded to federated storage into combined storage.
Primary Cluster: A cluster where you can see the total Federated data that was combined from your Federated Clusters. These clusters read from combined storage.
These cluster designations can overlap, in that some clusters may be several types at once. A cluster that is a Federated Cluster, Federator Cluster, and Primary Cluster will perform the following functions:
As a Federated Cluster, push local cluster cost data to be combined from its local ETL build pipeline.
As a Federator Cluster, run the Federator inside the cost-analyzer, which pulls this local cluster data from S3, combines them, then pushes them back to combined storage.
As a Primary Cluster, pull back this combined data from combined storage to serve it on Kubecost APIs and/or the Kubecost frontend.
The Storages referred to here are an S3 (or GCP/Azure equivalent) storage bucket which acts as remote storage for the Federated ETL Pipeline.
Federated Storage: A set of folders on paths <bucket>/federated/<cluster id>
which are essentially ETL backup data, holding a “copy” of Federated Cluster data. Federated Clusters push this data to Federated Storage to be combined by the Federator. Federated Clusters write this data, and the Federator reads this data.
Combined Storage: A folder on S3 on the path <bucket>/federated/combined
which holds one set of ETL data containing all the allocations/assets
in all the ETL data from Federated Storage. The Federator takes files from Federated Storage and combines them, adding a single set of combined ETL files to Combined Storage to be read by the Primary Cluster. The Federator writes this data, and the Primary Cluster reads this data.
The Federator: A component of the cost-model which is run on the Federator Cluster, which can be a Federated Cluster, a Primary Cluster, or neither. The Federator takes the ETL binaries from Federated Storage and merges them, adding them to Combined Storage.
Federated ETL: The pipeline containing the above components.
This diagram shows an example setup of the Federated ETL with:
Three pure Federated Clusters (not classified as any other cluster type): Cluster 1, Cluster 2, and Cluster 3
One Federator Cluster that is also a Federated Cluster: Cluster 4
One Primary Cluster that is also a Federated Cluster: Cluster 5
The result is 5 clusters federated together.
Ensure each federated cluster has a unique clusterName
and cluster_id
:
For any cluster in the pipeline (Federator, Federated, Primary, or any combination of the three), create a file federated-store.yaml with the same format used for Thanos/S3 backup.
Add a secret using that file: kubectl create secret generic <secret_name> -n kubecost --from-file=federated-store.yaml
. Then set .Values.kubecostModel.federatedStorageConfigSecret
to the kubernetes secret name.
For all clusters you want to federate together (i.e. see their data on the Primary Cluster), set .Values.federatedETL.federatedCluster
to true
. This cluster is now a Federated Cluster, and can also be a Federator or Primary Cluster.
For the cluster “hosting” the Federator, set .Values.federatedETL.federator.enabled
to true
. This cluster is now a Federator Cluster, and can also be a Federated or Primary Cluster.
Optional: If you have any Federated Clusters pushing to a store that you do not want a Federator Cluster to federate, add the cluster id under the Federator config section .Values.federatedETL.federator.clusters
. If this parameter is empty or not set, the Federator will take all ETL files in the /federated
directory and federate them automatically.
Multiple Federators federating from the same source will not break, but it’s not recommended.
In Kubecost, the Primary Cluster
serves the UI and API endpoints as well as reconciling cloud billing (cloud-integration).
For the cluster that will be the Primary Cluster, set .Values.federatedETL.primaryCluster
to true
. This cluster is now a Primary Cluster, and can also be a Federator or Federated Cluster.
Cloud-integration requires .Values.federatedETL.federator.primaryClusterID
set to the same value used for .Values.kubecostProductConfigs.clusterName
Important: If the Primary Cluster is also to be federated, please wait 2-3 hours for data to populate Federated Storage before setting a Federated Cluster to primary (i.e. set .Values.federatedETL.federatedCluster
to true
, then wait to set .Values.federatedETL.primaryCluster
to true
). This allows for maximum certainty of data consistency.
If you do not set this cluster to be federated as well as primary, you will not see local data for this cluster.
The Primary Cluster’s local ETL will be overwritten with combined federated data.
This can be undone by unsetting it as a Primary Cluster and rebuilding ETL.
Setting a Primary Cluster may result in a loss of the cluster’s local ETL data, so it is recommended to back up any filestore data that one would want to save to S3 before designating the cluster as primary.
Alternatively, a fresh Kubecost install can be used as a consumer of combined federated data by setting it as the Primary but not a Federated Cluster.
The Federated ETL should begin functioning. On any ETL action on a Federated Cluster (Load/Put into local ETL store) the Federated Clusters will add data to Federated Storage. The Federator will run 5 minutes after the Federator Cluster startup, and then every 30 minutes after that. The data is merged into the Combined Storage, where it can be read by the Primary.
To verify Federated Clusters are uploading their data correctly, check the container logs on a Federated Cluster. It should log federated uploads when ETL build steps run. The S3 bucket can also be checked to see if data is being written to the /federated/<cluster_id>
path.
To verify the Federator is functioning, check the container logs on the Federator Cluster. The S3 bucket can also be checked to verify that data is being written to /federated/combined
.
To verify the entire pipeline is working, either query Allocations/Assets
or view the respective views on the frontend. Multi-cluster data should appear after:
The Federator has run at least once.
There was data in the Federated Storage for the Federator to have combined.
If you are using an internal certificate authority (CA), follow this tutorial instead of the above Setup section.
Begin by creating a ConfigMap with the certificate provided by the CA on every agent, including the Federator and any federated clusters, and name the file kubecost-federator-certs.yaml.
Now run the following command, making sure you specify the location for the ConfigMap you created:
kubectl create cm kubecost-federator-certs --from-file=/path/to/kubecost-federator-certs.yaml
Mount the certification on the Federator and any federated clusters by passing these Helm flags to your values.yaml/manifest:
Create a file federated-store.yaml, which will go on all clusters:
Now run the following command (omit kubectl create namespace kubecost
if your kubecost
namespace already exists, or this command will fail):
Kubernetes Concept | Azure Tag Key | Azure Tag Value |
---|---|---|
Field | Description |
---|---|
Follow the same verification steps available .
Sample configurations for each cloud provider can be found .
You can configure the Thanos sidecar following or . Additionally, ensure you configure the following:
so the Thanos sidecar has permissions to read/write to the cloud storage bucket
so Kubecost is able to distinguish which metric belongs to which cluster in the Thanos bucket.
Configure Prometheus or to get notified when you are losing metrics or when metrics deviate beyond a known standard.
Next, you will need to create an additional cloud-integration
secret. Follow this tutorial on to generate your cloud-integration.json file, then run the following command:
Alternatively, the most common configurations can be found in our repo.
When using ETL Federation, there are several methods to recover Kubecost data in the event of data loss. See our doc for more details regarding these methods.
In the event of missing or inaccurate data, you may need to rebuild your ETL pipelines. This is a documented procedure. See the doc for information and troubleshooting steps.
Cluster
kubernetes_cluster
cluster-name
Namespace
kubernetes_namespace
namespace-name
Deployment
kubernetes_deployment
deployment-name
Label
kubernetes_label_NAME*
label-value
DaemonSet
kubernetes_daemonset
daemonset-name
Pod
kubernetes_pod
pod-name
Container
kubernetes_container
container-name
Athena Region
The AWS region Athena is running in
Athena Database
The name of the database created by the Athena setup
Athena Tablename
The name of the table created by the Athena setup
Athena Result Bucket
An S3 bucket to store Athena query results that you’ve created that Kubecost has permission to access
AWS account ID
The AWS account ID where the Athena CUR is, likely your management account.
To use Azure Storage as Thanos object store, you need to precreate a storage account from Azure portal or using Azure CLI. Follow the instructions from the Azure Storage Documentation.
Now create a .YAML file named object-store.yaml
with the following format:
Start by creating a new Google Cloud Storage bucket. The following example uses a bucket named thanos-bucket
. Next, download a service account JSON file from Google's service account manager (steps).
Now create a YAML file named object-store.yaml
in the following format, using your bucket name and service account details:
Note: Because this is a YAML file, it requires this specific indention.
Warning: Do not apply a retention policy to your Thanos bucket, as it will prevent Thanos compaction from completing.
Kubecost v1.67.0+ uses Thanos 0.15.0. If you're upgrading to Kubecost v1.67.0+ from an older version and using Thanos, with AWS S3 as your backing storage for Thanos, you'll need to make a small change to your Thanos Secret in order to bump the Thanos version to 0.15.0 before you upgrade Kubecost.
Thanos 0.15.0 has over 10x performance improvements, so this is recommended.
Your values-thanos.yaml needs to be updated to the new defaults here. The PR bumps the image version, adds the query-frontend component, and increases concurrency.
This is simplified if you're using our default values-thanos.yaml, which has the new configs already.
For the Thanos Secret you're using, the encrypt-sse
line needs to be removed. Everything else should stay the same.
For example, view this sample config:
The easiest way to do this is to delete the existing secret and upload a new one:
kubectl delete secret -n kubecost kubecost-thanos
Update your secret .YAML file as above, and save it as object-store.yaml.
kubectl create secret generic kubecost-thanos -n kubecost --from-file=./object-store.yaml
Once this is done, you're ready to upgrade!
Kubecost uses a shared storage bucket to store metrics from clusters, known as durable storage, in order to provide a single-pane-of-glass for viewing cost across many clusters. Multi-cluster is an enterprise feature of Kubecost.
There are multiple methods to provide Kubecost access to an S3 bucket. This guide has two examples:
Using a Kubernetes secret
Attaching an AWS Identity and Access Management (IAM) role to the service account used by Prometheus
Both methods require an S3 bucket. Our example bucket is named kc-thanos-store
.
This is a simple S3 bucket with all public access blocked. No other bucket configuration changes should be required.
Once created, add an IAM policy to access this bucket. See our AWS Thanos IAM Policy doc for instructions.
To use the Kubernetes secret method for allowing access, create a YAML file named object-store.yaml
with contents similar to the following example. See region to endpoint mappings here.
Instead of using a secret key in a file, many will want to use this method.
Attach the policy to the Thanos pods service accounts. Your object-store.yaml
should follow the format below when using this option, which does not contain the secret_key and access_key fields.
Then, follow this AWS guide to enable attaching IAM roles to pods.
You can define the IAM role to associate with a service account in your cluster by creating a service account in the same namespace as Kubecost and adding an annotation to it of the form eks.amazonaws.com/role-arn: arn:aws:iam::<AWS_ACCOUNT_ID>:role/<IAM_ROLE_NAME>
as described here.
Once that annotation has been created, configure the following:
You can encrypt the S3 bucket where Kubecost data is stored in AWS via S3 and KMS. However, because Thanos can store potentially millions of objects, it is suggested that you use bucket-level encryption instead of object-level encryption. More details available in these external docs:
Visit the Configuring Thanos doc for troubleshooting help.
In order to create an AWS IAM policy for use with Thanos:
Navigate to the AWS console and select IAM.
Select Policies in the Navigation menu, then select Create Policy.
Add the following JSON in the policy editor:
Make sure to replace <your-bucket-name>
with the name of your newly-created S3 bucket.
4. Select Review policy and name this policy, e.g. kc-thanos-store-policy
.
Navigate to Users in IAM control panel, then select Add user.
Provide a username (e.g. kubecost-thanos-service-account
) and select Programmatic access.
Select Attach existing policies directly, search for the policy name provided in Step 4, then create the user.
Capture your Access Key ID and secret in the view below:
If you don’t want to use a service account, IAM credentials retrieved from an instance profile are also supported. You must get both access key and secret key from the same method (i.e. both from service or instance profile). More info on retrieving credentials here.
This feature is only officially supported on Kubecost Enterprise plans.
Kubecost leverages Thanos and durable storage for three different purposes:
Centralize metric data for a global multi-cluster view into Kubernetes costs via a Prometheus sidecar
Allow for unlimited data retention
Backup Kubecost ETL data
To enable Thanos, follow these steps:
This step creates the object-store.yaml file that contains your durable storage target (e.g. GCS, S3, etc.) configuration and access credentials. The details of this file are documented thoroughly in Thanos documentation.
We have guides for using cloud-native storage for the largest cloud providers. Other providers can be similarly configured.
Use the appropriate guide for your cloud provider:
Create a secret with the .yaml file generated in the previous step:
Each cluster needs to be labelled with a unique Cluster ID, which is done in two places.
values-clusterName.yaml
The Thanos subchart includes thanos-bucket
, thanos-query
, thanos-store
, thanos-compact
, and service discovery for thanos-sidecar
. These components are recommended when deploying Thanos on the primary cluster.
These values can be adjusted under the thanos
block in values-thanos.yaml. Available options are here: thanos/values.yaml
The thanos-store
container is configured to request 2.5GB memory, this may be reduced for smaller deployments. thanos-store
is only used on the primary Kubecost cluster.
To verify installation, check to see all Pods are in a READY state. View Pod logs for more detail and see common troubleshooting steps below.
Thanos sends data to the bucket every 2 hours. Once 2 hours have passed, logs should indicate if data has been sent successfully or not.
You can monitor the logs with:
Monitoring logs this way should return results like this:
As an aside, you can validate the Prometheus metrics are all configured with correct cluster names with:
To troubleshoot the IAM Role Attached to the serviceaccount, you can create a Pod using the same service account used by the thanos-sidecar (default is kubecost-prometheus-server
):
s3-pod.yaml
This should return a list of objects (or at least not give a permission error).
If a cluster is not successfully writing data to the bucket, review thanos-sidecar
logs with the following command:
Logs in the following format are evidence of a successful bucket write:
/stores
endpointIf thanos-query can't connect to both the sidecar and the store, you may want to directly specify the store gRPC service address instead of using DNS discovery (the default). You can quickly test if this is the issue by running:
kubectl edit deployment kubecost-thanos-query -n kubecost
and adding
--store=kubecost-thanos-store-grpc.kubecost:10901
to the container args. This will cause a query restart and you can visit /stores
again to see if the store has been added.
If it has, you'll want to use these addresses instead of DNS more permanently by setting .Values.thanos.query.stores in values-thanos.yaml.
A common error is as follows, which means you do not have the correct access to the supplied bucket:
Assuming pods are running, use port forwarding to connect to the thanos-query-http
endpoint:
Then navigate to http://localhost:8080 in your browser. This page should look very similar to the Prometheus console.
If you navigate to Stores using the top navigation bar, you should be able to see the status of both the thanos-store
and thanos-sidecar
which accompanied the Prometheus server:
Also note that the sidecar should identify with the unique cluster_id
provided in your values.yaml in the previous step. Default value is cluster-one
.
The default retention period for when data is moved into the object storage is currently 2h. This configuration is based on Thanos suggested values. By default, it will be 2 hours before data is written to the provided bucket.
Instead of waiting 2h to ensure that Thanos was configured correctly, the default log level for the Thanos workloads is debug
(it's very light logging even on debug). You can get logs for the thanos-sidecar
, which is part of the prometheus-server
Pod, and thanos-store
. The logs should give you a clear indication of whether or not there was a problem consuming the secret and what the issue is. For more on Thanos architecture, view this resource.
This document will describe why your Kubecost instance’s data can be useful to share with us, what content is in the data, and how to share it.
Kubecost product releases are tested and verified against a combination of generated/synthetic Kubernetes cluster data and examples of customer data that have been shared with us. Customers who share snapshots of their data with us help to ensure that product changes handle their specific use cases and scales. Because the Kubecost product for many customers is run as an on-prem service, with no data sharing back to us, we do not inherently have this data for many of our customers.
Sharing data with us requires an ETL backup executed by the customer in their own environment before the resulting data can be sent out. Kubecost's ETL is a computed cache built upon Prometheus metrics and cloud billing data, on which nearly all API requests made by the user and the Kubecost frontend currently rely. Therefore, the ETL data will contain metric data and identifying information for that metric (e.g. a container name, pod name, namespace, and cluster name) during a time window, but will not contain other information about containers, pods, clusters, cloud resources, etc. You can read more about these metric details in our doc.
The full methodology for creating the ETL backup can be found in our doc. Once these files have been backed up, the content will look as follows before compressing the data:
Once the data is downloaded to the local disk from either the automated or manual ETL backup methods, the data must be converted to a gzip file. A suggested method for downloading the ETL backup and compressing it quickly is to use . Check out the tar
syntax in that script if doing this manually without the script. When the compressed ETL backup is ready to share, please work with a Kubecost support engineer on sharing the file with us. Our most common approach is to use a Google Drive folder with access limited to you and the support engineer, but we recognize not all companies are open to this and will work with you to determine the most business-appropriate method.
If you are interested in reviewing the contents of the data, either before or after sending the ETL backup to us, you can find an example Golang implementation on how to read the .
Secondary clusters use a minimal Kubecost deployment to send their metrics to a central storage-bucket (aka durable storage) that is accessed by the primary cluster to provide a single-pane-of-glass view into all aggregated cluster costs globally. This aggregated cluster view is exclusive to Kubecost Enterprise.
Kubecost's UI will appear broken when set to a secondary cluster. It should only be used for troubleshooting.
This guide explains settings that can be tuned in order to run the minimum Kubecost components to run Kubecost more efficiently.
See the Additional resources section below for complete examples in our GitHub repo.
Disable product caching and reduce query concurrency with the following parameters:
Grafana is not needed on secondary clusters.
Kubecost and its accompanying Prometheus collect a reduced set of metrics that allow for lower resource/storage usage than a standard Prometheus deployment.
The following configuration options further reduce resource consumption when not using the Kubecost frontend:
Potentially reducing retention even further, metrics are sent to the storage-bucket every 2 hours.
You can tune prometheus.server.persistentVolume.size
depending on scale, or outright disable persistent storage.
Disable Thanos components. These are only used for troubleshooting on secondary clusters. See this guide for troubleshooting via kubectl logs.
Secondary clusters write to the global storage-bucket via the thanos-sidecar on the prometheus-server pod.
You can disable node-exporter and the service account if cluster/node rightsizing recommendations are not required.
node-export must be disabled if there is an existing DaemonSet. More info here.
For reference, this secondary-clusters.yaml
snippet is a list of the most common settings for efficient secondary clusters:
You can find complete installation guides and sample files on our repo.
Additional considerations for properly tuning resource consumption is here.
We do not recommend enabling ETL Backup in conjunction with Federated ETL.
Kubecost's extract, transform, load (ETL) data is a computed cache based on Prometheus's metrics, from which the user can perform all possible Kubecost queries. The ETL data is stored in a persistent volume mounted to the kubecost-cost-analyzer
pod.
There are a number of reasons why you may want to backup this ETL data:
To ensure a copy of your Kubecost data exists, so you can restore the data if needed
To reduce the amount of historical data stored in Prometheus/Thanos, and instead retain historical ETL data
Beginning in v1.100, this feature is enabled by default if you have Thanos enabled. To opt out, set .Values.kubecostModel.etlBucketConfigSecret="".
Kubecost provides cloud storage backups for ETL backing storage. Backups are not the typical approach of "halt all reads/writes and dump the database." Instead, the backup system is a transparent feature that will always ensure that local ETL data is backed up, and if local data is missing, it can be retrieved from backup storage. This feature protects users from accidental data loss by ensuring that previously backed-up data can be restored at runtime.
Durable backup storage functionality is supported with a Kubecost Enterprise plan.
When the ETL pipeline collects data, it stores daily and hourly (if configured) cost metrics on a configured storage. This defaults to a PV-based disk storage, but can be configured to use external durable storage on the following providers:
AWS S3
Azure Blob Storage
Google Cloud Storage
This configuration secret follows the same layout documented for Thanos here.
You will need to create a file named object-store.yaml using the chosen storage provider configuration (documented below), and run the following command to create the secret from this file:
The file must be named object-store.yaml.
If Kubecost was installed via Helm, ensure the following value is set.
If you are using an existing disk storage option for your ETL data, enabling the durable backup feature will retroactively back up all previously stored data*. This feature is also fully compatible with the existing S3 backup feature.
If you are using a memory store for your ETL data with a local disk backup (kubecostModel.etlFileStoreEnabled: false
), the backup feature will simply replace the local backup. In order to take advantage of the retroactive backup feature, you will need to update to file store (kubecostModel.etlFileStoreEnabled: true
). This option is now enabled by default in the Helm chart.
The simplest way to backup Kubecost's ETL is to copy the pod's ETL store to your local disk. You can then send that file to any other storage system of your choice. We provide a script to do that.
To restore the backup, untar the results of the ETL backup script into the ETL directory pod.
There is also a Bash script available to restore the backup in Kubecost's etl-backup repo.
Currently, this feature is still in development, but there is currently a status card available on the Diagnostics page that will eventually show the status of the backup system:
In some scenarios like when using Memory store, setting kubecostModel.etlHourlyStoreDurationHours
to a value of 48
hours or less will cause ETL backup files to become truncated. The current recommendation is to keep etlHourlyStoreDurationHours at its default of 49
hours.
This feature is currently in beta. It is enabled by default.
Multi-Cluster Diagnostics offers a single view into the health of all the clusters you currently monitor with Kubecost.
Health checks include, but are not limited to:
Whether Kubecost is correctly emitting metrics
Whether Kubecost is being scraped by Prometheus
Whether Prometheus has scraped the required metrics
Whether Kubecost's ETL files are healthy
Additional configuration options can found in the values.yaml under diagnostics:
.
The multi-cluster diagnostics feature is run as an independent deployment (i.e. deployment/kubecost-diagnostics
). Each diagnostics deployment monitors the health of Kubecost and sends that health data to the central object store at the /diagnostics
filepath.
The below diagram depicts these interactions. This diagram is specific to the requests required for diagnostics only. For additional diagrams, see our multi-cluster guide.
The diagnostics API can be accessed through /model/multi-cluster-diagnostics?window=2d
(or /model/mcd
for short)
The window
query parameter is required, which will return all diagnostics within the specified time window.
GET
http://<your-kubecost-address>/model/multi-cluster-diagnostics
The Multi-cluster Diagnostics API provides a single view into the health of all the clusters you currently monitor with Kubecost.
This feature is only supported on Kubecost Enterprise plans.
The query service replica (QSR) is a scale-out query service that reduces load on the cost-model pod. It allows for improved horizontal scaling by being able to handle queries for larger intervals, and multiple simultaneous queries.
The query service will forward /model/allocation
and /model/assets
requests to the Query Services StatefulSet.
The diagram below demonstrates the backing architecture of this query service and its functionality.
There are three options that can be used for the source ETL Files:
For environments that have Kubecost Federated ETL enabled, this store will be used, no additional configuration is required.
For single cluster environments, QSR can target the ETL backup store. To learn more about ETL backups, see the ETL Backup doc.
Alternatively, an object-store containing the ETL dataset to be queried can be configured using a secret kubecostDeployment.queryServiceConfigSecret
. The file name of the secret must be object-store.yaml
. Examples can be found in our Configuring Thanos doc.
QSR uses persistent volume storage to avoid excessive S3 transfers. Data is retrieved from S3 hourly as new ETL files are created and stored in these PVs. The databaseVolumeSize
should be larger than the size of the data in the S3 bucket.
When the pods start, data from the object-store is synced and this can take a significant time in large environments. During the sync, parts of the Kubecost UI will appear broken or have missing data. You can follow the pod logs to see when the sync is complete.
The default of 100Gi is enough storage for 1M pods and 90 days of retention. This can be adjusted:
Once the data store is configured, set kubecostDeployment.queryServiceReplicas
to a non-zero value and perform a Helm upgrade.
Once QSR has been enabled, the new pods will automatically handle all API requests to /model/allocation
and /model/assets
.
Amazon Elastic Kubernetes Services (Amazon EKS) is a managed container service to run and scale Kubernetes applications in the AWS cloud. In collaboration with Amazon EKS, Kubecost provides optimized bundle for Amazon EKS cluster cost visibility that enables customers to accurately track costs by namespace, cluster, pod or organizational concepts such as team or application. Customers can use their existing AWS support agreements to obtain support. Kubernetes platform administrators and finance leaders can use Kubecost to visualize a breakdown of their Amazon EKS cluster charges, allocate costs, and chargeback organizational units such as application teams.
In this article, you will learn more about how the Amazon EKS architecture interacts with Kubecost. You will also learn to deploy Kubecost on EKS using one of three different methods:
Deploy Kubecost on an Amazon EKS cluster using Amazon EKS add-on
Deploy Kubecost on an Amazon EKS cluster via Helm
Deploy Kubecost on an Amazon EKS Anywhere cluster using Helm
User experience diagram:
Amazon EKS cost monitoring with Kubecost architecture:
Subscribe to Kubecost on AWS Marketplace here.
You have access to an Amazon EKS cluster.
After subscribing to Kubecost on AWS Marketplace and following the on-screen instructions successfully, you are redirected to Amazon EKS console. To get started in the Amazon EKS console, go to your EKS clusters, and in the Add-ons tab, select Get more add-ons to find Kubecost EKS add-ons in the cluster setting of your existing EKS clusters. You can use the search bar to find "Kubecost - Amazon EKS cost monitoring" and follow the on-screen instructions to enable Kubecost add-on for your Amazon EKS cluster. You can learn more about direct deployment to Amazon EKS clusters from this AWS blog post.
On your workspace, run the following command to enable the Kubecost add-on for your Amazon EKS cluster:
You need to replace $YOUR_CLUSTER_NAME
and $AWS_REGION
accordingly with your actual Amazon EKS cluster name and AWS region.
To monitor the installation status, you can run the following command:
The Kubecost add-on should be available in a few minutes. Run the following command to enable port-forwarding to expose the Kubecost dashboard:
To disable Kubecost add-on, you can run the following command:
To get started, you can follow these steps to deploy Kubecost into your Amazon EKS cluster in a few minutes using Helm.
You have access to an Amazon EKS cluster.
If your cluster is version 1.23 or later, you must have the Amazon EBS CSI driver installed on your cluster. You can also follow these instructions to install Amazon EBS CSI driver:
Run the following command to create an IAM service account with the policies needed to use the Amazon EBS CSI Driver.
Remember to replace $CLUSTER_NAME
with your actual cluster name.
Install the Amazon EBS CSI add-on for EKS using the AmazonEKS_EBS_CSI_DriverRole by issuing the following command:
After completing these prerequisite steps, you're ready to begin EKS integration.
In your environment, run the following command from your terminal to install Kubecost on your existing Amazon EKS cluster:
To install Kubecost on Amazon EKS cluster on AWS Graviton2 (ARM-based processor), you can run following command:
On the Amazon EKS cluster with mixed processor architecture worker nodes (AMD64, ARM64), this parameter can be used to schedule Kubecost deployment on ARM-based worker nodes: --set nodeSelector."beta\\.kubernetes\\.io/arch"=arm64
Remember to replace $VERSION with the actual version number. You can find all available versions via the Amazon ECR public gallery here.
By default, the installation will include certain prerequisite software including Prometheus and kube-state-metrics. To customize your deployment, such as skipping these prerequisites if you already have them running in your cluster, you can configure any of the available values to modify storage, network configuration, and more.
Run the following command to enable port-forwarding to expose the Kubecost dashboard:
You can now access Kubecost's UI by visiting http://localhost:9090
in your local web browser. Here, you can monitor your Amazon EKS cluster cost and efficiency. Depending on your organization’s requirements and setup, you may have different options to expose Kubecost for internal access. There are a few examples that you can use for your references:
See Kubecost's Ingress Examples doc as a reference for using Nginx ingress controller with basic auth.
You can also consider using AWS LoadBalancer controller to expose Kubecost and use Amazon Cognito for authentication, authorization, and user management. You can learn more via the AWS blog post Authenticate Kubecost Users with Application Load Balancer and Amazon Cognito.
Deploying Kubecost on EKS Anywhere via Helm is not the officially recommended method by Kubecost or AWS. The recommended method is via EKS add-on (see above).
Amazon EKS Anywhere (EKS-A) is an alternate deployment of EKS which allows you to create and configure on-premises clusters, including on your own virtual machines. It is possible to deploy Kubecost on EKS-A clusters to monitor spend data.
Deploying Kubecost on an EKS-A cluster should function similarly at the cluster level, such as when retrieving Allocations or Assets data. However, because on-prem servers wouldn't be visible in a CUR (as the billing source is managed outside AWS), certain features like the Cloud Cost Explorer will not be accessible.
You have installed the EKS-A installer and have access to an Amazon EKS-A cluster.
In your environment, run the following command from your terminal to install Kubecost on your existing Amazon EKS cluster:
To install Kubecost on an EKS-A cluster on AWS Graviton2 (ARM-based processor), you can run following command:
On the Amazon EKS cluster with mixed processor architecture worker nodes (AMD64, ARM64), this parameter can be used to schedule Kubecost deployment on ARM-based worker nodes: --set nodeSelector."beta\\.kubernetes\\.io/arch"=arm64
Remember to replace $VERSION with the actual version number. You can find all available versions via the Amazon ECR public gallery here.
By default, the installation will include certain prerequisite software including Prometheus and kube-state-metrics. To customize your deployment, such as skipping these prerequisites if you already have them running in your cluster, you can configure any of the available values to modify storage, network configuration, and more.
Run the following command to enable port-forwarding to expose the Kubecost dashboard:
You can now access Kubecost's UI by visiting http://localhost:9090
in your local web browser. Here, you can monitor your Amazon EKS cluster cost and efficiency through the Allocations and Assets pages.
Amazon EKS documentation:
AWS blog content:
This document provides the steps for installing the Kubecost product from the AWS marketplace. .
To deploy Kubecost from AWS Marketplace, you need to assign an IAM policy with appropriate IAM permission to a Kubernetes service account before starting the deployment. You can either use AWS managed policy arn:aws:iam::aws:policy/AWSMarketplaceMeteringRegisterUsage
or create your own IAM policy. You can learn more with AWS' tutorial.
Here's an example IAM policy:
Create an IAM role with AWS-managed IAM policy.
Create a K8s service account name awsstore-serviceaccount
in your Amazon EKS cluster.
Set up a trust relationship between the created IAM role with awsstore-serviceaccount.
Modify awsstore-serviceaccount
annotation to associate it with the created IAM role
Remember to replace CLUSTER_NAME
with your actual Amazon EKS cluster name.
Define which available version you would like to install using this following command You can check available version titles from the AWS Marketplace product, e.g: prod-1.95.0:
export IMAGETAG=<VERSION-TITLE>
Deploy Kubecost with Helm using the following command:
Run this command to enable port-forwarding and access the Kubecost UI:
You can now start monitoring your Amazon EKS cluster cost with Kubecost by visiting http://localhost:9090
.
Name | Type | Description |
---|---|---|
We recommend doing this via . The command below helps to automate these manual steps:
More details and how to set up the appropriate trust relationships is available .
Your Amazon EKS cluster needs to have IAM OIDC provider enabled to set up IRSA. Learn more with AWS' doc.
window*
string
Duration of time over which to query. Accepts words like today
, week
, month
, yesterday
, lastweek
, lastmonth
; durations like 30m
, 12h
, 7d
; comma-separated RFC3339 date pairs like 2021-01-02T15:04:05Z,2021-02-02T15:04:05Z
; comma-separated Unix timestamp (seconds) pairs like 1578002645,1580681045
.
Installing Kubecost on an Alibaba cluster is the same as other cloud providers with Helm v3.1+:
helm install kubecost/cost-analyzer -n kubecost -f values.yaml
Your values.yaml files must contain the below parameters:
The alibaba-service-key
can be created using the following command:
Your path needs a file having Alibaba Cloud secrets. Alibaba secrets can be passed in a JSON file with the file in the format:
These two can be generated in the Alibaba Cloud portal. Hover over your user account icon, then select AccessKey Management. A new window opens. Select Create AccessKey to generate a unique access token that will be used for all activities related to Kubecost.
Currently, Kubecost does not support complete integration of your Alibaba billing data like for other major cloud providers. Instead, Kubecost will only support public pricing integration, which will provide proper list prices for all cloud-based resources. Features like reconciliation and savings insights are not available for Alibaba. For more information on setting up a public pricing integration, see our Multi-Cloud Integrations doc.
While getting all the available Storage Classes that the Alibaba K8s cluster comes with, there may not be a default storage class. Kubecost installation may fail as the cost-model pod and Prometheus server pod would be in a status pending state.
To fix this issue, make any of the Storage Classes in the Alibaba K8s cluster as Default using the below command:
Following this, installation should proceed as normal.
Installing Kubecost on a GKE Autopilot cluster is similar to other cloud providers with Helm v3.1+, with a few changes. Autopilot requires the use of Google Managed Prometheus service, which generates additional costs within your Google Cloud account.
helm install kubecost/cost-analyzer -n kubecost -f values.yaml
Your values.yaml files must contain the below parameters. Resources are specified for each section of the Kubecost deployment, and Pod Security Policies are disabled.
Open the OperatorConfig on your Autopilot Cluster resource for editing:
Add the following collection section to the resource:
Save the file and close the editor. After a short time, the Kubelet metric endpoints will be scraped and the metrics become available for querying in Managed Service for Prometheus.
Plural is a free, open-source tool that enables you to deploy Kubecost on Kubernetes with the cloud provider of your choice. Plural is an open-source DevOps platform for self-hosting applications on Kubernetes without the management overhead. With baked-in SSO, automated upgrades, and secret encryption, you get all the benefits of a managed service with none of the lock-in or cost.
Kubecost is available as direct install with Plural, and it synergizes very well with the ecosystem, providing cost monitoring out of the box to users that deploy their Kubernetes clusters with Plural.
First, create an account on Plural. This is only to track your installations and allow for the delivery of automated upgrades. You will not be asked to provide any infrastructure credentials or sensitive information.
Next, install the Plural CLI by following steps 1-3 of Plural's CLI Quickstart guide.
You'll need a Git repository to store your Plural configuration. This will contain the Helm charts, Terraform config, and Kubernetes manifests that Plural will autogenerate for you.
You have two options:
Run plural init
in any directory to let Plural initiate an OAuth workflow to create a Git repo for you.
Create a Git repo manually, clone it down, and run plural init
inside it.
Running plural init
will start a configuration wizard to configure your Git repo and cloud provider for use with Plural. You're now ready to install Kubecost on your Plural repo.
To find the console bundle name for your cloud provider, run:
Now, to add it your workspace, run the install command. If you're on AWS, this is what the command would look like:
Plural's Kubecost distribution has support for AWS, GCP, and Azure, so feel free to pick whichever best fits your infrastructure.
The CLI will prompt you to choose whether you want to use Plural OIDC. OIDC allows you to log in to the applications you host on Plural with your login acting as an SSO provider.
To generate the configuration and deploy your infrastructure, run:
Note: Deploys will generally take 10-20 minutes, based on your cloud provider.
To make management of your installation as simple as possible, we recommend installing the Plural Console. The console provides tools to manage resource scaling, receiving automated upgrades, creating dashboards tailored to your Kubecost installation, and log aggregation. This can be done using the exact same process as above, using AWS as an example:
Now, head over to kubecost.YOUR_SUBDOMAIN.onplural.sh
to access the Kubecost UI. If you set up a different subdomain for Kubecost during installation, make sure to use that instead.
To monitor and manage your Kubecost installation, head over to the Plural Console at console.YOUR_SUBDOMAIN.onplural.sh
.
To bring down your Plural installation of Kubecost at any time, run:
To bring your entire Plural deployment down, run:
Note: Only do this if you're absolutely sure you want to bring down all associated resources with this repository.
If you have any issues with installing Kubecost on Plural, feel free to join the Plural Discord Community and we can help you out.
If you'd like to request any new features for our Kubecost installation, feel free to open an issue or PR here.
To learn more about what you can do with Plural and more advanced uses of the platform, feel free to dive deeper into Plural's docs.
The following requirements are given:
Rancher with default monitoring
Use of an existing Prometheus and Grafana (Kubecost will be installed without Prometheus and Grafana)
Istio with gateway and sidecar for deployments
Kubecost v1.85.0+ includes changes to support cAdvisor metrics without the container_name
rewrite rule.
Istio is activated by editing the namespace. To do this, execute the command kubectl edit namespace kubecost
and insert the label istio-injection: enabled
After Istio has been activated, some adjustments must be made to the deployment with kubectl -n kubecost edit deployment kubecost-cost-analyzer
to allow communication within the namespace. For example, the healtch-check is completed successfully. When editing the deployment, the two annotations must be added:
An authorization policy governs access restrictions in namespaces and specifies how resources within a namespace are allowed to access it.
Peer authentication is used to set how traffic is tunneled to the Istio sidecar. In the example, enforcing TLS is disabled so that Prometheus can grab the metrics from Kubecost (if this action is not performed, it returns at HTTP 503 error).
A destination rule is used to specify how traffic should be handled after routing to a service. In my case, TLS is disabled for connections from Kubecost to Prometheus and Grafana (namespace "cattle-monitoring-system").
A virtual service is used to direct data traffic specifically to individual services within the service mesh. The virtual service defines how the routing should run. A gateway is required for a virtual service.
After creating the virtual service, Kubecost should be accessible at the URL http(s)://${gateway}/kubecost/
.
This article is the primary reference for installing Kubecost in an air-gapped environment with a user-managed container registry.
This section details all required and optional Kubecost images. Optional images are used depending on the specific configuration needed.
Please substitute the appropriate version for prod-x.xx.x. Latest releases can be found here.
To find the exact images used for each Kubecost release, a command such as this can be used:
The alpine/k8s image is not used in real deployments. It is only in the Helm chart for testing purposes.
Frontend: gcr.io/kubecost1/frontend
CostModel: gcr.io/kubecost1/cost-model
NetworkCosts: gcr.io/kubecost1/kubecost-network-costs (used for network-allocation)
Cluster controller: gcr.io/kubecost1/cluster-controller:v0.9.0 (used for write actions)
BusyBox: registry.hub.docker.com/library/busybox:latest (only for NFS)
quay.io/prometheus/prometheus
prom/node-exporter
quay.io/prometheus-operator/prometheus-config-reloader
grafana/grafana
kiwigrid/k8s-sidecar
thanosio/thanos
There are two options to configure asset prices in your on-premise Kubernetes environment:
Per-resource prices can be configured in a Helm values file (reference) or directly in the Kubecost Settings page. This allows you to directly supply the cost of a certain Kubernetes resources, such as a CPU month, a RAM Gb month, etc.
Use quotes if setting "0.00" for any item under kubecostProductConfigs.defaultModelPricing
. Failure to do so will result in the value(s) not being written to the Kubecost cost-model's PV (/var/configs/default.json).
When setting CPU and RAM monthly prices, the values will be broken down to the hourly rate for the total monthly price set under kubecost.ProductConfigs.defaultModelPricing. The values will adjust accordingly in /var/configs/default.json in the kubecost cost-model container.
This method allows each individual asset in your environment to have a unique price. This leverages the Kubecost custom CSV pipeline which is available on Enterprise plans.
Use a proxy for the AWS pricing API. You can set AWS_PRICING_URL
via the extra env var
to the address of your proxy.
Grafana Cloud is a composable observability platform, integrating metrics, traces and logs with Grafana. Customers can leverage the best open source observability software without the overhead of installing, maintaining, and scaling your observability stack.
This document will show you how to integrate the Grafana Cloud Prometheus metrics service with Kubecost.
You have access to a running Kubernetes cluster
You have created a Grafana Cloud account
You have permissions to create Grafana Cloud API keys
Install the Grafana Agent for Kubernetes on your cluster. On the existing K8s cluster that you intend to install Kubecost, run the following commands to install the Grafana Agent to scrape the metrics from Kubecost /metrics
endpoint. The script below installs the Grafana Agent with the necessary scraping configuration for Kubecost, you may want to add additional scrape configuration for your setup. Please remember to replace the following values with your actual Grafana cloud's values:
REPLACE-WITH-GRAFANA-PROM-REMOTE-WRITE-ENDPOINT
REPLACE-WITH-GRAFANA-PROM-REMOTE-WRITE-USERNAME
REPLACE-WITH-GRAFANA-PROM-REMOTE-WRITE-API-KEY
REPLACE-WITH-YOUR-CLUSTER-NAME
You can also verify if grafana-agent
is scraping data with the following command (optional):
To learn more about how to install and config Grafana agent as well as additional scrape configuration, please refer to Grafana Agent documentation or you can check Kubecost Prometheus scrape config at this GitHub repository.
dbsecret
to allow Kubecost to query the metrics from Grafana Cloud Prometheus:Create two files in your working directory, called USERNAME
and PASSWORD
respectively
Verify that you can run queries against your Grafana Cloud Prometheus query endpoint (optional):
Create K8s secret name dbsecret
:
Verify if the credentials appear correctly (optional):
To set up recording rules in Grafana Cloud, download the Cortextool CLI utility. While they are optional, they offer improved performance.
After installing the tool, create a file called kubecost_rules.yaml with the following command:
Then, make sure you are in the same directory as your _kubecost\_rules.yaml_
, and load the rules using Cortextool. Replace the address with your Grafana Cloud’s Prometheus endpoint (Remember to omit the /api/prom path from the endpoint URL).
Print out the rules to verify that they’ve been loaded correctly:
Install Kubecost on your K8s cluster with Grafana Cloud Prometheus query endpoint and dbsecret
you created in Step 2.
The process is complete. By now, you should have successfully completed the Kubecost integration with Grafana Cloud.
Optionally, you can also add our Kubecost Dashboard for Grafana Cloud to your organization to visualize your cloud costs in Grafana.
There are several considerations when disabling the Kubecost included Prometheus deployment. Kubecost strongly recommends installing Kubecost with the bundled Prometheus in most environments.
The Kubecost Prometheus deployment is optimized to not interfere with other observability instrumentation and by default only contains metrics that are useful to the Kubecost product. This results in 70-90% fewer metrics than a Prometheus deployment using default settings.
Additionally, if multi-cluster metric aggregation is required, Kubecost provides a turnkey solution that is highly tuned and simple to support using the included Prometheus deployment.
This feature is accessible to all users. However, please note that comprehensive support is provided with a paid support plan.
Kubecost requires the following minimum versions:
Prometheus: v2.18 (v2.13-2.17 supported with limited functionality)
kube-state-metrics: v1.6.0+
cAdvisor: kubelet v1.11.0+
node-exporter: v0.16+ (Optional)
If you have node-exporter and/or KSM running on your cluster, follow this step to disable the Kubecost included versions. Additional detail on KSM requirements.
In contrast to our recommendation above, we do recommend disabling the Kubecost's node-exporter and kube-state-metrics if you already have them running in your cluster.
This process is not recommended. Before continuing, review the Bring your own Prometheus section if you haven't already.
Pass the following parameters in your Helm install:
The FQDN can be a full path via https://prometheus-prod-us-central-x.grafana.net/api/prom/
if you use Grafana Cloud-managed Prometheus. Learn more in the Grafana Cloud Integration for Kubecost doc.
Have your Prometheus scrape the cost-model /metrics
endpoint. These metrics are needed for reporting accurate pricing data. Here is an example scrape config:
This config needs to be added to extraScrapeConfigs
in the Prometheus configuration. See the example extraScrapeConfigs.yaml.
By default, the Prometheus chart included with Kubecost (bundled-Prometheus) contains scrape configs optimized for Kubecost-required metrics. You need to add those scrape configs jobs into your existing Prometheus setup to allow Kubecost to provide more accurate cost data and optimize the required resources for your existing Prometheus.
You can find the full scrape configs of our bundled-Prometheus here. You can check Prometheus documentation for more information about the scrape config, or read this documentation if you are using Prometheus Operator.
This step is optional. If you do not set up Kubecost's CPU usage recording rule, Kubecost will fall back to a PromQL subquery which may put unnecessary load on your Prometheus.
Kubecost-bundled Prometheus includes a recording rule used to calculate CPU usage max, a critical component of the request right-sizing recommendation functionality. Add the recording rules to reduce query load here.
Alternatively, if your environment supports serviceMonitors
and prometheusRules
, pass these values to your Helm install:
To confirm this job is successfully scraped by Prometheus, you can view the Targets page in Prometheus and look for a job named kubecost
.
This step is optional, and only impacts certain efficiency metrics. View issue/556 for a description of what will be missing if this step is skipped.
You'll need to add the following relabel config to the job that scrapes the node exporter DaemonSet.
This does not override the source label. It creates a new label called kubernetes_node
and copies the value of pod into it.
In order to distinguish between multiple clusters, Kubecost needs to know the label used in prometheus to identify the name. Use the .Values.kubecostModel.promClusterIDLabel
. The default cluster label is cluster_id
, though many environments use the key of cluster
.
By default, metric retention is 91 days, however the retention of data can be further increased with a configurable value for a property etlDailyStoreDurationDays
. You can find this value here.
Increasing the default etlDailyStorageDurationDays
value will naturally result in greater memory usage. At higher values, this can cause errors when trying to display this information in the Kubecost UI. You can remedy this by increasing the Step size when using the Allocations dashboard.
The Diagnostics page (Settings > View Full Diagnostics) provides diagnostic info on your integration. Scroll down to Prometheus Status to verify that your configuration is successful.
Below you can find solutions to common Prometheus configuration problems. View the Kubecost Diagnostics doc for more information.
Evidenced by the following pod error message No valid prometheus config file at ...
and the init pods hanging. We recommend running curl <your_prometheus_url>/api/v1/status/config
from a pod in the cluster to confirm that your Prometheus config is returned. Here is an example, but this needs to be updated based on your pod name and Prometheus address:
In the above example, <your_prometheus_url> may include a port number and/or namespace, example: http://prometheus-operator-kube-p-prometheus.monitoring:9090/api/v1/status/config
If the config file is not returned, this is an indication that an incorrect Prometheus address has been provided. If a config file is returned from one pod in the cluster but not the Kubecost pod, then the Kubecost pod likely has its access restricted by a network policy, service mesh, etc.
Network policies, Mesh networks, or other security related tooling can block network traffic between Prometheus and Kubecost which will result in the Kubecost scrape target state as being down in the Prometheus targets UI. To assist in troubleshooting this type of error you can use the curl
command from within the cost-analyzer container to try and reach the Prometheus target. Note the "namespace" and "deployment" name in this command may need updated to match your environment, this example uses the default Kubecost Prometheus deployment.
When successful, this command should return all of the metrics that Kubecost uses. Failures may be indicative of the network traffic being blocked.
Ensure Prometheus isn't being CPU throttled due to a low resource request.
Review the Dependency Requirements section above
Visit Prometheus Targets page (screenshot above)
Make sure that honor_labels is enabled
Ensure results are not null for both queries below.
Make sure Prometheus is scraping Kubecost search metrics for: node_total_hourly_cost
Ensure kube-state-metrics are available: kube_node_status_capacity
For both queries, verify nodes are returned. A successful response should look like:
An error will look like:
Ensure that all clusters and nodes have values- output should be similar to the above Single Cluster Tests
Make sure Prometheus is scraping Kubecost search metrics for: node_total_hourly_cost
On macOS, change date -d '1 day ago'
to date -v '-1d'
Ensure kube-state-metrics are available: kube_node_status_capacity
For both queries, verify nodes are returned. A successful response should look like:
An error will look like:
Kubecost leverages the open-source Prometheus project as a time series database and post-processes the data in Prometheus to perform cost allocation calculations and provide optimization insights for your Kubernetes clusters such as Amazon Elastic Kubernetes Service (Amazon EKS). Prometheus is a single machine statically-resourced container, so depending on your cluster size or when your cluster scales out, it could exceed the scraping capabilities of a single Prometheus server. In collaboration with Amazon Web Services (AWS), Kubecost integrates with Amazon Managed Service for Prometheus (AMP), a managed Prometheus-compatible monitoring service, to enable the customer to easily monitor Kubernetes cost at scale.
The architecture of this integration is similar to Amazon EKS cost monitoring with Kubecost, which is described in the previous blog post, with some enhancements as follows:
In this integration, an additional AWS SigV4 container is added to the cost-analyzer pod, acting as a proxy to help query metrics from Amazon Managed Service for Prometheus using the AWS SigV4 signing process. It enables passwordless authentication to reduce the risk of exposing your AWS credentials.
When the Amazon Managed Service for Prometheus integration is enabled, the bundled Prometheus server in the Kubecost Helm Chart is configured in the remote_write mode. The bundled Prometheus server sends the collected metrics to Amazon Managed Service for Prometheus using the AWS SigV4 signing process. All metrics and data are stored in Amazon Managed Service for Prometheus, and Kubecost queries the metrics directly from Amazon Managed Service for Prometheus instead of the bundled Prometheus. It helps customers not worry about maintaining and scaling the local Prometheus instance.
There are two architectures you can deploy:
The Quick-Start architecture supports a small multi-cluster setup of up to 100 clusters.
The Federated architecture supports a large multi-cluster setup for over 100 clusters.
The infrastructure can manageup to 100 clusters. The following architecture diagram illustrates the small-scale infrastructure setup:
To support the large-scale infrastructure of over 100 clusters, Kubecost leverages a Federated ETL architecture. In addition to Amazon Prometheus Workspace, Kubecost stores its extract, transform, and load (ETL) data in a central S3 bucket. Kubecost's ETL data is a computed cache based on Prometheus's metrics, from which users can perform all possible Kubecost queries. By storing the ETL data on an S3 bucket, this integration offers resiliency to your cost allocation data, improves the performance and enables high availability architecture for your Kubecost setup.
The following architecture diagram illustrates the large-scale infrastructure setup:
You have an existing AWS account. You have IAM credentials to create Amazon Managed Service for Prometheus and IAM roles programmatically. You have an existing Amazon EKS cluster with OIDC enabled. Your Amazon EKS clusters have Amazon EBS CSI driver installed
The example output should be in this format:
The Amazon Managed Service for Prometheus workspace should be created in a few seconds. Run the following command to get the workspace ID:
Run the following command to set environment variables for integrating Kubecost with Amazon Managed Service for Prometheus:
Note: You can ignore Step 2 for the small-scale infrastructure setup.
a. Create Object store S3 bucket to store Kubecost ETL metrics. Run the following command in your workspace:
b. Create IAM Policy to grant access to the S3 bucket. The following policy is for demo purposes only. You may need to consult your security team and make appropriate changes depending on your organization's requirements.
Run the following command in your workspace:
c. Create Kubernetes secret to allow Kubecost to write ETL files to the S3 bucket. Run the following command in your workspace:
These following commands help to automate the following tasks:
Create an IAM role with the AWS-managed IAM policy and trusted policy for the following service accounts: kubecost-cost-analyzer-amp
, kubecost-prometheus-server-amp
.
Modify current K8s service accounts with annotation to attach a new IAM role.
Run the following command in your workspace:
For more information, you can check AWS documentation at IAM roles for service accounts and learn more about Amazon Managed Service for Prometheus managed policy at Identity-based policy examples for Amazon Managed Service for Prometheus
Run the following command to create a file called config-values.yaml, which contains the defaults that Kubecost will use for connecting to your Amazon Managed Service for Prometheus workspace.
Run this command to install Kubecost and integrate it with the Amazon Managed Service for Prometheus workspace as the primary:
These installation steps are similar to those for a primary cluster setup, except you do not need to follow the steps in the section "Create Amazon Managed Service for Prometheus workspace", and you need to update these environment variables below to match with your additional clusters. Please note that the AMP_WORKSPACE_ID
and KC_BUCKET
are the same as the primary cluster.
Run this command to install Kubecost and integrate it with the Amazon Managed Service for Prometheus workspace as the additional cluster:
Your Kubecost setup is now writing and collecting data from AMP. Data should be ready for viewing within 15 minutes.
To verify that the integration is set up, go to Settings in the Kubecost UI, and check the Prometheus Status section.
Read our Custom Prometheus integration troubleshooting guide if you run into any errors while setting up the integration. For support from AWS, you can submit a support request through your existing AWS support contract.
You can add these recording rules to improve the performance. Recording rules allow you to precompute frequently needed or computationally expensive expressions and save their results as a new set of time series. Querying the precomputed result is often much faster than running the original expression every time it is needed. Follow these instructions to add the following rules:
The below queries must return data for Kubecost to calculate costs correctly.
For the queries below to work, set the environment variables:
Verify connection to AMP and that the metric for container_memory_working_set_bytes
is available:
If you have set kubecostModel.promClusterIDLabel
, you will need to change the query (CLUSTER_ID
) to match the label (typically cluster
or alpha_eksctl_io_cluster_name
).
The output should contain a JSON entry similar to the following.
The value of cluster_id
should match the value of kubecostProductConfigs.clusterName
.
Verify Kubecost metrics are available in AMP:
The output should contain a JSON entry similar to:
If the above queries fail, check the following:
Check logs of the sigv4proxy
container (may be the Kubecost deployment or Prometheus Server deployment depending on your setup):
In a working sigv4proxy
, there will be very few logs.
Correctly working log output:
Check logs in the `cost-model`` container for Prometheus connection issues:
Example errors:
Rafay is a SaaS-first Kubernetes Operations Platform (KOP) with enterprise-class scalability, zero-trust security and interoperability for managing applications across public clouds, data centers & edge.
See Rafay documentation to learn more about the platform and how to use it.
This document will walk you through installing Kubecost on a cluster that has been provisioned or imported using the Rafay controller. The steps below describe how to create and use a custom cluster blueprint via the Rafay Web Console. The entire workflow can also be fully automated and embedded into an automation pipeline using the RCTL CLI utility or Rafay REST APIs.
You have already provisioned or imported one or more Kubernetes clusters using the Rafay controller.
Under Integrations:
Select Repositories and create a new repository named kubecost
of type Helm.
Select Create.
Enter the endpoint value of https://kubecost.github.io/cost-analyzer/
.
Select Save.
You'll need to override the default values.yaml file. Create a new file called kubecost-custom-values.yaml with the following content:
Login to the Rafay Web Console and navigate to your Project as an Org Admin or Infrastructure Admin.
Under Infrastructure, select Namespaces and create a new namespace called kubecost
, and select type Wizard.
Select Save & Go to Placement.
Select the cluster(s) that the namespace will be added to. Select Save & Go To Publish.
Select Publish to publish the namespace to the selected cluster(s).
Once the namespace has been published, select Exit.
Under Infrastructure, select Clusters.
Select the kubectl button on the cluster to open a virtual terminal.
Verify that the kubecost
namespace has been created by running the following command:
From the Web Console:
Select Add-ons and Create a new add-on called kubecost.
Select Bring your own.
Select Helm 3 for type.
Select Pull files from repository.
Select Helm for the repository type.
Select kubecost
for the namespace.
Select Select.
Create a new version of the add-on.
Select New Version.
Provide a version name such as v1
.
Select kubecost
for the repository.
Enter cost-analyzer
for the chart name.
Upload the kubecost-custom-values.yaml
file that was previously created.
Select Save Changes.
Once you've created the Kubecost add-on, use it in assembling a custom cluster blueprint. You can add other add-ons to the same custom blueprint.
Under Infrastructure, select Blueprints.
Create a new blueprint and give it a name such as kubecost
.
Select Save.
Create a new version of the blueprint.
Select New Version.
Provide a version name such as v1
.
Under Add-Ons, select the kubecost
Add-on and the version that was previously created.
Select Save Changes.
You may now apply this custom blueprint to a cluster.
Select Options for the target cluster in the Web Console.
Select Update Blueprint and select the kubecost
blueprint and version you created previously.
Select Save and Publish.
This will start the deployment of the add-ons configured in the kubecost
blueprint to the targeted cluster. The blueprint sync process can take a few minutes. Once complete, the cluster will display the current cluster blueprint details and whether the sync was successful or not.
You can optionally verify whether the correct resources have been created on the cluster. Select the kubectl
button on the cluster to open a virtual terminal.
Then, verify the pods in the kubecost
namespace. Run kubectl get pod -n kubecost
, and check that the output is similar to the example below.
In order to access the Kubecost UI, you'll need to enable access to the frontend application using port-forward. To do this, download and use the Kubeconfig
with the KubeCTL CLI (../../accessproxy/kubectl_cli/
).
You can now access the Kubecost UI by visiting http://localhost:9090
in your browser.
You have now successfully created a custom cluster blueprint with the kubecost
add-on and applied to a cluster. Use this blueprint on as many clusters as you require.
You can find Rafay's documentation on Kubecost as well as guides for how to create or import a cluster using the Rafay controller on the Rafay Product Documentation site.
Using an existing Grafana deployment can be accomplished through one of two options:
Linking to an external Grafana directly
Deploying with Grafana sidecar enabled
After installing Kubecost, select Settings from the left navigation and update Grafana Address to a URL that is visible to users accessing Grafana dashboards. This variable can alternatively be passed at the time you deploy Kubecost via the kubecostProductConfigs.grafanaURL
parameter in values.yaml. Next, import Kubecost Grafana dashboards as JSON from this folder.
Passing the Grafana parameters below in your values.yaml will install ConfigMaps for Grafana dashboards that will be picked up by the Grafana sidecar if you have Grafana with the dashboard sidecar already installed.
Ensure that the following flags are set in your Operator deployment:
sidecar.dashboards.enabled=true
sidecar.dashboards.searchNamespace
isn't restrictive. Use ALL
if Kubecost runs in another namespace.
The Kubecost UI cannot link to the Grafana dashboards unless kubecostProductConfigs.grafanaURL
is set, either via the Helm chart, or via the Settings page, as described in Option 1.
When using Kubecost on a custom ingress path, you must add this path to the Grafana root_url
:
If you choose to disable Grafana, set the following Helm values to ensure successful pod startup:
Kubecost supports deploying to Red Hat OpenShift (OCP) and includes options and features which assist in getting Kubecost running quickly and easily with OpenShift-specific resources.
There are two main options to deploy Kubecost on OpenShift.
More details and instructions on both deployment options are covered in the sections below.
A standard deployment of Kubecost to OpenShift is no different from deployments to other platforms with the exception of additional settings which may be required to successfully deploy to OpenShift.
Kubecost is installed with Cost Analyzer and Prometheus as a time-series database. Data is gathered by the Prometheus instance bundled with Kubecost. Kubecost then pushes and queries metrics to and from Prometheus.
The standard deployment is illustrated in the following diagram.
An existing OpenShift or OpenShift-compatible cluster (ex., OKD).
Access to the cluster to create a new project and deploy new workloads.
helm
CLI installed locally.
Add the Kubecost Helm chart repository and scan for new charts.
Install Kubecost using OpenShift specific values. Note that the below command fetches the OpenShift values from the development branch which may not reflect the state of the release which was just installed. We recommend using the corresponding values file from the chart release.
Because OpenShift disallows defining certain fields in a pod's securityContext
configuration, values specific to OpenShift must be used. The necessary values have already been defined in the OpenShift values file but may be customized to your specific needs.
If you want to install Kubecost with your desired cluster name, provide the following values to either your values override file or via the --set
command. Remember to replace the cluster name/id with the value you wish to use for this installation.
Other OpenShift-specific values include the ability to deploy a Route and SecurityContextConstraints for optional components requiring more privileges such as Kubecost network costs and Prometheus node exporter. To view all the latest OpenShift-specific values and their use, please see the OpenShift values file.
If you have not opted to do so during installation, it may be necessary to create a Route to the service kubecost-cost-analyzer
on port 9090
of the kubecost
project (if using default values). For more information on Routes, see the OpenShift documentation here.
After installation, wait for all pods to be ready. Kubecost will begin collecting data and may take up to 15 minutes for the UI to reflect the resources in the local cluster.
Kubecost offers a Red Hat community operator which can be found in the Operator Hub catalog of the OpenShift web console. When using this deployment method, the operator is installed and a Kubernetes Custom Resource is created which then triggers the operator to deploy the Helm chart. The chart deployed by the community operator is the same chart which is referenced in the standard deployment.
An existing OpenShift cluster.
Access to the cluster to create a new project and deploy new workloads.
Log in to your OCP cluster web console and select Operators > OperatorHub > then enter "Kubecost" in the search box.
Click the Install button to be taken to the operator installation page.
On the installation page, select the update approval method and then click Install.
Once the operator has been installed, create a namespace in which to deploy a Kubecost installation.
You can also select Operators > Installed Operators to review the details as shown below.
Once the namespace has been created, create the CostAnalyzer Custom Resource (CR) with the desired values for your installation. The CostAnalyzer CR represents the total Helm values used to deploy Kubecost and any of its components. This may either be created in the OperatorHub portal or via the oc
CLI. The default CostAnalyzer sample provided is pre-configured for a basic installation of Kubecost.
To create the CostAnalyzer resource from OperatorHub, from the installed Kubecost operator page, click on the CostAnalyzer tab and click the Create CostAnalyzer button.
Click on the radio button YAML view to see a full example of a CostAnalyzer CR. Here, you can either create a CostAnalyzer directly or download the Custom Resource for later use.
Change the namespace
field to kubecost
if this was the name of your namespace created previously.
Click the Create button to create the CostAnalyzer based on the current YAML.
After about a minute, Kubecost should be up and running based upon the configuration defined in the CostAnalyzer CR. You can get details on this installation by clicking on the instance which was just deployed.
If you have not opted to do so during installation, it may be necessary to create a Route to the service kubecost-cost-analyzer
on port 9090
of the kubecost
project (if using default values). For more information on Routes, see the OpenShift documentation here.
As of v1.67, the persistent volume attached to Kubecost's primary pod (cost-analyzer) contains as well as product configuration data. While it's technically optional (because all configurations can be set via ConfigMap), it dramatically reduces the load against your Prometheus/Thanos installations on pod restart/redeploy. For this reason, it's strongly encouraged on larger clusters.
If you are creating a new installation of Kubecost:
We recommend that you back Kubecost with at least a 32GB disk. This is the default as of 1.72.0.
If you are upgrading an existing version of Kubecost:
If your provisioner supports volume expansion, we will automatically resize you to a 32GB disk when upgrading to 1.72.0.
If your provisioner does not support volume expansion:
If all your configs are supplied via values.yaml in Helm or via ConfigMap and have not been added from the front end, you can safely delete the PV and upgrade.
We suggest you delete the old PV, then run Kubecost with a 32GB disk. This is the default in 1.72.0
If you cannot safely delete the PV storing your configs and configure them on a new PV:
If you are not on a regional cluster, provision a second PV by setting persistentVolume.dbPVEnabled=true
If you are on a regional cluster, provision a second PV using a topology-aware storage class (). You can set this disk’s storage class by setting persistentVolume.dbStorageClass=your-topology-aware-storage-class-name
If you're using just one PV and still see issues with Kubecost being rescheduled on zones outside of your disk, consider using a topology-aware storage class. You can set the Kubecost disk’s storage class by setting persistentVolume.storageClass
to your topology-aware storage class name.
Kubecost leverages the open-source Prometheus project as a time series database and post-processes the data in Prometheus to perform cost allocation calculations and provide optimization insights for your Kubernetes clusters. Prometheus is a single machine statically-resourced container, so depending on your cluster size or when your cluster scales out, your cluster could exceed the scraping capabilities of a single Prometheus server. In this doc, you will learn how Kubecost integrates with , a managed Prometheus-compatible monitoring service, to enable the customer to monitor Kubernetes costs at scale easily.
For this integration, GMP is required to be enabled for your GKE cluster with the managed collection. Next, Kubecost is installed in your GKE cluster and leverages GMP Prometheus binary to ingest metrics into GMP database seamlessly. In this setup, Kubecost deployment also automatically creates a Prometheus proxy that allows Kubecost to query the metrics from the GMP database for cost allocation calculation.
This integration is currently in beta.
You have a GCP account/subscription.
You have permission to manage GKE clusters and GCP monitoring services.
You have an existing GKE cluster with GMP enabled. You can learn more .
You can use the following command to install Kubecost on your GKE cluster and integrate with GMP:
In this installation command, these additional flags are added to have Kubecost work with GMP:
prometheus.server.image.repository
and prometheus.server.image.tag
replace the standard Prometheus image with GMP specific image.
global.gmp.enabled
and global.gmp.gmpProxy.projectId
are for enabling the GMP integration.
prometheus.server.global.external_labels.cluster_id
and kubecostProductConfigs.clusterName
helps to set the name for your Kubecost setup.
Your Kubecost setup now writes and collects data from GMP. Data should be ready for viewing within 15 minutes.
Run the following command to enable port-forwarding to expose the Kubecost dashboard:
To verify that the integration is set up, go to Settings in the Kubecost UI, and check the Prometheus Status section.
The below queries must return data for Kubecost to calculate costs correctly. For the queries to work, set the environment variables:
Verify connection to GMP and that the metric for container_memory_working_set_bytes
is available:
If you have set kubecostModel.promClusterIDLabel
in the Helm chart, you will need to change the query (CLUSTER_ID
) to match the label.
Verify Kubecost metrics are available in GMP:
You should receive an output similar to:
If id
returns as a blank value, you can set the following Helm value to force set cluster
as the Prometheus cluster ID label:
If the above queries fail, check the following:
Check logs of the sigv4proxy
container (may be the Kubecost deployment or Prometheus Server deployment depending on your setup):
In a working sigv4proxy
, there will be very few logs.
Correctly working log output:
Check logs in the cost-model
container for Prometheus connection issues:
Example errors:
In the standard deployment of , Kubecost is deployed with a bundled Prometheus instance to collect and store metrics of your Kubernetes cluster. Kubecost also provides the flexibility to connect with your time series database or storage. is an open-source, horizontally scalable, highly available, multi-tenant TSDB for long-term storage for Prometheus.
This document will show you how to integrate the Grafana Mimir with Kubecost for long-term metrics retention. In this setup, you need to use Grafana Agent to collect metrics from Kubecost and your Kubernetes cluster. The metrics will be re-written to your existing
You have access to a running Kubernetes cluster
You have an existing Grafana Mimir setup
Install the Grafana Agent for Kubernetes on your cluster. On the existing K8s cluster that you intend to install Kubecost, run the following commands to install the Grafana Agent to scrape the metrics from Kubecost /metrics
endpoint. The script below installs the Grafana Agent with the necessary scraping configuration for Kubecost, you may want to add additional scrape configuration for your setup.
You can also verify if grafana-agent
is scraping data with the following command (optional):
Run the following command to deploy Kubecost. Please remember to update the environment variables values with your Mimir setup information.
The process is complete. By now, you should have successfully completed the Kubecost integration with your Grafana Mimir setup.
You can find additional configurations at our main file.
From your , You can run the following query to verify if Kubecost metrics are collected:
Additionally, read our if you run into any other errors while setting up the integration. For support from GCP, you can submit a support request at the .
To learn more about how to install and configure the Grafana agent, as well as additional scrape configuration, please refer to documentation, or you can view the Kubecost Prometheus scrape config at this .