1 of 26

Navigating the Kubecost UI

This section of the docs will break down how to navigate the Kubecost UI. The UI is composed of several primary dashboards which provide cost visualization, as well as multiple savings and governance tools. Below is the main Overview page, which contains several helpful panels for observing workload stats and trends.

Allocations Dashboard

The Kubecost Allocations dashboard allows you to quickly see allocated spend across all native Kubernetes concepts, e.g. namespace, k8s label, and service. It also allows for allocating cost to organizational concepts like team, product/project, department, or environment. This document explains the metrics presented and describes how you can control the data displayed in this view.

Configuring your query

Kubecost provides a variety of options for configuring your allocations queries to view the information you need. Below is a table of the major configuration options, with in-depth explanations in this article for how they work.

Element

Description

Date Range

Will report Last 7 days by default. Manually select your start and end date, or choose a preset option

Aggregate By

Aggregate costs by one or several concepts. Add custom labels

Save/Unsave

Save or unsave the current report

Edit

Includes multiple filtering tools including cost metric and shared resources

Additional options icon

Additional options for opening and downloading reports

Date Range

Select the date range of the report, called the window, by setting specific start and end dates, or by using one of the preset options. You can use Select Start and Select End to establish custom date ranges as well.

Step size

Step size refers to the length of time of each group of data displayed on your dashboard across the window. Options are Default, Daily, Weekly, Monthly, and Quarterly. When retaining long periods of data through custom configurations (such as Prometheus), consider using larger step sizes to avoid potential display errors. The step size when selecting Default is dependent on the size of your window.

Aggregate By filters

Here you can aggregate cost by namespace, deployment, service, and other native Kubernetes concepts. While selecting Single Aggregation, you will only be able to select one concept at a time. While selecting Multi Aggregation, you will be able to filter for multiple concepts at the same time.

Service in this context refers to a Kubernetes object that exposes an interface to outside consumers.

When aggregating by namespace, the Allocations dashboard will only display namespaces that have or have had workloads running in them. If you don't see a namespace on this dashboard, you should confirm whether the namespace is running a workload.

Costs aggregations are also visible by other meaningful organizational concepts, e.g. Team, Department, and Product. These aggregations are based on Kubernetes labels, referenced at both the pod and namespace-level, with labels at the pod-level being favored over the namespace label when both are present. The Kubernetes label name used for these concepts can be configured in Settings or in values.yaml after setting kubecostProductConfigs.labelMappingConfigs.enabled to true. Workloads without the relevant label will be shown as __unallocated__.

Kubernetes annotations can also be used for cost allocation purposes, but this requires enabling a Helm flag. Learn more about using annotations. To see the annotations, you must add them to the label groupings via Settings or in values.yaml. Annotations will not work as one-off Labels added into reports directly, they will only work when added to the label groups in Settings or within the values.yaml.

To find what pods are not part of the relevant label set, you can either apply an __unallocated__ label filter in this allocation view or explore variations of the following kubectl commands:

kubectl get pods -l 'app notin (prometheus, cost-analyzer, ...)' --all-namespaces
kubectl get pods --show-labels -n <TARGET_NAMESPACE>

Edit report

The Edit icon has additional options for configuring your query such as how to display your data, adding filters, and configuring shared resources.

Idle Costs

Allocating idle costs proportionately distributes slack or idle cluster costs to tenants. Idle refers to resources that are provisioned but not being fully used or requested by a tenant.

As an example, if your cluster is only 25% utilized, as measured by the max of resource usage and requests, applying idle costs would proportionately increase the cost of each pod/namespace/deployment by 4x. This feature can be enabled by default in Settings.

The idle costs dropdown allows you to choose how you wish your idle costs to be displayed:

Hide: Hide idle costs completely.
Separate: Idle costs appear as their own cost, visualized as a gray-colored bar in your display table.
Share By Cluster: Idle costs are grouped by the cluster they belong to.
Share By Node: Idle costs are grouped by the node they belong to.

To learn more about sharing idle costs, see here.

Chart

View Allocation data in the following formats:

Cost: Total cost per aggregation over date range
Cost over time: Cost per aggregation broken down over days or hours depending on date range
Efficiency over time: Shows resource efficiency over given date range
Proportional cost: Cost per aggregate displayed as a percentage of total cost over date range
Cost Treemap: Hierarchically structured view of costs in current aggregation

You can select Edit > Chart > Cost over time from the dropdown to have your data displayed on a per-day basis. Hovering over any day's data will provide a breakdown of your spending.

Cost metric

View either cumulative or run rate costs measured over the selected time window based on the resources allocated.

Cumulative Cost: represents the actual/historical spend captured by the Kubecost agent over the selected time window
Rate metrics: Monthly, daily, or hourly "run rate" cost, also used for projected cost figures, based on samples in the selected time window

Costs allocations are based on the following:

Resources allocated, i.e. max of resource requests and usage
The cost of each resource
The amount of time resources were provisioned

For more information, refer to the OpenCost spec.

Filters

Filter resources by namespace, clusterID, and/or Kubernetes label to more closely investigate a rise in spend or key cost drivers at different aggregations such as deployments or pods. When a filter is applied, only resources with this matching value will be shown. These filters are also applied to external out-of-cluster (OOC) asset tags. Supported filters are as follows:

Filter

Description

Cluster

Limit results to workloads in a set of clusters with matching IDs. Note: clusterID is passed in values at install-time.

Node

Limit results to workloads where the node name is filtered for.

Namespace

Limit results to workloads in a set of Kubernetes namespaces.

Label

Limit results to workloads with matching Kubernetes labels. Namespace labels are applied to all of its workloads. Supports filtering by __unallocated__ field as well.

Service

Limit results to workloads based on Kubernetes service name.

Controller

Limit results to workloads based on Kubernetes controller name.

Controllerkind

Limit results to workloads based on Kubernetes controller (Daemonset, Deployment, Job, Statefulset, Replicaset, etc) type.

Pod

Limit results to workloads where the Kubernetes pod name is filtered for.

Comma-separated lists are supported to filter by multiple categories, e.g. namespace filter equals kube-system,kubecost. Wild card filters are also supported, indicated by a * following the filter, e.g. namespace=kube* to return any namespace beginning with kube.

Advanced filtering

You can also implement more advanced forms of filtering to include or exclude values including prefixes or suffixes for any of the above categories in the table. Selecting the filtering dropdown (default Equals) will show you all available filtering options. These are reflective of Kubecost's v2 filtering language.

Shared resources

Select how shared costs set on the settings page will be shared among allocations. Pick from default shared resources, or select a custom shared resource. A custom shared resource can be selected in the Configure custom shared resources feature at the bottom of the Edit window.

Additional options

The three horizontal dots icon (directly next to Save) will provide additional options for handling your report:

Open Report: Allows you to open one of your saved reports without first navigating to the Reports page
Alerts: Send one of four reports routinely: recurring, efficiency, budget, and spend change
Download CSV: Download your current report as a CSV file
Download PDF: Download your current report as a PDF file

Cost metrics table

Cost allocation metrics are available for both in-cluster and OOC resources:

Metric

Description

CPU

GPU

RAM

Persistent Volume (PV) Cost

The cost of persistent storage volumes claimed by this object. Prices are based on cloud billing prices or custom pricing sheets for on-prem deployments.

Network

Load Balancer (LB) cost

The cost of cloud-service load balancer that has been allocated.

Shared

The cost of shared resources allocated to this tenant. This field covers shared overhead, shared namespaces, and shared labels. Can be explored further via Inspect Shared Costs. Idle costs are not included in Shared costs.

Cost Efficiency

The percentage of requested CPU & memory dollars utilized over the measured time window. Values range from 0 to above 100 percent. Workloads with no requests but with usage OR workloads with usage > request can report efficiency above 100%.

Cost efficiency table example

Additional options column

The rightmost column in the Allocations metrics table allows you to perform additional actions on individual line items (functionality will vary based on how you aggregate):

Inspect: Opens an advanced cost overview of the namespace in a new tab.
Inspect Shared Costs: Opens an advanced cost overview of your shared costs in a new tab.
View Right-Sizing: Opens the Container Request Right-Sizing Recommendations page in a new tab.

Efficiency and Idle

For teams interested in reducing their Kubernetes costs, it's beneficial to first understand how provisioned resources have been used. There are two major concepts to start with: pod resource efficiency and cluster idle costs.

Efficiency

Pod resource efficiency is defined as the resource utilization versus the resource request over a given time window. It is cost-weighted and can be expressed as follows:

(((CPU Usage / CPU Requested) * CPU Cost) + ((RAM Usage / RAM Requested) * RAM Cost)) / (RAM Cost + CPU Cost)
where CPU Usage = rate(container_cpu_usage_seconds_total) over the time window RAM Usage = avg(container_memory_working_set_bytes) over the time window

For example, if a pod is requesting 2CPU and 1GB, using 500mCPU and 500MB, CPU on the node costs $10/CPU, and RAM on the node costs $1/GB, we have ((0.5/2) * 20 + (0.5/1) * 1) / (20 + 1) = 5.5 / 21 = 26%

Idle

Cluster idle cost is defined as the difference between the cost of allocated resources and the cost of the hardware they run on. Allocation is defined as the max of usage and requests. It can also be expressed as follows:

idle_cost = sum(cluster_cost) - (cpu_allocation_cost + ram_allocation_cost + gpu_allocation_cost)
where allocation = max(request, usage)

Node idle cost can be expressed as:

idle_cost = sum(node_cost) - (cpu_allocation_cost + ram_allocation_cost + gpu_allocation_cost)
where allocation = max(request, usage)

So, idle costs can also be thought of as the cost of the space that the Kubernetes scheduler could schedule pods, without disrupting any existing workloads, but it is not currently.

Idle can be charged back to pods on a cost-weighted basis or viewed as a separate line item. As an example, consider the following representations:

[ ... ] = cluster
( ... ) = node
wN = workload
-- = idle capacity

Then, a cluster might look like:

[ ( w1, w2, w3, w4, --, --), (w5, --, --, --, --, --) ]

In total, there are 12 units of resources, and idle can be shared as follows:

Separate: In this single cluster across two nodes, there are 7 total idles.
Share By Node: The first node has 4 resources used and 2 idle. The second node has 1 resource used and 5 idle. If you share idle by node, then w1-4 will share 2 idles, and w5 will get 5 idles.
Share By Cluster: The single cluster has 5 resources used and 7 idle. If you share idle by cluster, then w1-5 will share the 7 idles.

Distributing idle when aggregating

If for example you are aggregating by namespace, idle costs will be distributed to each namespace proportional to how much that namespace costs. Specifically:

namespace_cpu_idle_cost = (namespace_cpu_cost / (total_cpu_cost - idle_cpu_cost)) * idle_cpu_cost

This same principle applies for ram, and also applies to any aggregation that is used (e.g. Deployment, Label, Service, Team).

Target values for efficiency and idle

The most common pattern for cost reduction is to ensure service owners tune the efficiency of their pods, and ensure cluster owners scale resources to appropriately minimize idle.

Efficiency targets can depend on the SLAs of the application. See our Request Right-Sizing API doc for more details.

It's recommended to target idle in the following ranges:

CPU: 50%-65%
Memory: 45%-60%
Storage: 65%-80%

Target figures are highly dependent on the predictability and distribution of your resource usage (e.g. P99 vs median), the impact of high utilization on your core product/business metrics, and more. While too low resource utilization is wasteful, too high utilization can lead to latency increases, reliability issues, and other negative behavior.

Network Traffic Cost Allocation

This document describes how Kubecost calculates network costs.

Network cost calculation methodology

Kubecost uses best-effort to allocate network transfer costs to the workloads generating those costs. The level of accuracy has several factors described below.

There are two primary factors when determining how network costs are calculated:

Network costs DaemonSet: Must be enabled in order to view network costs
Cloud integration: Optional, allows for accurate cloud billing information

Base functionality

A default installation of Kubecost will use the onDemand rates for internet egress and proportionally assign those costs by pod using the metric container_network_transmit_bytes_total. This is not exactly the same as costs obtained via the network costs DaemonSet, but will be approximately similar.

Network costs DaemonSet

When you enable the network costs DaemonSet, Kubecost has the ability to attribute the network-byte traffic to specific pods. This will allow the most accurate cost distribution, as Kubecost has per-pod metrics for source and destination traffic.

Learn how to enable the network costs DaemonSet in seconds here.

Cloud integration

Kubecost uses cloud integration to pull actual cloud provider billing information. Without enabling cloud integration, these prices will be based on public onDemand pricing.

Cloud providers allocate data transfers as line-items on a per-node basis. Kubecost will allocate network transfer costs based on each pod's share of container_network_transmit_bytes_total of its node.

This will result in a accurate node-based costs. However, it is only estimating the actual pod/application responsible for the network-transfer costs.

Both cloud integration and network cost DaemonSet

Enabling both cloud-integration and the networkCosts DaemonSet allows Kubecost to give the most accurate data transfer costs to each pod.

Limitations

At this time, there is a minor limitation where Kubecost cannot determine accurate costs for pods that use hostNetwork. These pods, today, will share all costs with the costs with the node.

Assets Dashboard

As of v1.104, cloud data is parsed through the Cloud Costs Explorer dashboard instead of through Assets. Read our announcement here for more information.

The Kubecost Assets dashboard shows Kubernetes cluster costs broken down by the individual backing assets in your cluster (e.g. cost by node, disk, and other assets). It’s used to identify spend drivers over time and to audit Allocation data. This view can also optionally show out-of-cluster assets by service, tag/label, etc.

Similar to our Allocation API, the Assets API uses our ETL pipeline which aggregates data daily. This allows for enterprise-scale with much higher performance.

Configuring your query

Kubecost provides a variety of options for configuring your assets queries to view the information you need. Below is a table of the major configuration options, with in-depth explanations in this article for how they work.

Element

Description

Date Range (Last 7 days)

Will report Last 7 days by default. Manually select your start and end date, or pick one of twelve preset options

Aggregate By

Aggregate costs by one or several concepts. Add custom labels

Save/Unsave

Save or unsave the current report

Edit

Adjust cost metrics and how data is displayed

Additional options icon

Additional options for opening and downloading reports

Date Range

Select the date range of the report by setting specific start and end dates, or using one of the preset options.

Aggregate By filter

Here you can aggregate cost by native Kubernetes concepts. While selecting Single Aggregation, you will only be able to select one concept at a time. While selecting Multi Aggregation, you will be able to filter for multiple concepts at the same time. Assets will be by default aggregated by Service.

Edit Report

The Edit icon has additional options to filter your search:

Resolution

Change the display of your recent assets by service. Daily provides a day-by-day breakdown of assets. Entire window creates a semicircle that shows each asset as a sizable portion based on total cost within the displayed time frame.

Cost metric

View either cumulative or run rate costs measured over the selected time window based on the assets being filtered for.

Cumulative Cost: represents the actual/historical spend captured by the Kubecost agent over the selected time window
Rate metrics: Monthly, daily, or hourly “run rate” cost, also used for projected cost figures, based on samples in the selected time window

Filters

Filter assets by category, service, or other means. When a filter is applied, only resources with this matching value will be shown.

Additional options

The three horizontal dots icon will provide additional options for handling your reports:

Open Report: Open one of your saved reports
Download CSV: Download your current report as a CSV file

Assets metrics table

The assets metrics table displays your aggregate assets, with four columns to organize by.

Name: Name of the aggregate group
Credits: Amount deducted from total cost due to provider-applied credit. A negative number means the total cost was reduced.
Adjusted: Amount added to total cost based on reconciliation with cloud provider’s billing data.
Total cost: Shows the total cost of the aggregate asset factoring in additions or subtractions from the Credits and Adjusted columns.

Hovering over the gray info icon next to each asset will provide you with the hours run and hourly cost of the asset. To the left of each asset name is one of several Category icons (you can aggregate by these): Storage, Network, Compute, Management, and Other.

Gray bubble text may appear next to an asset. These are all manually-assigned labels to an asset. To filter assets for a particular label, select the Edit search parameters icon, then select Label/Tag from the Filters dropdown and enter the complete name of the label.

You can select an aggregate asset to view all individual assets comprising it. Each individual asset should have a ProviderID.

Cloud cost reconciliation

After granting Kubecost permission to access cloud billing data, Kubecost adjusts its asset prices once cloud billing data becomes available, e.g. AWS Cost and Usage Report and the spot data feed. Until this data is available from cloud providers, Kubecost uses data from public cloud APIs to determine cost, or alternatively custom pricing sheets. This allows teams to have highly accurate estimates of asset prices in real-time and then become even more precise once cloud billing data becomes available, which is often 1-2 hours for spot nodes and up to a day for reserved instances/savings plans.

While cloud adjustments typically lag by roughly a day, there are certain adjustments, e.g. credits, that may continue to come in over the course of the month, and in some cases at the very end of the month, so reconciliation adjustments may continue to update over time.

Clusters Dashboard

Overview

The Clusters dashboard provides a list of all your monitored clusters, as well as additional clusters detected in your cloud bill. The dashboard provides details about your clusters including cost, efficiency, and cloud provider. You are able to filter your list of clusters by when clusters were last seen, activity status, and by name (see below).

Monitoring of multiple clusters is only supported in Kubecost Enterprise plans. Learn more about Kubecost Enterprise's multi-cluster view here.

Enabling Clusters dashboard

To enable the Clusters dashboard, you must perform these two steps:

Enable cloud integration for any and all cloud service providers you wish to view clusters with
Enable Cloud Costs

Enabling Cloud Costs through Helm can be done using the following parameters:

kubecostModel:
  cloudCost:
     enabled: true
     labelList:
       IsIncludeList: false
       # format labels as comma separated string (ex. "label1,label2,label3")
       labels: ""
     topNItems: 1000

Usage

Clusters are primarily distinguished into three categories:

Clusters monitored by Kubecost (green circle next to cluster name)
Clusters not monitored by Kubecost (yellow circle next to cluster name)
Inactive clusters (gray circle next to cluster name)

For detail on how Kubecost identifies clusters, see Cloud Cost Metrics.

Monitored clusters are those that have cost metrics which will appear within your other Monitoring dashboards, like Allocations and Assets. Unmonitored clusters are clusters whose existence is determined from cloud integration, but haven't been added to Kubecost. Inactive clusters are clusters Kubecost once monitored, but haven't reported data over a certain period of time. This time period is three hours for Thanos-enabled clusters, and one hour for non-Thanos clusters.

Efficiency and Last Seen metrics are only provided for monitored clusters.

Efficiency is calculated as the amount of node capacity that is used, compared to what is available.

Selecting any metric in a specific cluster's row will take you to a Cluster Details page for that cluster which provides more extensive metrics, including assets and namespaces associated with that cluster and their respective cost metrics.

Filtering clusters

You are able to filter clusters through a window of when all clusters were last seen (default is Last 7 days). Although unmonitored clusters will not provide a metric for Last Seen, they will still appear in applicable windows.

You can also filter your clusters for Active, Inactive, or Unmonitored status, and search for clusters by name.

Cloud Cost Explorer

The Cloud Cost Explorer is a dashboard which provides visualization and filtering of your cloud spending. This dashboard includes the costs for all assets in your connected cloud accounts by pulling from those providers' Cost and Usage Reports (CURs) or other cloud billing reports.

If you haven't performed a successful billing integration with a cloud service provider, the Cloud Cost Explorer won't have cost data to display. Before using the Cloud Cost Explorer, make sure to read our Cloud Billing Integrations guide to get started, then see our specific articles for the cloud service providers you want to integrate with.

Installation and configuration

As of v1.104, Cloud Cost is enabled by default. If you are using v1.04+, you can skip the Installation and Configuration section.

For versions of Kubecost up to v1.103, Cloud Cost needs to be enabled first through Helm, using the following parameters:

kubecostModel:
  cloudCost:
     enabled: true
     labelList:
       IsIncludeList: false
       # format labels as comma separated string (ex. "label1,label2,label3")
       labels: ""
     topNItems: 1000

Enabling Cloud Cost is required. Optional parameters include:

labelList.labels: Comma-separated list of labels; empty string indicates that the list is disabled
labelList.IsIncludeList: If true, label list is a white list; if false, it is a black list
topNItems: number of sampled "top items" to collect per day

While Cloud Cost is enabled, it is recommended to disable Cloud Usage, which is more memory-intensive.

kubecostModel:
  etlCloudUsage: false

Disabling Cloud Usage will restrict functionality of your Assets dashboard. This is intentional. Learn more about Cloud Usage here.

Using `topNitems`

Item-level data in the Cloud Cost Explorer is only a sample of the most expensive entries, determined by the Helm flag topNitems. This value can be increased substantially but can lead to higher memory consumption. If you receive a message in the UI "We don't have item-level data with the current filters applied" when attempting to filter, you may need to expand the value of topNitems (default is 1,000), or reconfigure your query.

Configuring your query

Date range

You can adjust your displayed metrics using the date range feature, represented by Last 7 days, the default range. This will control the time range of metrics that appear. Select the date range of the report by setting specific start and end dates, or by using one of the preset options.

Aggregate filters

You can adjust your displayed metrics by aggregating your cost by category. Supported fields are Workspace, Provider, Billing Account, Service Item, as well as custom labels. The Cloud Cost Explorer dashboard supports single and multi-aggregation. See the table below for descriptions of each field.

Aggregation

Description

Account

The ID of the billing account your cloud provider bill comes from. (ex: AWS Management/Payer Account ID, GCP Billing Account ID, Azure Billing Account ID)

Provider

Cloud service provider (ex: AWS, Azure, GCP)

Invoice Entity

Cloud provider account (ex: AWS Account, Azure Subscription, GCP Project)

Service

Cloud provider services (ex: S3, microsoft.compute, BigQuery)

Item

Individual items from your cloud billing report(s)

Labels

Labels/tags on your cloud resources (ex: AWS tags, Azure tags, GCP labels)

Edit

Selecting the Edit button will allow for additional filtering and pricing display options for your cloud data.

Add filters

You can filter displayed dashboard metrics by selecting Edit, then adding a filter. Filters can be created for the following categories to view costs exclusively for items (see descriptions of each category in the Aggregate filters table above):

Service
Account
Invoice Entity
Provider
Labels

Cost Metric

The Cost Metric dropdown allows you to adjust the displayed cost data based on different calculations. Cost Metric values are based on and calculated following standard FinOps dimensions and metrics, but may be calculated differently depending on your CSP. Learn more about how these metrics are calculated by each CSP in the Cloud Cost Metrics doc. The five available metrics supported by the Cloud Cost Explorer are:

Cost Metric

Description

Amortized Net Cost

Net Cost with removed cash upfront fees and amortized (default)

Net Cost

Costs inclusive of discounts and credits. Will also include one-time and recurring charges.

List Cost

CSP pricing without any discounts

Invoiced Cost

Pricing based on usage during billing period

Amortized Cost

Effective/upfront cost across the billing period

Cost table metrics

Your cloud cost spending will be displayed across your dashboard with several key metrics:

K8s Utilization: Percent of cost which can be traced back to Kubernetes cluster
Total cost: Total cloud spending
Sum of Sample Data: Only when aggregating by Item. Only lists the top cost for the timeframe selected. Displays that may not match your CUR.

All line items, after aggregation, should be selectable, allowing you to drill down to further analyze your spending. For example, when aggregating cloud spend by Service, you can select an individual cloud service (AmazonEC2, for example) and view spending, K8s utilization, and other details unique to that item.

Reports

Reports are saved queries from your various Monitoring dashboards which can be referenced at a later date for convenience. Aggregation, filters, and other details of your query will be saved in the report, and the report can be opened at any time. Reports are currently supported by the Allocations, Assets, and Cloud Cost Explorer dashboards.

Reports can be managed via values.yaml or the Kubecost UI. This reference outlines the process of configuring saved reports through a values file, and provides documentation on the required and optional parameters.

Managing reports via UI

Creating a report

Begin by selecting Create a report. There are five report types available. Three of these correspond to Kubecost's different monitoring dashboards. The other two are specialized beta features.

Allocation Report
Asset Report
Advanced Report (beta)
Cloud Cost Report
Advanced Report - Cost Centers (beta)

Selecting a monitoring report type will take you to the respective dashboard. Provide the details of the query, then select Save. The report will now be saved on your Reports page for easy access.

For help creating an Advanced Report (either type), select the respective hyperlink above for a step-by-step process.

After creating a report, you are able to share that report in recurring intervals via email as a PDF or CSV file. Shared reports replicate your saved query parameters every interval so you can view cost changes over time.

Sharing reports is only available for Allocations, Assets, and Cloud Cost Reports, not either type of Advanced Report.

In the line for the report you want to share, select the three horizontal dots icon in the Actions column. Select Share report from the menu. The Share Report window opens. Provide the following fields:

Interval: Interval that recurring reports will be sent out. Supports Daily, Weekly, and Monthly. Weekly reports default to going out Sunday at midnight. Monthly reports default to midnight on the first of the month. When selecting Monthly and resetting on a day of the month not found in every month, the report will reset at the latest available day of that month. For example, if you choose to reset on the 31st, it will reset on the 30th for months with only 30 days.
Format: Supports PDF or CSV.
Add email: Email(s) to distribute the report to.

Select Apply to finalize. When you have created a schedule for your report, the selected interval will be displayed in the Interval column of your Reports page.

Managing reports via values.yaml

The saved report settings, under global.savedReports, accept two parameters:

enabled determines whether Kubecost will read saved reports configured via values.yaml; default value is false
reports is a list of report parameter maps

The following fields apply to each map item under the reports key:

title the title/name of your custom report; any non-empty string is accepted
window the time window the allocation report covers, the following values are supported:
- keywords: today, week (week-to-date), month (month-to-date), yesterday, lastweek, lastmonth
- number of days: {N}d (last N days)
  - e.g. 30d for the last 30 days
- date range: {start},{end} (comma-separated RFC-3339 date strings or Unix timestamps)
  - e.g. 2021-01-01T00:00:00Z,2021-01-02T00:00:00Z for the single day of 1 January 2021
  - e.g. 1609459200,1609545600 for the single day of 1 January 2021
- Note: for all window options, if a window is requested that spans "partial" days, the window will be rounded up to include the nearest full date(s).
  - e.g. 2021-01-01T15:04:05Z,2021-01-02T20:21:22Z will return the two full days of 1 January 2021 and 2 January 2021
aggregateBy the desired aggregation parameter -- equivalent to Breakdown in the Kubecost UI. Supports:
- cluster
- container
- controller
- controllerKind
- daemonset
- department
- deployment
- environment
- job
- label requires the following format: label:<label_name>
- namespace
- node
- owner
- pod
- product
- service
- statefulset
- team
chartDisplay -- Can be one of category, series, efficiency, percentage, or treemap. See Cost Allocation Charts for more info.
idle idle cost allocation, supports hide, shareByNode, shareByCluster, and separate
rate -- Can be one of cumulative, monthly, daily, hourly
accumulate determines whether or not to sum Allocation costs across the entire window -- equivalent to Resolution in the UI, supports true (Entire window resolution) and false (Daily resolution)
sharedNamespaces -- a list containing namespaces to share costs for.
sharedOverhead -- an integer representing overhead costs to share.
sharedLabels -- a list of labels to share costs for, requires the following format: label:<label_name>
filters -- a list of maps consisting of a property and value
- property -- supports cluster, node, namespace, and label
- value -- property value(s) to filter on, supports wildcard filtering with a * suffix
  - Special case label value examples: app:cost-analyzer, app:cost*
    Wildcard filters only apply for the label value. e.g., ap*:cost-analyzer is not valid
- Note: multiple filter properties evaluate as ANDs, multiple filter values evaluate as ORs
  - e.g., (namespace=foo,bar), (node=fizz) evaluates as (namespace == foo || namespace == bar) && node=fizz
- Important: If no filters used, supply an empty list []

Example Helm values.yaml Saved Reports section

   # Set saved report(s) accessible in reports.html
   # View configured saved reports in <front-end-url>/model/reports
  savedReports:
    enabled: true # If true, overwrites report parameters set through UI
    reports:
      - title: "Example Saved Report 0"
        window: "today"
        aggregateBy: "namespace"
        chartDisplay: "category"
        idle: "separate"
        rate: "cumulative"
        accumulate: false # daily resolution
        sharedNamespaces:
          - monitoring
          - kube-system
        filters:
          - property: "cluster"
            value: "cluster-one,cluster*" # supports wildcard filtering and multiple comma separated values
          - property: "namespace"
            value: "kubecost"
      - title: "Example Saved Report 1"
        window: "month"
        aggregateBy: "controllerKind"
        chartDisplay: "category"
        idle: "shareByNode"
        rate: "monthly"
        accumulate: false
        filters:
          - property: "label"
            value: "app:cost*,environment:kube*"
          - property: "namespace"
            value: "kubecost"
      - title: "Example Saved Report 2"
        window: "2020-11-11T00:00:00Z,2020-12-09T23:59:59Z"
        aggregateBy: "service"
        chartDisplay: "category"
        idle: "hide"
        rate: "daily"
        accumulate: true # entire window resolution
        filters: [] # if no filters, specify empty array

Combining UI report management with values.yaml

When defining reports via values.yaml, by setting global.savedReports.enabled = true in the values file, the reports defined in values.yaml are created when the Kubecost pod starts. Reports can still be freely created/deleted via the UI while the pod is running. However, when the pod restarts, whatever is defined the values file supersedes any UI changes.

Generally, the ConfigMap, if present, serves as the source of truth at startup.

If saved reports are not provided via values.yaml, meaning global.savedReports.enabled = false, reports created via the UI are saved to a persistent volume and persist across pod restarts.

Troubleshooting

Review these steps to verify that saved reports are being passed to the Kubecost application correctly:

Confirm that global.savedReports.enabled is set to true
Ensure that the Helm values are successfully read into the ConfigMap
- Run helm template ./cost-analyzer -n kubecost > test-saved-reports-config.yaml
- Open test-saved-reports-config
- Find the section starting with # Source: cost-analyzer/templates/cost-analyzer-saved-reports-configmap.yaml
- Ensure that the Helm values are successfully read into the ConfigMap under the data field. Example below.

```
# Source: cost-analyzer/templates/cost-analyzer-saved-reports-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: saved-report-configs
  labels:

    app.kubernetes.io/name: cost-analyzer
    helm.sh/chart: cost-analyzer-1.70.0
    app.kubernetes.io/instance: RELEASE-NAME
    app.kubernetes.io/managed-by: Helm
    app: cost-analyzer
data:
  saved-reports.json: '[{"accumulate":false,"aggregateBy":"namespace","filters":[{"property":"cluster","value":"cluster-one,cluster*"},{"property":"namespace","value":"kubecost"}],"idle":"separate","title":"Example Saved Report 0","window":"today"},{"accumulate":false,"aggregateBy":"controllerKind","filters":[{"property":"label","value":"app:cost*,environment:kube*"},{"property":"namespace","value":"kubecost"}],"idle":"shareByNode","title":"Example Saved Report 1","window":"month"},{"accumulate":true,"aggregateBy":"service","filters":[],"idle":"hide","title":"Example Saved Report 2","window":"2020-11-11T00:00:00Z,2020-12-09T23:59:59Z"}]'# Source: cost-analyzer/templates/cost-analyzer-alerts-configmap.yaml
```

3. Ensure that the JSON string is successfully mapped to the appropriate configs

Navigate to your Reports page in the Kubecost UI and ensure that the configured report parameters have been set by selecting the Report name.

Advanced Reporting

Advanced Reporting is a beta feature. Read the documentation carefully.

Advanced Reporting allows teams to sculpt and tailor custom reports to easily view the information they care about. Providing an intersection between Kubernetes allocation and cloud assets data, this tool provides insight into important cost considerations for both workload and external infrastructure costs.

Creating an advanced report

Begin by accessing the Reports page. Select Create a report, then select Advanced Report. The Advanced Reporting page opens.

Advanced Reporting will display your Allocations data and allow for similar configuring and editing. However, that data can now also intersect your cloud service, provider, or accounts.

Some line items will display a magnifying lens icon next to the name. Selecting this icon will provide a Cloud Breakdown which compares Kubernetes costs and out-of-cluster (OOC) costs. You will also see OOC costs broken down by cloud service provider (CSP).

Configuring a report

The Advanced Reporting page manages the configurations which make up a report. Review the following tools which specify your query:

Configuration

Description

Date Range

Manually select your start and end date, or choose a preset option. Default is Last 7 days.

Aggregate By

Field by which to aggregate results, such as by Namespace, Cluster, etc.

The Service aggregation in this context refers to a Kubernetes object that exposes an interface to outside consumers, not a CSP feature.

Editing your report

Selecting Edit will open a slide panel with additional configuration options.

Filters

When a filter is applied, only results matching that value will display.

Shared resources

Field to handle default and custom shared resources (adjusted on the Settings page). Configure custom shared overhead costs, namespaces, and labels

Saving your report

After completing all configurations for your report, select Save. A name for your report based on your configuration will be auto-generated, but you have the option to provide a custom name. Finalize by selecting Save.

Reports can be saved via your organization like Allocations and Assets reports, instead of locally.

Cloud Breakdown

Line items that possess any out-of-cluster (OOC) costs, ie. cloud costs, will display a magnifying lens icon next to their name. Selecting this icon will open a slide panel that compares your K8s and OOC costs.

You can choose to aggregate those OOC costs by selecting the Cloud Breakdown button next to Aggregate By then selecting from one of the available options. You can aggregate by Provider, Service, Account, or use Custom Data Mapping to override default label mappings.

Cost Center Report

Cost Center Report is a beta feature. Please share your feedback as we are in active development of this feature.

A Cost Center Report (CCR) allows you to join your Kubernetes resource costs with cloud-native services. For example, it allows combining S3 and/or BigQuery costs with the Kubernetes namespace that is consuming those services.

The reporting supports multiple types of resource matches in terms of labels/tags/accounts/K8s object names/etc.

Adding a cost center

Begin by selecting Reports in the left navigation. Then, select Create a report > Advanced Report - Cost Centers. The Cost Center Report page opens.

In the Report name field, enter a custom value name for your report. This name will appear on your Reports page for quick access after creation.

In the Cost center name field, enter the desired name for your Cost Center. Once a Report name and Cost center name have been provided, it should appear at the bottom of the page in the Report Preview. However, continue with this section to learn how to customize your Cost Center Report and complete its creation.

Cloud costs

You can aggregate your cloud costs by a variety of fields (default is Service). Single and multi-aggregation, and custom labels, are supported. Then, select the desired cloud cost metric. Cloud cost metrics are calculated differently depending on your cloud service provider (CSP).

Certain selected cloud cost metrics may produce errors forming your report preview. Use Net Amortized Cost, the default option, if you experience this error.

You can also provide custom filters to display only resources which match the filter value in your Cost Center Report. Select Filter and choose a filter type from the dropdown, then provide your filter value in the text field. Select the plus sign icon to add your filter.

Kubernetes workloads

Your Kubernetes workload data can be read as your Kubernetes allocations. You can aggregate and filter for your allocation data in the same way as your cloud cost data as described above. Default aggregation is Namespace.

Your cost center should automatically appear in the Report Preview. There is no need to finalize its creation; it will exist as long as all required fields have been provided. The Report Preview provides cost data for each cost center.

After configuring a cost center, you can select Collapse to close that configuration (this is only to condense the page view, it will not affect your overall reported data).

Tags and labels

Any cloud provider tag or label can be used, but be sure to follow the Cloud Billing Integrations guide for any respective CSPs to ensure that they are included with the billing data.

when using tags and labels, separate the key and value with a :. Example: owner:frontend.

Managing multiple cost centers

A single CCR allows for the creation of multiple cost centers within it. To create an additional cost center, select Add cost center. This will open a new cost center tab and the functionality of creating a cost center will be the same.

You can delete a cost center by selecting Delete Cost Center in its tab, or selecting the trash can icon in the line of that cost center in the Report Preview.

Finalizing a CCR

When you are finished adding or deleting cost centers, select Done to finalize your CCR. You will be taken to a page for your reports. You can select individual cost centers for breakdowns of cloud costs and Kubernetes costs.

A cost center name is required in order for your cost center to appear in the Report Preview. However, if you select Done without giving a name to a cost center, it will appear in your Report with a blank space for a name. It can still be interacted with, but it is recommended to name all cost centers.

The Cost column per line item is the total cost of all other columns.

You can also adjust the window of spend data by selecting the Time window box and choosing either a preset or entering a custom range.

When viewing a breakdown of your cloud costs, you may see the same aggregate repeated multiple times. These are of the same property across multiple different days. When you expand the window range, you should naturally see the number of line items increase.

If you return to the Reports page, you will now see your CCR displayed amongst your other reports. Selecting the three horizontal dots in the Actions column of your CCR will allow you to Edit or Delete the CCR.

Alerts

Kubecost alerts allow teams to receive updates on real-time Kubernetes spend. They are configurable via the Kubecost UI or Helm values. This resource gives an overview of how to configure alerts sent through email, Slack, and Microsoft Teams using Kubecost Helm chart values. Alerts are either created to monitor specific data sets and trends, or they must be toggled on or off. The following alert types are supported:

Allocation Budget: Sends an alert when spending crosses a defined threshold
Allocation Efficiency: Detects when a Kubernetes tenant is operating below a target cost-efficiency threshold
Allocation Recurring Update: Sends an alert with cluster spending across all or a subset of Kubernetes resources.
Allocation Spend Change: Sends an alert reporting unexpected spend increases relative to moving averages
Asset Budget: Sends an alert when spend for a particular set of assets crosses a defined threshold.
Asset Recurring Update: Sends an alert with asset spend across all or a subset of cloud resources.
Cloud Cost Budget: Sends an alert when the total cost of cloud spend goes over a set budget limit.
Monitor Cluster Health: Used to determine if the cluster's health score changes by a specific threshold. Can only be toggled on/off.
Monitor Kubecost Health: Used for production monitoring for the health of Kubecost itself. Can only be toggled on/off.

Configuring alerts in Helm

values.yaml is a source of truth. Alerts set through values.yaml will continually overwrite any manual alert settings set through the Kubecost UI.

Global alert parameters

The alert settings, under global.notifications.alertConfigs in cost-analyzer/values.yaml, accept four global fields:

frontendUrl optional, your cost analyzer front-end URL used for linkbacks in alert bodies
globalSlackWebhookUrl optional, a global Slack webhook used for alerts, enabled by default if provided
globalMsTeamWebhookUrl optional, a global Microsoft Teams webhook used for alerts, enabled by default if provided
globalAlertEmails a global list of emails for alerts

Example Helm values.yaml:

notifications:
    # Kubecost alerting configuration
    alertConfigs:
      frontendUrl: http://localhost:9090 
      globalSlackWebhookUrl: https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX 
      globalMsTeamsWebhookUrl: https://m365x682156.webhook.office.com
      globalAlertEmails:
        - recipient@example.com
        - additionalRecipient@example.com
      # Alerts generated by kubecost, about cluster data
      alerts:
      # list of individual alerts
      # ...

Configuring each alert type

In addition to all global... fields, every alert allows optional individual ownerContact (a list of email addresses), slackWebhookUrl (if different from globalSlackWebhookUrl), and msTeamsWebhookUrl (if different from globalMsTeamsWebhookUrl) fields. Alerts will default to the global settings if these optional fields are not supplied.

Allocation Budget

Defines spend budgets and alerts on budget overruns.

Parameter

Value(s)

Description

type

budget

Alert type.

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values.

threshold

<amount>

Cost threshold in configured currency units.

Example Helm values.yaml:

# Daily namespace budget alert on namespace `kubecost`
- type: budget
  threshold: 50
  window: daily # or 1d
  aggregation: namespace
  filter: kubecost
# 3d cluster budget alert on cluster `cluster-one`
- type: budget
  threshold: 600
  window: 3d
  aggregation: cluster
  filter: cluster-one

Allocation Efficiency

Alerts when Kubernetes tenants, e.g. namespaces or label sets, are running below defined cost-efficiency thresholds.

Parameter

Value(s)

Description

type

efficiency

Alert type.

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values.

efficiencyThreshold

<value>

Optional. Efficiency threshold ranging from 0.0 to 1.0.

spendThreshold

<amount>

The cost threshold (ie. budget) in configured currency units.

The example below sends a Slack alert when any namespace spending is running below 40% cost efficiency and has spent more than $100 during the last day.

- type: efficiency
  efficiencyThreshold: 0.4
  spendThreshold: 100
  window: 1d 
  aggregation: namespace
  slackWebhookUrl: ‘https://hooks.slack.com/services/TE6GRBNET/BFFK0P848/jFWmsadgfjhiBJp30p’

Allocation Recurring Update

Sends a recurring alert with a summary report of cost and efficiency metrics.

Parameter

Value(s)

Description

type

recurringUpdate

Alert type.

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values

Additional `window` values:

<N>d where N in [1, 7) for every N days
7d or weekly for 0:00:00 UTC every Monday
30d or monthly for 0:00:00 UTC on the first day of the month.

Additional `aggregation` values:

label requires the following format: label:<label_name>
annotation requires the following format: annotation:<annotation_name>

This example sends a recurring alert for allocation data for all namespaces every seven days:

# Recurring weekly namespace update on all namespaces
- type: recurringUpdate
  window: weekly
  aggregation: namespace
  filter:
# Recurring weekly namespace update on kubecost namespace
- type: recurringUpdate
  window: weekly
  aggregation: namespace
  filter: kubecost
  ownerContact: # optional, overrides globalAlertEmails default
    - owner@example.com
    - owner2@example.com
  slackWebhookUrl: https://hooks.slack.com/services/<different-from-global> # optional, overrides globalSlackWebhookUrl default

Allocation Spend Change

Detects unexpected spend increases/decreases relative to historical moving averages.

Parameter

Value(s)

DEscription

type

spendChange

Alert type.

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values.

baselineWindow

<N>d

Collect data from N days prior to queried items to establish cost baseline. Configurable, where N ≥ 1.

relativeThreshold

<N>

Percentage of change from the baseline (positive or negative) which will trigger the alert. Configurable where N ≥ -1.

Example Helm values.yaml:

# Daily spend change alert 
- type: spendChange
  relativeThreshold: 0.20   # change relative to baseline average cost. Must be greater than -1 (can be negative).
  window: 1d                # accepts ‘d’, ‘h’
  baselineWindow: 30d       # previous window, offset by window
  aggregation: namespace
  filter: kubecost, default # accepts csv

Asset Budget

Defines asset budgets and alerts when Kubernetes assets overrun the threshold set.

Parameter

Value(s)

Description

type

assetBudget

Alert type

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values

threshold

<amount>

The cost threshold (ie. budget) in configured currency units.

Example Helm values.yaml:

# Daily namespace budget alert on type `Node,Loadbalancer`
- type: assetBudget
  threshold: 50
  window: daily # or 1d
  aggregation: type
  filter: Node,LoadBalancer
# 3d cluster budget alert on cluster with no filter set
- type: assetBudget
  threshold: 100
  window: 3d
  aggregation: cluster
  filter: ''

Asset Recurring Update

Sends a recurring alert with a Kubernetes assets summary report.

Parameter

Value(s)

Description

type

cloudReport

Alert type

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values

Additional `window` values:

<N>d where N in [1, 7) for every N days
7d or weekly for 0:00:00 UTC every Monday
30d or monthly for 0:00:00 UTC on the first day of the month.

Additional `aggregation` values:

label requires the following format: label:<label_name>
annotation requires the following format: annotation:<annotation_name>

Two example alerts, one which provides weekly summaries of Kubernetes asset spend data aggregated by cluster, and one which provides weekly summaries of asset spend data for one specific cluster:

# Recurring weekly cloud update on all cluster
- type: cloudReport
  window: weekly  # or 7d
  aggregation: cluster
  filter: ''
# Recurring weekly cloud update on only cluster-one
- type: cloudReport
  window: weekly  # or 7d
  aggregation: cluster
  filter: cluster-one
  ownerContact: # optional, overrides globalAlertEmails default
    - owner@example.com
    - owner2@example.com
  slackWebhookUrl: https://hooks.slack.com/services/<different-from-global> # optional, overrides globalSlackWebhookUrl default

Cloud Cost Budget

Defines cloud cost budgets and alerts when cloud spend overruns the threshold set.

Parameter

Value(s)

Details

type

cloudCostBudget

Alert type

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

Configurable, accepts service, account, provider, invoiceEntity, or label.

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values

threshold

<amount>

The cost threshold (ie. budget) in configured currency units.

costMetric

<metric-type>

Cost metric type. Accepts ListCost, NetCost, AmortizedNetCost, InvoicedCost and AmortizedCost.

Monitor Cluster Health

Cluster health alerts occur when the cluster health score changes by a specific threshold. The health score is calculated based on the following criteria:

Low Cluster Memory
Low Cluster CPU
Too Many Pods
Crash Looping Pods
Out of Memory Pods
Failed Jobs

Example Helm values.yaml:

# Health Score Alert 
- type: health              # Alerts when health score changes by a threshold
  window: 10m
  threshold: 5              # Send Alert if health scores changes by 5 or more
  ownerContact: # optional, overrides globalAlertEmails default
    - owner@example.com
    - owner2@example.com
  slackWebhookUrl: https://hooks.slack.com/services/<different-from-global> # optional, overrides globalSlackWebhookUrl default

Monitor Kubecost Health

Enabling diagnostic alerts in Kubecost occursthe when an event impacts product uptime. This feature can be enabled in seconds from a values file. The following events are grouped into distinct categories that each result in a separate alert notification:

Prometheus is unreachable
Kubecost Metrics Availability:
- Kubecost exported metrics missing over last 5 minutes
- cAdvisor exported metrics missing over last 5 minutes
- cAdvisor exported metrics missing expected labels in the last 5 minutes
- Kubestate Metrics (KSM) exported metrics missing over last 5 minutes
- Kubestate Metrics (KSM) unexpected version
- Node Exporter metrics are missing over last 5 minutes.
- Scrape Interval prometheus self-scraped metrics missing over last 5 minutes
- CPU Throttling detected on cost-model in the last 10 minutes
Clusters Added/Removed (Enterprise Multicluster Support Only)

Required parameters:

type: diagnostic
window: <N>m -- configurable, N > 0

Optional parameters:

diagnostics -- object containing specific diagnostic checks to run (default is true for all). See configuration example below for options:

Example Helm values.yaml:

# Kubecost Health Diagnostic
- type: diagnostic
  window: 10m
  ownerContact: # optional, overrides globalAlertEmails default
    - owner@example.com
    - owner2@example.com
  slackWebhookUrl: https://hooks.slack.com/services/<different-from-global> # optional, overrides globalSlackWebhookUrl default
  diagnostics: 
    prometheusUp: true
    cAdvisor: true
    ksm: true
    kubecost: true 
    nodeExporter: true
    scrapeInterval: true
    cpuThrottling: true
    clusterJoinLeave: true

Configuring alerts in the Kubecost UI

Cluster and Kubecost Health Alerts

Cluster Health Alerts and Kubecost Health Alerts work differently from other alert types. While other alerts monitor cost data for cost or efficiency anomalies, these two monitor the health of Kubecost itself, as well as the health of the cluster running Kubecost. For this reason, multiple of these alert types cannot be created. In the UI, switches for these alert types can be toggled either on or off, managing a single instance of each, and allowing the settings of these single instances to be adjusted.

There is no validation around Cluster Health Alerts. If a Health Alert configuration is invalid, it will appear to save, but will not actually take effect. Please check carefully that the alert has a Window and Threshold properly specified.

Global recipients

Global recipients specify a default fallback recipient for each type of message. If an alert does not define any email recipients, its messages will be sent to any emails specified in the Global Recipients email list. Likewise, if an alert does not define a webhook, its messages will be sent to the global webhook, if one is present. Alerts that do define recipients will ignore the global setting for recipients of that type.

Budget, efficiency, spend change, and recurring update alerts

The remaining alert types all target a set of allocation data with window, aggregation and filter parameters, and trigger based on the target data. The table results can be filtered using the Filter alerts search bar next to + Create Alert. This input can be used to filter based on alert name, type, aggregation, window, and/or filter.

Select + Create Alert to open the Create Alert window where you configure the details of your alert.

The fields for each alert type should resemble their corresponding Helm values in the above tables.

Alerts can also be edited, removed, and tested from the table. Editing opens a dialog similar to the alert creation dialog, for editing the chosen alert.

When creating an alert, you can have these alerts sent through email, Slack, or Microsoft Teams. You can customize the subject field for an email, and attach multiple recipients. Alerts sent via email will contain a PDF of your report which shows the Kubecost UI for your Allocation/Asset page(s). This can be helpful for distributing visual information to those without immediate access to Kubecost.

Testing alerts

The Test arrow icons, as well as a separate Test Alert button in the Edit Alert window, can be used to issue a "test" alert. This can be useful to ensure that alerting infrastructure is working correctly and that an alert is properly configured. Issuing a test from the alert edit modal tests the alert with any modifications that have not yet been saved.

Alerts scheduler

All times in UTC. Alert send times are determined by parsing the supplied window parameter. Alert diagnostics with the next and last scheduled run times are available via <your-kubecost-url>/model/alerts/status.

Supported: weekly and daily special cases, <N>d, <M>h (1 ≤ N ≤ 7, 1 ≤ M ≤ 24) Currently Unsupported: time zone adjustments, windows greater than 7d, windows less than 1h

Scheduler behavior

An <N>d alert sends at 00:00 UTC N day(s) from now, i.e., N days from now rounded down to midnight.

For example, a 5d alert scheduled on Monday will send on Saturday at 00:00, and subsequently the next Thursday at 00:00

An <N>h alert sends at the earliest time of day after now that is a multiple of N.

For example, a 6h alert scheduled at any time between 12 pm and 6 pm will send next at 6 pm and subsequently at 12 am the next day.

If 24 is not divisible by the hourly window, schedule at next multiple of <N>h after now, starting from the current day at 00:00.

For example, a 7h alert scheduled at 22:00 checks 00:00, 7:00, 14:00, and 21:00, before arriving at the next send time of 4:00 tomorrow.

Troubleshooting

Review these steps to verify alerts are being passed to the Kubecost application correctly.

Check /model/alerts/configs to ensure the alerts system has been configured properly.
Check /model/alerts/status to ensure alerts have been scheduled correctly.
- The status endpoint returns all of the running alerts including schedule metadata:
  - scheduledOn: The date and time (UTC) that the alert was scheduled.
  - lastRun: The date and time (UTC) that the alert last ran checks (will be set to 0001-01-01T00:00:00Z if the alert has never run).
  - nextRun: The date and time (UTC) that the alert will next run checks.
  - lastError: If running the alert checks fails for unexpected reasons, this field will contain the error message.

If using Helm:

Run kubectl get configmap alert-configs -n kubecost -o json to view the alerts ConfigMap.
Ensure that the Helm values are successfully read into the ConfigMap under alerts.json under the data field. See below:

{
    "apiVersion": "v1",
    "data": {
        "alerts.json": "{\"globalAlertEmails\":[\"foo@fooexample1.com\",\"recipient@example.com\"],\"frontendUrl\":\"http://localhost:3000\",\"globalSlackWebhookUrl\":\"\",\"alerts\":[{\"type\":\"budget\",\"threshold\":50,\"window\":\"1d\",\"aggregation\":\"namespace\",\"filter\":\"kubecost\"},{\"type\":\"spendChange\",\"relativeThreshold\":0.2,\"window\":\"1d\",\"baselineWindow\":\"30d\",\"aggregation\":\"namespace\",\"filter\":\"kubecost, default\"},{\"type\":\"budget\",\"threshold\":202.1,\"window\":\"1d\",\"aggregation\":\"cluster\",\"filter\":\"cluster-one\"},{\"type\":\"recurringUpdate\",\"window\":\"7d\",\"aggregation\":\"namespace\",\"filter\":\"*\"}]}\n"
    },
    "kind": "ConfigMap",
    "metadata": {
        "creationTimestamp": "2022-01-05T23:43:27Z",
		"labels": {
            "app": "cost-analyzer",
            "app.kubernetes.io/instance": "kubecost-stage",
            "app.kubernetes.io/managed-by": "Helm",
	    ...
        },
        "name": "alert-configs",
        "namespace": "kubecost",
    ...
    }
}

Ensure that the .JSON string is successfully mapped to the appropriate configs

Confirm that Kubecost has received configuration data:

Visit the Alerts page in the Kubecost UI to view configured alert settings as well as any of the alerts configured from Helm.
Alerts set up in the UI will be overwritten by Helm values.yaml if the pod restarts.

Additionally, confirm that the alerts scheduler has properly parsed and scheduled the next run for each alert by visiting <your-kubecost-url>/model/alerts/status to view individual alert parameters as well as the next and last scheduled run times for individual alerts.

Confirm that nextRun has been updated from "0001-01-01T00:00:00Z"

If nextRun fails to update, or alerts are not sent at the nextRun time, check pod logs by running kubectl logs $(kubectl get pods -n kubecost | awk '{print $1}' | grep "^kubecost-cost-analyzer.\{16\}") -n kubecost -c cost-model > kubecost-logs.txt

Common causes of misconfiguration include the following:

Unsupported CSV filters: spendChange alerts accept multiple filter values when comma-separated; other alert types do not.
Unsupported alert type: all alert type names are in camelCase. Check spelling and capitalization for all alert parameters.
Unsupported aggregation parameters: see the Allocation API doc for details.

Savings

The Savings page provides miscellaneous functionality to help you use resources more effectively and assess wasteful spending. In the center of the page, you will see your estimated monthly savings available. The savings value is calculated from all enabled Savings features, across your clusters and the designated cluster profile via dropdowns in the top right of the page.

Savings insights

The Savings page provides an array of panels containing different insights capable of lowering your Kubernetes and cloud spend.

The monthly savings values on this page are precomputed every hour for performance reasons, while per-cluster views of these numbers, and the numbers on each individual Savings insight page, are computed live. This may result in some discrepancies between estimated savings values of the Savings page and the pages of individual Savings insights.

Cloud insights:

Reserve instances
Manage orphaned resources
Spot Instances

Archiving Savings insights

You can archive individual Savings insights if you feel they are not helpful, or you cannot perform those functions within your organization or team. Archived Savings insights will not add to your estimated monthly savings available.

To temporarily archive a Savings insight, select the three horizontal dots icon inside its panel, then select Archive. You can unarchive an insight by selecting Unarchive.

You can also adjust your insight panels display by selecting View. From the View dropdown, you have the option to filter your insight panels by archived or unarchived insights. This allows you to effectively hide specific Savings insights after archiving them. Archived panels will appear grayed out, or disappear depending on your current filter.

Single cluster insights

By default, the Savings page and any displayed metrics (For example, estimated monthly savings available) will apply to all connected clusters. You can view metrics and insights for a single cluster by selecting it from the dropdown in the top right of the Savings page.

Functionality for most cloud insight features only exists when All Clusters is selected in the cluster dropdown. Individual clusters will usually only have access to Kubernetes insight features.

Cluster profiles

On the Savings page, as well as on certain individual Savings insights, you have the ability to designate a cluster profile. Savings recommendations such as right-sizing are calculated in part based on your current cluster profile:

Production: Expects stable cluster activity, will provide some extra space for potential spikes in activity.
Development: Cluster can tolerate small amount of instability, will run cluster somewhat close to capacity.
High availability: Cluster should avoid instability at all costs, will size cluster with lots of extra space to account for unexpected spikes in activity.

Actions

Actions is currently in beta. Please read the documentation carefully.

Actions is only available with a Kubecost Enterprise plan.

The Actions page is where you can create scheduled savings actions that Kubecost will execute for you. The Actions page supports creating actions for multiple turndown and right-sizing features.

Actions are only able to be applied to your primary cluster. To use Actions on a secondary cluster, you must manually switch to that cluster via front end.

Enabling Kubecost Actions

The Actions page will exist inside the Savings folder in the left navigation, but must first be enabled before it appears. The two steps below which enable Kubecost Actions do not need to be performed sequentially as written.

Step 1. Enable experimental features

Because the Actions page is currently a beta feature, it does not appear as part of Kubecost's base functionality. To enable alpha features, select Settings from the left navigation. Then toggle on the Enable experimental features switch. Select Save at the bottom of the Settings page to confirm your changes. The Actions page will now appear in your left navigation, but you will not be able to perform any actions until you've enabled the Cluster Controller (see below).

Step 2. Enable the Cluster Controller

Some features included in Kubecost Actions are only available in GKE/EKS environments. See the Cluster Controller doc for more clarity on which features you will have access to after enabling the Cluster Controller.

Creating an Action

On the Actions page, select Create Action in the top right. The Create New Action window opens.

You will have the option to perform one of several available Actions:

Cluster Turndown: Schedule clusters to spin down when unused and back up when needed
Request Sizing: Ensure your containers aren't over-provisioned
Cluster Sizing: Configure your cluster in the most cost-effective way
Namespace Turndown: Schedule unused workloads to spin down
Guided Sizing: Continuous container and node right-sizing

Selecting one of these Actions will take you off the Actions page to a Action-specific page which will allow to perform the action in moments.

If the Cluster Controller was not properly enabled, the Create New Action window will inform you and limit functionality until the Cluster Controller has been successfully enabled.

Cluster Turndown

Cluster Turndown is a scheduling feature that allows you to reduce costs for clusters when they are not actively being used, without spinning them down completely. This is done by temporarily removing all existing nodes except for master nodes. The Cluster Turndown page allows you to create a schedule for when to turn your cluster down and up again.

Selecting Cluster Turndown from the Create new action window will take you to the Cluster Turndown page. The page should display available clusters for turndown. Begin by selecting Create Schedule next to the cluster you wish to turn down. Select what date and time you wish to turn down the cluster, and what date and time you wish to turn it back up. Select Apply to finalize.

You can delete an existing turndown schedule by selecting the trash can icon.

Request Sizing

Cluster Sizing

Cluster Sizing will provide right-sizing recommendations for your cluster by determining the cluster's needs based on the type of work running, and the resource requirements. You will receive a simple (uses one node type) and a complex (uses two or more node types) recommendation.

Kubecost may hide the complex recommendation when it is more expensive than the simple recommendation, and present a single recommendation instead.

Visiting the Cluster Sizing Recommendations page from the Create New Action window will immediately prompt you with a suggested recommendation that will replace your current node pools with the displayed node pools. You can select Adopt to immediately resize, or select Cancel if you want to continue exploring.

Namespace Turndown

Namespace turndown allows you to take action to delete your abandoned workloads. Instead of requiring the user to manually size down or delete their unused workloads, Kubecost can delete namespaces full of idle pods in one moment or on a continual basis. This can be helpful for routine cleanup of neglected resources. Namespace turndown is supported on all cluster types.

Selecting Namespace Turndown from the Create new action window will open the Namespace Turndown page.

Begin by providing a name for your Action in the Job Name field. For the schedule, provide a cron string that determines when the turndown occurs (leave this field as 0 0 * * * by default to perform turndown every night at midnight).

For schedule type, select Scheduled or Smart from the dropdown.

Scheduled turndown will delete all non-ignored namespaces.
Smart turndown will confirm that all workloads in the namespace are idle before deleting.

Then you can provide optional values for the following fields:

Ignore Targets: Filter out namespaces you don't want turned down. Supports "wildcard" filtering: by ending your filter with *, you can filter for multiple namespaces which include that filter. For example, entering kube* will prevent any namespace featuring kube from being turned down. Namespace turndown will ignore namespaces named kube-*, the default namespace, and the namespace the Cluster Controller is enabled on.
Ignore labels: Filter out key-alue labels that you don't want turned down.

Select Create Schedule to finalize.

Guided Sizing

1. Request Sizing

In the first collapsible tab, you can configure your container request sizing.

The Auto resizing toggle switch will determine whether you want to perform a one-time resize, or a continuous auto-resize. Default is one-time (off).
Frequency: Only available when Auto resizing is toggled on. Determines how frequently right-sizing will occur. Options are Day, Week, Monthly, or Quarterly.
Start Time: Only available when Auto resizing is toggled on. Determines the day, and time of day, that auto-resizing will begin occurring. Will default to the current date and time if left blank.

Select Start One-Time Resize/Start Auto-Resizing Now to finalize.

2. Cluster Sizing

In the second collapsible tab, you can configure continuous cluster sizing.

Architecture: Supports x86 or ARM.
Target Utilization: How much excess resource nodes should be configured with to account for variable or increasing resource consumption. Default is 0.8.
Frequency: Determines how frequently right-sizing will occur. Options are Day, Week, Monthly, or Quarterly.
Start Time: Determines the day, and time of day, that auto-resizing will begin occurring. Will default to the current date and time if left blank.

Select Enable Auto-Resizing Now to finalize.

Managing Actions

Once you have successfully created an Action, you will see it on the Actions page under Scheduled Actions. Here you will be able to view a Schedule, the Next Run, Affected Workloads, and the Status. You can select Details to view more information about a specific Action, or delete the scheduled Action by selecting the trash can icon.

Cluster Right-Sizing Recommendations

Kubecost can provide and implement recommendations for right-sizing your supported clusters to ensure they are configured in the most cost-effective way. Recommendations are available for any and all clusters. Kubecost in certain configurations is also capable of taking a recommendation and applying it directly to your cluster in one moment. These two processes should be distinguished respectively as viewing cluster recommendations vs. adopting cluster recommendations.

Kubecost is also able to implement cluster sizing recommendations on a user-scheduled interval, known as continuous cluster right-sizing.

Viewing cluster right-sizing recommendations

You can access cluster right-sizing by selecting Savings in the left navigation, then select the Right-size your cluster nodes panel.

Kubecost will offer two recommendations: simple (uses one node type) and complex (uses two or more node types). Kubecost may hide the complex recommendation when it is more expensive than the simple recommendation, and present a single recommendation instead. These recommendations and their metrics will be displayed in a chart next to your existing configuration in order to compare values like total cost, node count, and usage.

Configuring your cluster right-sizing recommendations

Kubecost provides its right-sizing recommendations based on the characteristics of your cluster. You have the option to edit certain properties to generate relevant recommendations.

There are multiple dropdown menus to consider:

In the Cluster dropdown, you can select the individual cluster you wish to apply right-sizing recommendations to.
In the Window dropdown, select the number of days to query for your cluster's most recent activity. Options range from 1 day to 7 days. If your cluster has varying performance on different days of the week, it's better to select a longer interval for the most consistent recommendations.

You can toggle on Show optimization inputs to view resources which will determine the minimum size of your nodes. These resources are:

DaemonSet VCPUs/RAM: Resources allocated by DaemonSets on each node.
Max pod VCPUs/RAM: Largest resource allocation by any single Pod in the cluster.
Non-DaemonSet/static VCPUs/RAM: Sum of resources allocated to Pods not controlled by DaemonSets.

Finally, you can select Edit to provide information about the function of your cluster.

In the Profile dropdown, select the most relevant category of your cluster. You can select Production, Development, or High Availability.
- Production: Stable cluster activity, will provide some extra space for potential spikes in activity.
- Development: Cluster can tolerate small amount of instability, will run cluster somewhat close to capacity.
- High availability: Cluster should avoid instability at all costs, will size cluster with lots of extra space to account for unexpected spikes in activity.
In the Architecture dropdown, select either x86 or ARM. You may only see x86 as an option. This is normal. At the moment, ARM architecture recommendations are only supported on AWS clusters.

With this information provided, Kubecost can provide the most accurate recommendations for running your clusters efficiently. By following some additional steps, you will be able to adopt Kubecost's recommendation, applying it directly to your cluster.

Adopting cluster right-sizing recommendations

Prerequisites

To receive cluster right-sizing recommendations, you must first:

Have a GKE/EKS/AWS Kops cluster

To adopt cluster right-sizing recommendations, you must:

Have a GKE/EKS/AWS Kops cluster
Enable the Cluster Controller on that cluster and perform the provider service key setup

In order for Kubecost to apply a recommendation, it needs write access to your cluster. Write access to your cluster is enabled with the Cluster Controller.

Usage

To adopt a recommendation, select Adopt recommendation > Adopt. Implementation of right-sizing for your cluster should take roughly 10-30 minutes.

If you have Kubecost Actions enabled, you can also perform immediate right-sizing by selecting Savings, then selecting Actions. On the Actions page, select Create Action > Cluster Sizing to receive immediate recommendations and the option to adopt them.

Recommendations via Kubecost Actions can only be adopted on your primary cluster. To adopt recommendations on a secondary cluster via Kubecost Actions, you must first manually switch to that cluster's Kubecost frontend.

Continuous cluster right-sizing

Prerequisites

Continuous cluster right-sizing has the same requirements needed as implementing any cluster right-sizing recommendations. See above for a complete description of prerequisites.

Usage

Continuous Cluster Right-Sizing is accessible via Actions. On the Actions page, select Create Action > Guided Sizing. This feature implements both cluster right-sizing and container right-sizing.

For a tutorial on using Guided Sizing, see here.

Troubleshooting

If you are using Persistent Volumes (PVs) with AWS's Elastic Block Store (EBS) Container Storage Interface (CSI), you may run into a problem post-resize where pods are in a Pending state because of a "volume node affinity conflict". This may be because the pod needs to mount an already-created PV which is in an Availability Zone (AZ) without node capacity for the pod. This is a limitation of the EBS CSI.

Kubecost mitigates this problem by ensuring continuous cluster right-sizing creates at least one node per AZ by forcing NodeGroups to have a node count greater than or equal to the number of AZs of the EKS cluster. This will also prevent you from setting a minimum node count for your recommendation below the number of AZs for your cluster. If the EBS CSI continues to be problematic, you can consider switching your CSI to services like Elastic File System (EFS) or FSx for Lustre.

Using Cluster Autoscaler on AWS may result in a similar error. See more here.

Container Request Right-Sizing Recommendations

This feature is in beta. Please read the documentation carefully.

Kubecost can automatically implement its recommendations for container resource requests if you have the Cluster Controller component enabled. Using container request right-sizing (RRS) allows you to instantly optimize resource allocation across your entire cluster. You can easily eliminate resource over-allocation in your cluster, which paves the way for vast savings via cluster right-sizing and other optimizations.

Prerequisites

There are no restrictions to receive container RRS recommendations.

To adopt these recommendations, you must enable the Cluster Controller on that cluster. In order for Kubecost to apply a recommendation, it needs write access to your cluster, which is enabled with the Cluster Controller.

Configuring RRS recommendations

Select Savings in the left navigation, then select Right-size your container requests. The Request right-sizing recommendations page opens.

Select Customize to modify the right-sizing settings. Your customization settings will tell Kubecost how to calculate its recommendations, so make sure it properly represents your environment and activity:

Window: Duration of deployment activity Kubecost should observe
Profile: Select from Development, Production, or High Availability*, which come with preconfigured values for CPu/RAM target utilization fields. Selecting Custom will allow you to manually configure these fields.
CPU/RAM recommendation algorithm: Always configured to Max.
CPU/RAM target utilization: Refers to the percentage of used resources over total resources available.
Add Filters: Optional configuration to limit the deployments which will have right-sizing recommendations applied. This will provide greater flexibility in optimizing your environment. Ensure you select the plus icon next to the filter value text box to add the filter. Multiple filters can be added.

When finished, select Save.

Your configured recommendations can also be downloaded as a CSV file by selecting the three dots button > Download CSV.

Adopting RRS recommendations

There are several ways to adopt Kubecost's container RRS recommendations, depending on how frequently you wish to utilize this feature for your container requests.

One-click right-sizing

To apply RRS as you configured in one instance, select Resize Requests Now > Yes, apply the recommendation.

Autoscaling

Also referred to as continuous container RRS, autoscaling allows you to configure a schedule to routinely apply RRS to your deployments. You can configure this by selecting Enable Autoscaling, selecting your Start Date and schedule, then confirming with Apply.

Savings Actions

Both one-click and continuous container RRS can be configured via Savings Actions. On the Actions page, select Create Action, then select either:

Request Sizing: Will open the Container RRS page with the schedule window open to configure and apply.
Guided Sizing: Will open the Guided Sizing page and allow you to apply both one-click RRS, then continous cluster sizing

Abandoned Workloads

The Abandoned Workloads page can detect workloads which have not sent or received a meaningful rate of traffic over a configurable duration.

You can access the Abandoned Workloads page by selecting Savings in the left navigation, then selecting Manage abandoned workloads.

The Abandoned Workloads page will display front and center an estimated savings amount per month based on a number of detected workloads considered abandoned, defined by two values:

Traffic threshold (bytes/sec): This slider will determine a meaningful rate of traffic (bytes in and out per second) to detect activity of workloads. Only workloads below the threshold will be taken into account, therefore, as you increase the threshold, you should observe the total detected workloads increase.
Window (days): From the main dropdown, you will be able to select the duration of time to check for activity. Presets include 2 days, 7 days, and 30 days. As you increase the duration, you should observe the total detected workloads increase.

Filtering your abandoned workloads

Beneath your total savings value and slider scale, you will see a dashboard containing all abandoned workloads. The number of total line items should be equal to the number of workloads displayed underneath your total savings value.

You can filter your workloads through four dropdowns; across clusters, namespaces, owners, and owner kinds.

Selecting an individual line item will expand the item, providing you with additional traffic data for that item.

Unclaimed Volumes

Kubecost will display volumes unused by any pod. You can consider these volumes for deletion, or move them to a cheaper storage tier.

You can access the Unclaimed Volumes page by selecting Savings in the left navigation, then selecting Manage unclaimed volumes.

Volumes will be displayed in a table, and can be sorted By Owner or By Namespace. You can view owner, storage class, and size for your volumes.

Using the Cluster dropdown, you can filter volumes connected to an individual cluster in your environment.

Local Disks

Kubecost displays all local disks it detects with low usage, with recommendations for resizing and predicted cost savings.

You can access the Local Disks page by selecting Settings in the left navigation, then selecting Manage local disks.

You will see a table of all disks in your environment which fall under 20% current usage. For each disk, the table will display its connected cluster, its current utilization, resizing recommendation, and potential savings. Selecting an individual line item will take you offsite to a Grafana dashboard for more metrics relating to that disk.

In the Cluster dropdown, you can filter your table of disks to an individual cluster in your environment.

In the Profile dropdown, you can configure your desired overhead percentage, which refers to the percentage of extra usage you would like applied to each disk in relation to its current usage. The following overhead percentages are:

Development (25%)
Production (50%)
High Availability (100%)

The value of your overhead percentage will affect your resizing recommendation and estimated savings, where a higher overhead percentage will result in higher average resize recommendation, and lower average estimated savings. The overhead percentage is applied to your current usage (in GiB), then added to your usage obtain a value which Kubecost should round up to for its resizing recommendation. For example, for a disk with a usage of 12 GiB, with Production (50%) selected from the Profile dropdown, 6 GiB (50% of 12) will be added to the usage, resulting in a resizing recommendation of 18 GiB.

Kubecost can only provide detection of underused disks with recommendations for resizing. It does not assist with node turndown.

Underutilized Nodes

Kubecost displays all nodes with both low CPU/RAM utilization, indicating they may need to be turned down or resized, while providing checks to ensure safe drainage can be performed.

You can access the Underutilized Nodes page by selecting Savings in the left navigation, then selecting Manage underutilized nodes.

Configuring maximum utilization

To receive accurate recommendations, you should set the maximum utilization percentage for CPU/RAM for your cluster. This is so Kubecost can determine if your environment can perform successfully below the selected utilization once a node has been drained. This is visualized by the Maximum CPU/RAM Request Utilization slider bar. In the Profile dropdown, you can select three preset values, or a custom option:

Development: Sets the utilization to 80%.
Production: Sets the utilization to 65%.
High Availability: Sets the utilization to 50%.
Custom: Allows you to manually move the slider.

Node and pod checks

Kubecost provides recommendations by performing a Node Check and a Pod Check to determine if a node can be drained without creating problems for your environment. For example, if draining the node would put the cluster above the utilization request threshold, the Node Check will fail. Only a node that passes both Checks will be recommended for safe drainage. For nodes that fail at least one Check, selecting the node will provide a window of potential pod issues.

Kubecost does not directly assist in turning nodes down.

Orphaned Resources

Kubecost displays all disks and IP addresses that are not utilized by any cluster. These may still incur charges, and so you should consider these orphaned resources for deletion.

You can access the Orphaned Resources page by selecting Savings in the left navigation, then selecting Manage orphaned resources.

Disks and IP addresses (collectively referred to as resources) will be displayed in a single table. Selecting an individual line item will expand its tab and provide more metrics about the resource, including cost per month, size (disks only), region, and a description of the resource.

You can filter your table of resources using two dropdowns:

The Resource dropdown will allow you to filter by resource type (Disk or IP Address).
The Region dropdown will filter by the region associated with the resource. Resources with the region “Global” cannot be filtered, and will only display when All has been selected.

Above your table will be an estimated monthly savings value. This value is the sum of all displayed resources’ savings. As you filter your table of resources, this value will naturally adjust.

For cross-functional convenience, you can copy the name of any resource by selecting the copy icon next to it.

Spot Checklist

The Spot Readiness Checklist investigates your Kubernetes workloads to attempt to identify those that are candidates to be schedulable on Spot (preemptible) nodes. Spot nodes are deeply-discounted nodes (up to 90% cheaper) from your cloud provider that do not come with an availability guarantee. They can disappear at any time, though most cloud providers guarantee some sort of alert and a small shutdown window, on the order of tens of seconds to minutes, before the node disappears.

Spot-ready workloads, therefore, are workloads that can tolerate some level of instability in the nodes they run on. Examples of Spot-ready workloads are usually state-free: many microservices, Spark/Hadoop nodes, etc.

The Spot Checklist performs a series of checks that use your own workload configuration to determine readiness:

Controller Type (Deployment, StatefulSet, etc.)
Replica count
Local storage
Controller Pod Disruption Budget
Rolling update strategy (Deployment-only)
Manual annotation overrides

You can access the Spot Checklist in the Kubecost UI by selecting Settings > Spot Instances > Spot Checklist.

How to interpret Spot Checklist results

Controller type

The checklist is configured to investigate a fixed set of controllers, currently only Deployments and StatefulSets.

Deployments are considered Spot-ready because they are relatively stateless, intended to only ensure a certain number of pods are running at a given time.

StatefulSets should generally be considered not Spot ready; they, as their name implies, usually represent stateful workloads that require the guarantees that StatefulSets. Scheduling StatefulSet pods on Spot nodes can lead to data loss.

Replica count

Workloads with a configured replica count of 1 are not considered Spot-ready because if the single replica is removed from the cluster due to a Spot node outage, the workload goes down. Replica counts greater than 1 signify a level of Spot-readiness because workloads that can be replicated tend to also support a variable number of replicas that can occur as a result of replicas disappearing due to Spot node outages.

Local storage

Currently, workloads are only checked for the presence of an emptyDir volume. If one is present, the workload is assumed to be not Spot-ready.

More generally, the presence of a writable volume implies a lack of Spot readiness. If a pod is shut down non-gracefully while it is in the middle of a write, data integrity could be compromised. More robust volume checks are currently under consideration.

Pod Disruption Budget

It is possible to configure a Pod Disruption Budget (PDB) for controllers that causes the scheduler to (where possible) adhere to certain availability requirements for the controller. If a controller has a PDB set up, we read it and compute its minimum available replicas and use a simple threshold on the ratio min available / replicas to determine if the PDB indicates readiness. We chose to interpret a ratio of > 0.5 to indicate a lack of readiness because it implies a reasonably high availability requirement.

If you are considering this check while evaluating your workloads for Spot-readiness, do not immediately discount them because of this check failing. Workloads should always be evaluated on a case-by-case basis and it is possible that an unnecessarily strict PDB was configured.

(Deployment only) Rolling update strategy

Deployments have multiple options for update strategies and by default they are configured with a Rolling Update Strategy (RUS) with 25% max unavailable. If a deployment has an RUS configured, we do a similar min available (calculated from max unavailable in rounded-down integer form and replica count) calculation as with PDBs, but threshold it at 0.9 instead of 0.5. Doing so ensures that default-configured deployments with replica counts greater than 3 will pass the check.

Manual annotation overrides

We also support manually overriding the Spot readiness of a controller by annotating the controller itself or the namespace it is running in with spot.kubecost.com/spot-ready=true.

Implementing Spot nodes in your cluster

The Checklist is now deployed alongside a recommended cluster configuration which automatically suggests a set of Spot and on-demand nodes to use in your cluster based on the Checklist. If you do not want to use that, read the following for some important information:

Kubecost marking a workload as Spot ready is not a guarantee. A domain expert should always carefully consider the workload before approving it to run on Spot nodes.

Most cloud providers support a mix of Spot and non-Spot nodes in the cluster and they have guides:

Different cloud providers have different guarantees on shutdown windows and automatic draining of Spot nodes that are about to be removed. Consult your provider’s documentation before introducing Spot nodes to your cluster.

It is a good idea to use taints and tolerations to schedule only Spot-ready workloads on Spot nodes.

Additionally, it is generally wise to use smaller size Spot nodes. This minimizes the scheduling impact of individual Spot nodes being reclaimed by your cloud provider. Consider one Spot node of 20 CPU cores and 120 GB RAM against 5 Spot nodes of 4 CPU and 24 GB. In the first case, that single node being reclaimed could force tens of pods to be rescheduled, potentially causing scheduling problems, especially if capacity is low and spinning up a new node takes too long. In the second case, fewer pods are forced to be rescheduled if a reclaim event occurs, thus lowering the likelihood of scheduling problems.

Spot Commander

Spot Commander is a Savings feature which identifies workloads where it is available and cost-effective to switch to Spot nodes, resizing the cluster in the process. Spot-readiness is determined through a which analyzes the workload and assesses the minimal cost required. It also generates CLI commands to help you implement the recommendation.

Spot Cluster Sizing Recommendation

The recommended Spot cluster configuration uses all of the data available to Kubecost to compute a "resizing" of your cluster's nodes into a set of on-demand (standard) nodes O and a set of spot (preemptible) nodes S. This configuration is produced from applying a scheduling heuristic to the usage data for all of your workloads. This recommendation offers a more accurate picture of the savings possible from implementing spot nodes because nodes are what the cost of a cluster is made up of; once O and S have been determined, the savings are the current cost of your nodes minus the estimated cost of O and S.

Implementing the recommended configuration

The recommended configuration assumes that all workloads considered spot-ready by the will be schedulable on spot nodes and that workloads considered not spot-ready will only be schedulable on on-demand nodes. Kubernetes has for achieving this behavior. Cloud providers usually have guides for using spot nodes with taints and tolerations in your managed cluster:

Different cloud providers have different guarantees on shutdown windows and automatic draining of spot nodes that are about to be removed. Consult your provider’s documentation before introducing spot nodes to your cluster.

Kubecost marking a workload as spot ready is not a guarantee. A domain expert should always carefully consider the workload before approving it to run on spot nodes.

How the recommended cluster configuration is determined

Determining O and S is achieved by first partitioning all workloads on the cluster (based on the results of the Checklist) into sets: spot-ready workloads R and non-spot-ready workloads N. Kubecost consults its maximum resource usage data (in each Allocation, Kubecost records the MAXIMUM CPU and RAM used in the window) and determines the following for each of R and N:

The maximum CPU used by any workload
The maximum RAM used by any workload
The total CPU (sum of all individual maximums) required by non-DaemonSet workloads
The total RAM (sum of all individual maximums) required by non-DaemonSet workloads
The total CPU (sum of all individual maximums) required by DaemonSet workloads
The total RAM (sum of all individual maximums) required by DaemonSet workloads

Kubecost uses this data with a configurable target utilization (e.g., 90%) for R and N to create O and S:

Every node in O and S must reserve 100% - target utilization (e.g., 100% - 90% = 10%) of its CPU and RAM
Every node in O must be able to schedule the DaemonSet requirements in R and N
Every node in S must be able to schedule the DaemonSet requirements in R
With the remaining resources:
The largest CPU requirement in N must be schedulable on a node in O
The largest RAM requirement in N must be schedulable on a node in O
The largest CPU requirement in R must be schedulable on a node in S
The largest RAM requirement in R must be schedulable on a node in S
The total CPU requirements of N must be satisfiable by the total CPU available in O
The total RAM requirements of N must be satisfiable by the total RAM available in O
The total CPU requirements of R must be satisfiable by the total CPU available in S
The total RAM requirements of R must be satisfiable by the total RAM available in S

Usage tips

It is recommended to set the target utilization at or below 95% to allow resources for the operating system and the kubelet.

The configuration currently only recommends one node type for O and one node type for S but we are considering adding multiple node type support. If your cluster requires specific node types for certain workloads, consider using Kubecost's recommendation as a launching point for a cluster configuration that supports your specific needs.

Persistent Volume Right-Sizing Recommendations

Kubecost is able to provide recommendations for resizing your PVs by comparing their average usage to their maximum capacity, and can recommend sizing down to smaller storage sizes.

To access the Persistent Volume Right-Sizing Recommendations page, select Savings from the left navigation, then select Right-size persistent volumes.

Kubecost will display a table containing all PVs in your environment. Table columns include the PV name and its corresponding cluster, and metrics pertaining usage and savings. The estimated savings per month per table item is calculated by subtracting your recommended cost from the current cost.

You can filter your table of PVs using the Cluster dropdown to view PVs in an individual cluster, or across all connected clusters.

Calculating recommendations

You can also adjust Kubecost’s average recommended capacity size using the Profile dropdown, which establishes how much minimum excess capacity you will for every PV, using their local usage data from the past six hours. The percentage value associated with each Profile is the minimum unused capacity required per PV, which is then added to the max usage to obtain Kubecost’s recommendation. Recommended capacity is calculated as (max usage + (max usage * overhead percentage)) in GiB. This is then converted to GB and rounded to the nearest tenth when displayed in the UI (A capacity of 1 GiB will be converted to 1.1 GB). Max Usage is also converted in this way from GiB to GB. The smallest denomination Kubecost will recommend per PV is 1.1 GB. From here, the recommended capacity increases in intervals of 1 GiB. The higher the minimum excess capacity needed, the higher the average recommended capacity, and therefore the lower the average savings.

For example, for a PV with a max usage of 2 GiB, and a selected Production Profile (which requires 50% excess capacity), the overhead will be calculated as 2 * .5, then added to the max usage, resulting in a minimum recommended capacity of 3 GiB. This will then be converted to approximately 3.2 GB for the final recommendation.

Kubecost does not directly assist with resizing your PVs.

Cluster Health Score

The health score starts at 100. Penalties reduce the score. There are three penalty types:

SevereErrorPenalty = 50
ErrorPenalty       = 15
WarningPenalty     = 3

WarningPenalty is applied when:

Single Cluster (Master exists on Cluster - for kops based kubernetes deployments on AWS)
Single Region
Predictive Disk Growth crosses a 90% threshold

ErrorPenalty is applied:

Any Nodes in the Cluster are Not Ready
Any Nodes are under MemoryPressure

SevereErrorPenalty is applied:

Memory Usage exceeds 90% of Available Memory on the Cluster

Alert

The is based on a threshold of change. For example, an alert on 14 would alert anytime an Error penalty was applied.

Budgets

Budgets are a way of establishing spend limits for your clusters, namespaces, or labels. They can be created in moments using the Budgets dashboard.

Creating a budget

Begin by selecting the New Budget button in the top right corner of the dashboard. A new window will display from the right side of your screen.

Provide the following fields:

Budget name: The name of your budget
Budget cap: The allotted amount of your budget per interval

The currency of your budget is unchangeable in the Budgets dashboard. To change currency type, go to Settings > Currency. Then, select Save at the bottom of the Settings page to apply changes. Changing currency type will affect cost displays across all of your Kubecost, not just the Budgets dashboard. Kubecost does not convert spending costs to other currency types; it will only change the symbol displayed next to cost. For best results, configure your currency to what matches your spend.

Determine the length of your budget and reset date using the two dropdowns under the Budget cap text box. Budgets can be either Weekly or Monthly, and can reset on any day of the week/month. This means you don't need to recreate your budgets repeatedly and can align them with your schedules or processes.

Workloads

From the first dropdown, select whether this budget will apply to a namespace, cluster, or a label. In the second dropdown, choose the individual item in that category. When Namespace or Cluster has been selected, the dropdown menu should attempt to autocomplete by searching for all potential items.

Labels need to be provided in a key:value format that describes the object where the budget applies to.

Actions

Budget Actions are an optional method of better monitoring your budgets. You can use Actions to create an alert when your budget hits a certain percentage threshold, and send out an email, Slack, and/or Microsoft Teams alert.

Budget Actions by default check against the limits every 8 hours.

To begin, select New Action. Select your Trigger percentage value (leaving your Trigger percentage at 100 will only alert you once the budget has been exceeded). Then, provide any emails or webhooks where you would like to receive your alerts. Select Save.

Finalize your budget by selecting Save. Your budget has been created and should appear on the dashboard.

Budget options

Once your budget has been created, it will immediately display your current spending. There are multiple ways of inspecting or adjusting your existing budgets.

Details

Selecting Details in the row of a specific budget will open a window displaying all details for your budget, including current spending, budget remaining, reset date, and any existing Actions.

Editing a budget

Selecting Edit in the row of a specific budget will open a window allowing you to edit all details about your budget, similar to when you initially created it. All details are able to be changed here.

Deleting a budget

Selecting Delete will open the Delete Budget window. Confirm by selecting Delete.

Audits

The Audits dashboard cannot be used until you have enabled the Cost Events Audit API via Helm. See the doc for instructions.

The Audit dashboard provides a log of changes made to your deployment. It's powered by the and the . Supported event types include creations and deletions of Deployments and StatefulSets.

Estimated monthly cost impact

Cost impact from additions or deletions is provided using the Predict API. Deletions should naturally result in cost savings, indicated by a negative value, with the opposite effect for additions.

Alerts

Allocation Budget: Sends an alert when spending crosses a defined threshold
Allocation Efficiency: Detects when a Kubernetes tenant is operating below a target cost-efficiency threshold
Allocation Recurring Update: Sends an alert with cluster spending across all or a subset of Kubernetes resources.
Allocation Spend Change: Sends an alert reporting unexpected spend increases relative to moving averages
Asset Budget: Sends an alert when spend for a particular set of assets crosses a defined threshold.
Asset Recurring Update: Sends an alert with asset spend across all or a subset of cloud resources.
Cloud Cost Budget: Sends an alert when the total cost of cloud spend goes over a set budget limit.
Monitor Cluster Health: Used to determine if the cluster's health score changes by a specific threshold. Can only be toggled on/off.
Monitor Kubecost Health: Used for production monitoring for the health of Kubecost itself. Can only be toggled on/off.

Configuring alerts in Helm

values.yaml is a source of truth. Alerts set through values.yaml will continually overwrite any manual alert settings set through the Kubecost UI.

Global alert parameters

The alert settings, under global.notifications.alertConfigs in cost-analyzer/values.yaml, accept four global fields:

frontendUrl optional, your cost analyzer front-end URL used for linkbacks in alert bodies
globalSlackWebhookUrl optional, a global Slack webhook used for alerts, enabled by default if provided
globalMsTeamWebhookUrl optional, a global Microsoft Teams webhook used for alerts, enabled by default if provided
globalAlertEmails a global list of emails for alerts

Example Helm values.yaml:

notifications:
    # Kubecost alerting configuration
    alertConfigs:
      frontendUrl: http://localhost:9090 
      globalSlackWebhookUrl: https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX 
      globalMsTeamsWebhookUrl: https://m365x682156.webhook.office.com
      globalAlertEmails:
        - recipient@example.com
        - additionalRecipient@example.com
      # Alerts generated by kubecost, about cluster data
      alerts:
      # list of individual alerts
      # ...

Configuring each alert type

Allocation Budget

Defines spend budgets and alerts on budget overruns.

Parameter

Value(s)

Description

type

budget

Alert type.

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

Configurable, accepts all aggregations supported by the .

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values.

threshold

<amount>

Cost threshold in configured currency units.

Example Helm values.yaml:

# Daily namespace budget alert on namespace `kubecost`
- type: budget
  threshold: 50
  window: daily # or 1d
  aggregation: namespace
  filter: kubecost
# 3d cluster budget alert on cluster `cluster-one`
- type: budget
  threshold: 600
  window: 3d
  aggregation: cluster
  filter: cluster-one

Allocation Efficiency

Alerts when Kubernetes tenants, e.g. namespaces or label sets, are running below defined cost-efficiency thresholds.

Parameter

Value(s)

Description

type

efficiency

Alert type.

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

Configurable, accepts all aggregations supported by the .

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values.

efficiencyThreshold

<value>

Optional. Efficiency threshold ranging from 0.0 to 1.0.

spendThreshold

<amount>

The cost threshold (ie. budget) in configured currency units.

The example below sends a Slack alert when any namespace spending is running below 40% cost efficiency and has spent more than $100 during the last day.

- type: efficiency
  efficiencyThreshold: 0.4
  spendThreshold: 100
  window: 1d 
  aggregation: namespace
  slackWebhookUrl: ‘https://hooks.slack.com/services/TE6GRBNET/BFFK0P848/jFWmsadgfjhiBJp30p’

Allocation Recurring Update

Sends a recurring alert with a summary report of cost and efficiency metrics.

Parameter

Value(s)

Description

type

recurringUpdate

Alert type.

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

Configurable, accepts all aggregations supported by the .

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values

Additional `window` values:

<N>d where N in [1, 7) for every N days
7d or weekly for 0:00:00 UTC every Monday
30d or monthly for 0:00:00 UTC on the first day of the month.

Additional `aggregation` values:

label requires the following format: label:<label_name>
annotation requires the following format: annotation:<annotation_name>

This example sends a recurring alert for allocation data for all namespaces every seven days:

# Recurring weekly namespace update on all namespaces
- type: recurringUpdate
  window: weekly
  aggregation: namespace
  filter:
# Recurring weekly namespace update on kubecost namespace
- type: recurringUpdate
  window: weekly
  aggregation: namespace
  filter: kubecost
  ownerContact: # optional, overrides globalAlertEmails default
    - owner@example.com
    - owner2@example.com
  slackWebhookUrl: https://hooks.slack.com/services/<different-from-global> # optional, overrides globalSlackWebhookUrl default

Allocation Spend Change

Detects unexpected spend increases/decreases relative to historical moving averages.

Parameter

Value(s)

DEscription

type

spendChange

Alert type.

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

Configurable, accepts all aggregations supported by the .

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values.

baselineWindow

<N>d

Collect data from N days prior to queried items to establish cost baseline. Configurable, where N ≥ 1.

relativeThreshold

<N>

Percentage of change from the baseline (positive or negative) which will trigger the alert. Configurable where N ≥ -1.

Example Helm values.yaml:

# Daily spend change alert 
- type: spendChange
  relativeThreshold: 0.20   # change relative to baseline average cost. Must be greater than -1 (can be negative).
  window: 1d                # accepts ‘d’, ‘h’
  baselineWindow: 30d       # previous window, offset by window
  aggregation: namespace
  filter: kubecost, default # accepts csv

Asset Budget

Defines asset budgets and alerts when Kubernetes assets overrun the threshold set.

Parameter

Value(s)

Description

type

assetBudget

Alert type

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

Configurable, accepts all aggregations supported by the .

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values

threshold

<amount>

The cost threshold (ie. budget) in configured currency units.

Example Helm values.yaml:

# Daily namespace budget alert on type `Node,Loadbalancer`
- type: assetBudget
  threshold: 50
  window: daily # or 1d
  aggregation: type
  filter: Node,LoadBalancer
# 3d cluster budget alert on cluster with no filter set
- type: assetBudget
  threshold: 100
  window: 3d
  aggregation: cluster
  filter: ''

Asset Recurring Update

Sends a recurring alert with a Kubernetes assets summary report.

Parameter

Value(s)

Description

type

cloudReport

Alert type

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

Configurable, accepts all aggregations supported by the .

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values

Additional `window` values:

<N>d where N in [1, 7) for every N days
7d or weekly for 0:00:00 UTC every Monday
30d or monthly for 0:00:00 UTC on the first day of the month.

Additional `aggregation` values:

label requires the following format: label:<label_name>
annotation requires the following format: annotation:<annotation_name>

Two example alerts, one which provides weekly summaries of Kubernetes asset spend data aggregated by cluster, and one which provides weekly summaries of asset spend data for one specific cluster:

# Recurring weekly cloud update on all cluster
- type: cloudReport
  window: weekly  # or 7d
  aggregation: cluster
  filter: ''
# Recurring weekly cloud update on only cluster-one
- type: cloudReport
  window: weekly  # or 7d
  aggregation: cluster
  filter: cluster-one
  ownerContact: # optional, overrides globalAlertEmails default
    - owner@example.com
    - owner2@example.com
  slackWebhookUrl: https://hooks.slack.com/services/<different-from-global> # optional, overrides globalSlackWebhookUrl default

Cloud Cost Budget

Defines cloud cost budgets and alerts when cloud spend overruns the threshold set.

Parameter

Value(s)

Details

type

cloudCostBudget

Alert type

window

<N>d or <M>h

The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.

aggregation

<agg-parameter>

Configurable, accepts service, account, provider, invoiceEntity, or label.

filter

<value>,<value2>...

Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values

threshold

<amount>

The cost threshold (ie. budget) in configured currency units.

costMetric

<metric-type>

Cost metric type. Accepts ListCost, NetCost, AmortizedNetCost, InvoicedCost and AmortizedCost.

Monitor Cluster Health

Cluster health alerts occur when the cluster health score changes by a specific threshold. The health score is calculated based on the following criteria:

Low Cluster Memory
Low Cluster CPU
Too Many Pods
Crash Looping Pods
Out of Memory Pods
Failed Jobs

Example Helm values.yaml:

# Health Score Alert 
- type: health              # Alerts when health score changes by a threshold
  window: 10m
  threshold: 5              # Send Alert if health scores changes by 5 or more
  ownerContact: # optional, overrides globalAlertEmails default
    - owner@example.com
    - owner2@example.com
  slackWebhookUrl: https://hooks.slack.com/services/<different-from-global> # optional, overrides globalSlackWebhookUrl default

Monitor Kubecost Health

Prometheus is unreachable
Kubecost Metrics Availability:
- Kubecost exported metrics missing over last 5 minutes
- cAdvisor exported metrics missing over last 5 minutes
- cAdvisor exported metrics missing expected labels in the last 5 minutes
- Kubestate Metrics (KSM) exported metrics missing over last 5 minutes
- Kubestate Metrics (KSM) unexpected version
- Node Exporter metrics are missing over last 5 minutes.
- Scrape Interval prometheus self-scraped metrics missing over last 5 minutes
- CPU Throttling detected on cost-model in the last 10 minutes
Clusters Added/Removed (Enterprise Multicluster Support Only)

Required parameters:

type: diagnostic
window: <N>m -- configurable, N > 0

Optional parameters:

diagnostics -- object containing specific diagnostic checks to run (default is true for all). See configuration example below for options:

Example Helm values.yaml:

# Kubecost Health Diagnostic
- type: diagnostic
  window: 10m
  ownerContact: # optional, overrides globalAlertEmails default
    - owner@example.com
    - owner2@example.com
  slackWebhookUrl: https://hooks.slack.com/services/<different-from-global> # optional, overrides globalSlackWebhookUrl default
  diagnostics: 
    prometheusUp: true
    cAdvisor: true
    ksm: true
    kubecost: true 
    nodeExporter: true
    scrapeInterval: true
    cpuThrottling: true
    clusterJoinLeave: true

Configuring alerts in the Kubecost UI

Cluster and Kubecost Health Alerts

Global recipients

Budget, efficiency, spend change, and recurring update alerts

Select + Create Alert to open the Create Alert window where you configure the details of your alert.

The fields for each alert type should resemble their corresponding Helm values in the above tables.

Alerts can also be edited, removed, and tested from the table. Editing opens a dialog similar to the alert creation dialog, for editing the chosen alert.

Testing alerts

Alerts scheduler

Supported: weekly and daily special cases, <N>d, <M>h (1 ≤ N ≤ 7, 1 ≤ M ≤ 24) Currently Unsupported: time zone adjustments, windows greater than 7d, windows less than 1h

Scheduler behavior

An <N>d alert sends at 00:00 UTC N day(s) from now, i.e., N days from now rounded down to midnight.

For example, a 5d alert scheduled on Monday will send on Saturday at 00:00, and subsequently the next Thursday at 00:00

An <N>h alert sends at the earliest time of day after now that is a multiple of N.

For example, a 6h alert scheduled at any time between 12 pm and 6 pm will send next at 6 pm and subsequently at 12 am the next day.

If 24 is not divisible by the hourly window, schedule at next multiple of <N>h after now, starting from the current day at 00:00.

For example, a 7h alert scheduled at 22:00 checks 00:00, 7:00, 14:00, and 21:00, before arriving at the next send time of 4:00 tomorrow.

Troubleshooting

Review these steps to verify alerts are being passed to the Kubecost application correctly.

Check /model/alerts/configs to ensure the alerts system has been configured properly.
Check /model/alerts/status to ensure alerts have been scheduled correctly.
- The status endpoint returns all of the running alerts including schedule metadata:
  - scheduledOn: The date and time (UTC) that the alert was scheduled.
  - lastRun: The date and time (UTC) that the alert last ran checks (will be set to 0001-01-01T00:00:00Z if the alert has never run).
  - nextRun: The date and time (UTC) that the alert will next run checks.
  - lastError: If running the alert checks fails for unexpected reasons, this field will contain the error message.

If using Helm:

Run kubectl get configmap alert-configs -n kubecost -o json to view the alerts ConfigMap.
Ensure that the Helm values are successfully read into the ConfigMap under alerts.json under the data field. See below:

{
    "apiVersion": "v1",
    "data": {
        "alerts.json": "{\"globalAlertEmails\":[\"foo@fooexample1.com\",\"recipient@example.com\"],\"frontendUrl\":\"http://localhost:3000\",\"globalSlackWebhookUrl\":\"\",\"alerts\":[{\"type\":\"budget\",\"threshold\":50,\"window\":\"1d\",\"aggregation\":\"namespace\",\"filter\":\"kubecost\"},{\"type\":\"spendChange\",\"relativeThreshold\":0.2,\"window\":\"1d\",\"baselineWindow\":\"30d\",\"aggregation\":\"namespace\",\"filter\":\"kubecost, default\"},{\"type\":\"budget\",\"threshold\":202.1,\"window\":\"1d\",\"aggregation\":\"cluster\",\"filter\":\"cluster-one\"},{\"type\":\"recurringUpdate\",\"window\":\"7d\",\"aggregation\":\"namespace\",\"filter\":\"*\"}]}\n"
    },
    "kind": "ConfigMap",
    "metadata": {
        "creationTimestamp": "2022-01-05T23:43:27Z",
		"labels": {
            "app": "cost-analyzer",
            "app.kubernetes.io/instance": "kubecost-stage",
            "app.kubernetes.io/managed-by": "Helm",
	    ...
        },
        "name": "alert-configs",
        "namespace": "kubecost",
    ...
    }
}

Ensure that the .JSON string is successfully mapped to the appropriate configs

Confirm that Kubecost has received configuration data:

Visit the Alerts page in the Kubecost UI to view configured alert settings as well as any of the alerts configured from Helm.
Alerts set up in the UI will be overwritten by Helm values.yaml if the pod restarts.

Confirm that nextRun has been updated from "0001-01-01T00:00:00Z"

Common causes of misconfiguration include the following:

Unsupported CSV filters: spendChange alerts accept multiple filter values when comma-separated; other alert types do not.
Unsupported alert type: all alert type names are in camelCase. Check spelling and capitalization for all alert parameters.
Unsupported aggregation parameters: see the Allocation API doc for details.

Navigating the Kubecost UI

Allocations Dashboard

Configuring your query

Date Range

Step size

Aggregate By filters

Edit report

Idle Costs

Chart

Cost metric

Filters

Advanced filtering

Shared resources

Additional options

Cost metrics table

Cost efficiency table example

Additional options column

Efficiency and Idle

Efficiency

Idle

Sharing idle

Distributing idle when aggregating

Target values for efficiency and idle

Network Traffic Cost Allocation

Network cost calculation methodology

Base functionality

Network costs DaemonSet

Cloud integration

Both cloud integration and network cost DaemonSet

Limitations

Assets Dashboard

Configuring your query

Date Range

Aggregate By filter

Edit Report

Resolution

Cost metric

Filters

Additional options

Assets metrics table

Cloud cost reconciliation

Clusters Dashboard

Overview

Enabling Clusters dashboard

Usage

Filtering clusters

Cloud Cost Explorer

Installation and configuration

Using topNitems

Configuring your query

Date range

Aggregate filters

Edit

Add filters

Cost table metrics

Reports

Managing reports via UI

Creating a report

Sharing reports

Managing reports via values.yaml

Example Helm values.yaml Saved Reports section

Combining UI report management with values.yaml

Troubleshooting

Advanced Reporting

Creating an advanced report

Configuring a report

Editing your report

Filters

Shared resources

Saving your report

Cloud Breakdown

Cost Center Report

Adding a cost center

Cloud costs

Kubernetes workloads

Tags and labels

Managing multiple cost centers

Finalizing a CCR

Alerts

Configuring alerts in Helm

Using `topNitems`

Additional `window` values:

Additional `aggregation` values:

Additional `window` values:

Additional `aggregation` values: